New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Modeling dynamic and network heterogeneities in the spread of sexually transmitted diseases

Edited by Robert May, University of Oxford, Oxford, United Kingdom, and approved August 5, 2002 (received for review April 24, 2002)
Abstract
A wide range of communicable human diseases can be considered as spreading through a network of possible transmission routes. The implied network structure is vital in determining disease dynamics, especially when the average number of connections per individual is small as is the case for many sexually transmitted diseases (STDs). Here we develop an intuitive mathematical framework to deal with the heterogeneities implicit within contact networks and those that arise because of the infection process. These models are compared with full stochastic simulations and show excellent agreement across a wide range of parameters. We show how such models can be used to estimate parameters of epidemiological importance, and how they can be extended to examine the effectiveness of various control strategies, in particular screening programs and contact tracing.
Sexually transmitted diseases (STDs) are an important and increasing publichealth problem throughout the world, causing widespread mortality and morbidity and having immense social and economic consequences (1–4). The progress and dynamics of these infections within a population must be understood so that treatment and interventions can be most effectively targeted. Here we present a mechanism to predict the spread of infection through a network of contacts. This work bridges the gap between full simulations, which are difficult to parameterize, and riskstructured meanfield models, which ignore the strong effects attributable to partnerships.
The localized spread of infectious diseases has been successfully captured with a variety of spatial models (5–10). Such approaches recognize that the predominantly local nature of disease transmission leads to a high degree of spatial segregation and means that the population is not uniformly well mixed. An alternative and often more realistic strategy is to consider the spread of infection across a network of contacts. This approach is necessary for STDs because proximity in space is no longer the determining risk factor for transmission. The resulting sexual mixing network contains all of the information on potential pathways of infection and as such has proved vital in understanding STD spread (11–18).
The sexual mixing network focuses on one of the central issues of epidemiology: who can acquire infection from whom (19–22). Such networks represent each individual within a population as a node, with connecting edges denoting relationships that could lead to the transmission of infection (Fig. 1). For many of the common airborne infections it may be difficult to define which contacts form an edge (23); for STDs, however, edges are more precisely defined and correspond to sexual partnerships. We note that the network is disease dependent, with STD networks contained within the networks of more readily transmitted infections.
The initial spread and longterm behavior of any infectious disease are determined by both its epidemiological characteristics and the graph theoretical properties of the network—such as the average number of neighbors, degree of clustering, and the path length between nodes (24–26). One of the important dynamical features of infection taking place within the constraints of a network is the rapid buildup of correlations in the infection status of connected individuals (27–29); for example, most infected nodes have infected neighbors, by whom they were infected or to whom they have transmitted infection. This aggregation reduces the average number of susceptible partners per infected individual and consequently slows the propagation of an epidemic. Standard epidemiological models (19) ignore this powerful correlation; this simplification has the greatest impact when the number of neighbors is low, which is generally the case for STDs in the vast majority of the population.
Because of the large number of individuals involved, the personal nature of the contact information, and the biases that are frequently present, it is very difficult to accurately reconstruct the entire network of sexual contacts (30–32). However, work carried out by genitourinary medicine (GUM) clinics provides some data about the local properties of the network. To limit further spread, the GUM clinic attempts to trace the sexual partners of any index case or subsequently discovered infectious individual. Ideally, we would like to reconstruct the entire mixing network by using this or similar approaches (11–14, 33, 34), but in practice this is rarely possible as the chains of transmission detected are seldom more than a few people long (31, 35). This limitation identifies the need for a modeling approach that utilizes the available detailed information about the network characteristics, but does not require the complete network to be reconstructed.
We present a model formulation that utilizes the essential characteristics of the mixing network. It captures both the buildup of correlations within the population and the effects of heterogeneities already present in the network structure. This capability is achieved by modeling partnerships as dynamic variables, developing a set of differential equations for the various types of connected pairs within the network. This is a significant step toward a more individualbased understanding of the dynamics of STDs; unlike computer simulations of the full network, these models are easily parameterizable through the use of readily attainable contact tracing data, and they retain a high degree of generality.
The next section describes the basic formulation of the pairwise network equations and introduces a simplifying approximation that dramatically eases computation. The section after that tests these new equations and other standard approaches for modeling STDs against full stochastic simulations of disease transmission on computergenerated networks, which can provide complete comprehensive information. Finally, we show how our model can be used to estimate important individuallevel epidemiological parameters from available data and to determine the effects of both random screening and contact tracing, which cannot be achieved with the usual model formulation.
PairWise Network Model
Standard models for the dynamics of diseases classify individuals in terms of their infection history. In general, such models consider the proportion of individuals in each class and ignore network or spatial structure. Commonly, individuals can be in one of three states; they are susceptible (S) and can catch the disease, infectious (I) and can spread the disease, or recovered (R) and immune (19). For STDs there is generally little or no immunity, so individuals return to the susceptible state on recovery; therefore we restrict our focus primarily to such SIS dynamics.
Model Formulation.
One obvious risk factor for contracting an STD is the number of sexual partners (in network terms, the number of neighbors) (16, 36). However, the risk also depends on the characteristics of the neighbors, in particular the likelihood that they are infected. Hence, to model STD dynamics accurately it is necessary to know not only the disease status and activity level of each individual as included in riskstructured models (19) but also the status of their sexual partners; thus information about partnerships is vitally important (20–22, 24, 37–39).
The model developed here considers partnerships as the fundamental variables (40–43), modeling the edges on the mixing network rather than just the nodes. Only by treating partnerships within their network context can the correlations between neighboring individuals be captured. This model extends the recent work on pairwise models (27–29, 44, 45) by introducing heterogeneities into the network structure (46). In this model all partnerships are considered to be active and hence concurrent (26, 40, 43), which may be a realistic approximation for highly promiscuous subpopulations although not for a general population over a long time period. Using the standard compartmental approach and following the notation of Keeling et al. (27), we label individuals as S or I, and include superscripts to denote their number of partners in the network. For example, [I^{n}] denotes the number of infected individuals with n partners, and [S^{n}I^{m}] is the number of partnerships between a susceptible with n partners and an infected with m (Table 1 gives the notation used in more detail). It is only through such partnerships, between a susceptible and an infected individual, that infection can be transmitted.
Considering the dynamics of infectious individuals two basic events can occur. Either recovery, which is assumed to occur at rate g, or a susceptible individual can be infected by an infectious partner—this leads to the inclusion of partnership information in an equation for individuals: 1 In more detail, the term ∑_{m} [S^{n}I^{m}] refers to the total number of infected partners of all S^{n}, each of whom transmits infection at rate τ. If we make the standard assumption and ignore partnerships but use the contact data to estimate mixing between classes we could approximate this term by using 2 Although this equation includes riskstructured heterogeneities, such a meanfield approach ignores the correlations in infection status that emerge between connected individuals. More accurately, a further set of equations can be constructed to model the dynamics of pairs, for example, 3 the last term being absent in the SIR model. The terms in Eq. 3 refer to creation of the [S^{n}I^{m}] pair caused by infection of an S^{m} within an [S^{n}S^{m}] pair, loss of the pair caused by infection of the S^{n} either from outside or from within the partnership, loss of the pair because of recovery of the infected individual, or in the SIS framework creation of the pair attributable to an I^{n} recovering. In a similar manner, by considering the possible events that can occur we construct equations for all types of pairs (see Appendix A). This process could in theory be extrapolated, modeling triples, such as [S^{n}S^{m}I^{p}], in terms of quadruples and so on; however, not only does the system rapidly become more complicated but the amount of data available to characterize triples is limited. We therefore make a moment closure approximation (27–29, 47), estimating the number of triples in terms of the number of pairs (see Appendix A), which closes the system, enabling us to calculate the behavior of individuals and pairs.
A Refining Approximation.
For the pairwise network model above, the number of equations grows quadratically with the maximum number of neighbors allowed, and quickly becomes computationally intense. We thus look for a simplifying approximation that can reduce the number of the equations without losing the partnership representation or network structure aspects of the model.
From Eq. 1 we observe that the number of new infections is proportional to ∑_{m} [S^{n}I^{m}], and hence the network properties of the infected neighbors are irrelevant at this scale. This observation motivates us to ignore the full set of pairs and use variables of the form [A^{n}B] (= ∑_{m} [A^{n}B^{m}]) to capture the behavior (Appendix B and Appendix C, which is published as supporting information on the PNAS web site, www.pnas.org). The network structure (i.e., the number of [nm] pairs) still enters into the equations for pairs, hence much of the network heterogeneity is retained, but it is now treated as independent of the infection status of the individuals concerned. The size of this approximate set of equations grows only linearly with the maximum number of neighbors, making it far more computationally efficient.
Comparison to Full Network Simulations
We now wish to test the accuracy of both the pairwise network model and the approximation in comparison to the more standard riskstructured models (which ignore correlations) and the kregular pairwise model (which includes correlations, but ignores network structure by assuming all individuals have exactly k neighbors). We compare all four of these models with the results of a true stochastic infection process occurring on a fully connected computergenerated network (Fig. 1). This procedure has two advantages. First, the computer simulations and networks can be tightly controlled to simulate many different scenarios. Second, the simulations provide very precise and detailed information without any of the usual biases that may be present in real data.
Incidence of Infection.
Fig. 2a shows the numbers of infectious cases, comparing the four deterministic models with the results of typical stochastic simulations. Neither the meanfield nor the kregular pairwise model satisfactorily predicts the equilibrium level of infection, whereas both the full and approximate pairwise models perform well. Thus both correlations and risk structure appear to be important. This finding is further demonstrated in Fig. 2b, which shows the equilibrium levels of infection over a range of the dimensionless infection parameter τ/g, which is the normalized infectivity across a partnership. Not only do the equilibrium levels vary greatly between models, but so does the persistence threshold at which the disease can invade.
For low levels of the infection parameter τ/g, the kregular pairwise model underestimates incidence because it does not include the highly sexually active, highly connected coregroup vital to maintaining infection at low levels (14, 18, 21, 37, 48). In contrast, at very high parameter values (not shown in Fig. 2) this model overestimates prevalence because of a lack of poorly connected individuals that are shielded from disease. The meanfield model consistently overestimates the level of infection, because the correlations that naturally limit transmission are ignored. The failure of these two forms of model emphasizes the importance of integrating the pairwise and network approaches, such that both forms of heterogeneity are included.
The full pairwise model and the approximation both perform extremely well for a global network where sexual connections are made largely at random; however for a more local network when triangular connections are more frequent and path lengths between individuals are longer these models produce slight overestimates. Although sexual networks are likely to be significantly clustered, it seems that there are sufficient longrange links such that spatial segregation is less influential than for many infections (11, 34, 49, 50). The recent work on smallworld networks has indicated that even a low density of longrange connections can reduce the influence of such segregation (51, 52), and in such cases our model is likely to perform well.
R_{0} and Early Growth Rates.
The basic reproductive ratio, R_{0}, is defined as the average number of secondary cases produced by an average infectious individual in a totally susceptible population. When R_{0} is greater than 1 a disease can invade and increase within a virgin population, whereas when R_{0} is less than 1 any invasion is doomed to deterministic extinction (although stochastic effects can make a difference, especially close to the R_{0} = 1 boundary). Hence, R_{0} is a fundamental quantity in epidemiology and disease control (19, 53–55).
In practice R_{0} is calculated from the initial growth rate of an infinitesimal infection in an otherwise susceptible population. However, when the population is structured, the growth rate may depend on which class of individual is infected. We therefore allow the level of infection to equilibrate between the classes (such that highrisk individuals are more likely to be infected), before calculating the number of secondary cases. Correspondingly, for network models, R_{0} should be calculated only once the early spatial correlations (which develop within a couple of generations) have formed (29).
For the SIR version of the pairwise network model (Appendix A) the basic reproductive ratio is given by 4 where λ is the dominant eigenvalue of the matrix M given by 5 (see Appendix D, which is published as supporting information on the PNAS web site). This matrix M is therefore a useful means of quantifying the connectedness of contact networks. We note that M is a modified version of the standard contact matrix given by [nm], which contains all of the information about the types of partnerships present in the network.
The strong correlations between the infection statuses of neighboring individuals play two roles. First, the negative correlation between susceptible and infectious individuals acts to damp the epidemic spread and therefore reduces R_{0}. Second, in standard meanfield models, which ignore partnerships and correlations, R_{0} is the same for both SIS and SIR formulations. However for a pairwise version of the SIR epidemic, infectious individuals have a high proportion of recovered (and therefore immune) individuals in their neighborhood, which limits the spread of the disease. This limitation does not occur in the SIS formulation, and consequently epidemic growth is more rapid, as can be seen from Fig. 2c. R_{0} cannot be given in a simple closed form for the SIS case (see Appendix D in supporting information), but Eq. 4 provides a lower bound.
Fig. 2c shows how R_{0} and therefore the initial growth rates differ between the various models. Where there is no straightforward analytic solution, R_{0} is calculated from the early growth rate of the epidemic, once the local spatial structure has equilibrated. The importance of taking partnerships into account is underlined by the degree to which the meanfield model overestimates initial epidemic spread.
Core Groups.
Fig. 2d considers how infection is distributed between the classes; as expected, those individuals with more neighbors are more likely to be infected, and this bias becomes more pronounced as the infection level decreases (48). There is little substantial difference between the results of the two models and simulations, as the distribution across cases is dependent primarily on the risklevel heterogeneities, which are the same in all formulations. Both the full and meanfield models underestimate, to some extent, the degree to which infection is skewed toward higher activity classes, the meanfield model in particular failing to represent the sheltering that the network can provide for the less active groups. The heterogeneities between classes are therefore vital for understanding the persistence, spread, and invasion of STDs. A control policy focused on the highrisk individuals (those with large numbers of contacts), taking advantage of the heterogeneities present in the network, is likely to be more successful than one applied at random (19). The highrisk classes are even more important when public health agencies are close to eradicating the disease, as they can act as both a reservoir of infection and a potential invasion route for new infections. Hence as an intervention policy achieves success it becomes increasingly important to target those individuals most central to disease spread.
Applications to Sexual Network Data
To illustrate the utility of this approach, we shall apply it to a network of sexual relationships in Manitoba, Canada, shown in Wylie and Jolly (11). These network data are unusual in that they contain the sexual partnerships between 82 connected individuals together with the presence of chlamydia and gonorrhea infection, and as such represent the results of a large volume of research by many public health workers. We now consider the invasion parameters and control of these two diseases on such a highly connected subpopulation.
Parameter Estimation.
The information on partnerships was used to parameterize the mixing matrix used by pairwise and meanfield models. In contrast to the previous section, which assumed the same individuallevel parameters for each model, here we use the level of prevalence to estimate the model specific infection parameter, τ/g (Fig. 3a and Table 2). It is clear that the meanfield model requires lower values of the infection parameter to achieve the same levels of prevalence, as it ignores the damping effects of partnerships and hence correlations. By using these parameters, it is possible to calculate R_{0} in each case, providing a networkspecific measure of the invasiveness of each disease (Table 2). The significantly higher value of R_{0} obtained by using the meanfield model illustrates the differences between the approaches, even when controlling for prevalence. (The kregular model, in contrast, lacking the highly active individuals, shows a much slower growth of infection.) The significantly larger value of R_{0} calculated for the meanfield model implies that invasion occurs more slowly in practice than predicted by standard models. However, as shown below, this does not necessarily mean that the disease is easier to control.
Control of STDs.
Attempts to combat any infectious disease take two main forms: screening and contact tracing. Screening programs act at the individual level by testing and treating a random sample of the population, or a subpopulation of highrisk individuals. Such programs lead to a more rapid detection and recovery of infected individuals and, as such, act in a similar manner to public awareness campaigns. By contrast, contact tracing focuses on the partners of infected individuals (once they have been identified) and hence utilizes the network structure associated with transmission. Here we compare these two strategies using the pairwise network equations and the parameters derived from Wylie and Jolly (11). Contact tracing cannot be realistically modeled by a meanfield approach because information on partnerships has been neglected.
The level of screening (or public awareness) is reflected by a reduction in the infection parameter (τ/g) relative to its estimated value; screening alone results in individuals recovering back into the susceptible class while their partners are unaffected. Contact tracing is modeled by additionally treating a proportion of the partners of an identified index case—this proportion is referred to as the tracing efficiency. In practice this procedure would follow the chains of transmission, but for simplicity we truncate the tracing at secondary cases (see Appendix E in supporting information on the PNAS web site). Fig. 3b shows the prevalence of infection as the levels of the two control measures vary. Clearly, the level of screening has to be sufficiently high so that index cases can be detected—contact tracing alone is unlikely to be effective. Contact tracing also has little effect when the incidence of infection is high. However, as screening/awareness improves and the incidence of infection decreases contact tracing becomes a more efficient way of searching for infection. This observation suggests that a combined policy is generally the optimal approach to achieve eradication. We note that a more targeted screening campaign, focusing on highrisk individuals, is likely to be far more successful.
Discussion
Sexual mixing networks demonstrate the interlinked pattern of sexual interactions, and are conceptually important in modeling STDs. The clearly defined nature of sexual partnerships and the restricted number of partners of each individual mean that the population cannot be considered well mixed; rather, infection status in neighboring individuals is highly correlated. For both STDs and other infections the accurate determination of the complete networks is rarely feasible, but local information such as the distribution of number of partners is often available.
We have presented a model that requires only local information to produce a highly accurate description of the disease dynamics. This model treats the partnerships between individuals as its variables, and as such can include both the heterogeneities within the network structure and the heterogeneities that develop because of the disease dynamics. This model proved far more accurate than either the standard (meanfield) riskstructured models (20, 21) or the kregular pairwise models (27–29), agreeing closely with stochastic simulation on a computergenerated network. This result shows that both heterogeneities in behavior and correlations at the partnership level must be included in any predictive scheme. The full pairwise model developed here, however, suffers from requiring a vast number of variables to capture the state of the network (the system is highdimensional); this complexity motivated the formulation of an approximation, which performs as well as the full model, but has a much lower number of variables and is far more computationally efficient.
Having demonstrated the accuracy of this approach against computer simulations, we use some of the available network data (11) to consider some more applied aspects. The prevalence level of infection can be used to estimate the individuallevel parameters for that subpopulation by using any given model. The pairwise model consistently predicts higher partnertopartner transmission rates, but a lower basic reproductive ratio, R_{0}. However, because of the nonlinear behavior of R_{0} as the infection parameter changes (Fig. 2c) this does not simply translate into an extinction threshold. Instead, for random disease control the standard models underestimate the difficulty of eradication. The pairwise model can also be used to evaluate the impact of contact tracing—a form of targeted control that utilizes the network structure. This extension of the pairwise model shows that an integrated strategy using both random screening/prevention and contact tracing is likely to be the most successful, because contact tracing has the biggest effect when the prevalence of infection is already low. In contrast, standard meanfield models predict that the eradication threshold is not affected by contacttracing because they ignore the essential information on partnerships.
The modeling approach developed here does have two potential flaws, however. First, it is deterministic, essentially assuming a vast population size, and therefore cannot reproduce the variability seen in the stochastic simulations or the chance localized extinctions of infection that can occasionally occur. Second, the model loses some of its accuracy if the partnerships are formed predominantly with nearby individuals—i.e., when there is a strong spatial element in the network. Although sexual networks contain primarily local interactions with most partners residing in the same village, town, or city region, the presence of a significant proportion of outoftown, longrange, links (11, 34, 49, 50) may mean that they are more like smallworld than exclusively local networks. Hence for realworld networks and for large populations the model is expected to perform well.
The complexities and complications of real sexual networks and STD transmission can never be fully captured by simple models. However, the approach shown here, which can accurately decompose the detailed structure of a mixing network into a system of relatively simple equations, has been demonstrated to provide a robust and reliable framework. The techniques and methodologies can be simply adapted to match a variety of disease dynamics and network characteristics, and as such provide a practical tool for understanding and predicting the presence, progression, and prevention of STDs.
Acknowledgments
We thank Bryan Grenfell, Graham Medley, and two anonymous referees for their helpful comments and suggestions. This research was supported by the Medical Research Council (K.T.D.E.) and the Royal Society (M.J.K.).
The PairWise Network Equations
Because infection can be transmitted only via edges in the mixing network, for the purposes of disease spread only partnerships consisting of an infected and a susceptible individual need be considered. Hence the full SIS model is described by the following set of equations: 6 where the triples are evaluated by using the moment closure approximation: 7 This expression itself can be refined when there are triangles (loops containing three individuals) in the population to take into account the partnership between the B^{n} and the D^{p} (27, 29). However, for many STD networks, the proportion of triangles compared with triples is likely to be very low (in heterosexual networks triangles are absent), hence for simplicity these will be ignored.
Diseases with SIRtype dynamics follow a similar set of differential equations, with the terms in brackets ({·}) absent.
The Approximated PairWise Equations
From the form of the individual equations, we have been motivated to use variables of the form [A^{n}B] = ∑_{m} [A^{n}B^{m}] instead of the full set of all possible pairs. This can be achieved by summing the full pairwise equations over all possible m values. Where necessary we use the approximation: 8 The first term takes into account the partnership types (finding an A^{n} next to a B and a B^{m} next to an A), and the second term accounts for the neighborhood structure (finding an n–m pair). This leads to a smaller but more complex set of equations (see Appendix C in supporting information).
Footnotes

↵‡ To whom reprint requests should be addressed. Email: m.j.keeling{at}warwick.ac.uk.

This paper was submitted directly (Track II) to the PNAS office.
Abbreviation
 STDs,
 sexually transmitted diseases
 Received April 24, 2002.
 Copyright © 2002, The National Academy of Sciences
References
 ↵
 Joint United Nations Program on HIV/AIDS and World Health Organization

 Centers for Disease Control and Prevention

 PHLS,
 DHSS & PS,
 the Scottish ISD(D)5 Collaborative Group
 ↵
 ↵
 Mollison D

 Durrett R,
 Levin S A

 Rhodes C J,
 Jensen H J,
 Anderson R M

 Rohani P,
 Earn D J D,
 Grenfell B T

 Keeling M J,
 Woolhouse M E J,
 Shaw D J,
 Matthews L,
 ChaseTopping M,
 Haydon D T,
 Cornell S J,
 Kappey J,
 Wilesmith J,
 Grenfell B T
 ↵
 ↵
 ↵
 Friedman S R,
 Neagius A,
 Jose B,
 Curtis R,
 Goldstein M,
 Ildefonso G,
 Rothenberg R B,
 Des Jarlais D C
 ↵
 ↵
 Rothenberg R B,
 Potterat J J,
 Woodhouse D E
 ↵
 Anderson R M,
 May R M
 ↵
 Renton A,
 Whitaker L,
 Ison C,
 Wadsworth J,
 Harris J R W
 ↵
 ↵
 ↵
 Edmunds W J,
 O'Callaghan C J,
 Nokes D J
 ↵
 ↵
 ↵
 Keeling M J,
 Rand D A,
 Morris A J

 Bauch C,
 Rand D A
 ↵
 Keeling M J
 ↵
 Ghani A C,
 Garnett G P
 ↵
 ↵
 Fenton K A,
 Johnson A M,
 McManus S,
 Erens B
 ↵
 ↵
 ↵
 ↵
 ↵
 Kretzschmar M,
 van Duynhoven Y T H P,
 Severijnen A J

 Garnett G P,
 Anderson R M
 ↵
 Aral S O
 ↵
 ↵
 ↵
 Boots M,
 Sasaki A
 ↵
 ↵
 ↵
 ↵
 Wasserheit J N,
 Aral S O
 ↵
 ↵
 ↵
 Watts D J
 ↵
 ↵
 ↵