# Generalized reproduction numbers and the prediction of patterns in waterborne disease

^{a}Dipartimento di Elettronica e Informazione, Politecnico di Milano, 20133 Milan, Italy;^{b}Laboratory of Ecohydrology, School of Architecture, Civil and Environmental Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland;^{c}Department of Civil and Environmental Engineering, Princeton University, Princeton, NJ 08544; and^{d}Dipartimento di Ingegneria Civile, Edile ed Ambientale, Università di Padova, 35131 Padua, Italy

See allHide authors and affiliations

Contributed by Andrea Rinaldo, October 10, 2012 (sent for review July 14, 2012)

## Abstract

Understanding, predicting, and controlling outbreaks of waterborne diseases are crucial goals of public health policies, but pose challenging problems because infection patterns are influenced by spatial structure and temporal asynchrony. Although explicit spatial modeling is made possible by widespread data mapping of hydrology, transportation infrastructure, population distribution, and sanitation, the precise condition under which a waterborne disease epidemic can start in a spatially explicit setting is still lacking. Here we show that the requirement that all the local reproduction numbers be larger than unity is neither necessary nor sufficient for outbreaks to occur when local settlements are connected by networks of primary and secondary infection mechanisms. To determine onset conditions, we derive general analytical expressions for a reproduction matrix , explicitly accounting for spatial distributions of human settlements and pathogen transmission via hydrological and human mobility networks. At disease onset, a generalized reproduction number (the dominant eigenvalue of ) must be larger than unity. We also show that geographical outbreak patterns in complex environments are linked to the dominant eigenvector and to spectral properties of . Tests against data and computations for the 2010 Haiti and 2000 KwaZulu-Natal cholera outbreaks, as well as against computations for metapopulation networks, demonstrate that eigenvectors of provide a synthetic and effective tool for predicting the disease course in space and time. Networked connectivity models, describing the interplay between hydrology, epidemiology, and social behavior sustaining human mobility, thus prove to be key tools for emergency management of waterborne infections.

Waterborne diseases, mainly due to protozoa or bacteria and often resulting in profuse diarrhea (cholera is a prominent example), are one of the leading causes of death and especially strike infants and children in low-income countries (1). Therefore, it is fundamental to develop realistic dynamical models that can provide insights into the course of past and ongoing epidemics, assist in emergency management, and allocate health-care resources via an assessment of alternative intervention strategies (2⇓⇓⇓⇓–7). These models must properly account for the relevant disease and transport timescales (8⇓–10). Whereas some diarrheal infections, like Rotavirus, have basically a fecal–oral transmission and their spread can thus be modeled as susceptible-infected-recovered (SIR) systems (11), traditional waterborne disease models (12, 13) include, in addition to susceptibles (*S*) and infectives (*I*) (14), the population dynamics of bacteria (*B*) in water reservoirs. More recent modeling has considered hyperinfectivity of newly excreted vibrios (15), prey–predator interactions with phages (16), and seasonal and climate forcings (17⇓⇓⇓⇓–22). All these models, however, have not considered explicitly the spatial spread that, in waterborne diseases, occurs primarily along hydrological pathways, from coastal to inland regions or vice versa and from inland epidemic sites to neighboring areas. Integrating hydrology into epidemiological models is clearly of paramount importance and represents a recent accomplishment (5, 6, 9, 10). Moreover, infected individuals are often asymptomatic—as much as 75% for cholera (23) and 80% for amoebiasis caused by *Entamoeba histolytica* (24)—and thus can move and spread the disease to communities other than those where they were infected. Therefore, including human mobility—via either diffusion-based (25) or gravity-like models (2⇓⇓⇓–6, 26)—is mandatory. The widespread availability of georeferenced data regarding hydrological and transportation networks, human demography, water sanitation, and treatment centers distribution has facilitated the introduction of explicitly spatial models (3, 5). However, a systematic analysis of the conditions under which a waterborne disease epidemic can start within a specific territory is not yet available. Here, we determine the onset conditions and link them to explicit geographic descriptions of demographic, epidemiological, climatic, and socio-economic characteristics. From these conditions, spatiotemporal patterns of the infection are derived in a predictive manner.

## Model and the Onset Conditions

The basis of our analysis is a spatially explicit nonlinear differential model (*Materials and Methods*) that accounts for both the hydrological and the human mobility network (3, 6). The hydrological networks can be those of river basins, extracted from landscape topography (27), or those of human-made water distribution and sewage systems (or both). Depending on spatial resolution, network nodes can be cities, towns, or villages. In the *i*th community of size , with (*n* is the number of nodes) the state variables at time *t* are the local abundances of susceptibles, , infected/infective individuals, , and the concentration of pathogens in water reservoirs of volume . Infectives release pathogens at a site-dependent rate . Susceptibles are exposed to contaminated water at a rate and become infected according to a saturating function of (*K* = half-saturation constant). Connections between communities are described by two matrices (3, 6), (hydrologic network) and (human mobility) with . Pathogens die at a rate and move in water from node *i* to node *j* with a probability at a rate *l* depending on downstream advection and other water-mediated transport pathways (e.g, attachment to plankton). They are also spread by human mobility: individuals leave their home node *i* with probability for susceptibles and for infectives (usually ), reach their target *j* with probability , and then come back to *i*. Consistency requires and for any *i*. We assume that the union of - and -associated graphs is strongly connected; namely the infection can spread to any community along either network.

Disease onset is determined by the instability of the disease-free equilibrium (, for all *i*). If the communities were isolated (no hydrological or mobility connections), the onset condition in each community (13) would require that the local basic reproduction number

where *γ* is the recovery rate, and *μ* and *α* are, respectively, human baseline and disease-induced mortality rates. The connection with more traditional derivations of local values of for classic SIR models is reported in *SI Text*. Note that various versions of SIR-like local models would command different expressions of , accounting for parameters of the relevant mechanisms (e.g., ref. 28). For the spatially connected system, instead, onset in the metacommunity does not correspond to one or more local reproduction numbers .

Technically, our approach is similar to that of the next-generation matrix (NGM) (29⇓–31), which has been used for compartmental models rather than spatially explicit ones. Both methods are based on dominant eigenvalue analysis (32). More specifically, however, we use a bifurcation analysis (33) to determine how the eigenvalues of the Jacobian at vary with the model parameters (*Materials and Methods*). As the system is positive and is characterized by null values of and , the bifurcation can occur only via an exchange of stability; i.e., the disease-free equilibrium switches from stable node to saddle through a transcritical bifurcation (the Jacobian has one zero eigenvalue).

Define and introduce the diagonal matrices , , , , and (whose diagonals consist of parameters , , , , and with , respectively). is also diagonal with elements equal to ; thus, . The bifurcation corresponds to

where is the (-dimensional) identity matrix and

is a transmission matrix accounting for different probabilities of movement. , , and are matrices that correspond, respectively, to metacommunities with (*i*) infectives only, (*ii*) susceptibles only, or (*iii*) both infectives and susceptibles being mobile. thus depends on mobility through the terms (movement to a community), (movement from a community), and (movement to and from) (*Materials and Methods*).

Equivalent to Eq. **1** is that the dominant eigenvalue of the generalized reproduction matrix

equals unity. Our main result is therefore that the onset of the disease can be triggered whenever switches from being less than to being larger than 1; namely

is the sum of two matrices, one depending (linearly) on the hydrological matrix and the other (nonlinearly) on the human mobility matrix . Therefore, the two networks interplay in a complex manner to determine disease insurgence and spread.

The geographical distribution of disease insurgence can be linked to the dominant eigenvector of (*SI Text*). When is slightly larger than unity and no other eigenvalue has modulus , is a saddle and the dominant eigenvector corresponds to the unique unstable direction in the state space along which the system orbit will diverge from the equilibrium. Once the eigenvector is suitably projected onto the subspace of infectives and normalized, its *n* components are the relative proportions of the infectives in the *n* communities (*SI Text*). Whenever the dominant eigenvalue is sufficiently larger than one, there may be other eigenvalues of with modulus and more than one unstable direction of . However, after a short-term transient due to initial conditions, the disease will mainly propagate to the nodes corresponding to the largest components of the dominant eigenvector. These communities are those where the number of infectives will be highest during the onset and will thus act as the main foci of the disease.

## Case Studies

Our approach is applied to the cholera epidemic that struck Haiti in 2010 and is still ongoing (details in *SI Text*). We use Haiti epidemiological data from November 2010 to May 2011 (Fig. 1 *A* and *B*) and a model (based on a network of 366 hydrologic/demographic entities, *SI Text*) derived from the reassessment (6) of the disease unfolding. Attractivity of a certain destination site is supposed to be given by a constrained gravity model (34),

where is the distance between node *i* and node *j* and *D* is the average travel distance. Results of the analysis are reported in Fig. 1 *C–F*. In the Haitian case is practically insensitive to changes in pathogen movement rate *l* and average mobility distance *D* and to increases of the human mobility rate *m* (Fig. 1*D*), whereas it is quite sensitive to variations of the exposure and contamination rates *β* and *p*, of the pathogen mortality rate , and of the recovery rate (which is the largest component of parameter *ϕ*). Therefore, an effective way to prevent the onset of cholera would have been to implement sanitation measures to decrease the exposure (or contamination) rate by more than 40%. The alternative would have been to increase the rate of recovery by more than 50% via, e.g., the use of antibiotics, a measure that is not recommended, however, on a massive scale (35). The dominant eigenvector is a good indicator of the spatial distribution of recorded cases at both the coarse administrative level (10 departments, Fig. 1*C*) and the fine-grained scale (1-km^{2} population pixels, Fig. 1*E*), as demonstrated by the corresponding coefficients of determination (Fig. 1 legend). A sensitivity analysis has been run to determine how the predictive ability of the dominant eigenvector and the value of change with parameter variations. The coefficients of determination (Fig. 1*F*) exceed 75% even for variations as high as 50%, thus indicating robustness in the prediction of the spatial pattern.

The Haitian study was purely retrospective. We attempt a real-time predictive use for another monitored case of cholera epidemic (Thukela river basin in KwaZulu-Natal, South Africa, which was struck by cholera in 2000) (Fig. 2 *A* and *B*). The structure of the water network (287 nodes) is borrowed from a previous work (9). Model parameters have been estimated via Markov chain Monte Carlo sampling (36), using only the first weeks of epidemiological records as calibration data (Fig. 2*A* and *SI Text*). The results are reported in Fig. 2 *C–F*. The dominant eigenvector (associated to ) is a good indicator of the spatial patterns of disease spread (Fig. 2*C*), with coefficients of determination for disease onset (gray area in Fig. 2*A*) and for the epidemic course in 2000–2001. On the contrary, the local reproduction number in excess of one is neither necessary nor sufficient to pinpoint the ensuing epidemic in a spatially connected system. Fig. 2*D*, in fact, shows a map of the local reproduction numbers at the onset of the epidemic that can be contrasted with the spatial distribution of infection cases in excess of 10 (Fig. 2*E*). Strikingly, the comparison shows that several regions where still have disease because they are connected to other infected areas. Furthermore, a correlation analysis (details in *SI Text*) between the number of cases and the local reproduction numbers reveals that the distribution of has a much lower explanatory power than the dominant eigenvector of . Similar results hold for the correlation with the local population sizes and the fraction of households without piped water or toilet facilities, as well as for the related model parameters (exposure probabilities and contamination rates ; *SI Text*). Their coefficients of determination cannot compete with those resulting from the eigenvector analysis, thus suggesting that not only local living conditions but also pathogen transport (along waterways and induced by human mobility) must be properly taken into account to understand the formation of the spatial patterns of disease spread.

A sensitivity analysis of the Thukela model (Fig. 2*F*) shows that the dominant eigenvector is quite a robust spatial indicator of disease incidence. Performances are lower than for the Haiti case study, in which, however, model parameters were calibrated using the whole epidemiological dataset. The dominant eigenvector of , as calculated at the end of initial calibration, is not only a good indicator, but also a satisfactory predictor of the disease spatial distribution in the months following the calibration’s end. In fact, of the eigenvector components vs. the fractions of cumulative cases from the end of the calibration window to the epidemic peak is 0.91, whereas it is 0.84 if calculated against the cases from the end of calibration to epidemic fading.

## Metapopulation Networks

More insight derives from exploring complex theoretical landscapes with different characteristics of river basin network, population distribution, and human mobility. To this end, we construct abstract hydrological networks endowed with a realistic topology, the Peano networks (37⇓–39), which are deterministic fractals (40) whose scaling topology and metrics have been solved analytically (41) (*SI Text*). We study both the spatially homogeneous case (all the site-dependent parameters are identical and so are all the ) and the more realistic one in which the population of different communities is distributed according to Zipf’s law (42) (*SI Text*), mobility is gravity-like, and water volumes are proportional to . Fig. 3 *A* and *B* shows the parameter combinations corresponding to disease onset for the homogeneous case. The higher the values are of either *l* ( is the average residence time of pathogens in each node) or the downstream transport bias *b* (proportional to stream velocity; *SI Text*), the higher the local reproduction number must be for an epidemic to be possibly triggered (or triggered with high probability in the case of inhomogeneous population distributions, Fig. 3*C*). Conversely, for low values of *l* and *b* the disease can start even if all the are smaller than unity. We term this phenomenon a locally subthreshold epidemic as was done for compartmental models (43). In fact, when *l*, , , the local onset conditions () are neither necessary nor sufficient for disease outbreak. Human mobility can remarkably favor disease onset, especially if large population shares move (high *m*) over either intermediate (homogeneous population, Fig. 3*B*) or short (Zipf-like distribution, Fig. 3*D*) distances *D*. The analysis can be extended to different mobility models based on small-world and scale-free graphs (*SI Text*). The predictive ability of the dominant eigenvector against the spatial distribution of cases for both homogeneous and Zipf-like population distributions is shown in *SI Text*. It is very good at disease emergence and during the onset phase, whereas it usually decreases as the epidemic outbreak unfolds (homogeneous population distribution, for disease emergence; for the onset phase; for the whole epidemic course; and Zipf-like population distribution, , , ). This negative trend is especially apparent for the homogeneous case, whereas it is much less evident for Zipf-like population distributions, suggesting that population heterogeneity may play an important role in the definition of long-term spatial patterns of epidemic spread.

## Conclusion

We have derived an epidemiological matrix accounting for both hydrological transport and human mobility. Its dominant eigenvalue is the generalized reproduction number controlling the onset of waterborne disease whereas the dominant eigenvector well characterizes the geography of disease insurgence. Despite the simplicity of the model, the method not only is successful from a theoretical viewpoint, but also proves valid as a tool for analyzing real epidemics (Haiti and KwaZulu-Natal) and inspiring appropriate control measures and sanitation actions, which are the subject of ongoing debate (44, 45). In particular, the model is a good compromise between traditional space-homogeneous approaches (13) and individual-based simulation (5), thus coupling realism with a sufficiently simple structure that allows statistically significant tuning of a few relevant parameters. Spatially explicit data, now widely available, can be easily incorporated into the transmission matrices to describe real landscapes as networks of any complexity, ranging from a few to hundreds of nodes. Obviously, it should be remarked that the approach is valid to study just the onset conditions and cannot be extrapolated to draw conclusions about the long-term fate of waterborne disease. This requires the further inclusion of other fundamental determinants such as immunity loss (6) and seasonal and interannual variations of hydrological conditions (46).

The model can be generalized in many important ways. Including different age classes of human hosts (47), spatial heterogeneity of pathogen ecology (48, 49), and competition between pathogen strains (50) can add further realism to the methodology. Although not easy, expressions for the reproduction matrix can be obtained even for these cases. Our methodology, based on dominant eigenvalue and eigenvector analysis (32) coupled with georeferenced data assimilation, will provide, in our opinion, a valuable and reasonably simple approach for predicting the spatiotemporal epidemic course of waterborne disease outbreaks.

## Materials and Methods

### Local Epidemic Model.

Epidemiological dynamics of susceptibles and infected/infectives in the *i*th community, with , and transport of pathogens (represented by their concentrations ) over the networks are described by the following set of ordinary differential equations (*t* is time):

The evolution of susceptibles (first equation) is a balance between population demography and infections due to contact with a pathogen. The host population, if uninfected, is at demographic equilibrium (the size of the *i*th local community), with *μ* being the baseline mortality rate of humans. The parameter is the site-dependent rate of exposure to contaminated water, and (dose-response function) is the probability of becoming infected due to exposure to concentration of pathogens. In accordance with much literature on cholera (13), we use the hyperbolic and saturating function (*K* is the half-saturation constant). The dynamics of infectives (second equation) are a balance between newly infected individuals and losses due to recovery or natural/pathogen-induced mortality, with *γ* and *α* being recovery and disease-induced mortality rates, respectively. The dynamics of recovered individuals are neglected, because waterborne diseases confer at least temporary immunity and its loss neither determines conditions for disease onset nor affects its evolution in the immediate development following the epidemic peak. The dynamics of free-living pathogens concentration (third equation) assume that bacteria or protozoa are released in water (e.g., excreted) by infective individuals and immediately diluted in a well-mixed local water reservoir of volume at a site-dependent rate . Free-living pathogens are also assumed to die at a constant, site-independent rate . Regarding the hydrological transport, the spread of pathogens over the river network is described as a biased random-walk process on an oriented graph (9). Here, we assume that pathogens can move at a rate *l* from node *i* to node *j* of the hydrological network with a probability . The rate depends on both downstream advection and other possible pathogen transport pathways along the hydrological network, e.g., short-range distribution of water for consumption or irrigation or bacterial attachment to phyto- and zooplankton. Possible topological structures for the hydrological network range from simple one-dimensional lattices to realistic mathematical characterizations of existing river networks. The nodes of the human mobility network are assumed to correspond to those of the hydrological layer, whereas edges are defined by connections among communities. We also assume that susceptible and infective individuals can undertake short-term trips from the communities where they live toward other settlements. While traveling or commuting, susceptible individuals can be exposed to pathogens and return as infected carriers to the settlement where they usually live. Similarly, infected hosts can disseminate the disease away from their home community. In many cases infected individuals are asymptomatic and thus are not barred, or are only partially barred, from their usual activities by the presence of the pathogen in their intestine. Human mobility patterns are defined according to a connection matrix in which individuals leave their original node (say *i*) with an infection-dependent probability (respectively for susceptibles and for infectives, usually with ), reach their target location (say *j*) with a probability , and then come back to node *i*. Topological and transition probability structures for human mobility networks used in epidemiology can be based on suitable measures of node-to-node distance like in gravity models (34); on the actual transportation network; or on models based on conceptually different interactions, such as Erdős–Rényi random graphs (51), scale-free networks (52), small-world–like graphs (53), and radiation models (54).

### Derivation of Onset Conditions.

To analyze stability, we consider the Jacobian of the linearized system evaluated at the disease-free equilibrium _{0} (, for all *i*), which is given by

where

Note that the variables for pathogen have been scaled as . Because of its block-triangular structure, the Jacobian has obviously *n* eigenvalues equal to ; therefore, instability is determined by the eigenvalues of the block matrix

is a proper Metzler matrix (55); namely its off-diagonal entries are all nonnegative and at least one diagonal entry is negative. Thus its eigenvalue with maximal real part (dominant eigenvalue) is real. If the union of the graphs associated with matrices and is strongly connected, then the graph associated with is also strongly connected. Therefore one can apply the Perron–Frobenius theorem (56) for irreducible matrices and state that the dominant eigenvalue is a simple real root of the characteristic polynomial. The condition for the transcritical bifurcation of the disease-free equilibrium is that the dominant eigenvalue crosses the imaginary axis at zero; namely, the determinant of is zero (33). Actually, when the disease-free equilibrium is stable, all the eigenvalues have negative real parts and is positive because is a matrix of order . So the disease-free equilibrium becomes unstable when switches from positive to negative or equivalently the dominant eigenvalue becomes zero.

As obviously commutes with any matrix, we have (ref. 57 and *SI Text*)

Because , , and are diagonal, thus commuting, matrices, we can state that and rework the determinant of (*SI Text*). The condition is thus given by

In addition to the matrix we can now introduce three other matrices of reproduction numbers; namely,

corresponding to metacommunities with infectives only being mobile, susceptibles only being mobile, and both infectives and susceptibles being mobile, respectively. If we account for the different probabilities of movement in the metacommunity, we can define a transmission matrix averaged over nonmobile individuals, mobile infectives, and mobile susceptibles as

Therefore, the bifurcation of the disease-free equilibrium corresponds to the condition

Equivalently, the dominant eigenvalue of the matrix

which is a convex combination of and , must equal unity. Actually, the disease-free equilibrium switches from being stable to being a saddle, thus triggering the start of the disease, whenever the dominant eigenvalue of switches from positive to negative and hence whenever switches from being less than 1 to being larger than 1.

## Acknowledgments

The authors thank Paolo Gatto for technical suggestions on numerical simulations. L.M., E.B., L.R., and A.R. acknowledge the support provided by the European Research Council (ERC) advanced grant program through the project “River networks as ecological corridors for biodiversity, populations and waterborne disease” (RINEC-227612) and by the Swiss National Science Foundation (SNF/FNS) projects 200021_124930/1 and CR2312_138104/1. M.G. and A.R. acknowledge the support from the SFN/FNS project IZK0Z2_139537/1 for international cooperation. I.R.-I. acknowledges the support of the James S. McDonnell Foundation through a grant for Studying Complex Systems (220020138).

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. E-mail: andrea.rinaldo{at}epfl.ch.

Author contributions: M.G., L.M., E.B., R.C., I.R.-I., and A.R. designed research; M.G., L.M., E.B., R.C., L.R., and A.R. performed research; M.G., L.M., E.B., R.C., L.R., and A.R. analyzed data; and M.G., L.M., I.R.-I., and A.R. wrote the paper.

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1217567109/-/DCSupplemental.

Freely available online through the PNAS open access option.

## References

- ↵
- World Health Organization

- ↵
- ↵
- Mari L,
- et al.

- ↵
- ↵
- Chao DL,
- Halloran ME,
- Longini IM Jr.

- ↵
- Rinaldo A,
- et al.

- ↵
- Mukandavire Z,
- et al.

- ↵
- Rohani P,
- Earn DJ,
- Grenfell BT

- ↵
- ↵
- Bertuzzo E,
- Casagrandi R,
- Gatto M,
- Rodriguez-Iturbe I,
- Rinaldo A

- ↵
- Pitzer VE,
- et al.

- ↵
- ↵
- ↵
- Anderson R,
- May R

- ↵
- ↵
- Jensen MA,
- Faruque SM,
- Mekalanos JJ,
- Levin BR

- ↵
- Colwell RR

- ↵
- ↵
- Lipp EK,
- Huq A,
- Colwell RR

- ↵
- ↵
- Reiner RC Jr.,
- et al.

- ↵
- ↵
- World Health Organization

*Prevention and Control of Cholera Outbreaks: WHO Policy and Recommendations*, Technical Report (WHO Global Task Force on Cholera Control, Geneva, Switzerland). - ↵
- ↵
- ↵
- ↵
- Rodriguez-Iturbe I,
- Rinaldo A

- ↵
- ↵
- Diekmann O,
- Heesterbeek JAP,
- Roberts MG

- ↵
- Roberts MG,
- Heesterbeek JAP

- ↵
- ↵
- ↵
- Kuznetsov Y

- ↵
- Erlander S,
- Stewart NF

- ↵
- ↵
- Gilks WR,
- Richardson S,
- Spiegelharter DJ

- ↵
- ↵
- ↵
- ↵
- Mandelbrot B

- ↵
- ↵
- Zipf GK

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Faruque SM,
- et al.

*Vibrio cholerae*strains carrying genetic variants of the toxin-coregulated pilus pathogenicity island. Infect Immun 71(2):1020–1025. - ↵
- Erdős P,
- Rényi A

- ↵
- Albert R,
- Barabási A-L

- ↵
- ↵
- ↵
- Farina L,
- Rinaldi S

- ↵
- Gantmacher F

- ↵

## Citation Manager Formats

## Article Classifications

- Biological Sciences
- Ecology

- Physical Sciences
- Environmental Sciences