Multiscale mobility networks and the spatial spreading of infectious diseases
 ^{a}Center for Complex Networks and Systems Research, School of Informatics and Computing, Indiana University, Bloomington, IN 47408;
 ^{b}Pervasive Technology Institute, Indiana University, Bloomington, IN 47404;
 ^{c}Computational Epidemiology Laboratory, Institute for Scientific Interchange Foundation, 10133 Torino, Italy; and
 ^{d}Department of Physics, Indiana University, Bloomington, IN 47406
See allHide authors and affiliations

Edited by H. Eugene Stanley, Boston University, Boston, MA, and approved October 13, 2009 (received for review June 19, 2009)
Abstract
Among the realistic ingredients to be considered in the computational modeling of infectious diseases, human mobility represents a crucial challenge both on the theoretical side and in view of the limited availability of empirical data. To study the interplay between shortscale commuting flows and longrange airline traffic in shaping the spatiotemporal pattern of a global epidemic we (i) analyze mobility data from 29 countries around the world and find a gravity model able to provide a global description of commuting patterns up to 300 kms and (ii) integrate in a worldwidestructured metapopulation epidemic model a timescaleseparation technique for evaluating the force of infection due to multiscale mobility processes in the disease dynamics. Commuting flows are found, on average, to be one order of magnitude larger than airline flows. However, their introduction into the worldwide model shows that the largescale pattern of the simulated epidemic exhibits only small variations with respect to the baseline case where only airline traffic is considered. The presence of shortrange mobility increases, however, the synchronization of subpopulations in close proximity and affects the epidemic behavior at the periphery of the airline transportation infrastructure. The present approach outlines the possibility for the definition of layered computational approaches where different modeling assumptions and granularities can be used consistently in a unifying multiscale framework.
Computational approaches to the realistic modeling of spatial epidemic spread make use of a wide array of simulation schemes (1) ranging from very detailed agentbased approaches (2–6) to structured metapopulation models based on datadriven mobility schemes at the interpopulation level (7–10). All these approaches integrate a wealth of realworld data. However, it is not yet clear how to discriminate the effects of the inclusion/lack of realworld features in specific models. This limitation is mainly related to our incomplete knowledge of human interactions and mobility processes, which are fundamental aspects to describe a disease spread. Although recent efforts started to make available massive data on human mobility from different sources and at different levels of description (11–20), the multiscale nature of human mobility is yet to be comprehensively explored. Human mobility can be generally described by defining a network of interacting communities where the connections and the corresponding intensity represent the flow of people among them (13, 14). Global mobility flows therefore form very complex multiscale networks (21) spanning several orders of magnitude in intensity and spatiotemporal scales ranging from the longrange intercontinental air traffic (13, 15) to the short range commuting flows (17–19). A multitude of heuristic models for population structure and mobility patterns have been proposed, but they all depend on the specific mobility process under consideration (22, 23). The limited understanding of the interrelations among the multiple scales entailed in human mobility and their impact on the definition of epidemic patterns constitute a major road block in the development of predictive largescale data driven epidemic models. In this context, two questions stand out: (i) Is there a most relevant mobility scale in the definition of the global epidemic pattern? and (ii) At which level of resolution of the epidemic behavior does a given mobility scale become relevant, and to what extent?
To begin addressing these questions, we use highresolution worldwide population data that allow for the definition of subpopulations according to a Voronoi decomposition of the world surface centered on the locations of International Air Transport Association (IATA)indexed airports (www.iata.org). We have then gathered data on the commuting patterns of 29 countries in five continents, constructing shortrange commuting networks for the defined subpopulations. Extensive analysis of these networks allows us to draw a general gravity law for commuting flows that reproduces commuting patterns worldwide. This law, valid at the scale defined by the tessellation process, is statistically stable across the world because of the globally homogeneous procedure applied to build the subpopulations around transportation hubs. The multiscale networks we obtain are integrated into the global epidemic and mobility (GLEaM) model, a computational platform that uses a metapopulation stochastic model on a global scale to simulate the largescale spreading of influenzalike illnesses (ILI). To fully consider the effect of multiscale mobility processes in the disease dynamics, we develop a timescaleseparation technique for evaluating the force of infection due to different mobility couplings and simulate global pandemics with tunable reproductive ratios. The results obtained from the full multiscale mobility network are compared with the simulations in which only the largescale coupling of the airline transportation network is included. Our analysis shows that although commuting flows are, on average, one order of magnitude larger than the longrange airline traffic, the global spatiotemporal patterns of diseasespreading are mainly determined by the airline network. Shortrange commuting interactions have, on the other hand, a role in defining a larger degree of synchronization of nearby subpopulations and specific regions, which can be considered weakly connected by the airline transportation system. In particular, it is possible to show that shortrange mobility has an impact in the definition of the subpopulation infection hierarchy. The techniques developed here allow for an initial understanding of the level of data integration required to obtain reliable results in largescale modeling of infectious diseases.
Results and Discussion
Simulations of worldwide epidemic spread are generally based on structured metapopulation models that consider datadriven schemes for longrange mobility at the interpopulation level coupled with coarsegrained techniques within each subpopulation (7–10, 24–26). In this paper, we use the GLEaM computational scheme based on a georeferenced metapopulation approach. The model consists of three data layers. The population and mobility layers allows the partition of the world into geographical census regions coupled by population movements. This partition defines the subpopulation network where the connections between subpopulations represent the fluxes of individuals due to the transportation infrastructures and mobility patterns. Superimposed on this subpopulation network is the epidemic layer that defines inside each subpopulation the disease dynamics that depends on the specific etiology of the disease considered (see Material and Methods).
Multiscale Mobility Networks.
The basic structure of GLEaM is based on highresolution population data^{†} that estimates the population with a resolution given by cells of 15 × 15 minutes of arc, covering the whole planet. This population data allows the construction of Voronoi tassels around transportation hubs in the world, defining the subpopulations structure of the metapopulation model (see SI Appendix). In particular, we identify 3,362 subpopulations centered around IATA airports in 220 different countries. The airtraffic network among the defined subpopulations is obtained from the IATA databases that contain the list of worldwide airport pairs connected by direct flights and the number of available seats on any given connection. The high level of geographical resolution of the subpopulation database enables us to integrate also the mobility flows due to commuting patterns between subpopulations (see Material and Methods) and construct the corresponding commuting network. The main difficulty in defining a commuting network worldwide is the lack of a global database as opposed to the case of the airtraffic flow. Data are scattered in different national and international databases that use different administrative and geographical granularities, and several definitions of commuting flows. We have collected commuting data from 29 countries (a full list of countries and the database properties are reported in the SI Appendix) in five different continents. Each dataset was mapped into the GLEaM Voronoi tessellation constructing the commuting networks at the subpopulation level.
In Fig. 1, we show the commuting network of the continental U.S. as obtained by mapping the county commuting data onto the subpopulations used by GLEaM. Commuting data do not consider airline flows that are accounted for by the IATA dataset. On the same scale, we also report the airline traffic network, readily highlighting the difference in scale and spatial structure of the two networks. The commuting network appears as an almost gridlike lattice connecting neighboring subpopulations, whereas the airline traffic network is dominated by long range connections. The wide range of scales is evident also in the intensities of the mobility flows, spanning several orders of magnitude, with the average commuting flow being one order of magnitude larger than the average airline traffic flow. Finally, it should be noted that, in general, commuting flows refer to round trip processes with a characteristic time of the order of 1/3 day (average duration of a work day) compared with much longer characteristic times for airline travel (average value around two weeks at the end.^{‡}
To gain general insight on the commuting flow, we use the general gravity model from transportation theory (22, 23) as a starting point. This model assumes that the commuting flow w_{ij} between subpopulation i (with population N_{i}) and subpopulation j (with population N_{j}) takes on the form: where C is a proportionality constant, α and γ tune the dependence with respect to each subpopulation size, and f(d_{ij}) is a distancedependent functional form. Gravity laws usually consider power or exponential laws for the behavior of f(d_{ij}). The results reported in the literature are variable and generally depend on the way the subpopulations are defined. In our case, we can take advantage of the statistical similarity of the subpopulations centered on major transportation hubs. We tested the gravity law by the statistical analysis of more than 10^{4} flows worldwide, and we found that the best fit is obtained by using an exponential function f(d_{ij}) = exp(d_{ij}/r), where r is the characteristic length that governs the decay of commuting flows. In Table 1, we report the estimated values for the exponents α, γ, and r.
Noticeably, we can validate the gravity law at the level of geographical area or country as shown in Fig. 1, where the commuting flows obtained from the data are compared with the synthetic ones predicted by the model, as functions of the various variables of the gravity law. In the SI Appendix, we report similar tests at the level of geographical areas and countries, that confirm the generality and goodness of the obtained law. It is worth remarking that although the present gravity law works well at the granularity defined by our Voronoi decomposition, these results cannot be extrapolated to different granularities. For instance, we notice that the exponents obtained by our approach are quite close to those obtained by Viboud et al. (17), although the spatial decay has a completely different form. We do not see any obvious reason for the observed scaling form, but it must be noted that we are working with subpopulations, which are by construction statistically more homogeneous and of larger size than the county level used in ref. 17. Administrative regions might indeed impose boundaries that define subpopulations not clearly associated to centers of gravity for mobility processes, as e.g., large urban areas cut in multiple counties. In general, both approaches are usually tested in transportation theory (as in ref. 19), and in some cases the exponential function was found to better fit migration phenomena (27).
The above gravity law allows us to work with two different worldwide commuting networks. An entirely synthetic one, generated by using the gravity law fitted to the empirical data, and one integrating the empirical data. The synthetic network considers only neighboring subpopulations. We have empirically observed that secondneighbor interactions are on average one order of magnitude smaller than the nearestneighbor interactions and their contribution can be neglected in the firstorder estimation of the effective force of infection. In the following, we will report the results obtained only with the synthetic network and we leave to the SI Appendix a demonstration that no significant differences are observed when we compare the results obtained by using the synthetic and the real data networks.
Epidemic Simulations.
To study the effect that commuting networks have on the overall spread of an emerging disease, we consider the simulation of an ILI and compare the results with a simulation in which we include only airline traffic as in previous works (10). For the sake of clarity, in the following we compare the results for the simulations of a hypothetical pandemic influenza with R_{0} = 1.9 starting in Hanoi on April 1 (the SI Appendix also reports results for October 1). The model includes a seasonal dependence of the transmission. In the SI Appendix file, we report simulations of the 2001–2002 seasonal influenza timeline and compare those with the real surveillance data. The simulations of the realistic model and comparison with real data confirm the analysis presented here for the synthetic pandemic model.
GLEaM is fully stochastic and takes into account the discrete nature of individuals both in the travel coupling and in the compartmental transitions. The transmission model within each urban area follows a compartmentalization specific to the disease under study. Here we use the classic ILI compartmentalization in which each individual is classified by one of the discrete states such as susceptible (S), latent (L), infectious (I), and permanently recovered (R). Infectious persons are further subdivided into asymptomatic, symptomatic traveling, and symptomatic nontraveling as detailed in Material and Methods. The discrete nature of individuals is implemented by introducing binomial and multinomial processes inside each urban area for the stochastic evolution of the infection. A detailed exposition of the stochastic approach to disease evolution is provided in the SI Appendix.
In GLEaM, the airline mobility is integrated explicitly as individuals are allowed to travel from one subpopulation to another by means of the airline transportation network (28) similarly to the models in refs. 7 and 8 and the stochastic generalizations of ref. 9. In each subpopulation j, the number of individuals is N_{j}(t) and X_{j}^{[m]}(t) is the number of individuals in compartment [m] of the disease evolution at time t; therefore N_{j}(t) = Σ_{m} X_{j}^{[m]}(t). The dynamics of individuals traveling between cities is described by the stochastic transport operator Ω_{j} ({X^{[m]}}) representing the net balance of individuals in a given state [m] that entered or left each city j. This operator is a function of the traffic flows per unit time ω_{jℓ} with the neighboring cities and of the city population N_{j}. In particular, the number of passengers of each category traveling from city j to city ℓ is an integer random variable, in that each of the potential travelers has a probability p_{jℓ} = ω_{jℓ} δt/N_{j} to go from j to ℓ in the time interval δt. In city j, the number of passengers traveling on each connection (j,ℓ) at time t defines a set of stochastic variables that follow a multinomial distribution (10). The calculation can be extended to include transit traffic up to one connecting flight (29).
The introduction of commuting flows represents a numerical challenge as it acts at a very different time scale with individuals that have very short visit duration in the neighboring subpopulation. Although the airline traffic finds a natural time scale of one day, an equivalent mechanistic simulation of the commuting mobility would require working on a much smaller time scale hardly compatible with the airline traffic. This problem can be solved by relying on the results of refs. 30 and 31. Commuting flows govern subpopulation interactions by defining the visiting rate of an individual in subpopulation i to subpopulation j as σ_{ij} = w_{ij}/N_{i}. Visiting individuals have a very short visit time associated to high return rates, τ, to their original subpopulations (τ ≥ 3 day^{−1}), corresponding to σ_{ij} ≪ τ for all of the populations. It can be shown that the system is then described by stationary quantities X_{ij}^{[m]} in each compartment [m] for time scales larger than τ^{−1} (the time scale governing the relaxation time to equilibrium across subpopulations) in which X_{ij}^{[m]} is defined as the number of visitors currently in j coming from subpopulation i in compartment [m]. When the disease duration is significantly larger than τ^{−1}, as is the case for the ILI with characteristic time μ^{−1} ≃ 3 days, we can use a time scale separation approximation in which at each time step, the force of infection experienced by susceptible individuals in each subpopulation is a function of the stationary values I_{ii} and I_{ji}. The full force of infection used in the model is reported in the Material and Methods section, and a full derivation of the time separation approximation is reported in the SI Appendix.
Fig. 2 shows that the global and regional timing and size of the epidemic is weakly affected by also considering the commuting network in GLEaM. Both the probability of having a global outbreak and the overall profiles are very similar in the two cases, with the global epidemic size at the end of the first year almost unaffected by the inclusion of commuting. We perform a sensitivity analysis by testing the outset variation of the intensity of the commuting flows and varying the return time rate τ of more than one order of magnitude. In the SI Appendix, we show that the profiles do not show significant variations, the results being very robust against strong fluctuations in the commuting mobility process. The effect of commuting flows is, however, noticeable during the tail of the epidemic event. As presented in Fig. 2, many regions of the world show a broader tail in the absence of commuting, showing that the commuting coupling enhances the synchronization of the local epidemic profiles. The observed broadening of an epidemic profile that includes multiple subpopulations is due to the different timing of the outbreak that reaches the various subpopulations. The effect is more pronounced in the lack of short range coupling, as highlighted in the example reported in Fig. 3D and E of an air transportation hub loosely connected by air travel flow to the surrounding subpopulations. As expected, no significant change is observed in the hub profile, whereas the time delay in neighboring locations with limited airline connections is dramatically reduced by the coupling due to local commuting flows. After infecting the hub, the epidemic radiates out to the neighboring geographical census areas in a pattern reminiscent of the physical process of diffusion. This effect naturally leads to a much stronger correlation and synchrony in the evolution of the pandemic at the local level. In the SI Appendix, we present the model informed with the realistic parameters and initial conditions of 2001–2002 seasonal influenza season and compare the obtained epidemic pattern with the real data. The results confirm the synchronization effect and the better agreement with the actual data at the regional level when commuting flows are introduced in the model.
Commuting flows therefore alter the hierarchy of epidemic transmission from region to region. This hierarchical organization can be inferred by constructing the epidemic invasion tree that represents the transmission of the infection from one subpopulation to the other during the history of the epidemic. The stochastic nature of the epidemic process implies that each realization will produce a different tree. An overall epidemic invasion network can be constructed by defining weighted, directed links, T_{ij}, that denote the probability that the epidemic in subpopulation j is seeded by individuals belonging to the subpopulation i. This probability is defined by the ratio between the number of realizations in which we have a seeding i → j and the total number of realizations. When constructing the epidemic invasion tree we use averages over 10^{3} realizations. Finally, to highlight only the most likely infection tree, we construct the minimum spanning tree from the worldseeding subpopulation where we minimize the distance defined on each link as
Conclusions
Data collected from 29 countries in five continents were used to fit a gravity law that was then used to model commuting behavior between the Voronoi geographical census areas built around every airport indexed by IATA. The effect of adding this shortrange commuting network to a worldwide epidemic model including all airline traffic flowing among 3,362 airport locations allows us to discriminate the main contribution of the long and shortrange mobility flows. The impact of the epidemic does not change as the competition between the long and shortrange coupling acts only at the beginning of the epidemic in each subpopulation. Both coupling terms become a secondorder effect once the epidemic ramps up and the major force of infection is endogenous to the subpopulation. Therefore, both coupling mechanisms affect just the hierarchy of epidemic progression and its timing. On the one hand, the global epidemic behavior is governed by the longrange airline traffic that determines the arrival of infectious individuals on a worldwide scale. At the local level, however, the shortrange epidemic coupling induced by commuting flows creates a synchrony between neighboring regions and a local diffusive pattern with the epidemic flowing from subpopulations with major hubs into the neighboring subpopulations. These results clearly show that the level of detail on the mobility networks can be chosen according to the scale of interest. Neglecting local coupling for instance does not produce a dramatic effect if one is mainly interested in the global overall pattern at the granularity level of a large geographical area or country. On the other hand, more refined strategies that require access to finer granularity can be implemented by the progressive addition of details without radically altering the perspective achieved at the larger scales. This is extremely important in the balance between computational time and flexibility of models and becomes very relevant when computational approaches are used in real time to aid the decision process for a public health emergency. The present analysis opens the path to quantitative approximation schemes that calibrate the level of data resolution and the needed computational resources with respect to the accuracy in the description of the epidemics.
Materials and Methods
Voronoi Tessellation Around Main Transportation Hubs.
We define the geographical census areas centered around IATA airports by assigning the population of each cell of 15 × 15 minutes of arc to the closest airport within the same country. Such a procedure defines a Voronoilike tessellation (32) for the populated cells of the world with a cutoff scale for the tassels size of 200 kms (see also the SI Appendix for further details).
Disease Structure, Seasonality and R_{0}.
In each urban area, the evolution of the disease is governed by the compartmental scheme of the baseline scenario of ref. 10. A susceptible individual S in contact with a symptomatic (I^{t},I^{nt}, traveling or nontraveling, respectively) and asymptomatic (I^{a}) infectious individual contracts the infection at rate β or r_{β} β, respectively, and enters the latent (L) compartment, where he/she is infected but not yet infectious. At the end of the latency period, individuals in the latent class enter one of the symptomatic infectious compartments (I^{t}, I^{nt}) with probability 1−p_{a} or become asymptomatic (I^{a}) with probability p_{a}. Symptomatic individuals are further divided between those who are allowed to travel (I^{t}) with probability p_{t} and those who are prevented from doing so (I^{nt}) with probability 1−p_{t}, depending on the severity of symptoms. All infectious individuals enter the permanently recovered/removed compartment (R) at a rate of μ per day. The latent period has an average duration of ε^{−1} = 1.9 days and is assumed to be followed by an infectious period with a mean duration of μ^{−1} = 3 days (3, 10, 33). Given that infection has occurred, we assume that individuals become asymptomatic with probability p_{a} = 0.33 (3, 10, 33). The relative infectiousness of asymptomatic individuals is r_{β} = 0.5 (10) and symptomatic individuals are allowed to travel with probability p_{t} = 0.5. The contagion process (i.e., the generation of new infections through the transmission of the disease from infectious to susceptible individuals) and the spontaneous transitions (e.g., from latent to infectious or from infectious to recovered) are modeled with binomial and multinomial distributions (see the SI Appendix for a detailed description of the processes). The threshold parameter of the disease that determines the spreading rate of infection is called basic reproduction number (R_{0}) and is defined as the average number of infected cases generated by a typical infectious individual when introduced into a fully susceptible population (34). For our compartmental model we have R_{0} = βμ^{−1} [1−p_{a} + r_{β} p_{a}]. The R_{0} values indicated in the figures and discussed in the paper do not consider the effect of seasonality and the commuting in the force of infection. We take into account the seasonal behavior of influenza by adopting the scheme from ref. 10. The transmission rate β_{j} in each geographical census area is adjusted by a scaling factor that varies monthly according to the city's climatic zone. For example, cities in the tropical zone have a scaling factor that is always 1, independent of the season. See ref. 10 and its supporting information for details.
Effective Force of Infection Generated by Commuting Flows.
The effect of commuting in the spread of infection can be considered implicitly by evaluating the force of infection between subpopulations coupled by commuting flows. In the case of τ ≪ σ_{j}, the relaxation time to equilibrium values in the populations is dominated by the return rate τ to the origin subpopulation, as shown in the SI Appendix file. We can therefore use the equilibrium values of population sizes in our calculations of force of infection, because the τ^{−1} is much smaller than the time scales of disease evolution (i.e., ε^{−1} and μ^{−1}). New infections in a subpopulation are due to the transmission between susceptibles and infectious individuals occurring in the subpopulation or during a visit to a neighboring subpopulation. Taking this into account, it is possible to derive the force of infection λ_{j} in j as A detailed derivation is provided in the SI Appendix. In the above expression, N*_{j} = N_{jj} + Σ_{i∈υ(j)} N_{ij} is the actual number of individuals in j due to commuting. The first terms of the righthand side of Eq. 2 takes into account the transmission of the infection from the local infectious individuals in j. The second term considers the transmission due to the infectious individuals during their visits to j with local susceptible persons. The third and fourth terms consider the interactions of susceptible individuals during their visits to neighboring subpopulations i with the local infectious persons and the infectious visitors of i, respectively. Here we have also considered that the transmission rate β may be different in each population. The last expression includes secondorder commuting terms (e.g., σ_{ji} σ_{ℓi}/τ^{2}), which are neglected in the actual computation. The probability of new infections to be generated in city j is finally given by λ_{j} δt in the time interval δt, acting on a pool of susceptible individuals S_{j}.
Acknowledgments
We are grateful to the International Air Transport Association for making the airline commercial flight database available to us. This work has been partially funded by the National Institutes of Health Grant R21DA024259, the Lilly Endowment Grant 2008 1639–000. A.V. was supported by Defense Threat Reduction Agency Grant 1–0910039, A.V. and V.C. was supported by Future Emerging Technologies contract no. 231807 (EPIWORK), and V.C. was supported by European Research Council Ideas contract no. ERC2007Stg204863 (EPIFOR).
Footnotes
 ^{1}To whom correspondence should be addressed. Email: alexv{at}indiana.edu

Author contributions: D.B., V.C., B.G., H.H., J.J.R., and A.V. designed research, performed research, analyzed data, and wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/cgi/content/full/0906910106/DCSupplemental.

↵† The Gridded Population of the World and The Global RuralUrban Mapping Projects, Socioeconomic Data and Applications Center of Columbia University, http://sedac.ciesin.columbia.edu/gpw.

↵‡ Travel Trends 2007, Office for National Statistics, www.statistics.gov.uk.

Freely available online through the PNAS open access option.
References
 ↵
 Riley S
 ↵
 ↵
 Longini IM,
 et al.
 ↵
 ↵
 Germann TC,
 Kadau K,
 Longini IM,
 Macken CA
 ↵
 ↵
 ↵
 ↵
 Hufnagel L,
 Brockmann D,
 Geisel T
 ↵
 ↵
 Dutta S,
 Mia I
 Pentland A
 ↵
 Chowell G,
 Hyman JM,
 Eubank S,
 Castillo–Chavez C
 ↵
 Barrat A.,
 Barthelemy M.,
 Pastor–Satorras R.,
 Vespignani A
 ↵
 ↵
 Guimera R,
 Mossa S,
 Turtschi A,
 Amaral LAN
 ↵
 ↵
 Viboud C,
 et al.
 ↵
 ↵
 ↵
 ↵
 Barabási AL,
 Albert R
 ↵
 Erlander S,
 Stewart NF
 ↵
 Ortúzar J,
 de D,
 Willumsen LG
 ↵
 Flahault A,
 Valleron AJ
 ↵
 ↵
 ↵
 Kulldorf G
 ↵
 Colizza V,
 Barrat A,
 Barthélemy M,
 Vespignani A
 ↵
 ↵
 ↵
 ↵
 Okabe A,
 Boots B,
 Sugihara K,
 Chiu SN
 ↵
 ↵
 Anderson RM,
 May RM
Citation Manager Formats
Article Classifications
 Physical Sciences
 Applied Physical Sciences
Related Article
 In This Issue Dec 22, 2009