# Measurability of the epidemic reproduction number in data-driven contact networks

^{a}Web Sciences Center, University of Electronic Science and Technology of China, Chengdu 611731, Sichuan, People’s Republic of China;^{b}Big Data Research Center, University of Electronic Science and Technology of China, Chengdu 611731, Sichuan, People’s Republic of China;^{c}Laboratory for the Modeling of Biological and Socio-Technical Systems, Northeastern University, Boston, MA 02115;^{d}Bruno Kessler Foundation, 38123 Trento, Italy;^{e}Institute for Biocomputation and Physics of Complex Systems, University of Zaragoza, 50018 Zaragoza, Spain;^{f}Department of Theoretical Physics, University of Zaragoza, 50009 Zaragoza, Spain;^{g}ISI Foundation, 10126 Turin, Italy

See allHide authors and affiliations

Edited by Simon A. Levin, Princeton University, Princeton, NJ, and approved October 16, 2018 (received for review June 27, 2018)

## Significance

The analysis of real epidemiological data has raised issues of the adequacy of the classic homogeneous modeling framework and quantities, such as the basic reproduction number in real-world situations. Based on high-quality sociodemographic data, here we generate a multiplex network describing the contact pattern of the Italian and Dutch populations. By using a microsimulation approach, we show that, for epidemics spreading on realistic contact networks, it is not possible to define a steady exponential growth phase and a basic reproduction number. We show the operational use of the instantaneous reproduction rate as a good descriptor of the transmission dynamics.

## Abstract

The basic reproduction number is one of the conceptual cornerstones of mathematical epidemiology. Its classical definition as the number of secondary cases generated by a typical infected individual in a fully susceptible population finds a clear analytical expression in homogeneous and stratified mixing models. Along with the generation time (the interval between primary and secondary cases), the reproduction number allows for the characterization of the dynamics of an epidemic. A clear-cut theoretical picture, however, is hardly found in real data. Here, we infer from highly detailed sociodemographic data two multiplex contact networks representative of a subset of the Italian and Dutch populations. We then simulate an infection transmission process on these networks accounting for the natural history of influenza and calibrated on empirical epidemiological data. We explicitly measure the reproduction number and generation time, recording all individual-level transmission events. We find that the classical concept of the basic reproduction number is untenable in realistic populations, and it does not provide any conceptual understanding of the epidemic evolution. This departure from the classical theoretical picture is not due to behavioral changes and other exogenous epidemiological determinants. Rather, it can be simply explained by the (clustered) contact structure of the population. Finally, we provide evidence that methodologies aimed at estimating the instantaneous reproduction number can operationally be used to characterize the correct epidemic dynamics from incidence data.

Mathematical and computational models of infectious diseases are increasingly recognized as relevant quantitative support to epidemic preparedness and response (1⇓–3). Independent of the type of modeling approach, our understanding of epidemic models is generally tied to two fundamental concepts. One is the basic reproduction number

Here, we study the very definition and measurability of

## Results

We use detailed sociodemographic data to generate two multiplex networks describing the contact patterns of about 500,000 agents, each representative of a subset of the Italian and Dutch populations (*Materials and Methods* discusses the methodology). The effective contacts through which the infection can spread are determined by the copresence of two individuals in the same settings. This effectively defines a weighted multiplex network (38⇓–40) made by four layers representing the network of contacts between household members, schoolmates, work colleagues, and casual encounters in the general community (Fig. 1*A*) (15, 26, 34, 41, 42). Each node in the household layer represents one individual of the real population and is linked only to the other nodes representing members of her/his own household. A second layer represents contacts in school (i.e., every node represents one student or teacher and has contact only with other individuals attending/working in the same school). A third layer accounts for contacts in workplaces, and a fourth layer encodes contacts in the community, where we assume a complete network [i.e., each individual has a certain (low) probability of infecting any other individual of the population]. The four layers are characterized by remarkably different degree distributions (Fig. 1*B*). This representation of links between individuals readily highlights the typical strong clustering of human populations, where individuals tend to meet the same set of contacts (e.g., household members, schoolmates, colleagues) on a regular basis (43⇓–45).

The influenza-like transmission dynamics are defined through a susceptible, infectious, removed (SIR) scheme (Fig. 1*C*). Essentially, susceptible individuals can acquire the infection through contacts with infectious individuals, and as soon as they are infected, they proceed to the infectious stage. Infectious individuals then move to the removed compartment according to a removal rate [such that an infectious individual spends an exponential amount of time (the removal time or infectious period) in the infectious stage before recovering]. We keep the transmission scheme as simple as possible (e.g., avoiding the introduction of other classes, such as a latent compartment, the distinction between symptomatic and asymptomatic individuals, hospitalized individuals, and so on) to avoid confounding effects. Of course, more refined models are needed to answer complex questions, such as the impact of control strategies.

We simulate the transmission dynamics as a stochastic process for each specific individual, each with her/his own characteristics (e.g., age, individual infectiousness, membership to a specific household or school, and so on), and accounting for the clustering of contacts typical of human populations. For instance, an agent can transmit the infection in a given school only if she/he studies or works there. In each layer, the infection transmission between nodes is calibrated in such a way that the fractions of cases in the four layers are in agreement with literature values (namely, 30% of all influenza infections are linked to transmission occurring in the household setting, 18% of all influenza infections are linked to transmission occurring in schools, 19% of all influenza infections are linked to transmission occurring in workplaces, and 33% of all influenza infections are linked to transmission occurring in the community) (26, 27, 33). Moreover, we set these layer-specific transmission rates such that the reproduction number of the index case (

In *SI Appendix*, we report all of the details of the transmission model (*SI Appendix*, section 1.1). In the text, we also report the corresponding analysis of *SI Appendix*, sections 1.2–1.5). Null models include annealed (edges are constantly rewired) and quenched (edges are fixed over time) configuration models. Results reported in the text refer to the synthetic population for Italy.

### Effective Reproduction Number and Generation Time.

The first quantities generally investigated in epidemic models are the incidence (of new infections) as a function of time and the associated growth rate of the epidemic. In homogeneous models, the number of new cases increases exponentially at a nearly constant rate r (5, 49, 50) during the early phase of the epidemic. This is not the case in the data-driven model, where we find a nonmonotonous behavior: an increasing trend over the initial phase of the epidemic occurs followed by a marked decrease about 20 d before the epidemic peak (Fig. 2*A*). Such a result is in sharp contrast with the classic theory, where the epidemic growth rate is expected to slowly and monotonically decrease over time in the early epidemic phase (Fig. 2*A*). This suggests that, in contrast to simple SIR models where the basic reproduction number can readily be defined through the relation

The daily effective reproduction number and generation time can be computed from the microsimulations by keeping track of the exact number of secondary infections generated by each individual infected at time t in the simulations (Fig. 1*C*). We find that *B*). In contrast, in the homogeneous model, which lacks the typical structures of human populations, *A*) as predicted by classical mathematical epidemiology theory (4). The pattern found in the data-driven model can also be partly explained by the variation of the average degree induced by the infection of individuals with a higher number of adequate contacts [an effect already observed in heavy-tailed networks (51)], thus leading to an average growth of the reproduction number. The temporal dynamics of *SI Appendix*, section 2.1). In Fig. 2*C*, we show an analogous analysis of the estimated generation time in the data-driven model. We find that the generation time is considerably shorter than the duration of the infectious period (i.e., 3 d in our simulations). The estimated average *C*). This differs from what is predicted by the classic theory and the analysis of the homogeneous model (Fig. 2*C*), where the length of the infectious period corresponds to the generation time (6).

A closer look at the transmission process in the different layers of the multiplex network helps in understanding the origin of the deviations of *SI Appendix*, section 2.2), and at least to some extent, the same happens in the school layer. However, *A*). Indeed, *B*)—with an average fluctuating around 2.6 d, close to the value reported by analyzing real data for household transmission (26). To provide a simple illustration of the saturation effect in households, let us consider a household of three, with one index case and two susceptible members. If, at time t, the index case infects exactly one of the two susceptibles, then at time *B*). All of the observed patterns of *SI Appendix*, sections 2.4–2.6.

### The 2009 H1N1 Influenza Pandemic in Italy.

To test the robustness of the results in a more realistic epidemic transmission model, we used the data-driven modeling framework to model the 2009 influenza pandemic in Italy. One of the characteristic signatures of the 2009 H1N1 pandemic was the presence of a differential susceptibility by age (34, 53, 54); this is included in the model by using values estimated for Italy as reported in the literature (55). We also consider prepandemic immunity by age in the population according to serological data (55). Vaccination is not considered, as vaccination started only during the tail of the pandemic and had a very limited uptake in Italy (vaccination coverage *SI Appendix*, section 1.1.

The calibrated model is able to well capture the seropositive rates by age at the end of the pandemic (Fig. 4*A*). The estimated growth rate from the influenza-like illness (ILI) cases reported in Italy over the course of the 2009 H1N1 influenza pandemic clearly shows an increasing trend during the early phase of the epidemic followed by a sharp drop about 3 weeks before the epidemic peak (Fig. 4*B*). The trend observed in the data is consistent with that obtained in model simulations (Fig. 4*C*). Fitting a linear regression model to the estimated growth rate over time in the ILI data results in an estimated coefficient of 0.064 (SE of 0.033), while the mean value obtained with a linear regression model of case incidence from all stochastic realizations is 0.064 (*SI Appendix*, section 2.3).

## Discussion

Our simulation results clearly highlight how the heterogeneity and clustering of human interactions (e.g., contacts between household members, classmates, work colleagues) alter the standard results of fundamental epidemiological indicators, such as the reproduction number and generation time over the course of an epidemic. Our results seem to be consistent in different countries (*SI Appendix*, section 2.4), suggesting that the observed patterns are due to the structure and clustering of human contact patterns rather than country-specific features. Furthermore, the analysis of alternative null models, such as degree-preserving and layer-preserving configuration models (*SI Appendix*, section 2.7), shows markedly different behaviors than those exhibited by the data-driven model, suggesting that the multiplex structure and the strong clustering effect typical of human populations, not captured by null models, are at the root of the observed behavior.

Our numerical study questions the measurability of *Materials and Methods*). In Fig. 5, we show the comparison between *SI Appendix*, section 2.8). The overall good agreement between the estimated and actual values shows that it is possible to operationalize the estimation of

## Conclusion

The analysis presented here takes advantage of “in silico” numerical experiments to open a window of understanding in the analysis of realistic epidemic scenarios. Although a lot of theoretical work has been done to define the reproduction number in nonhomogeneous models (19⇓⇓⇓⇓–24), a unified theory is still lacking. While we are not providing a theoretical framework for the computation/definition of the reproduction number and the generation time on realistic contact networks, we provide evidence that estimates of

## Materials and Methods

### Data-Driven Contact Network.

The model considers a weighted multiplex network (38)

### Homogeneous Mixing Network.

This model assumes a single fully connected network. Following the notation introduced above, we have

### The Epidemic Transmission Model.

On each of the introduced networks, we simulate the influenza transmission process as an SIR model. The SIR model assumes that individuals can be in one of the following three states: susceptible, infectious, and removed. Two types of transitions between states are possible: (*i*) from susceptible to infectious and (*ii*) from infectious to removed. The transition from susceptible to infectious requires a contact between an infectious individual and a susceptible individual. Specifically, given that, at time step t, node j is infectious and its neighbor node i is susceptible, the probability that j infects i (i.e., i changes its status from susceptible to infectious) is given by

### Estimation of *R*(*t*) from Transmission Events Time Series.

Following the same approach used in refs. 11 and 17, we assume that the daily number of new cases

## Acknowledgments

Q.-H.L. acknowledges support from Program of the China Scholarships Council Grant 201606070059. M.A. and A.V. acknowledge the support of NIH Grant MIDAS-U54GM111274. A.A. acknowledges the support of the Formación Personal Investigador Doctoral Fellowship from Ministerio de Economía y Competitividad (MINECO) and its mobility scheme. Y.M. acknowledges partial support from the Government of Aragón, Spain through a grant (to the group Física Estadística y No Lineal) as well as MINECO and Fondo Europeo de Desarrollo Regional Funds Grant FIS2017-87519-P. The funders had no role in study design, data collection and analysis, or preparation of the manuscript.

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. Email: a.vespignani{at}northeastern.edu.

Author contributions: Q.-H.L., M.A., A.A., S.M., Y.M., and A.V. designed research, performed research, analyzed data, and wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1811115115/-/DCSupplemental.

- Copyright © 2018 the Author(s). Published by PNAS.

This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).

## References

- ↵
- ↵
- ↵
- ↵
- Anderson RM,
- May RM,
- Anderson B

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Britton T,
- Tomba GS

- ↵
- Cauchemez S, et al.

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Longini IM, et al.

- ↵
- ↵
- ↵
- ↵
- Halloran ME, et al.

- ↵
- Fumanelli L,
- Ajelli M,
- Merler S,
- Ferguson NM,
- Cauchemez S

- ↵
- ↵
- Germann TC,
- Kadau K,
- Longini IM,
- Macken CA

- ↵
- ↵
- ↵
- ↵
- ↵
- Lipsitch M, et al.

- ↵
- ↵
- De Domenico M, et al.

- ↵
- Cozzo E,
- Ferraz de Arruda G,
- Rodrigues F,
- Moreno Y

*Multiplex Networks: Basic Formalism and Structural Properties*, Springer Briefs in Complexity (Springer, Cham, Switzerland). - ↵
- Ajelli M,
- Poletti P,
- Melegaro A,
- Merler S

- ↵
- ↵
- ↵
- ↵
- Ajelli M,
- Litvinova M

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Fraser C, et al.

- ↵
- Cauchemez S, et al.

- ↵
- Merler S, et al.

- ↵
- ↵
- ↵
- ↵
- Nguyen VK,
- Rojas CP,
- Hernandez-Vargas E

- ↵

## Citation Manager Formats

## Article Classifications

- Physical Sciences
- Biophysics and Computational Biology

- Biological Sciences
- Population Biology