# Forward-looking serial intervals correctly link epidemic growth to reproduction numbers

Edited by Nils Chr. Stenseth, University of Oslo, Oslo, Norway, and approved December 4, 2020 (received for review June 4, 2020)

## Significance

The generation and serial interval distributions are key, but different, quantities in outbreak analyses. Recent studies suggest that the two distributions give different estimates of the reproduction number $\mathcal{R}$ as inferred from the observed growth rate $r$. Here, we show that estimating $\mathcal{R}$ based on $r$ and the serial interval distribution, when defined from the correct reference cohort, gives the same estimate as using $r$ and the generation interval distribution. We apply our framework to COVID-19 serial interval data from China, outside Hubei province (January 21 to February 8, 2020), revealing systematic biases in prior inference methods. Our study provides the theoretical basis for practical changes to the principled use of serial interval distributions in estimating $\mathcal{R}$ during epidemics.

## Abstract

The reproduction number $\mathcal{R}$ and the growth rate $r$ are critical epidemiological quantities. They are linked by generation intervals, the time between infection and onward transmission. Because generation intervals are difficult to observe, epidemiologists often substitute serial intervals, the time between symptom onset in successive links in a transmission chain. Recent studies suggest that such substitution biases estimates of $\mathcal{R}$ based on $r$. Here we explore how these intervals vary over the course of an epidemic, and the implications for $\mathcal{R}$ estimation. Forward-looking serial intervals, measuring time forward from symptom onset of an infector, correctly describe the renewal process of symptomatic cases and therefore reliably link $\mathcal{R}$ with $r$. In contrast, backward-looking intervals, which measure time backward, and intrinsic intervals, which neglect population-level dynamics, give incorrect $\mathcal{R}$ estimates. Forward-looking intervals are affected both by epidemic dynamics and by censoring, changing in complex ways over the course of an epidemic. We present a heuristic method for addressing biases that arise from neglecting changes in serial intervals. We apply the method to early (21 January to February 8, 2020) serial interval-based estimates of $\mathcal{R}$ for the COVID-19 outbreak in China outside Hubei province; using improperly defined serial intervals in this context biases estimates of initial $\mathcal{R}$ by up to a factor of 2.6. This study demonstrates the importance of early contact tracing efforts and provides a framework for reassessing generation intervals, serial intervals, and $\mathcal{R}$ estimates for COVID-19.

### Sign up for PNAS alerts.

Get alerts for new articles, or get an alert when an article is cited.

The reproduction number $\mathcal{R}$ is one of the most important characteristics of an emerging epidemic, such as the current pandemic of COVID-19 (1). The reproduction number is defined as the average number of secondary infections caused by a primary infection. The value in a fully susceptible population—the “basic” reproduction number ${\mathcal{R}}_{0}$—allows us to predict the extent to which an infection will spread in the population, and the amount of intervention necessary to eliminate it in simple cases (2). Since the reproduction number represents an average (2, 3), it fails to capture heterogeneity among individuals or across space. The reproduction number also fails to provide any information about the time scale of disease transmission.

Estimating the reproduction number $\mathcal{R}$ is often challenging. Direct estimates based on observed infections will typically be biased down when some infections cannot be observed. A common method of estimating $\mathcal{R}$ near the beginning of an epidemic is based on the population-level exponential growth rate $r$, which can often be estimated robustly from case reports (4, 5). The growth rate $r$ and the reproduction number $\mathcal{R}$ are linked by the generation interval distribution (6), where the generation interval is defined as the time between when an individual (infector) is infected and when that individual infects another person (infectee) (7).

Since generation intervals measure time between infection events, which can be difficult to observe in practice, generation intervals are often replaced with serial intervals. The serial interval is defined as the time between when an infector and an infectee develop symptoms (7). While generation and serial intervals both measure the time scale of disease transmission, they measure fundamentally different quantities. In particular, previous studies have noted that, in many contexts, serial intervals are expected to have larger variances than generation intervals but have the same mean in many contexts (7–10). Serial intervals can, in some cases, even take negative values in the presence of presymptomatic transmission (11), whereas generation intervals must be positive.

Although these distributions were clearly and distinctly defined over a decade ago (7), the need for a better conceptual and theoretical framework for understanding their differences is becoming clearer as the COVID-19 pandemic unfolds. Researchers continue to base inferences about COVID-19 on both generation and serial intervals without clearly distinguishing between them (e.g., refs. 11–15), and, in some cases, explicitly conflate the definitions of the two intervals (e.g., refs. 16 and 17). This confusion is apparent even in standard software for estimating $\mathcal{R}$, such as EpiEstim, in which the serial interval distribution is used to infer time-dependent $\mathcal{R}$ (18). These studies are examples of many—indeed, it is a common practice to use the serial and generation intervals interchangeably.

One source of confusion arises from an apparent discrepancy between the generation interval and serial interval viewpoints. While the epidemic is growing exponentially, the spread of infection can be characterized as a renewal process based on previous incidence of infection, the associated generation interval distribution, and the average infectiousness of an infected individual. It is well established that this renewal formulation allows us to link the exponential growth rate of an epidemic $r$ with its reproduction number $\mathcal{R}$ using the generation interval distribution (6). However, the serial interval distribution also describes a renewal process—in this case, the creation of a new symptomatic case based on a symptomatic case in the previous generation. Since both renewal processes, based on either generation or serial interval distributions, describe the same underlying exponentially growing system, both should provide the same correct link between the reproduction number $\mathcal{R}$ and the epidemic growth rate $r$.

In contexts where the serial and generation interval distributions differ, current theory has no explanation for how two different distributions could provide identical estimates of $\mathcal{R}$ from $r$. In fact, recent theory suggests that using the serial interval can underestimate the reproduction number (19, 20). However, these studies rely on intrinsic distributions of incubation periods and generation intervals that neglect the population-level dynamics of disease spread.

Here we show that, by correctly defining and calculating the “forward” serial interval distribution (i.e., a distribution of serial intervals from a cohort of infectors that developed symptoms at the same time) that connects symptom onset dates, we can resolve this discrepancy. These forward intervals are different from the “intrinsic” serial intervals that previous studies have relied on (7–10, 19). During an ongoing epidemic, all observed epidemiological delays (e.g., incubation period) between primary (e.g., infection) and secondary (e.g., symptom onset) events are subject to backward biases: When the incidence of primary events is increasing (or decreasing), we are more likely to observe shorter (respectively, longer) intervals. In particular, when we consider forward serial interval distributions, the incubation periods of the infectors are subject to backward biases, because we have to look backward in time from their symptom onset to infection. Therefore, the realized incubation period distributions of the infector and the infectee can differ dynamically, even if the intrinsic analogs of the same distributions are expected to be equivalent.

We develop a cohort-based framework for characterizing and comparing realized serial intervals, as well as any other epidemiological delays, and show that the initial forward serial interval distribution correctly estimates $\mathcal{R}$ from $r$. Conversely, using inaccurately defined serial intervals or failing to account for changes in the observed serial interval distributions over the course of an epidemic can considerably bias estimates of $\mathcal{R}$. For example, in our analysis of the COVID-19 serial intervals from China, outside Hubei province, we find that the original ${\mathcal{R}}_{0}$ estimates based on aggregated serial interval data underestimated ${\mathcal{R}}_{0}$ by a factor of 2.0 to 2.6. We further lay out several principles to consider in using information about serial intervals and other epidemiological time delays to correctly infer the initial reproduction number during the early stages of an outbreak.

## Methods

### Intrinsic, Forward, and Backward Delay Distributions.

A time delay between two epidemiological events can involve either one infected individual (e.g., incubation period: infection and symptom onset of an individual) or two—an infector and an infectee (e.g., generation and serial intervals). We define the delay as the time difference between the primary event and the secondary event. In some cases, the primary event always occurs before the secondary event (e.g., the time from infection to onset of symptoms in a single individual, or the generation interval between two individuals). In other cases, the delay can sometimes be negative (e.g., the time from onset of symptoms to onset of infectiousness in a single individual, or the serial interval between two individuals).

At the individual level, we can define the time distribution between a primary and a secondary event that we expect to observe for a single infected individual by averaging across individual characteristics—we refer to this distribution as the

*intrinsic distribution*. For example, the intrinsic incubation period distribution describes the expected time distribution from infection to symptom onset of an infected individual. Likewise, the intrinsic generation interval distribution describes the expected time distribution of infectious contacts made by an infected individual. However, the intrinsic time distributions are not always equivalent to the corresponding realized time distributions at the population level (i.e., the distribution of time between actual primary and secondary events that occur during an epidemic; Fig. 1). For example, an infectious contact results in infection only if the contacted individual is susceptible (and has not already been infected)—this is one mechanism that causes realized generation intervals (time between actual infection events) to differ from the intrinsic generation intervals (time between infection and infectious contacts) (21). In this example, the difference between intrinsic and realized time distributions can be attributed to the fact that the fraction of susceptible individuals is itself dynamic.Fig. 1.

At the population level, we model realized time delays between a primary and a secondary event from a cohort perspective. A cohort consists of all individuals whose (primary or secondary) event occurred at a given time. For example, when we are measuring incubation periods, a primary cohort consists of all individuals who became infected at time

*p*, while a secondary cohort consists of all individuals whose symptom onset occurred at time*s*. Similarly, when we are measuring serial intervals, a primary cohort consists of all infectors who became symptomatic at time*p*. Then, for a primary cohort at time*p*, we can define the distribution of realized delays between primary and secondary events. We refer to this distribution as the forward delay distribution and denote it as ${f}_{p}\left(\tau \right)$.Likewise, we define the backward delay distribution ${b}_{s}\left(\tau \right)$ for a secondary cohort at time

*s*: The backward delay distribution describes the time delays between primary and secondary events given that the secondary event occurred at time*s*. For example, the backward incubation period distribution at time*s*describes incubation periods for a cohort of individuals who became symptomatic at time*s*. Likewise, the backward serial interval distribution at time*s*describes serial intervals for a cohort of infectees who became symptomatic at time*s*.Both forward and backward perspectives must yield identical measurement (e.g., the length of the incubation period of a given individual is the same whether measured forward from the time of infection or backward from the time of symptom onset). Consequently, no matter how delays are distributed, if $\mathcal{P}$ and $\mathcal{S}$ represent the sizes of primary and secondary cohorts, then we can express the total density of intervals $\tau $ between calendar times where $W\left(p\right)$, the “weight” of the primary cohort, represents the average number of forward intervals that an individual in cohort $\mathcal{P}\left(p\right)$ produces over the course of their infection. When we measure within-individual delays, we expect $W\left(p\right)\le 1$ because only a subset of individuals who experience the primary event (e.g., infection) will eventually experience the secondary event (e.g., symptom onset). For between-individual delays, we expect $W\left(p\right)$ to change throughout an epidemic, because individuals infected earlier in an epidemic will infect more individuals, on average, than those infected later.

*p*and*s*(i.e., $\tau =s-p$) as follows:$$W\left(p\right)\mathcal{P}\left(p\right){f}_{p}\left(\tau \right)=\mathcal{S}\left(s\right){b}_{s}\left(\tau \right),$$

[1]

Substituting $p=s-\tau $, it follows thatIf we are considering incubation periods, the left-hand side of this equation is the probability density that an individual who became symptomatic at time

$${b}_{s}\left(\tau \right)=\frac{W\left(s-\tau \right)\mathcal{P}\left(s-\tau \right){f}_{s-\tau}\left(\tau \right)}{\mathcal{S}\left(s\right)}.$$

[2]

*s*had an incubation period of length $\tau $. From the right-hand side, we see that this probability density depends on the weight parameter $W\left(s-\tau \right)$ (in this case, the proportion of symptomatic infection), the time-varying primary cohort size at the earlier time $\mathcal{P}\left(s-\tau \right)$ (in this case, the number of individuals infected at time $s-\tau $), and the forward delay distribution ${f}_{s-\tau}\left(\tau \right)$ (in this case, the probability density that an incubation period that starts at time $s-\tau $ ends at time*s*).Several different mechanisms drive the changes in forward and backward delay distributions over time. Typically, within-individual forward delay distributions are not directly affected by epidemic dynamics. Some realized forward distributions, like incubation period distributions, are equivalent to their intrinsic distributions and remain invariant at the time scale of an outbreak. Other realized distributions, like the distribution of time from symptom onset to testing, may change over the course of an epidemic due to changes in public health policies or individual behavior. Between-individual forward delay distributions, such as generation or serial interval distributions, depend on epidemic dynamics. For example, forward generation intervals often become shorter as an epidemic progresses, due to the dynamical process of susceptible depletion, as well as due to other factors like behavioral change or interventions (22–24): If it is harder to infect later in the course of infection, then proportionally more intervals will be short.

Eq. where $r$ is the exponential growth rate. Since ${b}_{0}$ is a probability distribution, ${\left[W\left(0\right)\mathcal{P}\left(0\right)/\mathcal{S}\left(0\right)\right]}^{-1}={\int}_{-\infty}^{\infty}\mathrm{exp}\left(-r{\tau}^{\prime}\right){f}_{0}\left({\tau}^{\prime}\right)\hspace{0.17em}{\mathrm{d}\tau}^{\prime}$ corresponds to the normalization constant. Therefore, the backward delay distribution during the exponential growth phase depends only on the exponential growth rate $r$ and the initial forward delay distribution ${f}_{0}$.

**2**suggests that backward delay distributions change over time even if their corresponding forward delay distribution does not change. Backward delay distributions depend on changes in the primary cohort size over time, due to conditionality of observations: Conditioning on individuals whose secondary events have occurred at the same time means that we tend to observe shorter (or longer) interevent delays when cohort size has been increasing (decreasing) through time. When incidence is growing exponentially, we can calculate the amount of bias exactly. Assuming that the forward delay distribution (${f}_{p}\left(\tau \right)\approx {f}_{0}\left(\tau \right)$) and the weight parameter ($W\left(p\right)\approx W\left(0\right)$) remain constant during the exponential growth phase, we can substitute $\mathcal{P}\left(t\right)=\mathcal{P}\left(0\right)\mathrm{exp}\left(rt\right)$ in Eq.**2**to obtain$${b}_{0}\left(\tau \right)=\left[W\left(0\right)\mathcal{P}\left(0\right)/\mathcal{S}\left(0\right)\right]\mathrm{exp}\left(-r\tau \right){f}_{0}\left(\tau \right),$$

[3]

The mean backward interval will be always shorter than the mean forward interval as long as $r>0$. Even for different epidemics of the same disease, we expect to observe shorter backward intervals within a fast-growing epidemic (high $r$), all else being equal. In general, the backward delay distribution will differ from the forward delay distribution (unless the disease is at equilibrium), even if we are measuring time delays that are intrinsic to the life history of a disease (e.g., the incubation period). These ideas apply to all epidemiological delay distributions and generalize the work by ref. 24, who compared forward and backward generation interval distributions to describe realized generation intervals from the perspective of an infector and an infectee, respectively, as well as the work by ref. 19, who showed that Eq.

**3**holds for the backward generation interval distribution.### Realized Serial Interval Distributions.

The serial interval is defined as the time between when an infector becomes symptomatic and when their infectee becomes symptomatic (7). Previous studies have often expressed serial intervals ${\tau}_{\mathrm{\text{s}}}$ in the form (Fig. 1where ${\tau}_{\mathrm{\text{i1}}}$ and ${\tau}_{\mathrm{\text{i2}}}$ represent incubation periods of an infector and an infectee, respectively, and ${\tau}_{g}$ represents the generation interval between the infector and the infectee. These studies concluded that the serial and generation intervals have the same mean when ${\tau}_{\mathrm{\text{i1}}}$ and ${\tau}_{\mathrm{\text{i2}}}$ are drawn from the same distributions (7, 8, 10, 19). However, distributions of realized incubation periods, ${\tau}_{\mathrm{\text{i1}}}$ and ${\tau}_{\mathrm{\text{i2}}}$, will be identical only if we assume that they are intrinsic to individuals (and not dependent on epidemic dynamics at the population-level)—something that is generally true of forward but not backward incubation period distributions. We refer to the definition Eq.

*A*)$${\tau}_{\mathrm{\text{s}}}=\left({\tau}_{g}+{\tau}_{\mathrm{\text{i2}}}\right)-{\tau}_{\mathrm{\text{i1}}},$$

[4]

**4**as the intrinsic serial interval (Fig. 1*A*).To correctly link the realized serial interval distribution to the renewal process between infections based on symptom onset dates, we must use the forward serial interval (i.e., use the perspective of a cohort of infectors that share the same symptom onset time). Given that an infector became symptomatic at time $p$, to calculate the forward serial interval, we first go backward in time to when the infector was infected, and then forward in time to when the infectee was infected, and then forward again to when the infectee became symptomatic. In Fig. 1

*B*, we see that ${\tau}_{\mathrm{\text{i1}}}$ is drawn from the backward incubation period distribution of the cohort of infectors who became symptomatic at time $p$, ${\tau}_{g}$ is drawn from the forward generation interval distribution of the cohort of infectors who became infected at time $p-{\tau}_{\mathrm{\text{i1}}}$, and ${\tau}_{\mathrm{\text{i2}}}$ is drawn from the forward incubation period distribution of the cohort of infectees who became infected at time $p-{\tau}_{\mathrm{\text{i1}}}+{\tau}_{g}$. Likewise, we can define the backward serial interval distribution for a cohort of infectees who became symptomatic at time*s*(Fig. 1*C*). This conceptual framework demonstrates that the distributions of ${\tau}_{\mathrm{\text{i1}}}$, ${\tau}_{g}$, and ${\tau}_{\mathrm{\text{i2}}}$ (and therefore the distributions of realized serial intervals) depend on the reference cohort, which is defined by temporal direction (forward or backward) and a particular reference time.To calculate realized serial interval distributions, we begin by modeling $\mathcal{T}\left(p,s\right)$: the total density of serial intervals that start (when infectors develop symptoms) at time $p$ and end (when infectees develop symptoms) at time where the case reproduction number ${\mathcal{R}}_{\mathrm{c}}\left({\alpha}_{1}\right)$ is defined as the average number of secondary infections that an individual infected at time ${\alpha}_{1}$ will generate over the course of their infection (25). We describe the forward incubation periods and the forward generation intervals using a joint probability distribution because onset of symptoms and transmission potential jointly depend on the life history of a disease; for example, if an infected individual can only transmit the disease after symptom onset, the forward generation interval will necessarily be longer than the forward incubation period.

*s*. For simplicity, we assume that all infected individuals eventually develop symptoms. Then, the density of serial intervals between times $p$ and*s*, given that the infectors became infected at time ${\alpha}_{1}\le p$ and the infectees became infected at time ${\alpha}_{2}\le s$, depends on the amount of infection that occurs between times ${\alpha}_{1}$ and ${\alpha}_{2}$ as well as the density of forward incubation periods between ${\alpha}_{1}$ and $p$ (realized incubation periods of infectors) and between ${\alpha}_{2}$ and*s*(realized incubation periods of infectees),$$\begin{array}{ll}\hfill \mathcal{T}\left(p,s|{\alpha}_{1},{\alpha}_{2}\right)& =\underset{\begin{array}{c}\text{case}\\ \text{reproduction}\\ \text{number}\end{array}}{\underbrace{{\mathcal{R}}_{\mathrm{c}}\left({\alpha}_{1}\right)}}\times \underset{\begin{array}{c}\text{incidence}\\ \text{of}\\ \text{infection}\end{array}}{\underbrace{i\left({\alpha}_{1}\right)}}\times \underset{\begin{array}{c}\text{joint density of}\\ \text{forward incubation}\\ \text{periods}\hspace{0.17em}p-{\alpha}_{1}\hspace{0.17em}\text{and}\hspace{0.17em}\text{forward}\\ \text{generation}\hspace{0.17em}\text{intervals}\hspace{0.17em}{\alpha}_{2}-{\alpha}_{1}\\ \left(\text{of infectors}\right)\end{array}}{\underbrace{{h}_{{\alpha}_{1}}\left(p-{\alpha}_{1},{\alpha}_{2}-{\alpha}_{1}\right)}}\hfill \\ \hfill & \text{\hspace{1em}}\times \underset{\begin{array}{c}\text{marginal density of}\\ \text{forward incubation}\\ \text{periods}\hspace{0.17em}s-{\alpha}_{2}\\ \left(\text{of infectees}\right)\end{array}}{\underbrace{{\ell}_{{\alpha}_{2}}\left(s-{\alpha}_{2}\right)}},\hfill \end{array}$$

[5]

The total density of serial intervals between times $p$ and $s$ can now be obtained by integrating over all possible infection times for the infector and the infectee,Then, the forward serial interval distribution ${f}_{p}\left(\tau \right)$ is given by the density of intervals of length $\tau $ starting at time Likewise, the backward serial interval distribution ${b}_{s}\left(\tau \right)$ is given by the density of intervals of length $\tau $ ending at The denominator of the forward serial interval distribution (Eq. which we define as the average number of infections generated by an individual who developed symptoms at time This framework allows us to understand changes in the realized serial intervals for any epidemic model and properly link serial interval distributions with the renewal process. In addition, assuming that the reproduction number as well as the forward serial interval distribution remain constant during the exponential growth phase, we can substitute $j\left(t\right)\approx j\left(0\right)\mathrm{exp}\left(rt\right)$, ${\mathcal{R}}_{\mathrm{s}}\left(t\right)\approx {\mathcal{R}}_{\mathrm{s}}\left(0\right)$, and ${f}_{t-\tau}\left(\tau \right)\approx {f}_{0}\left(\tau \right)$ to obtainTherefore, the initial forward serial interval distribution, ${f}_{0}\left(\tau \right)$, provides the correct link between the exponential growth rate $r$ and the initial serial reproduction number ${\mathcal{R}}_{\mathrm{s}}\left(0\right)$. We revisit this idea later in

$$\mathcal{T}\left(p,s\right)={\int}_{-\infty}^{p}{\int}_{{\alpha}_{1}}^{s}\mathcal{T}\left(p,s|{\alpha}_{1},{\alpha}_{2}\right)\hspace{0.17em}\mathrm{d}{\alpha}_{2}\hspace{0.17em}\mathrm{d}{\alpha}_{1}.$$

[6]

*p*, relative to the total number of serial intervals starting at time*p*,$${f}_{p}\left(\tau \right)=\frac{\mathcal{T}\left(p,p+\tau \right)}{{\int}_{-\infty}^{\infty}\mathcal{T}\left(p,p+{\tau}^{\prime}\right)\hspace{0.17em}{\mathrm{d}\tau}^{\prime}}.$$

[7]

*s*, relative to the total number of serial intervals ending at*s*,$${b}_{s}\left(\tau \right)=\frac{\mathcal{T}\left(s-\tau ,s\right)}{{\int}_{-\infty}^{\infty}\mathcal{T}\left(s-{\tau}^{\prime},s\right)\hspace{0.17em}{\mathrm{d}\tau}^{\prime}}.$$

[8]

**7**) then corresponds to the total number of infections generated by infected individuals who themselves developed symptoms at time*p*. Dividing this quantity by the number of individuals who developed symptoms at time*p*, $j\left(p\right)={\int}_{-\infty}^{\infty}\mathcal{T}\left(p-{\tau}^{\prime},p\right)\hspace{0.17em}{\mathrm{d}\tau}^{\prime}$, we obtain the serial reproduction number,$${\mathcal{R}}_{\mathrm{s}}\left(p\right)=\frac{{\int}_{-\infty}^{\infty}\mathcal{T}\left(p,p+{\tau}^{\prime}\right)\hspace{0.17em}{\mathrm{d}\tau}^{\prime}}{j\left(p\right)},$$

[9]

*p*. Combining the forward serial interval distribution with the serial reproduction number completes the renewal process between symptomatic cases,$$j\left(t\right)={\int}_{-\infty}^{\infty}{\mathcal{R}}_{\mathrm{s}}\left(t-\tau \right)j\left(t-\tau \right){f}_{t-\tau}\left(\tau \right)\hspace{0.17em}\mathrm{d}\tau .$$

[10]

$$\frac{1}{{\mathcal{R}}_{\mathrm{s}}\left(0\right)}={\int}_{-\infty}^{\infty}\mathrm{exp}\left(-r\tau \right){f}_{0}\left(\tau \right)\mathrm{d}\tau .$$

[11]

*Linking*r*and*$\mathcal{R}$ and show that the initial forward serial interval distribution provides the same $r$–$\mathcal{R}$ link as the intrinsic generation interval distribution.### Epidemic Model.

We illustrate changes in forward and backward serial intervals over the course of an epidemic by applying our framework to a specific example of an epidemic model. We model disease spread with a renewal equation model (10, 26–30). Ignoring births and deaths, changes in the proportion of susceptible individuals $S\left(t\right)$ and incidence of infection $i\left(t\right)$ can be described aswhere $\mathcal{R}\left(t\right)$ is the instantaneous reproduction number [i.e., the average number of infections that an individual infected at time $t$ will generate if conditions at time $t$ remain unchanged (25)], and $g\left(\tau \right)$ is the intrinsic generation interval distribution [i.e., the forward generation interval distribution of a primary case in a population where changes in $\mathcal{R}\left(t\right)$ are negligible (24)]. This model assumes that $g\left(\tau \right)$ remains constant through time—in other words, that epidemic dynamics are driven by changes in transmission rate. This assumption may not be well suited to individual-based intervention such as case isolation (25); nonetheless, this form has been widely used in the literature and has been successfully applied in modeling the current COVID-19 pandemic (31).

$$\begin{array}{ll}\hfill \frac{\mathrm{d}S}{\mathrm{d}t}& =-i\left(t\right)\hfill \\ \hfill i\left(t\right)& =\mathcal{R}\left(t\right){\int}_{0}^{\infty}i\left(t-\tau \right)g\left(\tau \right)\hspace{0.17em}\mathrm{d}\tau ,\hfill \end{array}$$

[12]

Here, changes in reproduction number can be modeled as a product of the basic reproduction number ${\mathcal{R}}_{0}$, proportion susceptible $S\left(t\right)$, and a time-dependent factor $M\left(t\right)$ (for example, accounting for nonpharmaceutical interventions and behavioral changes): $\mathcal{R}\left(t\right)={\mathcal{R}}_{0}S\left(t\right)M\left(t\right)$; ref. 32 used a similar framework to evaluate the impact of nonpharmaceutical interventions on the spread of COVID-19 in 11 countries. Then, the forward generation interval for a cohort of individuals that were infected at time $p$ follows (see ref. 14),which allows us to separate the joint probability distribution ${h}_{p}$ of the forward incubation period and the forward generation interval distribution as a product of the proportion of susceptible individuals $S$ and the joint probability distribution $h$ of the forward incubation period and the intrinsic generation intervals,We further assume that the forward incubation period distribution does not vary across cohorts over the course of an epidemic, as it represents the life history of a disease; we denote it as $\ell $. Then, we haveFinally, the case reproduction for this model is defined as follows:The forward and backward serial interval distributions are then calculated by substituting these quantities into Eqs.

$${g}_{p}\left(\tau \right)=\frac{g\left(\tau \right)S\left(p+\tau \right)M\left(p+\tau \right)}{{\int}_{0}^{\infty}g\left({\tau}^{\prime}\right)S\left(p+{\tau}^{\prime}\right)M\left(p+{\tau}^{\prime}\right)\hspace{0.17em}{\mathrm{d}\tau}^{\prime}},$$

[13]

$${h}_{p}\left(x,\tau \right)=\frac{h\left(x,\tau \right)S\left(p+\tau \right)M\left(p+\tau \right)}{{\int}_{0}^{\infty}{\int}_{0}^{\infty}h\left({x}^{\prime},{\tau}^{\prime}\right)S\left(p+{\tau}^{\prime}\right)M\left(p+{\tau}^{\prime}\right)\hspace{0.17em}{\mathrm{d}\tau}^{\prime}\hspace{0.17em}{\mathrm{d}x}^{\prime}}.$$

[14]

$$\begin{array}{ll}\hfill \ell \left(x\right)& ={\int}_{0}^{\infty}h\left(x,\tau \right)\hspace{0.17em}\mathrm{d}\tau ,\hfill \\ \hfill g\left(\tau \right)& ={\int}_{0}^{\infty}h\left(x,\tau \right)\hspace{0.17em}\mathrm{d}x.\hfill \end{array}$$

[15]

$${\mathcal{R}}_{\mathrm{c}}\left(t\right)={\mathcal{R}}_{0}{\int}_{0}^{\infty}g\left(\tau \right)S\left(t+\tau \right)M\left(t+\tau \right)\hspace{0.17em}\mathrm{d}\tau .$$

[16]

**7**and**8**. We use this framework to illustrate how the realized epidemiological time distributions vary over the course of an epidemic and depend on the perspective (i.e., forward vs. backward).For simplicity, we let $M=1$ and assume that epidemic dynamics depend only on susceptible depletion in our simulations. Since we are interested in the initial epidemic growth phase (i.e., linking $r$ to $\mathcal{R}$), we expect $\mathcal{R}\left(t\right)$ to remain roughly constant during this period. In addition, qualitative effects of $M$ that reduces $\mathcal{R}\left(t\right)$ monotonically over time will be similar to the impact of susceptible depletion under this modeling framework. Therefore, general conclusions we draw from our analysis are expected to be robust; however, detailed shape of the epidemic curve and changes in generation and serial intervals can still depend on the shape of $M$.

### Linking $r$ and $\mathcal{R}$.

During the initial phase of an epidemic, the proportion susceptible remains approximately constant ($S\left(t\right)\approx S\left(0\right)$), and incidence of infection grows exponentially: $i\left(t\right)\approx {i}_{0}\mathrm{exp}\left(rt\right)$. During this period, the intrinsic generation interval distribution provides the correct link between the exponential growth rate $r$ and the initial reproduction number $\mathcal{R}={\mathcal{R}}_{0}S\left(0\right)$ based on the Euler–Lotka equation (6). Here, we focus on the estimates of the basic reproduction number ${\mathcal{R}}_{0}$ (the value of $\mathcal{R}$ in a fully susceptible population, $S\left(t\right)\approx 1$),Analogous to the intrinsic generation interval distribution, forward serial interval distributions describe the renewal process between symptomatic cases. Therefore, we expect the forward serial interval distribution during the exponential growth phase—which we refer to as the initial forward serial interval distribution ${f}_{0}$—to estimate the same value of ${\mathcal{R}}_{0}$ for a given $r$ as the intrinsic generation interval distribution (note, however, that the forward serial interval is not necessarily positive),Here, the initial forward serial interval distribution is given bywhere the normalization constant $\varphi $ is determined by the requirement that ${\int}_{-\infty}^{\infty}{f}_{0}\left(\tau \right)\hspace{1em}\mathrm{d}\tau =1$. We provide a mathematical proof of this relationship in

$$\frac{1}{{\mathcal{R}}_{0}}={\int}_{0}^{\infty}\mathrm{exp}\left(-r\tau \right)g\left(\tau \right)\hspace{0.17em}\mathrm{d}\tau .$$

[17]

$$\frac{1}{{\mathcal{R}}_{0}}={\int}_{-\infty}^{\infty}\mathrm{exp}\left(-r\tau \right){f}_{0}\left(\tau \right)\mathrm{d}\tau .$$

[18]

$${f}_{0}\left(\tau \right)=\frac{1}{\varphi}{\int}_{-\infty}^{0}{\int}_{{\alpha}_{1}}^{\tau}\mathrm{exp}\left(r{\alpha}_{1}\right)h\left(-{\alpha}_{1},{\alpha}_{2}-{\alpha}_{1}\right)\ell \left(\tau -{\alpha}_{2}\right)\hspace{0.17em}\mathrm{d}{\alpha}_{2}\hspace{0.17em}\mathrm{d}{\alpha}_{1},$$

[19]

*SI Appendix*, section S3. Since we do not make any assumptions about the shape of the joint distribution $h$ between incubation periods and the generation intervals, Eq.**18**holds, in general, whether or not there is a presymptomatic transmission period.We further compare this with the estimate of ${\mathcal{R}}_{0}$ based on the intrinsic serial interval distribution $q\left(\tau \right)$,The intrinsic serial interval distribution $q\left(\tau \right)$ does not depend on epidemic dynamics, and is given bywhere the normalization constant ${\varphi}_{q}$ is determined by the requirement that ${\int}_{-\infty}^{\infty}q\left(\tau \right)\hspace{1em}\mathrm{d}\tau =1$. Rather than numerically integrating over closed forms of $g$, ${f}_{0}$, and $q$ to estimate ${\mathcal{R}}_{0}$, we use simulation-based approaches for simplicity (

$$\frac{1}{{\mathcal{R}}_{\mathrm{\text{intrinsic}}}}={\int}_{-\infty}^{\infty}\mathrm{exp}\left(-r\tau \right)q\left(\tau \right)\mathrm{d}\tau .$$

[20]

$$q\left(\tau \right)=\frac{1}{{\varphi}_{q}}{\int}_{-\infty}^{0}{\int}_{{\alpha}_{1}}^{\tau}h\left(-{\alpha}_{1},{\alpha}_{2}-{\alpha}_{1}\right)\ell \left(\tau -{\alpha}_{2}\right)\hspace{0.17em}\mathrm{d}{\alpha}_{2}\hspace{0.17em}\mathrm{d}{\alpha}_{1},$$

[21]

*SI Appendix*, section S4).The initial forward serial interval distribution depends on the exponential growth rate $r$. For a fast-growing epidemic (high $r$), we expect the backward incubation periods to be short (Eq.

**3**), meaning that presymptomatic transmission is less likely to occur. Therefore, the initial forward serial interval distribution will generally have a larger mean than the intrinsic generation and serial interval distributions. However, the exact shape of the initial forward serial interval distribution depends on the shape of the joint distribution. For example, the Susceptible–Exposed–Infected–Recovered model, under the additional assumption that the incubation and exposed periods are equivalent (i.e., that onset of symptoms and infectiousness occur simultaneously), provides a special case. In this case, the forward serial and generation intervals follow the same distributions during the exponential growth phase because 1) infected individuals can only transmit after symptom onset and 2) the time between symptom onset and infection is independent of the incubation period of an infector (*SI Appendix*, section S5). Everywhere else in this paper, however, we do not assume that the incubation and exposed periods are equivalent. Instead, we allow for presymptomatic transmission in the model in order to reflect the transmission dynamics of COVID-19.### Model Parameterization.

We have shown that the dynamics of the serial interval distribution depend on the joint distribution between incubation periods and generation intervals. Here, we use a bivariate lognormal distribution to model the joint probability distribution $h$ of intrinsic incubation periods and intrinsic generation intervals (in the renewal equation model, Eq.

**12**), while allowing for the possibility that they might be correlated. Given that the viral load of SARS-CoV-2 peaks around the time of symptom onset (11), we generally expect the generation intervals to be positively correlated with the incubation period; that is, individuals who develop symptoms later are more likely to transmit later. Marginal distributions of incubation periods and generation intervals are parameterized based on parameter estimates for COVID-19 (Table 1). For simplicity, we consider four values for the correlation coefficients (on the log scale) of the bivariate lognormal distribution: $\rho =\mathrm{0,0.25},0.5,0.75$. This parameterization allows for generation intervals to be shorter than the incubation period, allowing for presymptomatic transmission.Table 1.

Parameter | Values | Source |
---|---|---|

Mean intrinsic incubation period | 5.5 d | (33) |

SD intrinsic incubation period | 2.4 d | (33) |

Mean intrinsic generation interval | 5.0 d | (34) |

SD intrinsic generation interval | 1.9 d | (34) |

The intrinsic incubation period distribution is parameterized using a lognormal distribution with log mean ${\mu}_{I}=1.62$ and log standard deviation ${\sigma}_{I}=0.42$. The intrinsic generation interval distribution is parameterized using a log-normal distribution with log mean ${\mu}_{G}=1.54$ and log standard deviation ${\sigma}_{G}=0.37$. Log mean and log standard deviations represent the mean and standard deviations of the underlying normal distributions, which are later exponentiated. The joint probability distribution is modeled using a bivariate log-normal distribution with correlations (on the log scale) $\rho =\left\{\mathrm{0,0.25},0.5,0.75\right\}$. The intrinsic incubation period and generation interval distributions are chosen to match characteristics of COVID-19 to illustrate realistic magnitudes of time-varying/perspective effects in the current pandemic.

## Results

We use parameter estimates for COVID-19 to characterize the degree to which the realized serial interval distribution can change over the course of an epidemic and to evaluate how different definitions of the serial interval distribution can affect the Euler–Lotka estimates of ${\mathcal{R}}_{0}$. We further address how the observed serial intervals, measured through contact tracing, are affected by right censoring during an ongoing epidemic and provide a heuristic method for addressing biases that can arise from using serial interval data to estimate ${\mathcal{R}}_{0}$. Finally, we analyze serial interval data from the COVID-19 epidemic in China, outside Hubei province, based on 468 transmission events reported between January 21 and February 8, 2020, under our framework.

### Realized Serial Interval Distributions during the Exponential Growth Phase.

Fig. 2 shows Euler–Lotka estimates of ${\mathcal{R}}_{0}$ based on different definitions of the serial interval. When the initial forward serial interval distribution ${f}_{0}\left(\tau \right)$ is used, estimates (from Eq.

**18**) exactly match the (correct) generation interval-based estimates (Eq.**17**) for all values of the correlation $\rho $ between the intrinsic incubation period and the intrinsic generation interval (Fig. 2*A*). When the intrinsic distributions are used, however, estimates based on the serial interval (Eq.**20**) underestimate ${\mathcal{R}}_{0}$: As $r$ increases, ${\mathcal{R}}_{\mathrm{\text{intrinsic}}}$ saturates and eventually decreases due to the increasing inferred importance of negative serial intervals (Fig. 2*B*). While the initial forward serial intervals during the exponential growth phase can also be negative, their effects are appropriately balanced, because faster epidemic growth leads to longer serial intervals (and a corresponding lower proportion of negative intervals).Fig. 2.

Comparing the shapes of the initial forward serial interval distribution (Eq.

**19**) and the intrinsic generation interval distribution allows us to better understand how different forward distributions lead to identical estimates of ${\mathcal{R}}_{0}$. In general, distributions with higher means and less variability lead to higher ${\mathcal{R}}_{0}$ for a given $r$ (6, 35, 36). When incidence is growing exponentially, forward serial intervals have higher means (Fig. 2*C*) and squared coefficients of variation (Fig. 2*D*) than the intrinsic generation interval distribution. The effects of higher means (which increase ${\mathcal{R}}_{0}$) exactly cancel those of higher variability (which decrease ${\mathcal{R}}_{0}$). On the other hand, intrinsic serial intervals (Eq.**21**) have the same mean (equal to the mean initial forward serial at $r=0$ in Fig. 2*C*) as the intrinsic generation intervals but are more variable (also see squared coefficient of variation of the initial forward serial interval distribution at $r=0$ in Fig. 2*D*); therefore, we underestimate ${\mathcal{R}}_{0}$ when we use the intrinsic serial interval distribution.### Realized Serial Interval Distributions during an Ongoing Epidemic.

The initial forward serial interval distribution captures the exponential growth phase of an epidemic. We now explore how forward and backward serial intervals can vary over the course of an epidemic, using deterministic and stochastic simulations based on the renewal equations (

*SI Appendix*, sections S1 and S2) using parameters in Table 1; we further assume ${\mathcal{R}}_{0}=2.5$, to reflect the transmission dynamics of COVID-19 in China (37). While the forward serial interval distribution is our primary focus, understanding the differences between the forward and the backward distributions is important because the observed intervals during an ongoing epidemic are often the backward ones: We typically identify infected individuals and ask when and by whom they were infected. Similarly, when we are estimating the incubation period of an individual, we typically observe their symptom onset date and try to estimate when they were infected (e.g., ref. 38).Fig. 3 shows the epidemiological dynamics (Fig. 3

*A*) together with the mean forward (Fig. 3*B*–*D*) and the mean backward (Fig. 3*E*–*G*) delay distributions of a deterministic model based on the renewal equation (Eq.**12**) and of the corresponding stochastic realizations based on individual-based simulations. The mean forward incubation period remains constant throughout an epidemic, by assumption (Fig. 3*B*). The mean forward generation interval decreases slightly when incidence is high, which is when the susceptible population declines rapidly (Fig. 3*C*) (22, 24). In contrast, the mean forward serial interval decreases over time (Fig. 3*D*).Fig. 3.

The forward serial interval distributions depend on distributions of three intervals (Fig. 1

*B*): 1) the backward incubation period, 2) the forward generation interval, and 3) the forward incubation period. In these simulations, both forward incubation period (Fig. 3*B*) and generation interval (Fig. 3*C*) distributions remain roughly constant; therefore, changes in the forward serial interval distributions (Fig. 3*D*) are predominantly driven by changes in the backward incubation period distribution, whose mean increases over time as the growth rate of disease incidence slows and then reverses. In general, relative contributions of the three distributions depend on their shapes, correlations between intrinsic incubation periods and generation intervals, and overall epidemiological dynamics.We see similar qualitative patterns in all three backward delays (Fig. 3

*E*–*G*and Eq.**2**), because they are predominantly driven by the rate of change in incidence, which, in turn, affects relative cohort sizes. When incidence is increasing, individuals are more likely to have been infected recently, and therefore we are more likely to observe shorter intervals (Eq.**3**). Similarly, when incidence decreases, we are more likely to observe longer intervals. Neglecting these changes will bias the inference of intrinsic distributions from observed distributions.### Observed Serial Interval Distributions.

Now, we turn to practical issues of estimating the reproduction number from the observed serial interval data during on ongoing epidemic. In order to have an unbiased estimate of the basic reproduction number, we need to estimate the initial forward serial interval distribution—that is, serial intervals based on cohorts of infectors who share the same symptom onset time, at the early stage of the epidemic. However, researchers typically use all available information to estimate epidemiological parameters (e.g., aggregating all serial intervals observed until certain time of an epidemic). For example, ref. 18 recently suggested that up-to-date serial interval data are necessary to accurately estimate the reproduction number. We explore the consequences of neglecting changes in the realized serial interval distribution on estimates of the basic reproduction number.

When an epidemic is ongoing, the observed serial intervals are subject to right censoring because we cannot observe a serial interval if either an infector or an infectee has not yet developed symptoms. For example, if we were to measure serial intervals on day 8 as in Fig. 4

*A*, we will only be able to observe the first six events (ID 1 to 6). Fig. 4*B*demonstrates how the effect of right censoring in the observed serial intervals translates to the underestimation of the basic reproduction number ${\mathcal{R}}_{0}$ in our stochastic simulations (assuming ${\mathcal{R}}_{0}=2.5$ as in Fig. 3). Notably, even if we could observe and aggregate all serial intervals across all transmission pairs after the epidemic has ended, we would still underestimate the initial mean forward serial interval (and therefore ${\mathcal{R}}_{0}$), likely by a large amount. The observed serial interval distribution converges to the intrinsic serial interval distribution, as the incubation periods and generation intervals will no longer be subject to backward biases. In fact, we would even underestimate the intrinsic value slightly due to contraction of the forward generation interval distribution during the susceptible depletion phase if the epidemic burnt through the population (Fig. 3*C*). Therefore, aggregated distributions of serial intervals that have been collected throughout different periods of an epidemic must be interpreted with care.Fig. 4.

Here, we provide a heuristic way of assessing potential biases in the estimate of the mean initial forward serial interval and therefore ${\mathcal{R}}_{0}$ retrospectively. We can rearrange the line list and group observed serial intervals based on the symptom onset date of infectors (Fig. 4

*C*); as we showed earlier, serial intervals that share the same symptom onset date of a primary case give us the forward serial interval distribution. Then, we can compare how the shape of the serial interval distribution (particularly its mean) as well as the estimate of ${\mathcal{R}}_{0}$ change as we incorporate more recent cohorts into the analysis; that is, we analyze observed serial intervals from infectors who became symptomatic before time $t$ and evaluate how the estimates change as we increase $t$. This approach is analogous to averaging over a set of forward intervals, just as using all information up to a certain time is analogous to averaging over a set of backward intervals (Fig. 4*D*); the major difference is that we focus on serial intervals that begin in a certain period, rather than those that end in a certain period. During the exponential growth phase, the estimates of the mean serial interval and ${\mathcal{R}}_{0}$ are consistent with the true value (see “initial forward” in Fig. 4*B*and*D*); adding more data allows us to make more precise inference during this period. However, the cohort-averaged estimates decrease rapidly soon after the exponential growth period, reflecting changes in the forward serial interval distributions. This approach allows us to detect dynamical changes in the forward serial interval distributions and their effect on the estimates of ${\mathcal{R}}_{0}$.### Applications to the COVID-19 Pandemic.

Finally, we reanalyze serial intervals of COVID-19 collected by Du et al. (13) from mainland China, outside Hubei province, based on 468 transmission events reported between January 21 and February 8, 2020. Du et al. (13) estimated the mean serial interval of 3.96 d (95% CI: 3.53 d to 4.39 d) and ${\mathcal{R}}_{0}$ of 1.32 (95% CI: 1.16 to 1.48). Fig. 5

*A*shows the distribution of symptom onset dates of all individuals within 468 transmission pairs (consisting of a total of 752 unique individuals), resembling a COVID-19 epidemic curve in China (compare figure 1 in ref. 40). In order to quantify changes in serial intervals, we group them by the symptom onset dates of the primary (Fig. 5*B*) and secondary (Fig. 5*C*) cases—corresponding to forward and backward serial interval distributions, respectively—and compute their mean and 95% quantiles. Fig. 5*B*shows that the mean forward serial interval decreases over time. While the decrease is likely to be affected by the right censoring (indicated by the closeness between the quantiles of the observed serial intervals and maximum observable serial intervals), the increase in the proportion of negative serial intervals indicates changes in the forward serial interval distribution; this proportion is unlikely to be affected by left censoring (based on the gap between the quantiles of the observed serial intervals and minimum observable serial intervals). The decrease in the mean forward serial interval was probably driven by interventions against spread. Interventions during this time period both decreased (and then reversed) the growth rate of COVID-19 cases—thus increasing the backward incubation period—and also reduced generation intervals, by preventing infections once cases were identified. Both of these would have acted to reduce the forward serial interval. Fig. 5*C*shows that the mean backward serial interval increased over time, also likely driven directly by the decrease in COVID-19 infections.Fig. 5.

While the qualitative changes in the mean forward and backward serial interval are consistent with our earlier simulations (Fig. 3), the initial mean forward serial interval (Fig. 5

*B*) appears to be larger than what we calculated based on previously estimated incubation period and generation interval distributions (Fig. 2*C*). This difference may imply that the incubation period and generation interval (Table 1) were underestimated, as neither study explicitly accounted for the fact that the observed intervals were drawn from the backward distributions and were likely to have been censored.Fig. 5

*D*shows the cohort-averaged estimates of ${\mathcal{R}}_{0}$, which remain roughly constant until day January 17 and suddenly decrease; this sudden decrease is due to changes in the forward serial intervals consistent with the dynamics seen in our simulations (Fig. 4). The cohort-averaged estimates of ${\mathcal{R}}_{0}$ based on the early forward serial intervals are also consistent with previous estimates of ${\mathcal{R}}_{0}$ of the COVID-19 epidemic in China (1, 37): ${\mathcal{R}}_{0}=2.6$ (95% CI: 2.2 to 3.1) and ${\mathcal{R}}_{0}=3.4$ (95% CI: 2.7 to 4.3) based on a doubling period of 8 d or 6 d, respectively, using serial interval data from infectors who developed symptoms by January 17. These early cohort-averaged estimates of ${\mathcal{R}}_{0}$ are unlikely to be affected by the right censoring, as we expect the degree of right censoring to be low (Fig. 5*A*). Therefore, the original ${\mathcal{R}}_{0}$ estimate of 1.32 (95% CI: 1.16 to 1.48), which neglects the changes in the forward serial interval distribution, underestimates ${\mathcal{R}}_{0}$ by a factor of 2.0 to 2.6. This example demonstrates the danger of using the observed serial intervals to calculate the reproduction number without organizing serial intervals into cohorts.## Discussion

Generation and serial intervals determine the time scale of disease transmission, and are therefore critical to dynamical modeling of infectious outbreaks. We have shown that the initial forward serial interval distribution—measured from the cohort of infectors who developed symptoms during the exponential growth phase of an epidemic—provides the correct link between the exponential growth rate $r$ and the initial reproduction number $\mathcal{R}$. In general, the forward serial interval distributions will not match the intrinsic serial interval distribution (which has the same mean as the intrinsic generation interval distribution) because the incubation period of the infectors (conditional on their symptom onset date of the infector) will be subject to backward biases. In particular, the mean forward serial interval can decrease over time for COVID-19, as individuals who develop symptoms later in an epidemic are more likely to have longer incubation periods, and therefore have greater opportunity to transmit presymptomatically. Failing to account for these effects can result in underestimation of initial $\mathcal{R}$.

Recently, Ali et al. (41) also showed that forward serial intervals of COVID-19 decreased through time in China. They grouped serial intervals by the symptom onset date of infectors across 14-d periods and found that the mean forward serial interval decreased from 7.8 d to 2.6 d. While they attributed the decrease in serial intervals to reduction of the isolation delay, their regression analysis showed that isolation delays explain only 51.5% of the variation in serial intervals (they could explain up to 72% of the variance by including other intervention measures). Our framework provides an explanation for the remaining variation: Changes in the backward incubation period during the decreasing phase of an epidemic act to further shorten serial intervals due to increased amount of presymptomatic transmission (even in the absence of nonpharmaceutical interventions). Isolation delays and other intervention measures affect the amount of onward transmission, and therefore the distribution of realized (forward) generation intervals. They therefore are not expected to explain all of the variation in forward serial intervals, since these additionally depend on both the backward incubation period of the infector and the forward incubation period of the infectee (Fig. 1

*B*).Our results support the use of serial interval distributions for calculating the $\mathcal{R}$ during the exponential growth phase, but they also reveal gaps in current practices in incorporating serial interval distributions into outbreak analyses. For example, ref. 18 recently emphasized the importance of using up-to-date serial interval data for accurate estimation of time-varying reproduction numbers. However, our results show that, if observational biases in the forward serial interval through time are not accounted for, using up-to-date serial interval data can actually exacerbate the underestimation of $\mathcal{R}$ in the initial growth phase of an outbreak. Future studies should explore how neglecting changes in the forward serial interval distribution can affect the estimates of $\mathcal{R}$ beyond the exponential growth phase, and potentially reassess existing estimates of $\mathcal{R}$. We also suggest that modelers should aim to characterize spatiotemporal variation in forward serial interval distributions. These modeling approaches should be coupled with epidemiological investigation through contact tracing. Going forward, an additional advantage of early, intensive contact tracing of emerging diseases is that it provides the best information to characterize the initial forward serial interval distribution.

Our study underlines the fact that the serial interval distribution depends not only on the generation interval and incubation period distributions but also on the correlation between their duration in a given individual. Here, we use a bivariate lognormal distribution to capture these correlations phenomenologically and to show that realized serial intervals can decrease over time in the context of COVID-19. Although their true correlation will depend on viral load dynamics, we expect our conclusions about decreasing serial intervals of COVID-19 to be robust, as individuals with longer incubation periods will generally have a longer time window to transmit before symptom onset. In general, the impact of increasing backward incubation periods on the forward serial intervals are likely to be disease specific—for example, we show, in

*SI Appendix*, section S5, that the initial forward serial interval distribution can be equivalent to the intrinsic generation interval distribution, regardless of the growth rate $r$, due to independence between the incubation period and time from symptom onset to transmission and the lack of presymptomatic transmission. Future studies trying to interpret realized serial intervals should consider carefully the joint distribution between the generation intervals and incubation periods.In closing, we lay out a few practical principles for analyzing and interpreting serial interval data. First, serial intervals should be cohorted based on the symptom onset date of the infector (and not of the infectee) whenever possible. Previous studies have often regarded serial intervals as an intrinsic quantity, having the same mean as the intrinsic generation interval (7, 8, 10, 19), but the distribution (and the mean) of observed serial intervals differs from this expectation, and changes through time, due to epidemic dynamics. Second, aggregating serial intervals across different cohorts and epidemic periods should be avoided because the realized serial interval distribution can be subject to different censoring and epidemiological biases: Even when all realized serial intervals can be observed throughout an unmitigated epidemic, we do not obtain the intrinsic serial interval distribution, due to susceptible depletion (Fig. 4). Third, applying serial interval information across epidemics of a given disease should be done with care, because serial intervals are epidemic specific, rather than disease specific. Finally, serial interval data should be accompanied by a trajectory of the epidemic curve, whenever possible, to provide epidemiological context. In practice, these recommendations will sometimes be hard to follow, due to limited data about serial intervals, but these issues should be kept in mind when interpreting serial interval data to inform transmission dynamics.

More broadly, our study underlines the importance of carefully defining measured epidemiological time distributions. Previous studies have shown the importance of forward vs. backward measurement of generation intervals (19, 23, 24); we generalize these ideas and show that they apply to other epidemiological distributions. Some studies during the early phases of the COVID-19 epidemics have tried to correct for the backward biases (42), but changes in the backward delay distributions due to changing cohort sizes are expected to be a pervasive feature of outbreak dynamics. Cohorting epidemiological delays by the primary event time can help avoid backward biases (although censoring biases can still exist) as well as detect potential changes in the distribution.

Here, we assume that all individuals develop symptoms and that the entire transmission process, including all relevant epidemiological delays, is known exactly. In practice, identifying who infected whom is difficult, in general, and asymptomatic and presymptomatic transmission of COVID-19 exacerbates this difficulty (11, 43, 44). Biases in the observed serial intervals will necessarily bias the estimates of $\mathcal{R}$. Furthermore, when one of the individuals in a transmission pair is asymptomatic, there is no symptom-based serial interval. Neglecting the time scale of asymptomatic transmission may also bias the estimates of $\mathcal{R}$ (45).

Despite these limitations, our analysis of serial intervals of COVID-19 from China provides further support for our theoretical framework, demonstrating temporal variation in serial intervals and its effect on the estimates of $\mathcal{R}$. Most existing estimates of the serial intervals of COVID-19 implicitly or explicitly assume that the serial interval distributions remain constant throughout the course of an epidemic (11, 13, 46–49). Our study provides a rationale for reassessing estimates of serial interval distributions—and their use in estimating $\mathcal{R}$—during the COVID-19 pandemic.

## Data Availability

All data and code are stored in a publicly available GitHub repository (https://github.com/parksw3/serial). All study data are included in the article and

*SI Appendix*.## Acknowledgments

J.D. and D.J.D.E. are grateful for COVID-19 rapid response funding from the Michael G. DeGroote Institute for Infectious Disease Research, McMaster University. D.J.D.E. was supported by Natural Sciences and Engineering Research Council (NSERC). J.D. was supported by Canadian Institutes of Health Research (CIHR). J.S.W. was supported by Army Research Office (W911NF1910384). The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the US NIH or Department of Health and Human Services.

## Supporting Information

Appendix (PDF)

- Download
- 234.95 KB

## References

1

M. S. Majumder, K. D. Mandl, Early in the epidemic: Impact of preprints on global discourse about covid-19 transmissibility.

*Lancet Glob. Health***8**, e627–e630 (2020).2

R. M. Anderson, R. M. May,

*Infectious Diseases of Humans: Dynamics and Control*(Oxford University Press, 1991).3

O. Diekmann, J. A. Heesterbeek, J. A. Metz, On the definition and the computation of the basic reproduction ratio ${\mathcal{R}}_{0}$ in models for infectious diseases in heterogeneous populations.

*J. Math. Biol.***28**, 365–382 (1990).4

C. E. Mills, J. M. Robins, M. Lipsitch, Transmissibility of 1918 pandemic influenza.

*Nature***432**, 904–906 (2004).5

J. Ma, J. Dushoff, B. M. Bolker, D. J. D. Earn, Estimating initial epidemic growth rates.

*Bull. Math. Biol.***76**, 245–260 (2014).6

J. Wallinga, M. Lipsitch, How generation intervals shape the relationship between growth rates and reproductive numbers.

*Proc. Biol. Sci.***274**, 599–604 (2007).7

Å. Svensson, A note on generation times in epidemic models.

*Math. Biosci.***208**, 300–311 (2007).8

D. Klinkenberg, H. Nishiura, The correlation between infectivity and incubation period of measles, estimated from households with two cases.

*J. Theor. Biol.***284**, 52–60 (2011).9

D. E. Te Beest, J. Wallinga, T. Donker, M. V. Boven. Estimating the generation interval of influenza A (H1N1) in a range of social settings.

*Epidemiology***24**, 244–250 (2013).10

D. Champredon, J. Dushoff, D. J. D. Earn, Equivalence of the Erlang-distributed SEIR epidemic model and the renewal equation.

*SIAM J. Appl. Math.***78**, 3258–3278 (2018).11

X. He et al., Temporal dynamics in viral shedding and transmissibility of COVID-19.

*Nat. Med.***26**, 672–675 (2020).12

S. Abbott et al., Temporal variation in transmission during the COVID-19 outbreak. https://cmmid.github.io/topics/covid19/current-patterns-transmission/global-time-varying-transmission.html. Accessed 20 April 2020.

13

Z. Du et al., Serial interval of COVID-19 among publicly reported confirmed cases.

*Emerg. Infect. Dis.***26**, 1341–1343 (2020).14

J. T. Wu, K. Leung, G. M. Leung, Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: A modeling study.

*Lancet***395**, 689–697 (2020).15

S. Zhao et al., Serial interval in determining the estimation of reproduction number of the novel coronavirus disease (COVID-19) during the early outbreak.

*J. Trav. Med.***27**, taaa033 (2020).16

R. M. Anderson, H. Heesterbeek, D. Klinkenberg, T. D. Hollingsworth, How will country-based mitigation measures influence the course of the COVID-19 epidemic?.

*Lancet***395**, 931–934 (2020).17

J. Hellewell et al., Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts.

*Lancet Glob. Health***8**, e488–e496 (2020).18

R. N. Thompson et al., Improved inference of time-varying reproduction numbers during infectious disease outbreaks.

*Epidemics***29**, 100356 (2019).19

T. Britton, G. S. Tomba, Estimation in emerging epidemics: Biases and remedies.

*J. R. Soc. Interface***16**, 20180670 (2019).20

T. Ganyani et al., Estimating the generation interval for coronavirus disease (COVID-19) based on symptom onset data, March 2020.

*Eurosurveillance***25**, 2000257 (2020).21

S. W. Park, D. Champredon, J. Dushoff, Inferring generation-interval distributions from contact-tracing data.

*J. R. Soc. Interface***17**, 20190719 (2020).22

E. Kenah, M. Lipsitch, J. M. Robins. Generation interval contraction and epidemic data analysis.

*Math. Biosci.***213**, 71–79 (2008).23

H. Nishiura, Time variations in the generation time of an infectious disease: Implications for sampling to appropriately quantify transmission potential.

*Math. Biosci. Eng.***7**, 851–869 (2010).24

D. Champredon, J. Dushoff, Intrinsic and realized generation intervals in infectious-disease transmission.

*Proc. Biol. Sci.***282**, 20152026 (2015).25

C. Fraser, Estimating individual and household reproduction numbers in an emerging epidemic.

*PloS One***2**, e758 (2007).26

J. A. P. Heesterbeek, K. Dietz. The concept of ${\mathcal{R}}_{0}$ in epidemic theory.

*Stat. Neerl.***50**, 89–110 (1996).27

O. Diekmann, J. A. P. Heesterbeek.

*Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpretation*(John Wiley, 2000),**vol. 5**.28

M. G. Roberts, Modeling strategies for minimizing the impact of an imported exotic infection.

*Proc. Roy. Soc. Lond. B Biol. Sci.***271**, 2411–2415 (2004).29

G. K. Aldis, M. G. Roberts, An integral equation model for the control of a smallpox outbreak.

*Math. Biosci.***195**, 1–22 (2005).30

M. G. Roberts, J. A. P. Heesterbeek, Model-consistent estimation of the basic reproduction number from the incidence of an emerging infection.

*J. Math. Biol.***55**, 803–816 (2007).31

K. M. Gostic et al., Practical considerations for measuring the effective reproductive number, ${\mathcal{R}}_{t}$. medRxiv:10.1101/2020.06.18.20134858 (28 August 2020).

32

S. Flaxman et al., Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe.

*Nature***584**, 257–261 (2020).33

S. A. Lauer et al., The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: Estimation and application.

*Ann. Intern. Med.***172**, 577–582 (2020).34

L. Ferretti et al., Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing.

*Science***368**, eabb6936 (2020).35

J. S. Weitz, J. Dushoff, Modeling post-death transmission of ebola: Challenges for inference and opportunities for control.

*Sci. Rep.***5**, 8751 (2015).36

S. W. Park, D. Champredon, J. S. Weitz, J. Dushoff, A practical generation-interval-based approach to inferring the strength of epidemics from their speed.

*Epidemics***27**, 12–18 (2019).37

S. W. Park et al., Reconciling early-outbreak estimates of the basic reproductive number and its uncertainty: Framework and applications to the novel coronavirus (SARS-CoV-2) outbreak.

*J. R. Soc. Interface***17**, 20200144 (2020).38

J. A. Backer, D. Klinkenberg, J. Wallinga, Incubation period of 2019 novel coronavirus (2019-nCoV) infections among travellers from Wuhan, China, 20–28 January 2020.

*Eurosurveillance***25**, 2000062 (2020).39

Q. Li et al., Early transmission dynamics in Wuhan, China, of novel coronavirus–infected pneumonia.

*N. Engl. J. Med.***382**, 1199–1207 (2020).40

A. Pan et al., Association of public health interventions with the epidemiology of the COVID-19 outbreak in wuhan, China.

*J. Am. Med. Assoc.***323**, 1915–1923 (2020).41

S. T. Ali et al., Serial interval of SARS-CoV-2 was shortened over time by nonpharmaceutical interventions.

*Science***369**, 1106–1109 (2020).42

R. Verity et al., Estimates of the severity of coronavirus disease 2019: A model-based analysis.

*Lancet Infect. Dis.***20**, 669–677 (2020).43

Y. Bai et al., Presumed asymptomatic carrier transmission of COVID-19.

*JAMA***323**, 1406–1407 (2020).44

W. E. Wei, Presymptomatic transmission of SARS-CoV-2—Singapore, January 23–March 16, 2020.

*Morbidity Mortality Weekly Rep.***69**, 411–415 (2020).45

S. W. Park, D. M. Cornforth, J. Dushoff, J. S. Weitz, The time scale of asymptomatic transmission affects estimates of epidemic potential in the COVID-19 outbreak.

*Epidemics***69**, 100392 (2020).46

H. Nishiura, N. M. Binton, A. R. Akhmetzhanov. Serial interval of novel coronavirus (COVID-19) infections.

*Int. J. Infect. Dis.***93**, 284–286 (2020).47

L. Tindale et al., Transmission interval estimates suggest pre-symptomatic spread of COVID-19. medRxiv:https://doi.org/10.1101/2020.03.03.20029983 (6 March 2020).

48

S. Zhao et al., Estimating the serial interval of the novel coronavirus disease (COVID-19): A statistical analysis using the public data in Hong Kong from January 16 to February 15, 2020. medRxiv:https://doi.org/10.1101/2020.02.21.20026559 (25 February 2020).

49

J. Zhang et al., Evolving epidemiology and transmission dynamics of coronavirus disease 2019 outside Hubei province, China: A descriptive and modeling study.

*Lancet Infect. Dis.***20**, P793–P802 (2020).## Information & Authors

### Information

#### Published in

#### Classifications

#### Copyright

Copyright © 2021 the Author(s). Published by PNAS. This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY).

#### Data Availability

All data and code are stored in a publicly available GitHub repository (https://github.com/parksw3/serial). All study data are included in the article and

*SI Appendix*.#### Submission history

**Published online**: December 23, 2020

**Published in issue**: January 12, 2021

#### Keywords

#### Acknowledgments

J.D. and D.J.D.E. are grateful for COVID-19 rapid response funding from the Michael G. DeGroote Institute for Infectious Disease Research, McMaster University. D.J.D.E. was supported by Natural Sciences and Engineering Research Council (NSERC). J.D. was supported by Canadian Institutes of Health Research (CIHR). J.S.W. was supported by Army Research Office (W911NF1910384). The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the US NIH or Department of Health and Human Services.

#### Notes

This article is a PNAS Direct Submission.

### Authors

#### Competing Interests

The authors declare no competing interest.

## Metrics & Citations

### Metrics

#### Citation statements

#### Altmetrics

### Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

#### Cited by

Loading...

## View Options

### View options

#### PDF format

Download this article as a PDF file

DOWNLOAD PDF### Get Access

#### Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Personal login Institutional Login#### Recommend to a librarian

Recommend PNAS to a Librarian#### Purchase options

Purchase this article to get full access to it.