Forward-looking serial intervals correctly link epidemic growth to reproduction numbers

Significance The generation and serial interval distributions are key, but different, quantities in outbreak analyses. Recent studies suggest that the two distributions give different estimates of the reproduction number R as inferred from the observed growth rate r. Here, we show that estimating R based on r and the serial interval distribution, when defined from the correct reference cohort, gives the same estimate as using r and the generation interval distribution. We apply our framework to COVID-19 serial interval data from China, outside Hubei province (January 21 to February 8, 2020), revealing systematic biases in prior inference methods. Our study provides the theoretical basis for practical changes to the principled use of serial interval distributions in estimating R during epidemics.


S1 Deterministic simulation
We simulate the renewal equation model using a discrete-time approximation: whereĝ is a discrete-time intrinsic generation-interval distribution that satisfies the following: , m = 1, . . . , m max .
The continuous-time intrinsic generation-interval distribution is parameterized using a lognormal distribution (Table 1). We define the intrinsic incubation period distribution in a similar manner:ˆ , m = 1, . . . , m max , (S.3) where its continuous-time analog is also based on a log-normal distribution. For simplicity, we assume that the forward incubation periods and intrinsic generation intervals are independent:ĥ (m∆t, n∆t) =ˆ (m∆t)ĝ(n∆t), m, n = 1, . . . , m max .

(S.4)
We use ∆t = 0.025 days and m max = 2001 for discretization steps. We initialize the simulation with population size N =40,000 as follows: where C is chosen such that mmax n=1 i(m∆t) = 10. These initial conditions allow the model to follow exponential growth from time ∆t(m max + 1) without any transient behaviors.

S2 Stochastic simulation
We run stochastic simulations of the renewal equation model using an individual-based model on a fully connected network (i.e., homogeneous population) based on the Gillespie algorithm that we developed earlier (Park et al., 2020). First, we initialize an epidemic with I(0) infected individuals (nodes) in a fully connected network of size N . For each initially infected individual, we draw number of infectious contacts from a Poisson distribution with the mean of R 0 and the corresponding generation intervals for each contact from a lognormal distribution (Table 1). Contactees are uniformly sampled from the total population. All contactees are sorted into event queues based on their infection time. We update the current time to the infection time of the first person in the queue. Then, the first person in the queue makes contacts based on the Poisson offspring distribution described earlier and their contactees are added to the sorted queue. Whenever contactees are added to the sorted queue, we remove all duplicated contacts (but keep the first one) as well as contacts made to individuals that have already been infected. Simulations continue until there are no more individuals in the queue. We simulate 10 epidemics with I(0) = 10 and N =40,000.

S3 Linking r and R 0 using serial-interval distributions
The intrinsic generation-interval distribution g(τ ) provides a link between r and R 0 via the Euler-Lotka equation (Wallinga and Lipsitch, 2007): In this section, we prove that the initial forward serial-interval distribution f 0 (τ ) also estimates the same R 0 from r, except that integral extends to τ = −∞ rather than beginning at τ = 0, because serial intervals can be negative: Here, the initial forward serial-interval distribution f 0 (τ ) is defined as: where h is the joint probability distribution describing the intrinsic generation-interval distribution g and the intrinsic incubation period distribution (see Eq. (15) in the main text), and the normalization constant φ is determined by the requirement that ∞ −∞ f 0 (τ ) dτ = 1. In order to verify Eq. (S.7), we first rewrite the integral in Eq. (S.8) by substituting −α 1 for α 1 , and then changing the order of integration: To further simplify the expression, we define z(α 2 ) as follows: Substituting z(α 2 ) into Eq. (S.9) we obtain: Writingẑ for a normalized version of z, we can now express the initial forward serial-interval distribution f 0 as a convolution ofẑ and : Since the right hand side of Eq. (S.7) is also a Laplace transform of f 0 =ẑ * , we can express it as the product of Laplace transforms ofẑ and : (S.14) In order to derive an expression for a Laplace transform ofẑ, we have to first derive an analytical expression for ∞ −∞ z(x) dx. By changing the order of integration, we have: Since is a marginal probability distribution of h, it follows that: Then, we have:ẑ Substituting the expression into Eq. (S.14), we have: exp(−rα 1 )h(α 1 , α 2 + α 1 ) dα 1 dα 2 .
(S.18) Recall that g is also a marginal probability distribution of h: We can then substitute τ = α 1 + α 2 into Eq. (S.18) and apply change of variables to obtain: Therefore, the initial forward serial-interval distribution and the intrinsic generation-interval distribution give the same estimates of R 0 from r.

S4 Comparing the estimates of R 0 using the initial forward and the intrinsic serial-interval distributions
We use a simulation-based approach to compare the estimates of R 0 based on the serialand generation-interval distributions. To do so, we model the intrinsic generation-interval distribution and the incubation period using a multivariate log-normal distribution with log means µ G , µ I , log standard variances σ 2 G , σ 2 I , and log-scale correlation ρ; the multivariate lognormal distribution is parameterized based on parameter estimates for COVID-19 (Table 1). We construct forward serial intervals during the exponential growth period as follows: where the backward incubation period X 1,i of an infector is simulated by drawing random lognormal samples Y i with log mean µ I and log variance σ 2 I and resampling Y i , each weighted by the inverse of the exponential growth function exp(−rY i ); the intrinsic generation interval conditional on the incubation period of the infector (G i |X 1,i ) is drawn from a log-normal distribution with log mean µ G + σ G ρ(log(X 1,i ) − µ I )/σ I and log variance σ 2 G (1 − ρ 2 ); the forward incubation period X 2,i of an infectee is drawn from a log-normal distribution with log mean µ I and log variance σ 2 I . We then calculate the basic reproduction number R 0 using the empirical estimator: .

(S.25)
We compare this with an estimate of R 0 based on the intrinsic serial-interval distribution which has the same mean as the intrinsic generation-interval distribution (Svensson, 2007;Klinkenberg and Nishiura, 2011;Champredon et al., 2018;Britton and Scalia Tomba, 2019):

S5 Applications: SEIR model
Consider a Susceptible-Exposed-Infectious-Recovered model: where β is the transmission rate, 1/γ E is the mean latent period, and 1/γ I is the mean infectious period. We further assume that the latent period is equivalent to incubation period; in other words, infected individuals can only transmit after symptom onset. Then, the generation interval will be always longer than the incubation period.
The joint probability distribution of the intrinsic incubation periods and intrinsic generation intervals for this model can be written as: Then, the intrinsic generation-interval distribution is given by: On the other hand, the initial forward serial-interval distribution is given by: Therefore, both the intrinsic generation intervals and the initial forward serial intervals are identically distributed and have the same mean.
S6 Simulations with correlated intrinsic incubation periods and intrinsic generation intervals.  Figure S1: Epidemiological dynamics and changes in mean forward and backward delay distributions. (A) Daily incidence over time. (B-D) Changes in the mean forward incubation period, generation interval, and serial interval. (E-G) Changes in the mean backward incubation period, generation interval, and serial interval. Intrinsic incubation periods and intrinsic generation intervals are modeled using a correlated bivariate lognormal distribution; therefore, generation intervals are drawn from the corresponding conditional distributions (given a incubation period), instead of the marginal distribution. Higher correlation reduces the amount of changes in the mean forward serial interval because shorter (longer) backward incubation periods of infectors during the increasing (decreasing) phase of an epidemic are associated with shorter (longer) forward generation intervals. See Figure  3 in the main text for a detailed description.