New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Reconstructing influenza incidence by deconvolution of daily mortality time series

Edited by Burton H. Singer, Princeton University, Princeton, NJ, and approved October 23, 2009 (received for review March 17, 2009)
Abstract
We propose a mathematically straightforward method to infer the incidence curve of an epidemic from a recorded daily death curve and timetodeath distribution; the method is based on the Richardson–Lucy deconvolution scheme from optics. We apply the method to reconstruct the incidence curves for the 1918 influenza epidemic in Philadelphia and New York State. The incidence curves are then used to estimate epidemiological quantities, such as daily reproductive numbers and infectivity ratios. We found that during a brief period before the official control measures were implemented in Philadelphia, the drop in the daily number of new infections due to an average infector was much larger than expected from the depletion of susceptibles during that period; this finding was subjected to extensive sensitivity analysis. Combining this with recorded evidence about public behavior, we conclude that public awareness and change in behavior is likely to have had a major role in the slowdown of the epidemic even in a city whose response to the 1918 influenza epidemic is considered to have been among the worst in the U.S.
In characterizing a newly emerging or historical epidemic, one often has access to an epidemic curve that provides the number of persons who became ill on a certain day (the symptom curve), or the number of cases that were reported each day (the report curve), or the number of dying that were reported each day (the death curve). Of more direct interest, for the purposes of visualizing the spread of the epidemic and calculating relevant quantities such as the daily reproductive number, is the (usually unobserved) epidemic curve of the number of persons becoming infected on each day (the incidence curve). The other three curves—the symptom, report, and death curves—provide information about the incidence curve but are imperfect representations of it, first because not all cases may appear in these other curves—asymptomatics, unreported cases, or nonfatal cases respectively will be missed—and perhaps more importantly because the delay between infection and subsequent events—symptom onset, report, or death—is a random variable that adds horizontal variation or “smear” to the epidemic curve. For some infections (e.g., HIV), diagnostic symptoms (i.e., AIDSdefining illness) may occur years after infection, so the symptom curve is a poor reflection of the evolution of the epidemic; there is much literature on the problem of deconvolution to estimate the HIV incidence curve from the symptom (AIDSincidence) curve (1). For acute infections, such as SARS and influenza, the incubation period is relatively short compared with the growth rate of the epidemic; hence the symptom curve and the incidence curve are similar. The time to death for such infections, however, can be several weeks, equivalent to three or more disease generations, and is highly variable (2–4), making the death curve a poor surrogate for the incidence curve.
In this paper, we propose a mathematically straightforward method to infer the incidence curve from a death curve. The incidence curve is then used to estimate the basic epidemiological quantities of interest, such as the daily reproductive numbers and infectivity ratios; the latter can be utilized to assess major changes in the epidemic's dynamics.
First, we describe a technique for deconvolution to estimate the incidence curve from the death curve and the timetodeath distribution. The technique, called the RichardsonLucy (RL) deconvolution, was originally developed for use in optics (5, 6), and is adapted in a simple way to the slightly different setting of the deathtoincidence deconvolution problem. The technique is illustrated for the 1918 influenza epidemic in Philadelphia and New York State (2, 7, 8).
Second, we use the reconstructed incidence curves to perform inference on daily reproductive numbers in that epidemic. One level of difficulty arises from the fact that because of the rapid progression of the epidemic, there is saturation of susceptibles and a possible change in behavior during the course of infection for persons infected on a given day (9). To deal with this issue, we have rederived the Wallinga–Teunis estimator by using an approach similar to the one in ref. 10. The derivation uses daily infectivity ratios, a concept which essentially appeared before in refs. 10 and 11.
Third, we examine the spread of pandemic influenza in the city of Philadelphia around the end of September and the beginning of October, 1918. On September 28, a 200,000person Liberty Loan Drive took place on the streets of Philadelphia against the advice of medical professionals (12). Within 72 hours, “every single bed in the city's 31 hospitals was filled” (ref. 13, p. 220). By October 1, residents were often encouraged to stay home to stem spread of the disease (14). On the evening of October 3, the closure of schools, churches, and places of public amusement was adopted by the Philadelphia city council (ref. 15, p. 74). The deconvolved incidence curve, which peaks around October 1–2, shows a drastic change in the growth patterns/infectivity ratios between September 26 and October 3. In particular, the infectivity ratios, representing the average number of infections by an infector on a given day, dropped by more than half. We estimate, assuming that casefatality ratio is at least 2% (16, 17), that during this period depletion of susceptibles was at most 16% (here and elsewhere, depletion is characterized in terms of the number of susceptibles on September 26, which equals Philadelphia's population minus the number of infected by September 25). This difference suggests that depletion of susceptibles played only a modest role in slowing the growth of the epidemic during that period, which took place before the closures went into effect. Our main conclusion, which is not merely of historical interest, is that public awareness and changes in behavior are likely to have had a major role in the slowdown of the epidemic before the official control measures were in place.
In the SI Appendix we use simulations and sensitivity analyses to test the robustness of our methodology/conclusions. We examine how well the deconvolved incidence curve represents the original one, and what is the best way to estimate the growth rate and the initial reproductive number of the original incidence curve. We also study the epidemic progression in Philadelphia between September 26 and October 3, considering various “incidence” curves deconvolved by using a collection of randomly generated timetodeath distributions and also allowing for time dependency of the casefatality ratios due to potential changes in demographics of the infected. Our general conclusion remains valid in all cases: Within that period, a drastic change in the epidemic's dynamics took place, and depletion of susceptibles was probably insufficient to explain that development.
Results
Reconstructing the Daily Incidence Curves for the 1918 Influenza Epidemic.
We apply the procedure from the Methods section to the 1918 influenza and pneumonia death curves in Philadelphia and New York State (excluding the city) to obtain the daily incidence in those places. Fig. 1 depicts the reconstructed incidence curves, which are scaled by a factor of p, where p is the case fatality ratio. Notably, the peaks of the incidence curves are shifted backward by approximately nine days, but the incidence curves have narrower peaks with steeper upward and downward trajectories as one would expect when removing the smear induced by the timetodeath distribution, which is analogous to removing the blur due to camera motion from a photographic image.
Estimates of the Daily Reproductive Numbers
The effective reproductive number R _{t} on day t measures the mean number of infections caused by persons who become infected on day t. Fig. 2 plots the estimated reproductive numbers for certain periods of the epidemic's progression, in both Philadelphia and New York State. The reproductive numbers fluctuate initially because of the small values for the deconvolved incidence. The reproductive numbers descend in the later part of the exponential growth period, representing future infections which happen under saturation of susceptibles or behavior change.
With the fluctuation of the reproductive numbers, one may wonder what is the best way to estimate R _{0}, the mean number of new infections caused by an infected individual during the early, exponential growth stage of the actual epidemic curve in Philadelphia. We address this via simulations in the SI Appendix. Our estimate is R _{0} = 2.14, with a standard error of 0.13.
Epidemic Peak in Philadelphia.
In this section, we examine the peak of the Philadelphia epidemic in more detail. We first compute the daily infectivity ratios, representing the number of infections on a given day caused by a weighted average of previously infected individuals (Eq. 4). Fig. 3 plots the estimated infectivity ratios for the period of September 14–October 19.
The infectivity ratio IR_{t} on a given day t presents a basic assessment of the epidemic's state on day t; in a mass action model, the infectivity ratio is proportional to the number of susceptibles left on that day. The reproductive number R _{t} on day t measures the number of infections caused during an infectious period of an average person who was infected on day t; it may be harder to relate R _{t} to the epidemic's state on day t because during that infectious period further depletion of susceptibles may occur, and conditions related to the epidemic's progression, such as an introduction of control measures and an increase in public awareness, may take place.
The estimated infectivity ratios dropped by more than half between September 26 and October 3. At the same time, depletion of susceptibles was much smaller. Philadelphia had a population of 1.7 million, with another 300,000 added by the war industry (15). As explained in Methods, the estimated incidence curve plotted in Figure 1 is scaled by a factor of p, where p is the case fatality ratio. Various estimates of p, according to location, age, gender and race, are given in refs. 16 and 17. There is considerable geographic variation in casefatality ratios, and for the surveyed communities in the Northeastern United States (from Baltimore, MD to New London, CT) casefatality ratios were above 2%, topping 3% in some places. Assuming P ≥ 0.02 for Philadelphia and by using the deconvolved incidence curve, we estimate that at most 16% of the susceptible population was depleted between September 26 and October 3, which cannot account for the over 50% drop in the infectivity ratios during that period; the latter finding was subjected to extensive sensitivity analysis with regard to the timetodeath distribution and other parameters (see the SI Appendix for more details). At the same time, we have evidence from refs. 12 and 14 about public awareness of the epidemic prior to October 3. We conclude that public awareness and change in behavior likely played a major role in the slowdown of the epidemic in Philadelphia before the official control measures were implemented on October 4.
Discussion
We have described and demonstrated the usefulness of a RLtype deconvolution approach to reconstruct incidence curves from epidemic curves showing the times of death in an epidemic. Applied to the 1918 influenza data, this approach reconstructs an epidemic curve that is more sharply peaked than the death curve—as one would expect given the smear introduced by the large variance in the timetodeath distribution. Moreover, this reconstructed curve allows the direct estimation of a daily value for the reproductive number R _{t}, bypassing the morecumbersome methods used previously, which involved extensive simulations and allowed the estimation of only a single value for R (4), or a parametric estimation of R _{t} (18). The estimate of R _{t} is given by the Wallinga–Teunis formula (9). Our derivation of this estimate relies on the fact that the number of infected individuals is large hence the notion of an averageinfectiousness profile makes sense; this is different from the original setting of an emerging epidemic in ref. 9, where it is assumed that all infected individuals had the same infectiousness profile. Finally, the deconvolved incidence curve was used to show that public awareness and change in behavior is likely to have played a major role in the slowdown of the epidemic even in Philadelphia, a city whose response to the 1918 influenza epidemic is considered to have been among the worst in the U.S. (13, 19). A related point regarding the belatedness of official control measures in Philadelphia was made in ref. 18.
The latter qualitative conclusion was subjected to extensive sensitivity analysis. The population of Philadelphia was not homogeneous, with casefatality ratios, timetodeath distributions, infectivity and susceptibility to infection varying by age and even gender. Although we cannot recover the complexity of the population network in Philadelphia, we have tested our observation that the drop in infectivity ratios during the week between September 26 and October 3 greatly exceeded the depletion of susceptibles under various scenarios; we have examined the sensitivity of that conclusion with respect to the timetodeath distribution, and allowed for certain forms of time dependency for the casefatality ratios, reflecting upon potential changes in the demographics of the infected as the epidemic progressed. The main conclusion persisted with remarkable stability in the simulations we have performed. Furthermore, visual inspection of the growth rate of the death curve (Fig. 4) shows a major change in its slope by October 5–6, which was likely predated by major changes in incidence patterns. Combining this with contemporary journalistic evidence (14) suggests that our hypothesis about the change in behavior before the official control measures were implemented is likely to be real.
We have not considered asymptomatic infections in our analysis. Data from the 1957 and the 1968 influenza epidemics yielded estimates of 20–42% for the rate of asymptomatic infections (20, 21). A review of several volunteer challenge studies (22) gives an estimate of 33% for the rate of asymptomatic infections. There is no data for the rate of asymptomatic infections for the 1918 influenza pandemic. Given its virulence, that rate could have been lower than the rates above. In our simulations, the drop in infectivity ratios between September 26 and October 3 surpasses depletion of susceptibles by a factor of at least 2.5. Thus, even if a third of all infections were asymptomatic, depletion of susceptibles during that period cannot explain the drop in infectivity ratios.
Our approach in principle gives a general method for reconstructing an incidence curve from the death curve. In practice, to have reliable estimates one needs the standard errors of the counts to be small as compared with the means—say, if the (Poisson) counts reach into the hundreds. We recommend performing simulations following the protocol in the SI Appendix to see how well the deconvolution process works.
We note that estimated infectivity ratios in Philadelphia (Fig. 3) first drop on September 27, one day before the notorious Liberty Loan parade, and seven days before the implementation of official control measures. Although reductions in transmission in the absence of official control measures have been inferred from the dynamics of influenza in U.S. cities by a previous study (18), this finding for Philadelphia is surprising, given that the parade is often interpreted as a sign that Philadelphia residents had not yet recognized the seriousness of the pandemic, and, moreover, that the parade is often seen as an opportunity for largescale transmission. If we have accurately estimated the timing of the decline in infectivity ratios, then the parade was probably not in fact a major venue for transmission, perhaps because it took place in the open air, and we must infer that behavior changes made a contribution to the slowing of the epidemic prior to official control measures. We have considered the possibility that the timing is estimated incorrectly. For example, it is possible that the timetodeath distribution we have employed is too long, resulting in artificially early changes in the incidence curve to reflect changes in the death curve. We believe that a shorter timetodeath distribution is unlikely because we are unaware of any reliable data suggesting that the timetodeath distribution is substantially shorter than the one we have adopted from ref. 2; in fact, the distribution we used is shorter than the one obtained from military data and used previously in refs. 4 and 18 (see SI Appendix). A likely source of error in estimating the exact timing of declines in infectivity ratios might be the nature of the deconvolution process, as seen in simulations in the SI Appendix. Briefly, deconvolved incidence curves have an exponential growth period ending about two days before the original one.
One may wonder if it is possible to use the death curve directly to infer epidemiological parameters of interest, such as R _{0}. One can try the exponential growth rate approach (see the SI Appendix). In theory, given a long exponential growth period for incidence (long in comparison to the timetodeath distribution), the death curve would have the same exponential growth rate. However for the Philadelphia data, deconvolution shows that the death curve growth rate is somewhat smaller than the incidence curve growth rate, and the estimate of R _{0} resulting from the death curve growth rate (≃ 2 − 2.02) is biased downward. Alternatively, one can consider the Wallinga and Teunis approach for estimating the reproductive numbers by using the “deathtodeath” distribution. The latter can be understood as follows: For people who died on day t, one can look at the times of death of the individuals they've infected. However this distribution depends on the state of the epidemic around time t, and thus changes with t. Moreover, assuming that the serial interval distribution is independent for different pairs of cases, the deathtodeath distribution will not be independent for successive pairs of cases (e.g., A who infects B who infects C), violating the assumptions of the approach. At the same time, we found that having information about the incidence curve can be used for a number of purposes besides estimating R _{0}. Our estimates of the daily reproductive numbers and assessment of the rapid decline of infectivity ratios around the peak of the epidemic in Philadelphia may not be accessible directly from the death curve.
It is also worthwhile to compare the approach to estimating R _{t} here to an approach which assumes that future opportunities for infection for each individual, regardless of when they are infected, will follow a fixed distribution going forward in time. One might expect that the relation between the reproductive numbers, the serial interval (w _{i}) (see Methods), and new infections is that for an average person infected on day t, the expected number E _{t,i} of people s/he will infect on day t + i is
The relation above is known to hold during periods when there is little to no depletion of susceptibles and behavior change (23)— during that period, R _{t} is in fact constant. However this may be untenable for the whole duration of an epidemic, where the number of susceptibles may decline over the course of a single individual's infectious period (24); similarly, measures to control the epidemic and behavior change may take place during that period. To put the issue a bit more precisely, consider the ratio
Methods
Richardson–LucyType Deconvolution.
The deconvolution problem is to assess the daily incidence curve (I _{t}) given the daily death curve (D _{1},…,D _{N}) and time frominfectiontodeath distribution (d _{1},…,d _{l}). Here, D _{j} is the number of deaths recorded on day j; and d _{k} represents the probability that an infected person who will eventually die, will die on day k after his/her infection. The daily death curves for Philadelphia and New York State (excluding city) are recorded in refs. 7 and 8. The time frominfectiontodeath distribution is obtained as a convolution of two distributions: time from symptom onset to death, taken from ref. 2, based on 599 hospital autopsy reports and the influenza latent period distribution, taken from ref. 25. It is plotted in Fig. 5.
To assess incidence I _{t} on day t, let p be the probability that an infected individual will die from influenza and pneumonia; in the SI Appendix, we allow for time dependence of p due to potential shifts in demographics of the infected. The number of people who got infected on day t and later died is binomial Bin(I _{t},p); because I _{t} is large and p is small, it is well approximated by a Poisson variable Pois(λ_{t}) with (an unknown) mean λ_{t} = p · I _{t}. Those Pois(λ_{t}) deaths are binned over the subsequent days according to the timetodeath distribution.
We wish to estimate the daily incidence curve during some time period (t _{1},…,t _{2}), which overlaps with the period (1,…,N) for which the daily counts for the number of deaths are available (N = 92 for Philadelphia and N = 75 for New York State). Because over 95% of deaths happen within three weeks of infection, we want to estimate incidence from day t _{1} = −20 (three weeks prior to the available death data); we also take t _{2} = N − 2 as d _{1} = 0 for our timetodeath distribution. Thus, the number of unknown parameters is N + 19, exceeding the number N of observations.
To assess the vector of unknown parameters (λ_{t1},…,λ_{t2}) (thus estimating the incidence curve up to a multiple), we iterate in the space of parameters by using the expectation maximization algorithm (26); this procedure is called the RL iteration. The initial guess λ^{0} = (λ_{t1} ^{0},…,λ_{t2} ^{0}) for the parameters is the death curve, shifted back by nine days, as the timetodeath distribution (d _{1},…,d _{31}) peaks on day nine after infection (see Fig. 4). RL iterations produce a sequence λ^{n} = (λ_{t1} ^{n},…,λ_{t2} ^{n}) of the Poisson parameters. Explicitly, let q _{j} = ∑_{−j+1≤i≤N−j} d _{i} be the probability that a death resulting from incidence on day j will be observed during the interval 1,…,N; here −20 ≤ j ≤ N − 2 and d _{i} = 0 for i ≤ 0 or i ≥ 32. Let D _{i} ^{n} = ∑_{j<i} d _{i−j}λ_{j} ^{n} be the expected number of deaths to occur on day i, conditional on the parameters λ^{n}. Then,
The probability of observations D _{1},…,D _{N}, conditional on the Poisson parameters λ^{n} and the timetodeath distribution, increases with each iteration. While iterating, we do not seek convergence to a maximum likelihood solution; rather, we use a criterion to end iterations and produce our estimate of the Poisson parameters. To understand this, for the n th iteration, let E _{i} ^{n} = ∑_{j<i} λ_{j} ^{n} d _{i−j} be the expected number of deaths on day i. If λ^{n} were the true parameters, D _{i} would be Poisson distributed with mean E _{i} ^{n}; thus, the expectation
Bootsrapping simulations in the SI Appendix address the question of how closely a deconvolved curve resembles the original incidence curve.
Estimating Reproductive Numbers and Infectivity Ratios.
In this section, we show how to estimate the daily reproductive numbers for the 1918 influenza from incidence curves and the infectiousness profile distribution. We remark that the words “infectiousness profile” and “serial interval” are often used interchangeably, though this is not quite accurate (see ref. 27, where “infectiousness profile distribution” is called “infectious contact distribution”). We prefer the term “infectiousness profile distribution” because it reflects upon average individual infectivity, which depends only on the time since one became infected, unlike the proportion of infectious contacts on each day after infection or the time between one's infection and the infection of one's infector.
We set a cutoff of 10 days for the infectivity process. The infectiousness profile distribution (w _{1},…,w _{10}), essentially taken from ref. 25, is plotted in the SI Appendix. Here each number w _{i} represents the proportion of the cumulative infectiousness which falls between days (i − 0.5,i + 0.5) for an average person. Note that, conveniently, the latent period in ref. 25 has an offset of 0.5 days.
By using the infectiousness profile distribution and the incidence curve, we can describe some key epidemiological parameters of interest. The infectivity ratio IR_{t} on day t measures the number of people infected by an “average” infector on day t: This concept was already used in ref. 10, where it was denoted by ϕ_{1}(t), and in ref. 11, where it was denoted by R _{t}. If the population of susceptibles and their behavior does not change much over a reasonable interval (covering the time of one individual's infectiousness history), the infectivity ratio would equal the reproductive number. This is not the case for the epidemics in question. However, because the number of infected is large, we can compute the reproductive number R _{t} on day t in a “forwardlooking” way, by adding up the numbers of infections caused in subsequent days by an average person who got infected on day t:
This is the Wallinga–Teunis formula, and its derivation essentially follows ref. 11. Note that this derivation relies on the fact that the number of infected is large, hence the notion of an average infectiousness profile makes sense; this is different from the original setting of an emerging epidemic in ref. 9, where it is assumed that all the infected have the same infectiousness profile. The two derivations help to clarify the notion that the reproductive number R _{t} is forwardlooking, taking account of all the events that may happen after time t that will affect how many secondary cases an individual infected at t will infect. By contrast, the infectivity ratio looks backward from the day on which infection occurs. In the case of exponential growth (where opportunities for transmission remain constant over time), the two measures are equivalent.
Acknowledgments
M.L. and E.G. were funded by the U.S. National Institutes of Health Models of Infectious Disease Agent Study Cooperative Agreements 5U01GM076497 and 1U54GM088588 to M.L. D.J.D.E., J.M., and J.D. were supported by the Canadian Institutes of Health Research and the Natural Sciences and Engineering Research Council of Canada. D.E. and J.B.P. were supported by the J. S. McDonnell Foundation. J.B.P. was supported by the Burroughs Wellcome Fund, the David and Lucille Packard Foundation, and National Institutes of Health Grant 3U54AI05716806S1. The authors would like to thank the Banff International Research Station for hosting focused research group meetings at which this project was begun.
Footnotes
 ^{1}To whom correspondence should be addressed. Email: egoldste{at}hsph.harvard.edu

Author contributions: E.G., J.D., J.M., J.B.P., D.J.D.E., and M.L. designed research; E.G., J.D., J.M., J.B.P., D.J.D.E., and M.L. performed research; E.G., J.D., J.M., J.B.P., D.J.D.E., and M.L. analyzed data; and E.G. and M.L. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/cgi/content/full/0902958106/DCSupplemental.

Freely available online through the PNAS open access option.
References
 ↵
 Brookmeyer R,
 Gail M
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Rogers SL
 ↵
 Eichel OR
 ↵
 Wallinga J,
 Teunis P
 ↵
 Wallinga J,
 Lipsitch M
 ↵
 ↵
 Anonymous (Sept 27, 1918)
 ↵
 Barry JM
 ↵
 Anonymous (Oct 2, 1918)
 ↵
 Crosby AW
 ↵
 Frost WH
 ↵
 ↵
 Bootsma M,
 Ferguson N
 ↵
 ↵
 ↵
 Davis LE,
 Caldwell GG,
 Lynch RE,
 Bailey RE,
 Chin TD
 ↵
 Carrat F,
 et al.
 ↵
 ↵
 Svensson A
 ↵
 ↵
 Dempster AP,
 Laird NM,
 Rubin DB
 ↵

 Lipsitch M,
 et al.

 Shepp LA,
 Vardi Y
Citation Manager Formats
Sign up for Article Alerts
Article Classifications
 Biological Sciences
 Medical Sciences