## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Inferring influenza dynamics and control in households

Edited by David Cox, Nuffield College, Oxford, United Kingdom, and approved June 2, 2015 (received for review December 7, 2014)

## Significance

Traditionally, the efficacy of household-based nonpharmaceutical interventions against influenza is measured by the household secondary attack rate based on observed individual epidemiological data. We present an alternative measure of intervention efficacy that accounts for unobserved transmission using Bayesian techniques. We apply our methods to data from a study of household interventions (face masks and improved hand hygiene) in Hong Kong. This paper describes advances in our understanding of an important disease system and also our statistical ability to synthesize modern data streams.

## Abstract

Household-based interventions are the mainstay of public health policy against epidemic respiratory pathogens when vaccination is not available. Although the efficacy of these interventions has traditionally been measured by their ability to reduce the proportion of household contacts who exhibit symptoms [household secondary attack rate (hSAR)], this metric is difficult to interpret and makes only partial use of data collected by modern field studies. Here, we use Bayesian transmission model inference to analyze jointly both symptom reporting and viral shedding data from a three-armed study of influenza interventions. The reduction in hazard of infection in the increased hand hygiene intervention arm was 37.0% [8.3%, 57.8%], whereas the equivalent reduction in the other intervention arm was 27.2% [−0.46%, 52.3%] (increased hand hygiene and face masks). By imputing the presence and timing of unobserved infection, we estimated that only 61.7% [43.1%, 76.9%] of infections met the case criteria and were thus detected by the study design. An assessment of interventions using inferred infections produced more intuitively consistent attack rates when households were stratified by the speed of intervention, compared with the crude hSAR. Compared with adults, children were 2.29 [1.66, 3.23] times as infectious and 3.36 [2.31, 4.82] times as susceptible. The mean generation time was 3.39 d [3.06, 3.70]. Laboratory confirmation of infections by RT-PCR was only able to detect 79.6% [76.5%, 83.0%] of symptomatic infections, even at the peak of shedding. Our results highlight the potential use of robust inference with well-designed mechanistic transmission models to improve the design of intervention studies.

- influenza dynamics
- nonpharmaceutical interventions
- household transmission
- Bayesian inference
- Markov chain Monte Carlo

The household offers an ideal setting to study the transmission dynamics of viral respiratory pathogens (1⇓⇓⇓–5) and, during periods of severe epidemics, to intervene and reduce the number of infections (6). Therefore, it is also the ideal setting in which to conduct trials of interventions designed to reduce infectivity and susceptibility. The known-index trial design has been used to measure the efficacy of different types of intervention in recent years, including nonpharmaceutical interventions (7⇓–9), antivirals (10), and vaccines (11⇓–13). In these studies, symptomatic individuals are recruited at a health care facility and asked if they—and potentially other members of their household—may want to participate in the trial. If the index agrees, biological samples are taken at that time in the clinic. Follow-ups normally occur in the household, with the first visit as soon after the recruitment of the index as possible. If other members of the household agree to participate, samples are taken at regular intervals after that first follow-up from the index and additional participating household members. Biological samples used in these studies include nasal or throat swabs, nasopharyngeal aspirates, and blood samples. Many different assays can be conducted on the samples (depending to some extent on the sample handling protocol), for example, rapid tests (14), RT-PCR (7, 15), and B-cell assays (16). Participants may also be asked to record symptoms in a diary or to report them over the phone.

The primary outcome measure for these trials is the household secondary attack rate (hSAR) (sometimes called secondary infection risk). The hSAR is most commonly defined as the proportion of nonindex household members who become cases, according to prespecified criteria, during the period of the study. Cases are usually defined in terms of either symptoms or virological outcome (e.g., PCR-confirmed infection), or sometimes both (7). Although significant reductions in hSAR between study arms are indicative of an effect, the amplitude of differences in hSAR can be difficult to interpret, partly because the statistic itself is dependent on the assays used and on the precise follow-up protocol. For example, criteria based on symptoms may fail to capture asymptomatic infections, and RT-PCR tests are sensitive to the frequency and timing of sampling. Also, the observed value of the hSAR in any specific household must be sensitive to the number of household members who participate, the precise timing of follow-up samples, and the pattern of any dropout.

Previous studies have analyzed the transmission dynamics of influenza in households by using household models and symptomatic data (3, 17), and also symptomatic data in conjunction with RT-PCR laboratory results (18). We defined a stochastic household transmission model, building on these works, that described the effect of interventions in reducing the daily hazard of infection, and estimated parameters of the model using Markov chain Monte Carlo (McMC) techniques (see *Materials and Methods* and *SI Text*).

## Results

### Study Data.

We analyze a superset of data from 322 households participating in a previously described known-index intervention study of influenza (7, 14). Three hundred and twenty-two of these were included in the primary analysis and randomly assigned to one of three groups: control (112 households), intervention with improved hand hygiene (HH, 106 households), or intervention with improved hand hygiene plus face masks (HH+FM, 104 households) (7). Each index case was initially screened by rapid test and confirmed using PCR. Nonindex household members were defined to be cases if they were PCR-confirmed on or after the second visit or if they showed at least two of the following signs and symptoms: temperature above 37.7 °C, cough, headache, sore throat, or myalgia (7, 19). Using traditional hSAR as the outcome for the primary analysis, a significant difference was observed between the control and HH arms, but only for a subset of households in which the index case attended clinic rapidly after the onset of symptoms (Fig. 1; see also Fig. S1) (7). Data from the 63 additional households included here could not be included in the primary analyses because either the index was not confirmed as infected by PCR (*n* = 16, despite being positive on rapid test) or because a household contact had RT-PCR-confirmed infection at the initial household visit (*n* = 47, defined to be coprimary) (7).

### Parameter Estimation and Validation.

We defined a process model for the transmission of influenza in a study household, with parameters that would allow us to make inference on the efficacy of interventions and underlying dynamics (see *Materials and Methods*). We sampled from the joint posterior distribution of number of infections, times of infection, and process parameters of interest (Table 1), conditional on the full set of study results and uninformative prior distributions. The posterior distributions of model parameters are shown in Fig. S2. With the modal posterior values for the model parameters, we simulated study outcomes using the precise study protocol for each household and obtained distributions of hSAR consistent with the observed data (Tables S1 and S2). Also, for validation, we used multiple sets of these pseudodata to successfully reestimate the modal process parameters (Table S3).

### Household Transmission Dynamics.

We estimated that, in this study, children were substantially more susceptible and infectious than were adults (Table 1). The infectiousness parameters for children and adults in the model are defined relative to the household size and are therefore somewhat difficult to assess directly. However, their ratio is easier to interpret, with children 2.29 [1.66. 3.23] times as infectious as adults and 3.36 [2.31, 4.82] times as susceptible. These results are broadly consistent with prior studies based only on symptomatic outcomes (2, 3, 20, 21).

The basic functional form of the infectiousness over time was assumed to be the log-normal density function truncated at day 10 (see *Materials and Methods*). Fig. *2A* shows how the inferred amplitude of infectiousness varies over the time since infection for households of size 4. It was necessary to consider a specific household size because we assumed that pair-wise infectiousness between individuals could vary as a function of household size, i.e., as the size of the household increased, the probability of infection between each possible susceptible–infectious pair was not constant. These infectiousness profiles contrast somewhat with previous results (17) based only on symptoms. Although ref. 17 and our results both suggest that infectiousness is highest near the day of symptoms, our estimated profiles exhibit a fatter tail than that in ref. 17.

To give a more intuitive description of the infectivity profiles, we also calculated the pairwise transmission probabilities (see *SI Text*, *Pairwise transmission probability*) of children and adults over the full period of their infectiousness, in the absence of interventions. Fig. 2 *B* and *C* shows the absolute and relative comparison between pairwise transmission probabilities of children and adults in different household sizes.

The generation time is defined as the expected delay between the infection of an infector and the infection of all their infectees across all infection types (22). Leveraging our ability to infer infection events, we were also able to estimate the approximate generation time commonly reported from household studies—the time between the infection of the index case and the infection of the secondary cases. We estimated this to be 3.39 d [3.06, 3.70] (Fig. S3), which was consistent with estimates in the literature for this and other strains of influenza A (23⇓–25). This estimate was also somewhat sensitive to the ratio of the sensitivity of RT-PCR testing between asymptomatic infections and symptomatic infections (see *Sensitivity Analysis*).

### Intervention Efficacy.

Intervention efficacy was modeled as the per day reduction of infectiousness and could take a different value in each of the three study arms. Using all available data, the efficacy in the HH group was estimated to be significantly different from 0 at 37.0% [8.3%, 57.8%] and, for HH+FM, was 27.2% [−4.6%, 52.3%] (Table 1). Although the reduction in infectivity was not significant for individual days, the cumulative effect reflected in the overall reduction in pairwise transmission probability was significant (Fig. 3).

By inferring the presence or absence of infections during the period of the study for all members of participating households (see *Materials and Methods*), we were able to compare the directly observed hSAR with an inferred hSAR. We estimated that only 61.7% [43.1%, 76.9%] of infections that occurred in households during the period of the study met the case criteria (see *Study Data*). This percentage was also somewhat sensitive to the ratio of the sensitivity of RT-PCR testing between asymptomatic infections and symptomatic infections (see *Sensitivity Analysis*). The underestimation was driven by variable timing of follow-up and also by variable sensitivity of the RT-PCR test, depending on the number of days since infection (26⇓–28).

When the data were stratified by the delay between symptom onset in the index case and the interventions (i.e., the speed of intervention), the inferred hSARs described a more coherent story than the crude hSARs, with, in particular, higher numbers of infections being inferred in the control arm than would have been expected from the observed hSARs for delays of 2 d or greater (Fig. 4). Although the structural assumptions implicit in our model (see *Discussion* and *Materials and Methods*) must have constrained the inferred infection events to some degree, it is encouraging that the two interventions had similar efficacy for each delay strata and that the pattern of increasing efficacy with reducing delay was consistent. We also note that intervention efficacy was not defined to be positive definite (see *SI Text*), so the model had the flexibility to explore parameter regimes where interventions increased the risk of transmission. The differences between inferred and observed hSAR were likely driven by stochastic variation in the timing and frequency of follow-up between households.

### RT-PCR Test Sensitivity.

RT-PCR is the gold standard laboratory method for confirming viral respiratory infections among symptomatic individuals. We estimated the peak level of RT-PCR test sensitivity in this field study to be 79.6% [76.5%, 83.0%] for symptomatic infections. This estimate is fundamentally different from previous estimates because it incorporates uncertainty about the true state of the individual, as well as the performance of the sampling protocol and assay themselves. Although the relative sensitivity of asymptomatic infections was not identifiable in this analysis and was assumed to be half that of symptomatic infections (29), key model parameters appeared to be robust toward these assumptions (see *Sensitivity Analysis*).

### Sensitivity Analysis.

We tested a number of our baseline assumptions and found few material differences in our results. In our baseline analyses, we assumed a Gamma distribution for the incubation period with mean 2 d and SD 0.2 d (30). Two alternative incubation periods were considered in the sensitivity analysis: a Weibull distribution with mean 1.48 d and SD 0.47 d (17) and a lognormal distribution with median 1.4 d and dispersion factor 1.51 d (31). These alternative assumptions for the incubation period produced results that were not materially different from the baseline results.

We also tested the robustness of assumptions about the ratio of the sensitivity of RT-PCR testing between asymptomatic infections and symptomatic infections, *Materials and Methods*) was assumed to be 0.001 in the baseline analysis (32, 33) and was varied in these sensitivity analyses. We tried a range of alternative values and found that higher values were associated with higher rates of asymptomatic infection. However, the values of other model parameters were not materially affected (see Figs. S4–S6).

The distribution of the generation time (last panels in the Figs. S4–S6) and the distribution of the percentage of infections meeting case criteria (Table S4) were also somewhat sensitive to the assumed value of ratio of

### Optimization of Study Design.

We used our estimated process model parameters to examine some specific issues around trial design. We designed a simulation experiment to explore how the timing of measurements (i.e., the timing of performing RT-PCR tests) may affect the outcome of the traditional analysis. Specifically, we answered the following question: If it was only possible to make a single visit to each household to evaluate an intervention (perhaps due to logistic constraints or cost-effectiveness considerations), what would be the best day to visit the household so that difference of number of infections (between intervention arms and control arm) may be more genuinely reflected by the traditional analysis? Based on the transmission-dynamic parameters estimated here, we simulated observations from our estimated model and assumed that only one home visit was carried out. To eliminate the effect due to the time of implementing the interventions, we only considered households with initial home visit within the same day as the symptoms onset of the index case. We found that a single sample at day 5 would have the highest expected difference in observed PCR-confirmed hSAR (Fig. 5). A single sample at day 4 would have a very similar expected difference between interventions and control but greater variance, suggesting that in day 4 or day 5 we may have the most “detectable” infections circulating in households.

## Discussion

We incorporated both viral shedding data and symptomatic data into a transmission model that allowed the estimation of the efficacy of interventions and key epidemiological parameters. Our analysis refines the primary study analyses in estimating a significant effect in one intervention arm using data from all households, rather than data only from a (prespecified) subset, highlighting opportunities to improve on traditional measures such as the observed hSAR (7). We showed that intervention efficacy can be more accurately captured using a disease-dynamic model coupled with rigorous statistical inference. Also, by inferring the number and timing of infections, the underlying transmission dynamics could also be described and the impact of variable timing of interventions in household assessed. Subject to structural biases, we argue that this approach extracts substantial additional information from known-index transmission studies than does traditional analysis reliant on the hSAR. More generally, our results highlight the potential use of robust inference with well-designed mechanistic transmission models to improve the design of intervention studies.

Compared with other nonpharmaceutical interventions, such as quarantine and social distancing, the use of face masks and improved hand hygiene are simple and impose less burden on those infected and their contacts. Our results have helped to reinforce earlier findings of substantial efficacy. Also, we suggest that defining the estimated efficacy as a per-day reduction in transmission gives a more interpretable measure and could be a useful quantity to communicate as part of an overall health protection message.

Our study has a number of limitations associated with the structured assumptions implicit in our model. First of all, we only estimated the average efficacy of improved hand hygiene and face masks by aggregating other heterogeneities such as adherence and age distribution among the households. Nonetheless, adherence to hand hygiene intervention (i.e., the main contributing intervention in our study) was similar to that reported in previous community studies (34⇓–36). Hence, although the estimated intervention efficacy should have varied with different adherence, the similarity of the adherence between our study and that in other community studies supports some generalization of our findings and the practicality of our conclusion.

We were not able to simultaneously estimate the reduction of absolute susceptibility and the reduction of infectivity due to interventions, as they were not identifiable with each other. Instead, to avoid this issue of identifiability, our model was parameterized by the relative susceptibility of children, and we estimated the reduction of infectivity. Children and adults were assumed to be subject to a common community infection rate, which may only represent a relatively crude average measure. However, as the study design aimed to recruit households that had infections mostly initiated by the index cases, the data may not be able to inform the community infection at a finer resolution. Nevertheless, the estimated common community infection rate appeared to be insignificant and hence was believed not have a significant impact (see Fig. S3).

A parallel analysis of these data using a different approach found evidence suggesting that aerosol (small droplets) transmission might be responsible for approximately half of all influenza transmission in households (37). Our results are consistent with this finding, because only a small to medium effect of face masks and hand hygiene would be expected given that they are thought to reduce transmission via large droplets and contact. Further extensions of our modeling framework could be considered in the future to account for different modes of transmission.

Although there is some evidence that children might have a higher level and a longer duration of shedding (23, 38) compared with adults, we assumed the same underlying function. Also, we only estimated the sensitivity of RT-PCR as a function of time since infection, and it was assumed to peak between 2 d and 5 d after infection (i.e., around the mean time of symptoms onset) (39). A more direct factor affecting the test sensitivity may be the viral shedding (26⇓–28). Future work in linking viral shedding explicitly with the test sensitivity could further refine the approach we have used here.

Our framework can be used to answer specific questions related to trial design as well as to conduct secondary analysis of existing data. Here, we illustrated this by estimating the best possible day for a single follow-up visit in an interventions trial. However, a more systematic trial design study with well-defined resources that may be constrained may well produce far more efficient protocols, perhaps varying by household size and age distribution.

## Materials and Methods

### Details of Data Collection.

In 2008, from 2 January through 30 September, 407 index patients with influenza-like illness with symptom onset in the previous 48 h, and who were positive for influenza A or B virus by QuickVue Influenza A

### Transmission Model.

We developed a stochastic model to jointly capture the study design, the transmission process, and the efficacy of the interventions. Household members were classified by their ages (i.e., children and adults) and were otherwise identical.

There were four major components to be explicitly modeled: (unobserved) infection times, symptom onset times, RT-PCR test results, and the intervention efficacy. Here, we describe the assumed processes related to each of these components. Each infected member was assumed to exhibit time-varying infectivity (i.e., hazard of infection) since infection, which was parameterized by its median *a* and mode *b*. Specifically, we used a log-normal density function truncated at day 10 to represent this infectivity profile; we denoted the effective median (i.e., half-life) of the infectivity profile after truncation as *j* by an infected member *k* was determined by coupling the infectivity profile with the household size and the age and time since infection of member *k*. The total hazard of infection exerted on member *j* at time *t* in the household was then taken to be the aggregated hazard of infection *t* in the household, i.e.,

We allowed a constant community infection hazard ρ common to children and adults on top of the within-household hazard. Interventions were assumed to reduce the magnitude of the infectivity profile by a constant proportion

A nonindex case has a probability *p* to be an asymptomatic infection. Given the infection time, onset time of a symptomatic case was determined by an assumed incubation period parameterized by α and γ. Also, for the robustness of the model, we had ϕ as a nuisance parameter representing the background constant rate of noninfection symptoms onset (32, 33). The RT-PCR testing results were assumed to depend on times of measurements, a (peak) test sensitivity ψ, and a test specificity *Q*. The test sensitivity was assumed to be a (step) function of the time since infection and to peak around the mean of the incubation period (also see *SI Text* for details). The test sensitivity of asymptomatic infections was assumed to be half that of symptomatic infections (see also different assumed values used in *Sensitivity Analysis*). Lastly, households were assumed to be independent given the sparse recruitment.

Events above were also described mathematically in *SI Text*.

### Bayesian Inference and McMC.

Let *Transmission Model*. Estimation for the model parameters was performed in the Bayesian framework (i.e., we estimated the parameters from the posterior distributions for the parameters). Denoting the observed data by *c* and *d* represented conservative lower and upper bound of *SI Text*.

### Hidden Infection Process.

Symptom onset and RT-PCR are not perfect indicators for an infection [e.g., the symptom onset might have a different etiology, and the sensitivity of RT-PCR is only high at the time of peak infectiousness (26⇓–28)]. To handle this uncertainty and to capture more accurately the underlying transmission dynamics, we required an algorithm that allows proper probabilistic transitions of a household member between the status of infected and noninfected. Specifically, we applied a reversible jump algorithm (41, 42) in which deletion (i.e., transit from infected to noninfected) and addition (i.e., transit from noninfected to infected) of an infection is allowed. More details on the inference of times on infection and how the process parameters and infection times were updated in the same algorithm are given in *SI Text*.

## SI Text

### Definition of Events and Likelihood Function.

Within each household, we denoted *n* was the total number of household members; *j* in the household, where *m* is the total number of times the test was carried out and the index case corresponds to *j*th column of

Because we assumed households were independent, we present likelihood for one household in this section. The joint likelihood for all households is straightforward after specifying the likelihood for a single household. Denoting *j* as*Transmission Model* in the main text were **S1** corresponded to likelihood contribution of infection events, symptoms onset times, RT-PCR test results, and the infection times of index cases, respectively. In the remainder of this section, detailed definition of these events and their exact mathematical representations are given.

#### Infectiousness over time.

We assumed that the infectiousness of an infectee varied according to the time since infection. Any infected person in the household contributed a hazard of infection, determined by a function*n*, and ε were the time since infection, the infectiousness parameter (it corresponded to the children when *i* = 1 and to the adults when *i* = 2), the household size, and the coefficient that explained the dependency between the infectiousness and household size, respectively [we assumed *a* and *b* was used, where *a* and *b* were median and mode, respectively, of the lognormal density function. We truncated this function at day 10 since infection; given the 7-d follow-up in the study (i.e., started from the first home visit for the household, normally 7 d follow up was given in which subsequent home visits were arranged and symptoms data were collected), we should not expect that we can estimate reliably the infectiousness at later days (e.g., 10 d) since infection. Note that *a* is merely a nominal half-life of the infectivity profile as we truncated the profile at day 10; we reported the (effective) half-life corresponding to the truncated profile

#### Relative susceptibility and intervention efficacy.

It was assumed that the infectiousness per day was reduced by a constant proportion equal to the intervention efficacy since the day of intervention applied. It should be noted that because the variation of adherence was not modeled, the interventions efficacies in our model represented average daily efficacies by aggregating the adherence.

Denote *j* after taking into account of the interventions. We had*j* at time *t* in the household and was taken to be the aggregated hazard of infection from other infected members at time *t* in the household, i.e.,*k* at *j* determined by coupling the infectivity profile with the age of individuals *k* and *j* (i.e., the heterogeneity of infectivity among different age groups), household size, and the time since infection of individual *k* (see also Eq. **S2** and *Materials and Methods*).

As a result, the conditional probability that individual *j* (excluding the index case) got infection at day *T* given the infection times of all other members was*j* = 1 for children; *j* = 2 for adults). We set

If individual *j* was an noninfectee, *Missing Data*) for the household.

In the majority of recruiting sites, the criterion for recruitment was a positive result for influenza A or B using the QuickVue rapid diagnostic test on a nose and throat swab. Therefore, it can be safely assumed that the index case was an infectee, and the distribution of infection time can be determined from the reported symptoms onset time together with an assumed distribution of the incubation period (i.e., the difference between time of infection and time of symptoms onset). See the last product in Eq. **S1**.

#### Pairwise transmission probability.

The integration of the infectiousness profile *q* up to day *T* from the expression

#### Symptoms onset.

We defined acute respiratory infection as follows: at least 2 symptoms out of fever (temperature ≥37.8 °C), cough, headache, sore throat, and myalgia (3, 4). Similar to the event of infection, we assumed that the hazard of having the symptoms onset of an infected person follows a function of time since infection, denoted as

#### Relation between hazard and incubation period.

Symptoms onset

#### Asymptomatic infection.

The conversion between the incubation period distribution and hazard function was only valid if the infected case was certain for having symptoms onset, but this was not always the case. Therefore, we introduced a parameter, *p*, to indicate the probability that a case was an asymptomatic infection.

As a result, the probability for an infected individual *j* to have symptoms onset at day *j*, therefore was equal to*j* had no reported symptoms onset,

#### RT-PCR test results.

RT-PCR was used for confirmation of infection. The symptoms onset and infection times characterized the complete transmission process. The testing data were conditionally independent of the transmission process but provided additional information for the inference of the transmission process.

One of the key elements in correctly incorporating the test results was the adoption of appropriate values of the sensitivity and specificity for the RT-PCR test. The test was, in general, of high sensitivity and specificity (8, 9). However, the sensitivity of the RT-PCR test was correlated to the time since infection and the viral load (10–13). We assumed a constant specificity 0.99 and estimated the sensitivity at disjoint time intervals since infection. Specifically, we had the sensitivity over the time since infection

Denoting *j* having positive and negative test results, respectively, from *k*th test at the testing day

Therefore, we had the probability of observing the test results, *j*

### Missing Data.

In the symptoms diaries collected, we found incomplete records. Also, the collection of specimens for RT-PCR was not complete. The missing patterns in both data were not regular. For example, within a household, we might find that one of the household members had an entry missing only on one particular day during the 7-d follow-up period but other members had the complete record. This induced extra difficulty in setting up a fixed timeframe for the transmission process. Therefore, herein we used a minimally complete approach such that we truncated the data of the symptoms record up to the point where every member in the household had complete records. This was equivalent to shortening the study period of a particular household with incomplete records to avoid the missing data issue. About 12% of the households had the data truncated, and the mean of the number of days of data truncated was 2 d. Because the RT-PCR testing was a conditionally independent process from the transmission, we could easily discard the missing records. Also, the records of RT-PCR test results that were obtained beyond the fixed timeframe determined by the symptoms record were discarded to retain the consistency.

### McMC Algorithm.

The objective of the McMC sampling was to simulate from the joint posterior distribution of model parameters and the unobserved data, which can be represented as

The inference of the parameters in the model was challenging mainly due to the presence of unobserved infection process, and the specification of the likelihood function depends on the correct distinction between infectee and noninfectee. However, symptom onset and RT-PCR were not perfect indicators for an infection [e.g., the symptom onset might have a different etiology, and the sensitivity of RT-PCR is only high at the time of peak infectiousness (10–13)]. To handle this uncertainty and to capture more accurately the underlying transmission dynamics, we required an algorithm that allows proper probabilistic transitions of a household member between the status of infected and noninfected. Specifically, we required the ability to apply a reversible jump algorithm (14, 15) in which deletion (i.e., transit from infected to noninfected) and addition (i.e., transit from noninfected to infected) of an infection is allowed.

First, we performed a single-component Metropolis–Hastings algorithm (16) to sample the model parameters that include *a*, *b*, *p*, *i*) the infection times of nonindex cases; (*ii*) the infection times of the index cases; (*iii*) *a* and *b*, the median and mode parameter of the infectiousness profile; (*iv*)*p*, *v*) apply a reversible jump algorithm in which one infection is added or deleted from the population.

Moves *i* and *ii* were performed in each household, and the samples were pooled. Without the concern of the distinction between infected and noninfected, a conventional algorithm was to randomly choose an infectee and then uniformly sample an infection time from the support, i.e., the infection time would be drawn from

Move *ii* was carried out in a manner similar to that in move *i*. As the index case was assumed to be infected, there was no distinction problem. At each iteration, a new infection time was proposed by drawing from *i*. Move *iii* and move *iv* followed random walk Metropolis–Hastings algorithm in which symmetric proposal densities were used.

We used noninformative uniform priors for individual parameters with reasonably specified lower and upper bounds. Specifically, we had *a* and *b*. We had *p* and ψ, as they were probabilities. Finally, we had

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. Email: msylau{at}princeton.edu.

Author contributions: M.S.Y.L., B.J.C., and S.R. designed research; M.S.Y.L. and S.R. performed research; M.S.Y.L. analyzed data; and M.S.Y.L., B.J.C., A.R.C., and S.R. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1423339112/-/DCSupplemental.

Freely available online through the PNAS open access option.

## References

- ↵
- ↵
- ↵
- ↵.
- Longini IM Jr,
- Koopman JS,
- Monto AS,
- Fox JP

- ↵.
- Longini IM Jr,
- Koopman JS,
- Haber M,
- Cotsonis GA

- ↵
- ↵
- ↵
- ↵
- ↵.
- Halloran ME,
- Hayden FG,
- Yang Y,
- Longini IMJ Jr,
- Monto AS

- ↵.
- Longini IMJ Jr,
- Halloran ME,
- Nizam A,
- Yang Y

- ↵
- ↵
- ↵
- ↵.
- Lambert SB, et al.

- ↵.
- Wrammert J, et al.

- ↵
- ↵.
- Papenburg J, et al.

- ↵.
- Monto AS, et al.

- ↵
- ↵
- ↵.
- Wallinga J,
- Lipsitch M

- ↵.
- Carrat F, et al.

- ↵
- ↵
- ↵
- ↵.
- Lee N, et al.

- ↵
- ↵.
- Lau LL, et al.

- ↵
- ↵
- ↵.
- Birrell PJ, et al.

- ↵
- ↵.
- Sandora TJ, et al.

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Green PJ

- ↵.
- Gibson GJ,
- Renshaw E

## Citation Manager Formats

## Sign up for Article Alerts

## Article Classifications

- Biological Sciences
- Medical Sciences