Estimation of multiple transmission rates for epidemics in heterogeneous populations

Cook et al. 10.1073/pnas.0706461104.

Supporting Information

Files in this Data Supplement:

SI Appendix
SI Table 3




SI Appendix: Additional Details of Methods

Bayesian Model Fitting

Following Gibson et al. (1), we fit the model by deriving the form for the likelihood function of the parameter vector (denoted by q) conditioned on the sets of nonobserved infection times (

) and sources of infection (s). The posterior distribution for q given the observed replicate data (D) is
with the region of integration being limited to the set of (t, s) consistent with D. Following ref. 2, the f (t, s | q) term is defined by:

,

where ls is the rate of infection exerted on s by the infecting source, ss. This follows from the definition of a Markov process (2). A detailed derivation for a related model with homogeneous hosts can be found in Gibson et al. (1).

It is preferable to use functional forms for fs(t) that have analytical integrals so that the survival terms above can readily be evaluated. Examples of such forms include the constant, exponentially declining and Weibull rates used in this paper.

The prior, f(q), is a subjective description of our beliefs regarding the parameters (q). We take relatively broad, independent priors, namely, exponential with mean 10 for all parameters except those for tertiary infection, which are exponential with mean 0.1.

To evaluate the 10,000-dimensional posterior, we use Markov chain Monte Carlo techniques (see ref. 3 for a review) to draw a sample from the joint distribution of (q, t, s) that then provides an estimate of the posterior distribution of q from which summary statistics were made (4, 5). Summary statistics are tabulated in SI Table 3. Proposed changes to the parameters are made through a Gaussian proposal kernel and accepted according to the Metropolis-Hastings algorithm; the kernel was centered on the current value and had variance chosen arbitrarily based on trial runs of the routine, with a view to making the chain mix well. Proposals to infection times also were accepted by dint of Metropolis-Hastings, with a uniform proposal distribution on the permissible range of infection times consistent with the set of sources of infection and the observed data. Proposals to the source of infection were made by a Gibbs step. A random scan algorithm was used to select parameters and hosts to propose changes to at a ratio of 120 parameter proposals to 10,000 time and source proposals.

Convergence of the chain was assessed by inspection of parameter and log-posterior trace plots and by running a further chain with different initial conditions for (q, t, s). A total of 25,000 iterations of the routine were performed and stored, each comprising 120 parameter and 10,000 time and source proposals. A further 1,000 such iterations were discarded as burn-in. Every 1,000 iterations took around an hour to perform on a modern desktop computer.

We infer the distribution of relevant functions of the parameters, such as b[hi, hs](t), from the posterior sample. Posterior distributions of sources of infection were similarly obtained to find average proportions of infection by transmission pathway.

The sample also was used to generate a posterior predictive distribution of the stochastic process through Monte Carlo simulation to explore hypothetical scenarios such as the effects of varying heterogeneity. Additionally, the predictive distribution of the daily increment in infection conditional on the existing (observed) pattern of disease was used to assess the goodness-of-fit of the model.

Model Selection

The deviance information criterion (DIC) was used to select the best model from a set of competing models with epidemiologically plausible time-varying functional forms for primary and secondary infection; it was also used to test for differences in secondary infection rates within and between species. The definition of the dic used here is given by ref. 6 (dic8), and it is obtained by first sampling the posterior before using the parameter means to estimate the augmented variables and hence the effective number of parameters of the model. A rule of thumb is to give consideration to models with a dic within 2 of the model with the lowest dic score (7); models with a dic between 2 and 10 of the best have less support, and those with a difference greater than 10 of the best have essentially no support.

Further tests for differences in parameters were carried out by using Bayesian hypothesis probabilities, by estimating Pr(hypothesis is true | data). The continuous independent priors used mean that, automatically, the posterior probability of parameters being equal is zero. We therefore consider the posterior probability that one parameter is greater than another: should this probability be close to zero or one it indicates there be a difference; should it not be near these extrema, we have insufficient evidence to declare which is greater. Note the difference in interpretation with classical hypothesis testing. As with classical hypothesis testing, the decision of what constitutes a "significant" result is arbitrary.

Analysis of Environmental Heterogeneity

Using the posterior distributions for the parameters, we analyze the effects of heterogeneity in the population structure at two spatial scales on the epidemic dynamics with and without spread between species. Large-scale heterogeneity is investigated by changing the proportion of favorable and unfavorable sites in the population, with random allocation of individuals throughout the population. Small-scale heterogeneity is characterized by a clustering parameter, K, defined as the proportion of neighboring pairs of hosts of the same type, which is varied by increasing the size of regular homogeneous clusters, each randomly allocated to host type such that a 50%:50% mixture is maintained. In all cases, 10,000 simulations of each scenario and spatial arrangement are used to generate the posterior predictive distribution of final disease levels, at an arbitrarily defined time, t = 20, after which further spread of the epidemic was negligible due to the decline in the transmission rates.

Robustness of Method

We assessed the ability of the approach to detect temporal changes to transmission rates for less frequently sampled systems by reducing the temporal detail of the data. This was done by discarding observations, from thirteen at daily intervals to three at times 4, 8, and 12. The effect is that the bounds on the infection time of each plant are broadened; both model and method remain the same. Posteriors are obtained and the temporal evolution of the primary and secondary plotted in Fig. 2.

This choice of observation times was made arbitrarily. It has been shown elsewhere how a small number of observation times can be chosen optimally for nonspatial, homogeneous systems, resulting in a substantial reduction in sampling costs with little loss of observation relative to intensively sampled populations (8). There is considerable scope and potential benefits for further work in this field.

1. Gibson GJ, Otten W, Filipe JAN, Cook A, Marion G, Gilligan CA (2006) Stat Comput 16:391-402.

2. Cox DR, Isham V (1980) Point Processes (Chapman & Hall, London).

3. Gilks WR, Richardson S, Spiegelhalter DJ, eds (1996) Markov Chain Monte Carlo in Practice: Interdisciplinary Statistics (Chapman & Hall, London).

4. Gibson GJ, Renshaw E (2001) Stat Comput 11:347-358.

5. O'Neill PD, Roberts GO (1999) J R Stat Soc A 162:121-129.

6. Celeux G, Forbes F, Robert CP, Titterington DM (2006) Bayesian Anal 1:651-674.

7. Spiegelhalter DJ, Best NG, Carlin BR, van der Linde A (2002) J R Stat Soc B 64:583-616.

8. Cook AR, Gibson GJ, Gilligan CA (2007) Biometrics, 10.1111/j.1541-0420.2007.00931.x.





Table 3. Posterior means and 95% credible intervals for the model parameters, fitted to 19 replicates of populations comprising different ratios of hosts favorable (F) and unfavorable (U) for damping-off infection caused by the soil-borne fungal plant pathogen, R. solani

 

Parameter

Posterior

Mean

95% CI

Primary transmission

a[F]

0.16

(0.11, 0.17)

a[U]

0.26

(0.20, 0.32)

r[F]

0.06

(0.01, 0.12)

r[U]

0.16

(0.11, 0.22)

Secondary transmission

b[F, F]

1.4

(1.3, 1.4)

b[F, U]

0.5

(0.4, 0.6)

b[U, F]

1.2

(0.9, 1.6)

b[U, U]

0.4

(0.4, 0.6)

κ[F, F]

3.5

(3.3, 3.8)

κ[F, U]

3.8

(2.9, 4.6)

κ[U, F]

2.9

(2.1, 3.6)

κ[U, U]

2.4

(1.7, 2.9)

μ[F, F]

9.3

(9.1, 9.5)

μ[F, U]

9.5

(9.0, 10.0)

μ[U, F]

10.2

(9.0, 13.1)

μ[U, U]

9.4

(8.3, 12.1)

Tertiary transmission

ε[F]

0.012

(0.011, 0.014)

ε[U]

0.004

(0.003, 0.005)

The time dependency of the transmission rates is given by a two-parameter exponential for primary infection and a three-parameter Weibull function for secondary infection.

This Article

  1. PNAS December 18, 2007 vol. 104 no. 51 20392-20397
  1. AbstractFree
  2. Figures Only
  3. Full Text
  4. Full Text (PDF)
  5. » Supporting Information