Previous Article |
Table of Contents
| Next Article
POPULATION BIOLOGY
Using conservation of pattern to estimate spatial parameters from a single snapshot



*Mathematics Institute and Department of Biological Sciences, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, United Kingdom;
Statistical Laboratory, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Road, Cambridge CB3 0WB, United Kingdom; and
Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge CB2 3EA, United Kingdom
Edited by Burton H. Singer, Princeton University, Princeton, NJ, and approved April 29, 2004 (received for review January 15, 2004)
| Abstract |
|---|
|
|
|---|
epidemic | spatial spread | parameter estimation | foot-and-mouth disease | citrus tristeza virus
Traditionally, spatial parameter estimation has been achieved by three main methods. The first method is through observation or experiment, in which the spatial kernel (which measures how the action or interaction of organisms changes with distance) is measured directly (19, 20). Although such practical methods are intuitively appealing, they are generally difficult to implement, labor intensive, and time consuming. A second method is to study the expansion around a single focus and, hence, impute the rate and form of spread (5); however, this technique can only be applied to a limited number of situations. Finally, in recent years, Markov Chain Monte Carlo (MCMC) analysis has been used to estimate parameters from spatial data (21-23). This technique analyzes the set of plausible transition pathways between two spatial configurations; hence, it requires two snapshots (or restrictive assumptions about initial conditions) and is frequently computationally intensive.
Here we outline an alternative approach that only requires a single snapshot and relies on conservation of the basic spatial patterns through time. The intuitive concept underlying this approach is to find a set of parameters that minimize the expected rates of change to the spatial pattern, which assumes that the spatial snapshot provides a balanced representation of the full dynamics, that the spatial domain is homogeneous, the spatial spread is not limited by boundaries, and the parameters and mechanisms have remained constant during the formation of the spatial pattern. These limitations are explored more fully in Discussion. The technique is tested by using simple lattice-based data generated by computer simulation with known parameters and for epidemiological field data for the spread of citrus tristeza virus in a plantation and for the recent foot-and-mouth epidemic in the U.K.
| Estimation Approach |
|---|
|
|
|---|
As a representation of spatial pattern we measure the spatial pairs, PXY(d), equal to the number of sites of types X and Y that are a distance d apart (where d is rounded to the nearest integer for convenience). Intuitively, this measure is of the spatial correlation; such pair-wise measures are commonly used in ecology to provide a simple estimate of the degree of spatial structure. Because the lattice is dynamic, the spatial pairs are continually changing as events, such as birth, death, or infection, happen. For every point (i) in the lattice and every possible event (Ei) that can occur there, the event will cause a change to a wide range of spatial pairs [
Ei PXY(d)]. For example, a single infection event will cause the loss of pairs that contain a susceptible individual at a range of spatial separations and the gain of pairs that contain an infected site.
The rate of change of these spatial pairs and, hence, the change in the spatial pattern can be calculated by incorporating the rate at which each event is likely to occur. Thus, more frequent events have more of an impact on the expected rates of change:
![]() | 1 |
If our assumption of statistical equilibrium is correct, then the rates of change of all pairs at all distances should be very small for the true spatial parameters (see supporting information, which is published on the PNAS web site). We therefore seek the set of parameters that minimizes these rates of change, thereby maintaining the spatial pattern. In particular, we wish to minimize the total deviation from zero,
:
![]() | 2 |
where
XY is a normalizing constant such that all distances and pair types contribute equally (see supporting information). In general, this constant is given by
![]() | 3 |
where N(d) is the total number of pairs at a given distance on an infinite lattice, X and Y refer to the density of sites of type X and Y, and
is the standard deviation in the rate of change of X and provides a measure of the fluctuations expected (see supporting information). For epidemiological problems, for which the dynamics are a simple progression from susceptible through infected to recovered, the normalization constant can be approximated as
![]() | 4 |
The N(d) term assures that local correlations play an equal role to more distant correlations, although there are more combinations of pairs that are separated by longer distances. Moreover, very long distances, such as those between opposite corners of the lattice, play a very minor role. The second term in the denominator takes into account the fact that more common pair types will necessarily be associated with the highest rates of change.
The minimization of
can be calculated with considerable efficiency. The quantity
Ei PXY(d) is parameter-independent; therefore, the time-consuming step of calculating these values only needs to be performed once. After the initial calculation of these conserved quantities has been performed, calculating the deviation (
) for each new set of parameters is relatively quick and scales linearly with both the number of possible events and the number of distance intervals considered. The following times correspond to calculations made with a standard 2.4-GHz personal computer: For a small dataset with a 36 x 28 lattice (see Fig. 3),
700 deviations can be calculated per minute; even for very large datasets with a 200 x 200 lattice (see Fig. 2) and
140,000 farms (see Fig. 4), the calculation only drops to
16 deviations per minute.
|
|
|
![]() | 5 |
|
The most likely candidate sites (Fig. 1A, darkest gray) agree with our intuitive expectations, preserving both the general aggregation and local density dependence. Although simple and rapid to implement, this predictive ability has strong practical benefits. In terms of disease or pest management, those regions identified as the most likely sites for imminent colonization should be the principal objectives of further investigation or control. This use of model prediction for targeting control efforts could significantly increase their efficiency compared with simple nontargeted approaches (18).
Estimation During Invasion. Although the statistical equilibrium assumption is likely to be true for many ecological systems and endemic diseases, those of greatest importance are often in flux. In particular, it is often vital to estimate the spatial spread of an invading organism or disease. In such situations, although the proportion of invaders, I, may be rapidly increasing, the conditional pairs (defined as PIX(d)/I; the expected number of X sites a distance d away from each invader I) remain constant throughout much of the invasion (25-27). This phenomenon is illustrated in the supporting information, which shows that from the early stages of invasion, the local pattern around each invader shows little variation. Intuitively, this lack of variation occurs because each invader rapidly sets the correlations within its own local environment, and so the shape of the invading wave-front generally changes slowly as it advances.
In this invasive situation, we therefore seek to minimize the deviation,
, associated with expected rates of change in such conditional measures:
![]() | 6 |
Intuitively, minimizing the derivation conserves the nature of the invading wave front and the pattern that has developed behind it.
| Estimation from Disease Data |
|---|
|
|
|---|
The spatial disease model takes the form of a cellular automata; a type of model that has been extensively used in the study of spatial dynamics (1, 4, 24). We begin with a square lattice of sites, where each site represents a single individual. To mimic the dynamics of disease, each site can be in one of three states: susceptible, infectious, or recovered. Susceptible individuals who are a distance d from an infectious site catch the disease at rate K(d) and become infectious themselves. K is the transmission kernel and measures how the risk of infection changes with distance; in our example we assume this kernel has a Gaussian form, K(d) = Pexp(-d2/2V). When there are multiple infectious sites, the transmission rates are simply added together. Infectious individuals are assumed to recover at rate 1; this sets the basic time scale of the simulation, so that all rates are effectively measured per infectious period. Recovered individuals become susceptible at rate
= 0.05, which can either be considered as waning immunity or demographic replacement by newly susceptible individuals. This model leads to a susceptible-infectious-recovered-susceptible framework (28) in which transmission is predominantly local but replenishment of susceptible individuals is at random.
We now utilize this lattice-based model of disease spread to examine how well the estimated parameters match the known values used in the simulation. The initial invasion occurs as a spreading wave (Fig. 2A), which clearly is not at statistical equilibrium, although the values of the conditional pairs quickly asymptote (see supporting information). It is assumed that the infectious period is known, which sets the basic time scale of the interactions. In addition, by fixing the recovery period to the value used in the simulation (
= 0.05), the deviation in conditional pairs,
, associated with different Gaussian transmission kernels can be calculated (Fig. 2B). The minimum deviation occurs at P
0.1686 and V
4.9989, which is in remarkable agreement with the actual parameter values (P = 0.16 and V = 5; Fig. 2B, yellow dot). The diagonal line of minimal values corresponds to kernels with the same basic reproductive ratio, R0; thus, small errors are unlikely to change our estimate of this most important epidemiological quantity. The optimal Gaussian kernels (which minimize the deviation) for this and two other snapshots are shown in Fig. 2 C-E. These best-fit results (red) are in close agreement with the actual kernels used in the simulations (black). For these three simulation snapshots, estimates of the full set of parameters, including the recovery period, are shown in Table 1; additional results are given in supporting information.
|
, against the average dispersal distance; the power-law distribution has the clearest and lowest minimum, suggesting that this kernel is the best fit to the data.
The optimal power-law scaling of
-1.35 is in close agreement with the value previously identified by MCMC analysis with two snapshots from consecutive years (22); however, our technique can elucidate transmission mechanisms in the first season without the need for a second sample. The optimal power-law kernel identified (Fig. 3C) shows a significant amount of transmission over longer distances. The "fat-tailed" nature of this distribution has strong implications for the long-term growth of this disease and its spread to other orchards (1, 5, 7). However, without detailed data about longer-range behavior, the precise nature of the tail is largely based on extrapolation from the more local dynamics. The other best-fit kernels show little variation with distance and, hence, would predict almost simple mass-action transmission. The results from all three kernels imply that a significant amount of long-distance spread occurs, which suggests that control measures should also be spread over a wide area and not simply focused on nearby trees.
Foot-and-Mouth Disease. Finally, we consider the data available during the early stages of the 2001 foot-and-mouth epidemic in the U.K., which illustrates the power of the methodology. The foot-and-mouth epidemic had a devastating impact on the farming industry and on the U.K. economy in general. From the early stages, spatial modeling played an important role in understanding the observed dynamics and assessing the likely impact of control measures (30, 31). However, in the early stages of the epidemic, the degree of spatial spread and even the reproductive ratio of the disease was unclear (11). By using conservation of the conditional pairs, we examine snapshots of the reported case data from March 2001. Each snapshot consists of the farms reporting infection in the previous week, which introduces two temporal delays: The first is the 9-day delay from infection to reporting; the second is the week's worth of reported data that is used in each snapshot. The exact spatial location of the farm is used in this estimation, but the conditional pairs information was aggregated into 1-km intervals.
Fig. 4 B and C shows the position of reported farms within the U.K. and the number reported on each day; these results are color-coded by day of reporting. During early March the disease is found in several small, widely disseminated clumps, whereas by late March the disease is clearly aggregated in three key areas (Cumbria, Devon, and the Welsh borders). The estimation procedure reflects this trend (Fig. 4A). In early March, the low local density of reported farms leads to a low estimation of the reproductive ratio, R, and a fairly compact transmission kernel. In part, these results are because the observed pattern is caused by two separate processes, with infections arising before the movement ban on February 23 being much more widely disseminated.
By mid-March the spatial pattern is predominantly defined by the post movement ban transmission kernel, and hence reported farms are highly aggregated in concentrations of high density. The estimated parameters from this time period are in good agreement with the reproductive ratio and spatial spread observed during the epidemic. Fig. 4D shows the power-law transmission kernels estimated from reported cases for the week ending on March 10, 20, and 30. These kernels are compared to the transmission kernel used in more sophisticated spatio-temporal models (17), for which the shape of the kernel is derived from tracing the most likely routes of infection. Clearly, the estimation technique is correctly identifying both the magnitude and the spatial scale of the infection process. Thus, although the detailed temporal resolution of the 2001 foot-and-mouth disease data allows parameter estimation by means of more standard techniques, our methodology produces comparable results and is far more computationally efficient (taking only 2-4 seconds to calculate the deviation for each parameter set). In addition, this technique can estimate parameters from any point in the epidemic without having to know the history, which could be a significant advantage if the epidemic has been spreading unnoticed for some time.
| Discussion |
|---|
|
|
|---|
Traditional methods of estimating spatial parameters have usually required direct observation of the processes in question, which is generally labor intensive. Recently, MCMC analysis has proven to be an increasingly popular tool for estimating parameters within the biological sciences, and this technique can also be successfully adapted to use spatial data. When appropriate multiple spatial data are available, then MCMC analysis is frequently the method of choice. However, given the scarcity and cost of obtaining multiple spatial samples and the computational overheads associated with MCMC, the efficient estimation technique given here has many practical advantages. Our estimation technique can also be used with multiple snapshots by simply minimizing the deviation
across all snapshots.
Parameterization from three distinct spatial snapshots (the disease model, citrus tristeza virus in an orchard, and foot-and-mouth disease within the U.K.) has shown the power and robustness of this technique. The results from all three snapshots are in remarkable agreement with the known values or the parameters already derived by more complex and intensive estimation methods. The data from the two real epidemics demonstrate the practical application of this technique, enabling us to predict the degree of spatial spread without the need for detailed temporal data. The citrus tristeza virus dynamics show a high proportion of long-range infection events, with a transmission kernel that decays slowly with distance; thus, such treatments should be applied throughout the orchard and even further a field. In contrast, estimates from the foot-and-mouth data predict very localized transmission kernel; therefore, control should be targeted toward farms in the local vicinity. The foot-and-mouth data also highlight two difficulties with our estimation technique; we have implicitly assumed that the pattern is caused by a consistent transmission process and that the infection is free to spread without bounds. However, the implementation of a movement ban on February 23 (11, 17) that shortened the transmission kernel and the finite shape of the U.K., which constrains the early long-range spread of infection, limits the accuracy of the estimation during the initial phase of the epidemic.
Throughout this study, it has been assumed that the snapshot provides a comprehensive picture of the spatial dynamics. However, this assumption can be violated in several ways described below.
Multiple Seeds. If there is no temporal variability, then invasions are generally rare, so experiencing two independent invasions is very unlikely and invasions at a single point are the norm. When there are two (or more) independent seeds, our estimation method can experience difficulties because of correlations between the invading wave fronts; although, as shown in the supporting information, problems only arise when the wave fronts get particularly close.
Boundary Effects. In some applied situations, the spread of a disease or an invading organism is prevented by natural boundaries. For example, the spread of West Nile Virus in the U.S. from its initial outbreak in New York State was predominately westerly (12) because the Atlantic formed a barrier to the east. Similarly, rivers can act as partial boundaries to the spread of rabies (32) that will again disrupt the spatial pattern. In such cases the inclusions of points close to the boundary may bias the estimation process (see supporting information).
Spatial Heterogeneities. Strong spatial heterogeneities in the parameters pose a significant problem for any estimation technique. For our methodology to determine the spatial heterogeneity in parameters, correlations would have to be specified in terms of distance, direction, and location. This increase in dimension makes the methodology unworkable. However, as shown in the supporting information, the aggregate results from the standard approach are still appropriate even with moderate amounts of heterogeneity.
We conclude that the method will fail when, for example, data are included close to a natural boundary, but overall our analyses show that, although the basic assumptions are not always valid, the estimation technique appears to be robust to moderate spatial heterogeneities and multiple seeds before coalescence of invading wave fronts.
Throughout this work we have focused exclusively on estimating parameters for the spread of diseases by using presence-absence data. Although this application is clearly one of the most important of this technique, there are a variety of other uses, from the invasion of unwanted species to understanding the spatial interactions of competing organisms. The study of disease dynamics at this scale is simplified as the dynamics at each location undergo clear successional changes, and there is only the one spatial kernel that defines the transmission of infection. When the dynamics at each location are more complex, such as in multispecies ecological models or when prevalence data are used, then the normalization constant
will need to be redefined (see supporting information), and more than one defining time scale (compared with the infectious period) may be required. For ecological situations, such as the spread of plants or animals, at least three spatial kernels may be needed: defining effects of competition on the death rate, competition on fecundity, and the dispersal to new locations. When multiple interacting species are involved, the number of kernels increases quadratically, causing additional difficulties because of the number of parameters that require estimation. This generic feature of such ecological systems is not specific to the technique developed here; however, it does illustrate the difficulties that can be encountered when the rapid estimation of parameters is essential.
When dealing with an invading organism or disease, delays can often prove costly. Because of the rapid exponential growth of such invaders, control measures should be implemented as soon as possible to maximize their effect. The recent outbreaks of foot-and-mouth in the U.K. (17) and severe acute respiratory syndrome worldwide (13) have reemphasized this epidemiological rule. However, control measures (such as culling farms at risk of foot-and-mouth, mass quarantine against severe acute respiratory syndrome, or vaccination of individuals against small-pox) often have high associated costs. Therefore, although it is imperative that control measures are strong enough to eradicate infection, they should not be so intensive that their effects are detrimental. Hence, the ability to ascertain the level of spatial spread and therefore optimize control measures from the earliest set of data has substantial benefits.
| Acknowledgements |
|---|
| Footnotes |
|---|
Abbreviation: MCMC, Markov Chain Monte Carlo.
To whom correspondence should be addressed. E-mail: m.j.keeling{at}warwick.ac.uk.
© 2004 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
A. R. Cook, W. Otten, G. Marion, G. J. Gibson, and C. A. Gilligan Estimation of multiple transmission rates for epidemics in heterogeneous populations PNAS, December 18, 2007; 104(51): 20392 - 20397. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||