Inference for nonlinear dynamical systems
See allHide authors and affiliations

Edited by Lawrence D. Brown, University of Pennsylvania, Philadelphia, PA, and approved September 21, 2006 (received for review April 19, 2006)
Abstract
Nonlinear stochastic dynamical systems are widely used to model systems across the sciences and engineering. Such models are natural to formulate and can be analyzed mathematically and numerically. However, difficulties associated with inference from timeseries data about unknown parameters in these models have been a constraint on their application. We present a new method that makes maximum likelihood estimation feasible for partiallyobserved nonlinear stochastic dynamical systems (also known as statespace models) where this was not previously the case. The method is based on a sequence of filtering operations which are shown to converge to a maximum likelihood parameter estimate. We make use of recent advances in nonlinear filtering in the implementation of the algorithm. We apply the method to the study of cholera in Bangladesh. We construct confidence intervals, perform residual analysis, and apply other diagnostics. Our analysis, based upon a model capturing the intrinsic nonlinear dynamics of the system, reveals some effects overlooked by previous studies.
State space models have applications in many areas, including signal processing (1), economics (2), cell biology (3), meteorology (4), ecology (5), neuroscience (6), and various others (7–9). Formally, a state space model is a partially observed Markov process. Realworld phenomena are often well modeled as Markov processes, constructed according to physical, chemical, or economic principles, about which one can make only noisy or incomplete observations.
It has been noted repeatedly (1, 10) that estimating parameters for state space models is simplest if the parameters are timevarying random variables that can be included in the state space. Estimation of parameters then becomes a matter of reconstructing unobserved random variables, and inference may proceed by using standard techniques for filtering and smoothing. This approach is of limited value if the true parameters are thought not to vary with time, or to vary as a function of measured covariates rather than as random variables. A major motivation for this work has been the observation that the particle filter (9–13) is a conceptually simple, flexible, and effective filtering technique for which the only major drawback was the lack of a readily applicable technique for likelihood maximization in the case of timeconstant parameters. The contribution of this work is to show how timevarying parameter algorithms may be harnessed for use in inference in the fixedparameter case. The key result, Theorem 1, shows that an appropriate limit of timevarying parameter models can be used to locate a maximum of the fixedparameter likelihood. This result is then used as the basis for a procedure for finding maximum likelihood estimates for previously intractable models.
We use the method to further our understanding of the mechanisms of cholera transmission. Cholera is a disease endemic to India and Bangladesh that has recently become reestablished in Africa, south Asia, and South America (14). It is highly contagious, and the direct fecal–oral route of transmission is clearly important during epidemics. A slower transmission pathway, via an environmental reservoir of the pathogen, Vibrio cholerae, is also believed to be important, particularly in the initial phases of epidemics (15). The growth rate of V. cholerae depends strongly on water temperature and salinity, which can fluctuate markedly on both seasonal and interannual timescales (16, 17). Important climatic fluctuations, such as the El Niño Southern Oscillation (ENSO), affect temperature and salinity, and operate on a time scale comparable to that associated with loss of immunity (18, 19). Therefore, it is critical to disentangle the intrinsic dynamics associated with cholera transmission through the two main pathways and with loss of immunity, from the extrinsic forcing associated with climatic fluctuations (20).
We consider a model for cholera dynamics that is a continuoustime version of a discretetime model considered by Koelle and Pascual (20), who in turn followed a discretetime model for measles (21). Discretetime models have some features that are accidents of the discretization; working in continuous time avoids this, and also allows inclusion of covariates measured at disparate time intervals. Maximum likelihood inference has various convenient asymptotic properties: it is efficient, standard errors are available based on the Hessian matrix, and likelihood can be compared between different models. The transformationinvariance of maximum likelihood estimates allows modeling at a natural scale. Nonlikelihood approaches typically require a variancestabilizing transformation of the data, which may confuse scientific interpretation of results. Some previous likelihoodbased methods have been proposed (22–25). However, the fact that nonlikelihoodbased statistical criteria such as least square prediction error (26) or gradient matching (27) are commonly applied to ecological models of the sort considered here is evidence that likelihoodbased methods continue to be difficult to apply. Recent advances in nonlinear analysis have brought to the fore the need for improved statistical methods for dealing with continuoustime models with measurement error and covariates (28).
Maximum Likelihood via Iterated Filtering
A state space model consists of an unobserved Markov process, x_{t} , called the state process and an observation process y_{t}. Here, x_{t} takes values in the state space ℝ^{dx}, and y_{t} in the observation space ℝ^{dy}. The processes depend on an (unknown) vector of parameters, θ, in ℝ^{dθ}. Observations take place at discrete times, t = 1, …, T; we write the vector of concatenated observations as y_{1:T} = (y_{1}, …, y_{T}); y_{1:0} is defined to be the empty vector. A model is completely specified by the conditional transition density f(x_{t}x_{t−1}, θ), the conditional distribution of the observation process f(y_{t}y_{1:t−1}, x_{1:t}, θ) = f(y_{t}x_{t}, θ), and the initial density f(x_{0}θ). Throughout, we adopt the convention that f(··) is a generic density specified by its arguments, and we assume that all densities exist. The likelihood is given by the identity f(y_{1:T}θ) = ∏_{t=1} ^{T} f(y_{t}y_{1:t−1}, θ). The state process, x_{t} , may be defined in continuous or discrete time, but only its distribution at the discrete times t = 1, …, T directly affects the likelihood. The challenge is to find the maximum of the likelihood as a function of θ.
The basic idea of our method is to replace the original model with a closely related model, in which the timeconstant parameter θ is replaced by a timevarying process θ_{t}. The densities f(x_{t}x_{t−1}, θ), f(y_{t}x_{t}, θ), and f(x_{0}θ) of the timeconstant model are replaced by f(x_{t}x_{t−1}, θ_{t−1}), f(y_{t}x_{t}, θ_{t}), and f(x_{0}θ_{0}). The process θ_{t} is taken to be a random walk in ℝ^{dθ}. Our main algorithm (Procedure 1 below) and its justification (Theorem 1 below) depend only on the mean and variance of the random walk, which are defined to be
In practice, we use the normal distributions specified by Eq. 1 . Here, σ and c are scalar quantities, and the new model in Eq. 1 is identical to the fixedparameter model when σ = 0. The objective is to obtain an estimate of θ by taking the limit as σ → 0. Σ is typically a diagonal matrix giving the respective scales of each component of θ; more generally, it can be taken to be an arbitrary positive–definite symmetric matrix. Procedure 1 below is standard to implement, as the computationally challenging step 2(ii) requires using only well studied filtering techniques (1, 13) to calculate for t = 1, …, T. We call this procedure maximum likelihood via iterated filtering (MIF).
Procedure 1. (MIF)

Select starting values θ̂^{(1)}, a discount factor 0 < α < 1, an initial variance multiplier c^{2} , and the number of iterations N.

For n in 1, …, N

Set σ_{n} = α^{n−1}. For t = 1, …, T, evaluate θ̂_{t} ^{(n)} = θ̂_{t}(θ̂^{(n)}, σ_{n}) and V_{t,n} = V_{t}(θ̂^{(n)}, σ_{n}).

Set θ̂^{(n+1)} = θ̂^{(n)} + V_{1,n} Σ_{t=1} ^{T} V_{t,n} ^{−1}(θ̂_{t} ^{(n)} − θ̂_{t−1} ^{(n)}), where θ̂_{0} ^{(n)} = θ̂^{(n)}.


Take θ̂^{(N+1)} to be a maximum likelihood estimate of the parameter θ for the fixed parameter model.
The quantities θ̂_{t} ^{(n)} can be considered local estimates of θ, in the sense that they depend most heavily on the observations around time t. The updated estimate is a weighted average of the values θ̂_{t} ^{(n)} , as explained below and in Supporting Text, which is published as supporting information on the PNAS web site. A weighted average of local estimates is a heuristically reasonable estimate for the fixed “global” parameter θ. In addition, taking a weighted average and iterating to find a fixed point obviates the need for a separate optimization algorithm. Theorem 1 asserts that (under suitable conditions) the weights in Procedure 1 result in a maximum likelihood estimate in the limit as σ → 0. Taking a weighted average is not so desirable when the information about a parameter is concentrated in a few observations: this occurs for initial value parameters, and modifications to Procedure 1 are appropriate for these parameters (Supporting Text).
Procedure 1, with step 2(i) implemented using a sequential Monte Carlo method (see ref. 13 and Supporting Text), permits flexible modeling in a wide variety of situations. The methodology requires only that Monte Carlo samples can be drawn from f(x_{t} x_{t−1}), even if only at considerable computational expense, and that f(y_{t} x_{t}, θ) can be numerically evaluated. We demonstrate this below with an analysis of cholera data, using a mechanistic continuoustime model. Sequential Monte Carlo is also known as “particle filtering” because each Monte Carlo realization can be viewed as a particle's trajectory through the state space. Each particle filtering step prunes particles in a way analogous to Darwinian selection. Particle filtering for fixed parameters, like natural selection without mutation, is rather ineffective. This explains heuristically why Procedure 1 is necessary to permit inference for fixed parameters via particle filtering. However, Procedure 1 and the theory given below apply more generally, and could be implemented using any suitable filter.
Example: A Compartment Model for Cholera
In a standard epidemiological approach (29, 30), the population is divided into disease status classes. Here, we consider classes labeled susceptible (S), infected and infectious (I), and recovered (R^{1}, …, R^{k} ). The k recovery classes allow flexibility in the distribution of immune periods, a critical component of cholera modeling (20). Three additional classes B, C, and D allow for birth, cholera mortality, and death from other causes, respectively. S_{t} denotes the number of individuals in S at time t, with similar notation for other classes. We write N_{t} ^{SI} for the integervalued process (or its realvalued approximation) counting transitions from S to I, with corresponding definitions of N_{t} ^{BS}, N_{t} ^{SD} , etc. The model is shown diagrammatically in Fig. 1. To interpret the diagram in Fig. 1 as a set of coupled stochastic equations, we write
The population size P_{t} is presumed known, interpolated from census data. Transmission is stochastic, driven by Gaussian white noise In Eq. 3 , we ignore stochastic effects at a demographic scale (infinitesimal variance proportional to S_{t}). We model the remaining transitions deterministically Time is measured in months. Seasonality of transmission is modeled by log(β _{t} ) = Σ _{k=0} ^{5} b_{k}s_{k}(t), where {s_{k}(t)} is a periodic cubic Bspline basis (31) defined so that s_{k}(t) has a maximum at t = 2k and normalized so that Σ_{k=0} ^{5} s_{k}(t) = 1; ε is an environmental stochasticity parameter (resulting in infinitesimal variance proportional to S_{t} ^{2}); ω corresponds to a nonhuman reservoir of disease; β_{t}I_{t}/P_{t} is humantohuman transmission; 1/γ gives mean time to recovery; 1/r and 1/(kr^{2}) are respectively the mean and variance of the immune period; 1/m is the life expectancy excluding cholera mortality, and m_{c} is the mortality rate for infected individuals. The equation for dN_{t} ^{BS} in Eq. 4 is based on cholera mortality being a negligible proportion of total mortality. The stochastic system was solved numerically using the Euler–Maruyama method (32) with time increments of 1/20 month. The data on observed mortality were modeled as y_{t} ∼ 𝒩[C_{t} − C_{t−1}, τ^{2}(C_{t} − C_{t−1})^{2}], where C_{t} = N_{t} ^{IC} . In the terminology given above, the state process x_{t} is a vector representing counts in each compartment.
Results
Testing the Method Using Simulated Data.
Here, we provide evidence that the MIF methodology successfully maximizes the likelihood. Likelihood maximization is a key tool not just for point estimation, via the maximum likelihood estimate (MLE), but also for profile likelihood calculation, parametric bootstrap confidence intervals, and likelihood ratio hypothesis tests (34).
We present MIF on a simulated data set (Fig. 2 A), with parameter vector θ* given in Table 1, based on data analysis and/or scientifically plausible values. Visually, the simulations are comparable to the data in Fig. 2 B. Table 1 also contains the resulting estimated parameter vector θ̂ from averaging four MIFs, together with the maximized likelihood. A preliminary indicator that MIF has successfully maximized the likelihood is that ℓ(θ̂) > ℓ(θ*). Further evidence that MIF is closely approximating the MLE comes from convergence plots and sliced likelihoods (described below), shown in Fig. 3. The SEs in Table 1 were calculated via the sliced likelihoods, as described below and elaborated in Supporting Text. Because inference on initial values is not of primary relevance here, we do not present standard errors for their estimates. Were they required, we would recommend profile likelihood methods for uncertainty estimates of initial values. There is no asymptotic justification of the quadratic approximation for initial value parameters, since the information in the data about such parameters is typically concentrated in a few early time points.
Applying the Method to Cholera Mortality Data.
We use the data in Fig. 2 B and the model in Eqs. 3 and 4 to address two questions: the strength of the environmental reservoir effect, and the influence of ENSO on cholera dynamics. See refs. 19 and 20 for more extended analyses of these data. A full investigation of the likelihood function is challenging, due to multiple local maxima and poorly identified combinations of parameters. Here, these problems are reduced by treating two parameters (m and r) as known. A value k = 3 was chosen based on preliminary analysis. The remaining 15 parameters (the first eleven parameters in Table 1 and the initial values S_{0}, I_{0}, R_{0} ^{1}, R_{0} ^{2}, R_{0} ^{3} , constrained to sum to P_{0}) were estimated. There is scope for future work by relaxing these assumptions.
For cholera, the difference between humantohuman transmission and transmission via the environment is not clearcut. In the model, the environmental reservoir contributes a component to the force of infection which is independent of the number of infected individuals. Previous data analysis for cholera using a mechanistic model (20) was unable to include an environmental reservoir because it would have disrupted the loglinearity required by the methodology. Fig. 4 shows the profile likelihood of ω and resulting confidence interval, calculated using MIF. This translates to between 29 and 83 infections per million inhabitants per month from the environmental reservoir, because the model implies a mean susceptible fraction of 38%. At least in the context of this model, there is clear evidence of an environmental reservoir effect (likelihood ratio test, P < 0.001). Although our assumption that environmental transmission has no seasonality is less than fully reasonable, this mode of transmission is only expected to play a major role when cholera incidence is low, typically during and after the summer monsoon season (see Fig. 5). Humantohuman transmission, by contrast, predominates during cholera epidemics.
Links between cholera incidence and ENSO have been identified (18, 19, 46). Such largescale climatic phenomena may be the best hope for forecasting disease burden (36). We looked for a relationship between ENSO and the prediction residuals (defined below). Prediction residuals are robust to the exact form of the model: they depend only on the data and the predicted values, and all reasonable models should usually make similar predictions. The lowfrequency component of the southern oscillation index (SOI), graphed in Fig. 2 C, is a measure of ENSO available during the period 1891–1940 (19); low values of SOI correspond to El Niño events. Rodó et al. (19) showed that low SOI correlates with increased cholera cases during the period 1980–2001 but found only weak evidence of a link with cholera deaths during the 1893–1940 period. Simple correlation analysis of standardized residuals or mortality with SOI reveals no clear relationship. Breaking down by month, we find that SOI is strongly correlated with the standardized residuals for August and September (in each case, r = −0.36, P = 0.005), at which time cholera mortality historically began its seasonal increase following the monsoon (see Fig. 5). This result suggests a narrow window of opportunity within which ENSO can act. This is consistent with the mechanism conjectured by Rodó et al. (19) whereby the warmer surface temperatures associated with an El Niño event lead to increased human contact with the environmental reservoir and greater pathogen growth rates in the reservoir. Mortality itself did not correlate with SOI in August (r = −0.035, P = 0.41). Some weak evidence of negative correlation between SOI and mortality appeared in September (r = −0.22, P = 0.063). Earlier work (20), based on a discretetime model and with no allowance for an environmental reservoir, failed to resolve this connection between ENSO and cholera mortality in the historical period: to find clear evidence of the external climatic forcing of the system, it is essential to use a model capable of capturing the intrinsic dynamics of disease transmission.
Discussion
Procedure 1 depends on the viability of solving the filtering problem, i.e., calculating θ̂_{t} and V_{t} in Eq. 2. This is a strength of the methodology, in that the filtering problem has been extensively studied. Filtering does not require stationarity of the stochastic dynamical system, enabling covariates (such as P_{t}) to be included in a mechanistically plausible way. Missing observations and data collected at irregular time intervals also pose no obstacle for filtering methods. Filtering can be challenging, particularly in nonlinear systems with a highdimensional state space (d_{x} large). One example is data assimilation for atmospheric and oceanographic science, where observations (satellites, weather stations, etc.) are used to inform large spatiotemporal simulation models: approximate filtering methods developed for such situations (4) could be used to apply the methods of this paper.
The goal of maximum likelihood estimation for partially observed data is reminiscent of the expectation–maximization (EM) algorithm (37), and indeed Monte Carlo EM methods have been applied to nonlinear state space models (24). The Monte Carlo EM algorithm, and other standard Monte Carlo Markov Chain methods, cannot be used for inference on the environmental noise parameter ε for the model given above, because these methods rely upon different sample paths of the unobserved process x_{t} having densities with respect to a common measure (38). Diffusion processes, such as the solution to the system of stochastic differential equations above, are mutually singular for different values of the infinitesimal variance. Modeling using diffusion processes (as in above) is by no means necessary for the application of Procedure 1, but continuoustime models for large discrete populations are well approximated by diffusion processes, so a method that can handle diffusion processes may be expected to be more reliable for large discrete populations.
Procedure 1 is well suited for maximizing numerically estimated likelihoods for complex models largely because it requires neither analytic derivatives, which may not be available, nor numerical derivatives, which may be unstable. The iterated filtering effectively produces estimates of the derivatives smoothed at each iteration over the scale at which the likelihood is currently being investigated. Although general stochastic optimization techniques do exist for maximizing functions measured with error (39), these methods are inefficient in terms of the number of function evaluations required (40). General stochastic optimization techniques have not to our knowledge been successfully applied to examples comparable to that presented here.
Each iteration of MIF requires similar computational effort to one evaluation of the likelihood function. The results in Fig. 3 demonstrate the ability of Procedure 1 to optimize a function of 13 variables using 50 function evaluations, with Monte Carlo measurement error and without knowledge of derivatives. This feat is only possible because Procedure 1 takes advantage of the statespace structure of the model; however, this structure is general enough to cover relevant dynamical models across a broad range of disciplines. The EM algorithm is similarly “only” an optimization trick, but in practice it has led to the consideration of models that would be otherwise intractable. The computational efficiency of Procedure 1 is essential for the model given above, where Monte Carlo function evaluations each take ≈15 min on a desktop computer.
Implementation of Procedure 1 using particle filtering conveniently requires little more than being able to simulate paths from the unobserved dynamical system. The new methodology is therefore readily adaptable to modifications of the model, allowing relatively rapid cycles of model development, model fitting, diagnostic analysis, and model improvement.
Theoretical Basis for MIF
Recall the notation above, and specifically the definitions in Eqs. 1 and 2 .
Theorem 1.
Assuming conditions (R1–R3) below, where ∇g is defined by [∇g]_{i} = ∂g/∂θ_{i} and θ̂_{0} = θ. Furthermore, for a sequence σ_{n} → 0, define θ̂^{(n)} recursively by where θ̂_{t} ^{(n)} = θ̂_{t}(θ̂^{(n)}, σ_{n}) and V_{t,n} = V_{t}(θ̂^{(n)}, σ_{n}). If there is a θ̂ with θ̂^{(n)} − θ̂/σ_{n} ^{2} → 0 then∇ log f(y_{1:T}θ = θ̂, σ = 0) = 0.
Theorem 1 asserts that (for sufficiently small σ_{n}), Procedure 1 iteratively updates the parameter estimate in the direction of increasing likelihood, with a fixed point at a local maximum of the likelihood. Step 2(ii) of Procedure 1 can be rewritten as θ̂^{(n+1)} = V_{1,n}{Σ_{t=1} ^{T−1} (V _{t,n} ^{−1} − V _{t+1,n} ^{−1})θ̂_{t} ^{(n)} + (V _{T,n} ^{−1})θ̂ _{T} ^{(n)}}. This makes θ̂^{(n+1)} a weighted average, in the sense that V_{1}{Σ_{t=1} ^{T−1} (V _{t} ^{−1} − V _{t+1} ^{−1}) + V _{T} ^{−1})} = I_{dθ}, where I_{dθ} is the _{θ} × d_{θ} identity matrix. The weights are necessarily positive for sufficiently small σ_{n} (Supporting Text).
The exponentially decaying σ_{n} in step 2(i) of Procedure 1 is justified by empirical demonstration, provided by convergence plots (Fig. 3). Slower decay, σ_{n} ^{2} = n^{−β} with 0 < β < 1, can give sufficient conditions for a Monte Carlo implementation of Procedure 1 to converge successfully (Supporting Text). In our experience, exponential decay yields equivalent results, considerably more rapidly. Analogously, simulated annealing provides an example of a widely used stochastic search algorithm where a geometric “cooling schedule” is often more effective than slower, theoretically motivated schedules (41).
In the proof of Theorem 1, we define f_{t}(ψ) = f(y_{t}y_{1:t−1}, θ_{t} = ψ). The dependence on σ may be made explicit by writing f_{t}(ψ) = f_{t}(ψ, σ). We assume that y_{1:T}, c and Σ are fixed: for example, the constant B in R1 may depend on y_{1:t}. We use the Euclidean norm for vectors and the corresponding norm for matrices, i.e., M = sup_{u≤1} u′Mu, where u′ denotes the transpose of u. We assume the following regularity conditions.

R1. The Hessian matrix is bounded, i.e., there are constants B and σ_{0} such that, for all σ < σ_{0} and all θ_{t} ∈ ℝ^{dθ}, ∇^{2}f_{t}(θ_{t}, σ) < B.

R2. E[θ_{t} − θ̂_{t−1}^{2} y_{1:t−1}] = O(σ^{2}).

R3. E[θ_{t}−θ̂_{t−1}^{3} y_{1:t−1}] = o(σ^{2}).
R1 is a global bound over θ_{t} ∈ ℝ^{dθ} , comparable to global bounds used to show the consistency and asymptotic normality of the MLE (42, 43). It can break down, for example, when the likelihood is unbounded. This problem can be avoided by reparameterizing to keep the model away from such singularities, as is common practice in mixture modeling (44). R2–R3 require that a new observation cannot often have a large amount of new information about θ. For example, they are satisfied if θ_{0:t}, x_{1:t} , and y_{1:t} are jointly Gaussian. We conjecture that they are satisfied whenever the state space model is smoothly parametrized and the random walk θ_{t} does not have long tails.
Proof of Theorem 1.
Suppose inductively that V_{t} = O(σ^{2}) and θ̂_{t−1} − θ = O(σ^{2}). This holds for t = 1 by construction. Bayes' formula gives The numerator in Eq. 8 comes from a Taylor series expansion of f_{t}(θ̂_{t}), and R1 implies R_{t} ≤ Bθ_{t} − θ̂_{t−1}^{2}/2. The denominator then follows from applying this expansion to the integral in Eq. 7 , invoking R2, and observing that Eq. 1 implies E[θ_{t} y_{1:t−1}] = θ̂_{t−1}. We now calculate Eq. 11 follows from Eq. 10 using Eq. 9 and R3. Eq. 12 follows from Eq. 11 using the induction assumptions on θ̂_{t−1} and V_{t}; Eq. 12 then justifies this assumption for θ̂_{t}. A similar argument gives where Eq. 14 follows from Eq. 13 via Eqs. 9 and 12 and the induction hypothesis on V_{t}. Eq. 14 in turn justifies this hypothesis. Summing Eq. 12 over t produces which leads to Eq. 5 . To see the second part of the theorem, note that Eq. 6 and the requirement that θ̂^{(n)} − θ̂/σ_{n} ^{2} → 0 imply that Continuity then gives which, together with Eq. 5 , yields the required result.
Heuristics, Diagnostics, and Confidence Intervals
Our main MIF diagnostic is to plot parameter estimates as a function of MIF iteration; we call this a convergence plot. Convergence is indicated when the estimates reach a single stable limit from various starting points. Convergence plots were also used for simulations with a known true parameter, to validate the methodology. The investigation of quantitative convergence measures might lead to more refined implementations of Procedure 1.
Heuristically, α can be thought of as a “cooling” parameter, analogous to that used in simulated annealing (39). If α is too small, the convergence will be “quenched” and fail to locate a maximum. If α is too large, the algorithm will fail to converge in a reasonable time interval. A value of α = 0.95 was used above.
Supposing that θ_{i} has a plausible range [θ_{i}
^{lo}, θ_{i}
^{hi}] based on prior knowledge, then each particle is capable of exploring this range in early iterations of MIF (unconditional on the data) provided
Although the asymptotic arguments do not depend on the particular value of the dimensionless constant c, looking at convergence plots led us to take c^{2} = 20 above. Large values c^{2} ≈ 40 resulted in increased algorithmic instability, as occasional large decreases in the prediction variance V_{t} resulted in large weights in Procedure 1 step 2(ii). Small values c^{2} ≈ 10 were diagnosed to result in appreciably slower convergence. We found it useful, in choosing c, to check that [V_{t}]_{ii} plotted against t was fairly stable. In principle, a different value of c could be used for each dimension of θ; for our example, a single choice of c was found to be adequate.
If the dimension of θ is even moderately large (say, d_{θ} ≈ 10), it can be challenging to investigate the likelihood surface, to check that a good local maximum has been found, and to get an idea of the standard deviations and covariance of the estimators. A useful diagnostic, the “sliced likelihood” (Fig. 3 B), plots ℓ(θ̂ + hδ_{i}) against θ̂_{i} + h, where δ_{i} is a vector of zeros with a one in the ith position. If θ̂ is located at a local maximum of each sliced likelihood, then θ̂ is a local maximum of ℓ(θ), supposing ℓ(θ) is continuously differentiable. Computing sliced likelihoods requires moderate computational effort, linear in the dimension of θ. A local quadratic fit is made to the sliced log likelihood (as suggested by ref. 35), because ℓ(θ̂ + hδ_{i}) is calculated with a Monte Carlo error. Calculating the sliced likelihood involves evaluating log f(y_{t} y_{1:t−1}, θ̂ + hδ_{i}), which can then be regressed against h to estimate (∂/∂θ_{i}) log f(y_{t} y_{1:t−1}, θ̂). These partial derivatives may then be used to estimate the Fisher information (ref. 34 and Supporting Text) and corresponding SEs. Profile likelihoods (34) can be calculated by using MIF, but at considerably more computational expense than sliced likelihoods. SEs and profile likelihood confidence intervals, based on asymptotic properties of MLEs, are particularly useful when alternate ways to find standard errors, such as bootstrap simulation from the fitted model, are prohibitively expensive to compute. Our experience, consistent with previous advice (45), is that SEs based on estimating Fisher information provide a computationally frugal method to get a reasonable idea of the scale of uncertainty, but profile likelihoods and associated likelihood based confidence intervals are more appropriate for drawing careful inferences.
As in regression, residual analysis is a key diagnostic tool for state space models. The standardized prediction residuals are {u_{t}(θ̂)} where θ̂ is the MLE and u_{t}(θ) = [Var(y_{t} y_{1:t−1}, θ)]^{−1/2} (y_{t} − E[y_{t} y_{1:t−1}, θ]). Other residuals may be defined for state space models (8), such as E∫ _{t−1} ^{t} dW_{s} y_{1:T}, θ̂] for the model in Eqs. 3 and 4 . Prediction residuals have the property that, if the model is correctly specified with true parameter vector θ*, {u_{t}(θ*)} is an uncorrelated sequence. This has two useful consequences: it gives a direct diagnostic check of the model, i.e., {u_{t}(θ̂)} should be approximately uncorrelated; it means that prediction residuals are an (approximately) prewhitened version of the observation process, which makes them particularly suitable for using correlation techniques to look for relationships with other variables (7), as demonstrated above. In addition, the prediction residuals are relatively easy to calculate using particlefilter techniques (Supporting Text).
Acknowledgments
We thank the editor and two anonymous referees for constructive suggestions, Mercedes Pascual and Menno Bouma for helpful discussions and Menno Bouma for the cholera data shown in Fig. 2 B. This work was supported by National Science Foundation Grant 0430120.
Footnotes
 ^{‡}To whom correspondence should be addressed. Email: ionides{at}umich.edu

Author contributions: E.L.I., C.B., and A.A.K. performed research, analyzed data, and wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS direct submission.
 Abbreviations:
 ENSO,
 El Niño Southern Oscillation;
 MIF,
 maximum likelihood via iterated filtering;
 MLE,
 maximum likelihood estimate;
 EM,
 expectation—maximization.
 © 2006 by The National Academy of Sciences of the USA
References

↵
 Anderson BD ,
 Moore JB

↵
 Shephard N ,
 Pitt MK
 ↵
 ↵
 ↵

↵
 Brown EN ,
 Frank LM ,
 Tang D ,
 Quirk MC ,
 Wilson MA

↵
 Shumway RH ,
 Stoffer DS

↵
 Durbin J ,
 Koopman SJ

↵
 Doucet A ,
 de Freitas N ,
 Gordon NJ
 ↵

↵
 Gordon N ,
 Salmond DJ ,
 Smith AFM

↵
 Liu JS
 ↵
 ↵

↵
 Zo YG ,
 Rivera ING ,
 RussekCohen E ,
 Islam MS ,
 Siddique AK ,
 Yunus M ,
 Sack RB ,
 Huq A ,
 Colwell RR

↵
 Huq A ,
 West PA ,
 Small EB ,
 Huq MI ,
 Colwell RR
 ↵

↵
 Pascual M ,
 Rodó X ,
 Ellner SP ,
 Colwell R ,
 Bouma MJ

↵
 Rodó X ,
 Pascual M ,
 Fuchs G ,
 Faruque ASG
 ↵

↵
 Finkenstädt BF ,
 Grenfell BT

↵
 Liu J ,
 West M
 Doucer A ,
 de Freitas N ,
 Gordon JJ

↵
 Hürzeler M ,
 Künsch HR
 Doucer A ,
 de Freitas N ,
 Gordon JJ

↵
 Cappé O ,
 Moulines E ,
 Rydén T
 ↵

↵
 Turchin P
 ↵

↵
 Bjørnstad ON ,
 Grenfell BT

↵
 Kermack WO ,
 McKendrick AG

↵
 Bartlett MS

↵
 Powell MJD

↵
 Kloeden PE ,
 Platen E

↵
 Cleveland WS ,
 Grossel E ,
 Shyu WM
 Chambers JM ,
 Hastie TJ

↵
 BarndorffNielsen OE ,
 Cox DR

↵
 Ionides EL
 ↵

↵
 Dempster AP ,
 Laird NM ,
 Rubin DB

↵
 Roberts GO ,
 Stramer O

↵
 Spall JC
 ↵

↵
 Press W ,
 Flannery B ,
 Teukolsky S ,
 Vetterling W

↵
 Cramér H
 ↵

↵
 McLachlan G ,
 Peel D

↵
 McCullagh P ,
 Nelder JA
 ↵