## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Equation-free mechanistic ecosystem forecasting using empirical dynamic modeling

Edited by Stephen R. Carpenter, University of Wisconsin–Madison, Madison, WI, and approved January 28, 2015 (received for review September 17, 2014)

## Significance

The conventional parametric approach to modeling relies on hypothesized equations to approximate mechanistic processes. Although there are known limitations in using an assumed set of equations, parametric models remain widely used to test for interactions, make predictions, and guide management decisions. Here, we show that these objectives are better addressed using an alternative equation-free approach, empirical dynamic modeling (EDM). Applied to Fraser River sockeye salmon, EDM models (*i*) recover the mechanistic relationship between the environment and population biology that fisheries models dismiss as insignificant, (*ii*) produce significantly better forecasts compared with contemporary fisheries models, and (*iii*) explicitly link control parameters (spawning abundance) and ecosystem objectives (future recruitment), producing models that are suitable for current management frameworks.

## Abstract

It is well known that current equilibrium-based models fall short as predictive descriptions of natural ecosystems, and particularly of fisheries systems that exhibit nonlinear dynamics. For example, model parameters assumed to be fixed constants may actually vary in time, models may fit well to existing data but lack out-of-sample predictive skill, and key driving variables may be misidentified due to transient (mirage) correlations that are common in nonlinear systems. With these frailties, it is somewhat surprising that static equilibrium models continue to be widely used. Here, we examine empirical dynamic modeling (EDM) as an alternative to imposed model equations and that accommodates both nonequilibrium dynamics and nonlinearity. Using time series from nine stocks of sockeye salmon (*Oncorhynchus nerka*) from the Fraser River system in British Columbia, Canada, we perform, for the the first time to our knowledge, real-data comparison of contemporary fisheries models with equivalent EDM formulations that explicitly use spawning stock and environmental variables to forecast recruitment. We find that EDM models produce more accurate and precise forecasts, and unlike extensions of the classic Ricker spawner–recruit equation, they show significant improvements when environmental factors are included. Our analysis demonstrates the strategic utility of EDM for incorporating environmental influences into fisheries forecasts and, more generally, for providing insight into how environmental factors can operate in forecast models, thus paving the way for equation-free mechanistic forecasting to be applied in management contexts.

- ecosystem forecasting
- fisheries ecology
- physical–biological interactions
- empirical dynamic modeling
- nonlinear dynamics

One of the fundamental challenges of environmental science is to understand and predict the behavior of complex natural ecosystems. This task can be especially difficult when multiple drivers (e.g., species interactions, environmental influences) interact in a nonlinear state-dependent way to produce dynamics that appear to be erratic and nonstationary (1). In the standard parametric approach, which implicitly assumes that the selected model and its equations are essentially correct, the equations (really just mechanistic hypotheses) can lack the flexibility to describe the nonlinear dynamics that occur in nature. Consequently, these parametric models tend to perform poorly as descriptions of reality, with little explanatory or predictive power (2, 3), and limited usefulness for prediction and management.

## Parametric Models as Hypotheses

A common problem when applying the parametric approach to nonlinear systems is that of ephemeral fitting. That is, although population models may assume that demographic parameters such as growth rate or carrying capacity are fixed constants, these quantities are often observed to vary in time or in relation to other variables (e.g., resource availability, changing climate regimes) when tested on actual data (4). This principle is illustrated in Fig. 1*A*, where the Ricker spawner–recruit relationship is fit to the early (1948–1976) and late (1977–2005) halves of the time series from the Seymour stock. Very different relationships emerge in these two time periods, conflicting with the assumption of a fixed equilibrium and constant parameter values. Indeed, Beamish et al. (5) found that the Ricker model fit better when constrained by climate regimes, suggesting that the spawner–recruit relationship does vary in time, a fact consistent with the general notion of nonlinear state dependence (6, 7).

At its core, nonlinear dynamics [which are known to be ubiquitous in marine species (8, 9)] occur when variables have interdependent effects; this can be problematic when applying a reductionist approach to understand nonlinear systems. For example, in laboratory experiments, guppies (*Poecilia reticulatus*) preferentially eat *Drosophila* or tubificid worms depending on which prey is more abundant (10). Thus, the strength of predation on, say, *Drosophila*, will change depending on the abundance of tubificid worms. This prey-switching behavior typifies nonlinear state dependence, whereby different components cannot be treated independently, as would be the case in a linear system or even a nonlinear system approximated at equilibrium. Consequently, applying a model that assumes separability of effects [e.g., vector autoregression (11)] to a system that is actually nonlinear can give the appearance of nonstationarity or stochasticity even when the underlying mechanisms are unchanged and deterministic.

Nonlinearity is also known to affect the correct identification of causal drivers—a key prerequisite for understanding and predicting system behavior. In nonlinear systems, because interacting variables can exhibit transient (mirage) correlations that change in magnitude or sign (6, 7), the use of correlation to identify causal environmental variables can be misleading, producing both false positives (i.e., correlation does not imply causation) and false negatives (i.e., lack of correlation does not imply a lack of causation). Given the prevalence of nonlinear interactions in ecology, mirage correlations can be misleading. Indeed, a metaanalysis examining the robustness of correlations between recruitment and the environment (12) found that only 28 out of 74 initially significant correlations were upheld when subsequent data were included.

Even when causal variables are known, their inclusion into improperly formulated models can produce conflicting results. For example, with sockeye salmon in the Fraser River, although anomalous oceanic conditions experienced by juveniles are thought to be responsible for the low abundance of returning adults in 2009 (13⇓–15), extensions of the standard Ricker model that explicitly include environmental factors surprisingly show no significant improvements in the actual forecasts (16⇓–18). A simple explanation for this apparent contradiction is that the extended Ricker model does not accurately portray the relevant interaction between the oceanic environment and sockeye salmon. Indeed, the model naively assumes that the environment acts on recruitment dynamics independently with a constant multiplicative effect (e.g., a 1 °C decrease in temperature always doubles recruitment regardless of other factors important to the state of the system). Although temperature, in all likelihood, does affect recruitment, it probably does not follow this arbitrary form. We demonstrate this by fitting the model to Pine Island sea surface temperature (SST) and Seymour spawner–recruit data (Fig. 1*B*), finding that the model predicts unrealistically high recruitment (much higher than the historically observed maximum) for hypothetical (but plausible) conditions of high spawner abundance and low temperature. Thus, although the equation may appear reasonable as a hypothesis, it apparently does not incorporate the environment realistically.

## Empirical Dynamic Modeling

In contrast to fitting an assumed set of equations, empirical dynamic modeling (EDM) instead relies on time series data to reveal the dynamic relationships among variables as they occur (1, 6, 19⇓–21). By extracting these relationships empirically, EDM accommodates potentially complex and changing interactions that cannot be described in a simple set of equations. Thus, prediction accuracy with EDM is constrained by the quantity and quality of data rather than by the hypotheses represented in a set of equations [which may be subject to process error due to false or incomplete specification (22)].

Fundamental to EDM is the concept of a time series as an observation on a dynamic system. Broadly speaking, a dynamic system can be viewed as a set of “states” (*d*-dimensional vectors where each coordinate is a system variable) and deterministic rules (governing dynamics) for how the states evolve over time. Collectively, the set of states and their trajectories forms an “attractor manifold,” and projecting the motion on this manifold to a coordinate axis produces a time series of the corresponding variable (*SI Appendix*, Fig. S1*A*). For example, in a simple predator–prey system where the system evolves as a function of the two abundances, the system state could be represented as the ordered pair of predator and prey abundances. This system state can be projected onto the prey coordinate axis to produce a time series of prey abundance, although many other observation functions are also possible (e.g., predator abundance, average number of prey for each predator).

In theory, with time series for all of the system variables, it would be possible to reconstruct the original attractor manifold by plotting each time series as a separate coordinate. In practice, however, we typically do not have these data or know the identity of all relevant variables. Fortunately, a fundamental mathematical result proves that information about the entire system is contained in any one variable (23, 24), meaning that a shadow version of the original attractor can be constructed from just a single time series. This is accomplished by substituting lags of that time series for the unknown or unobserved variables (*SI Appendix*, Fig. S1*B*). These essential mechanics of EDM are detailed in ref. 6 and crisply summarized in a short animation (Movie S1).

Although a single time series is usually sufficient to reconstruct a system’s dynamics, there are exceptions (e.g., it is not a closed system). In the case of sockeye salmon, abundance alone may not skillfully predict future returns because they are influenced by external environmental factors. Here, the environment may be thought to act as stochastic external forcing, necessitating its inclusion as an additional coordinate in a multivariate reconstruction (1, 7, 24). We demonstrate this by using spawners and SST to predict recruitment (Fig. 1*C*). Unlike a parametric model in which a hypothesized interaction must be specified in advance (the extended Ricker model; Fig. 1*B*), the empirical surface in Fig. 1*C* makes no assumptions about the relationship between variables, but instead captures the interaction between density dependence and environmental conditions as revealed by the data: ocean temperatures have a stronger effect on recruitment when spawner abundance is low.

## Fraser River Sockeye Salmon

In this work, we perform a real-world test comparing EDM and the standard parametric paradigm, by forecasting returns for the nine most historically abundant stocks of sockeye salmon from the Fraser River system in British Columbia, Canada (Fig. 2), of significance to Canada’s iconic fisheries. Total returns in this system are highly variable and can span over an order of magnitude: a record low of 1.6 million in 2009 was followed by a record high of 28.3 million in 2010 (Fig. 3). Although some of this variability occurs because of cyclic dominance (25, 26), large interannual fluctuations in mortality and productivity (recruits-per-spawner) are difficult to predict, leading to considerable uncertainties in current parametric forecast models (27). This is suggestive of nonlinear dynamics in this fishery, and indeed, a Canadian federal inquiry (13, 14) concluded that recent declines in productivity could not be attributed to any single mechanism but were likely caused by the interaction of multiple stressors (e.g., predators, food availability, environment). Applying a simple S-map test (*P* = 0.002) (*SI Appendix*, Fig. S2), we confirm the presence of nonlinear dynamics among returns of Fraser River sockeye salmon.

Thus, we apply EDM methods to unravel the mechanisms by which the environment may affect sockeye salmon recruitment. First, we compare the classical Ricker spawner–recruit model with equivalent EDM spawner–recruit models. With nearly all adults returning as age 4 or age 5 fish, we can consider the total returns in a single calendar year to be composed of age 4 and age 5 recruits from different spawning broods. Following ref. 16, we predict annual returns by first estimating total recruitment for each spawning brood year. This recruitment is then partitioned by age, and the age 4 and age 5 estimates from separate brood years combined appropriately to forecast returns (*Materials and Methods*). Note that the time series of spawning abundance and recruitment already account for the effects of the fishery (this information is contained within the time series; *Materials and Methods*), which enables us to focus on just the natural population dynamics.

Second, to investigate the causal influence of the oceanic environment, we consider forecasts produced by the extended Ricker model and equivalent multivariate EDM formulations. In both cases, if the inclusion of environmental variables significantly improves forecasts (*Materials and Methods*), those variables are taken to have a causal influence on salmon recruitment.

Last, to avoid arbitrary fitting and to obtain a robust measure of forecast skill, we apply a fourfold cross-validation scheme for each model: the model is fit to three-fourths of the data to predict the remaining one-fourth out-of-sample, and the procedure is repeated for each one-fourth segment of the time series.

## Results

### Comparison of Spawner–Recruit Forecast Models.

As a fair comparison with the standard Ricker model where spawner abundance is used to predict recruitment, we examine an equivalent EDM spawner–recruit model, but which actually has fewer fitted parameters (*Materials and Methods*). Fig. 4 shows that this simple EDM model has significantly higher accuracy (*ρ*, correlation between observations and predictions) than the Ricker model, with more accurate forecasts in eight of nine cases and significantly lower error overall [mean absolute error (MAE); *SI Appendix*, Fig. S3]. Nonetheless, predictions for several stocks (Birkenhead, Chilko, Stellako, and Weaver) are not very skillful (*ρ* < 0.3), suggesting that in these cases, there is no simple spawner–recruit relationship (parametric or otherwise). Instead, environmental factors (e.g., SST, food availability) may dominate, and better performance can be obtained by accounting for these external drivers.

### Incorporating Environmental Influences.

As in the actual forecast models (16), we further consider three environmental variables [the Pacific Decadal Oscillation (PDO), SST, and Fraser River discharge] observed at different times and locations (12 time series in total). Each of these factors is believed to have a potential effect on recruitment, although significance has yet to be demonstrated in practice. For each stock, we compare the relative performance of the extended Ricker and corresponding multivariate EDM models that incorporate these environmental variables (Table 1; *Materials and Methods*). Fig. 4 shows that multivariate EDM is consistently and significantly better at forecasting than the extended Ricker model for all nine stocks, and is true for both accuracy and precision metrics (*SI Appendix*, Fig. S3). Here, the relevant causal influence of these environmental variables is verified by the fact that multivariate EDM models that include them perform significantly better than their simple EDM spawner–recruit counterparts.

By contrast, the extended Ricker models show no significant improvement over the simple Ricker models in any of the stocks. The difference between EDM and Ricker is especially visible for Late Stuart, Quesnel, Stellako, and Weaver, indicating that these particular environmental factors (currently considered in assessments) can explain much of the variability in these stocks, provided they are incorporated reasonably (i.e., with the minimal assumptions of EDM). For Birkenhead and Chilko, however, multivariate EDM models performed no better than the simplified stock–recruit versions, hinting that variables other than these are required to understand the dynamics of those stocks.

## Discussion

### Nonlinearity of Fraser River Sockeye Salmon.

Similar to many marine species (8, 9), Fraser River sockeye salmon show strong evidence for nonlinear dynamics (*SI Appendix*, Fig. S2 and Table S1). Thus, it should not be surprising that a simple EDM model, which accommodates nonlinearity, would outperform the assumed spawner–recruit equation of the Ricker model. Furthermore, because sockeye salmon are exposed to different sources of environmentally driven mortality and because it is likely that they integrate these effects in a nonlinear fashion, it should not be surprising that multivariate EDM models that explicitly accommodate relevant environmental factors would show dramatically improved performance. In contrast, the extended Ricker model cannot resolve the nonlinear effect of the environment and shows only nonsignificant improvements (that might be expected from having additional degrees of freedom).

### Identifying Environmental Drivers.

It is believed that growth during the early marine stage for Pacific salmon is a critical period that determines subsequent mortality and recruitment (28, 29). Thus, it is reasonable to expect that including related environmental variables into models should improve predictions. However, the extended Ricker model did not improve when river discharge, SST, or the PDO (Fig. 4 and *SI Appendix*, Fig. S3) were included. Rather than suggesting that these variables have no effect, it is more likely that the extended Ricker model is incorrectly specified. This is borne out by the fact that these factors produce improved forecasts for many stocks when included nonparametrically in EDM (Fig. 4 and *SI Appendix*, Fig. S3). Thus, our analysis suggests that the tested variables are indeed informative about the relevant environmental conditions experienced by juvenile sockeye salmon. For example, river discharge and SST may indicate primary productivity in the Strait of Georgia and other areas through which juveniles migrate (Fig. 2) (15, 30, 31), and the associations between large-scale oceanic climate indicators such as the PDO and Pacific salmon productivity are well known from other studies (32, 33). Although these variables do not reflect direct causal mechanisms, they may be useful as simple indicators of processes that influence salmon survival, thereby improving forecasts when included in the EDM approach.

Although individual stocks appear to be sensitive to different environmental factors (Table 1), we did observe some general patterns: for example, two of the nine stocks (Stellako and Quesnel) identified the PDO as an informative variable (the first-ranked EDM models for these stocks include the PDO as a coordinate), yet the predictability for these stocks is further improved when other variables (river discharge or SST) are included in addition to the PDO. This suggests that the PDO is an incomplete observation on the relevant environment for sockeye, and that local-scale measures of the environment can enhance the information in the PDO index (an ocean basin-scale indicator) (see *SI Appendix* for details).

Although our models confirm a general influence of the environment on sockeye salmon recruitment, some stocks appear to be skillfully predicted using only spawner abundance. One explanation for this is that the stocks experience unique environments: they are exposed to different freshwater conditions in their respective nursery lakes, and they exhibit different timings and migration routes as they travel through the Fraser River, the Strait of Georgia (34), and along the west coast of North America (35). Even with shared environmental influences (e.g., food availability in the Strait of Georgia), nonlinear state dependence can produce dynamics unique to each stock. Consequently, if these myriad effects are strongly density dependent, recruitment could be successfully predicted using just spawner abundance. However, if these effects are stochastic (i.e., environmentally driven), then it will be necessary to include informative indicator variables to improve forecasts.

Apart from multivariate models, an alternative approach to determine causal environmental variables would be to apply the method of convergent cross-mapping (CCM) (6). However, due to data limitations (in particular, the absence of annual monitoring of each cycle line including the oceanic phase), CCM may not be sufficiently sensitive to resolve causality here (see *SI Appendix* for details).

### Nonuniqueness of Models.

We note that, for a given stock, different EDM models can show similar performance (*SI Appendix*, Table S4). Although somewhat counterintuitive, this phenomenon is expected, because the tested variables (river discharge, SST, the PDO) are proxy indicators of the environment. Thus, they may contain redundant information such that different variable combinations are equally informative even as they represent alternative perspectives on the system. This reflects a fundamental property of EDM in that forecast performance depends solely on the information content of the data rather than on how well assumed equations match reality.

To clarify the concept of nonuniqueness, consider the canonical Lorenz attractor (*SI Appendix*, Fig. S1*A*). The behavior of this system is governed by three differential equations (*SI Appendix*, Eq. **S1**). However, the axes can be rotated to produce three new coordinates, *x′*, *y′*, and *z′*, and the equations rewritten in terms of these new coordinates, allowing the system to be described using either representation (*x*, *y*, and *z* or *x′*, *y′*, and *z′*) as well as mixed combinations (e.g., *x*, *y*, and *z′*). Thus, with an infinite number of ways to rotate the system, there are an unlimited number of “true variables” and “true models.” In the case of sockeye salmon, the similar performance of different models (*SI Appendix*, Table S4) does not mean that one or the other model is incorrect; instead, it reflects the fact that the environmental variables are indicators of the same general mechanism, and so different variable combinations can be equally informative for forecasting recruitment.

Again, we emphasize that including a variable does not imply a direct causal link—variables in an EDM model improve forecasts because they are informative; it does not mean that the included variables are proximate causes. Importantly, the converse does not hold either: a variable could be causal and yet not appear in the multivariate EDM; this might occur when multiple stochastic drivers affect recruitment in an interdependent way, necessitating that a model include measurements of all of the drivers to account for their combined effect. For example, although none of the tested variables seem to improve forecasts for the Birkenhead stock (*SI Appendix*, Table S4), this does not mean that these sockeye salmon are insensitive to SST, river discharge, and the PDO. Rather, it suggests that the effect of these variables may be modulated by other factors not considered here.

### Data Requirements of EDM.

Using EDM is fundamentally a data-driven approach: thus, it is important to ensure that time series are of sufficient length to recover dynamics. For example, Sugihara et al. (6) suggest that at least 35–40 points might be necessary as a rough minimum, although methods exist for using dynamically similar replicates in cases where time series are shorter (36). For many systems, however, the data requirements of EDM mean that increased budgets and additional sampling effort will be important to support long-term continuous observations and generate sufficient time series. We note, although, that it is not necessary to sample all putatively relevant drivers, because different measurements are often substitutable as proxies for true proximal causes.

When data requirements are met, however, we note that collecting additional data can further improve accuracy and precision of EDM models. Consider the simplex projection method, which uses nearest-neighbor analogs to approximate system behavior. With each new data point, more analogs are available (the reconstructed manifold becomes denser), and so these approximations become more precise. Thus, EDM models will improve with longer time series. In contrast, a parametric model will benefit from more data only when the assumed equations are essentially correct. In the case of the classic Ricker model in Fig. 1*A*, it is clear that similar levels of spawner abundance yield very different levels of recruitment, and so any simple function relating the two cannot fully explain the scattered observations. Adding more data may result in more “precise” parameter estimates, but individual errors will remain large when the underlying process is more complex than the assumed model can portray.

### Alternative Parametric Models.

In this work, we use the classical and extended Ricker models as examples of the parametric approach, but acknowledge that there are alternative models considered by Fisheries and Oceans Canada (DFO) for Fraser River sockeye salmon (37⇓⇓–40). Although some of these models may fit the data better, this does not always reflect a model’s true performance in out-of-sample forecasting. For example, a modified Ricker model that allows parameters to randomly drift over time (37) will explain variations in the data better than the static alternative, because doing so can indirectly track nonlinear state dependence. However, instead of a mechanism for why parameters change, such models based on the Kalman filter (41) typically use forward information (i.e., observations at time *t* + 1 help to estimate the growth rate at time *t*) and thus do not actually “predict.” Consequently, the actual forecast performance of such models will be overestimated by their fit to historical data. A more fundamental concern with the parametric approach is that it requires explicit equations to model the effects of included variables. Such equations may be overly simplified (e.g., linear correlations) and unable to accommodate the state-dependent effects that occur in nonlinear systems.

### Final Remarks.

EDM addresses two important challenges for modeling natural systems. First, EDM identifies relevant variables and interactions empirically and dynamically (6); this is in contrast to the conventional approach where the use of parametric equations poses the dual risks of model misspecification (22) as well as variable misidentification (6, 7, 12). Importantly, EDM allows proxy variables to be used, which can be a boon when observations on key processes (e.g., mortality) are lacking but indirect measurements (e.g., SST) are available. Second, the equation-free approach of EDM produces more accurate forecasts than equivalent parametric models using the same data. As Perretti et al. (2) have shown, even when a correct parametric model is known, fitting parametric models can be problematic and is an important concern with many systems exhibiting nonlinear behavior (8, 9). In contrast, EDM models can capture dynamic information and explain behavior that may be misclassified as random by parametric fitting procedures.

Consequently, the dynamic perspective of EDM has much to offer for modeling nonlinear systems, representing a viable framework (with minimal assumptions) for system identification and robust forecasting. When parametric models are required, EDM can also be used in a complementary role to identify causal links, recover variable relationships, and even guide the construction of reliable equation-based models (42). This represents a practical way to perform data-driven modeling instead of starting with complex parametric models [end-to-end ecosystem models, such as Ecopath with Ecosim (43)], which often make strong assumptions and require large amounts of data to parameterize.

Moreover, EDM models can also serve as direct substitutes for their parametric equivalents. Here, our simple and multivariate EDM models are formulated similarly to their Ricker-based counterparts: using spawner abundance (and the environment) to forecast returns. Some simple extensions to the methods, such as the development of uncertainty estimates (*SI Appendix*, Fig. S5), will enable these models to be integrated into the current management framework that uses parametric models. Thus, we believe that EDM has great potential as a tool for understanding and forecasting nonlinear ecosystems: by operating without assumed equations, it can be beneficial when exact mathematical descriptions are not available.

## Materials and Methods

### Data.

We analyze yearly time series data for the nine historically most abundant stocks (Birkenhead, Chilko, Early Stuart, Late Shuswap, Late Stuart, Quesnel, Seymour, Stellako, and Weaver) of sockeye salmon from the Fraser River system (Dataset S2). Data span brood years 1948–2005, except for Late Stuart and Weaver, where data begin in 1949 and 1966, respectively. We consider only single-stock models, so notation and equations are given as for a single stock.

*S*_{t} is the number of effective female spawners in brood year *t*, and *R*_{t} is the corresponding recruitment (returning adults). Recruitment is partitioned by age: *R*_{a,t} is the number spawned in year *t* and returning at age *a* in year *t* + *a*. Following ref. 16, total recruitment is the sum of age 4 and age 5 recruits: *R*_{t} = *R*_{4,t} + *R*_{5,t}. In contrast, total returns, *N*_{y}, are the adults that return to spawn in calendar year *y*, and computed as *N*_{y} = *R*_{4,y-4} + *R*_{5,y-5}. As explained below, recruitment is forecast from spawner abundance, and age 4 and age 5 recruits (from different brood years) are summed to estimate total returns in a given calendar year. Note that both recruitment and returns are computed as catch plus escapement plus en route loss, whereas spawner abundance is based on observations of escapement and egg production (27). Thus, both spawner abundance and recruitment account for the effects of catch, and the models we consider here focus just on the population dynamics of this system.

We investigate three environmental variables: the PDO, SST, and Fraser River discharge (Dataset S3). For the PDO, one annual time series is constructed as the average of monthly values from November to March (32). SST measures are monthly averages from two lighthouse stations (Entrance Island: April to June; Pine Island: April to July). River discharge is measured at Hope; we include peak daily flow and monthly averages (April to June). Fraser River sockeye salmon enter the ocean at age 2, so the environmental data are lagged 2 years to line up with ocean entry time.

### Attractor Reconstruction.

The goal of attractor reconstruction is to approximate the originating dynamic system using time series data. The simplest construction uses successive lags of a single time series (23, 44): given time series {*x*_{t}}, *E*-dimensional vectors *x*_{t} are composed of *E* lags of *x*, each separated by a time step

Generalizations of Takens’ theorem (24, 45) permit attractor reconstructions using multiple time series. For example, with {*x*_{t}} and {*y*_{t}} observed from the same system, one possible reconstruction forms vectors as

### Simplex Projection and S-Map.

Simplex projection estimates the trajectory (i.e., forecasts) of a novel system state by computing a weighted average of the trajectories of that state’s nearest neighbors (19). Given an attractor reconstruction, and a novel state *x*_{s}, we first find the *b* nearest neighbors (typically setting *b* = *E* + 1) that are closest to *x*_{s}: these neighbors are the vectors *x*_{n(s,i)}, where *n*(*s*,*i*) designates the time index of the *i*th closest neighbor to *x*_{s}. So, *x*_{n(s,1)} is the closest neighbor to *x*_{s}, *x*_{n(s,2)} is the second closest neighbor, etc. We then evolve the neighbors forward, and compute a weighted average of the forward evolutions (*h* time steps into the future) to estimate *x*_{s+h}:

The weights, *w*_{i}(*s*), are based on the distance between *x*_{s} and its *i*th neighbor, *x*_{n(s,i)}, scaled to the distance to the nearest neighbor: *d*(*x*_{s}, *x*_{t}) is the Euclidean distance between the vectors *x*_{s} and *x*_{t}.

In most cases, we desire forecasts of a scalar value rather than of the full system state. This is possible when the variable to be forecast, *y*, is an observation on the same dynamic system. As such, there will be a correspondence between *x*_{t} and the scalar value of *y*_{t}, and we can adjust Eq. **1** to compute a weighted average of the corresponding values of *y*:

The S-map procedure computes a local linear map between lagged-coordinate vectors and a target variable and is often used to test for nonlinear state dependence (22). It includes a tuning parameter, *θ*, that controls the weights associated with individual vectors: *θ* = 0 reduces the S-map to a linear autoregressive model of order *E*, whereas *θ* > 0 gives more weight to nearby states when computing the local linear map, thus allowing for nonlinear behavior. Following refs. 9 and 46, we test for nonlinearity by computing the decrease in forecast error (MAE) as *θ* is tuned to be greater than 0 (see *SI Appendix* for details).

### Model Descriptions.

We formulate EDM models to forecast recruitment from spawner abundance, combining age 4 and age 5 recruits (from different brood years) to estimate total returns in a given calendar year. Acknowledging the persistent 4-year quasicycle, the time series of recruits and spawners are scaled so that each cycle line has mean 0 and variance 1: *k* = 1, 2, 3, or 4, depending on cycle line and can be computed as *k* = 1 + ((*t* − 1) mod 4). *k*th cycle line.

The simple EDM model approximates the system state with 1 lag of the transformed spawner abundance:

Forecasts of the age 4 and age 5 recruits,

The multivariate models combine spawner data with up to two environmental indicators:

where

Following ref. 16, we use the standard Ricker model to estimate total recruitment and then partition it into age 4 and age 5 fish. Age 4 and age 5 fish from separate brood years are combined to forecast the number of returns:

where *p*_{4} is the average fraction of recruits that return as age 4 fish. The extended Ricker model is similar but includes an additional term in the exponent for an environmental covariate:

### Fitting Procedure and Performance Measures.

To avoid testing all combinations of (and overfitting) the environmental variables in the EDM model, we sequentially add the environmental variable that most improves forecast accuracy (*ρ*, the correlation between observed and predicted values). If none of the variables improves forecasts when added, then no further environmental variables are included. Thus, the best EDM model for some stocks may have only 0 or 1 environmental variable (Table 1). Similarly, for the extended Ricker model, we choose the environmental variable that gives the highest *ρ*.

The Ricker models were fit using R 3.0.2 (www.r-project.org/), the Rjags package (cran.r-project.org/web/packages/rjags/index.html), and JAGS 3.2.0 (Just Another Gibbs Sampler; mcmc-jags.sourceforge.net/) following the procedure outlined in ref. 16. Medians of the posterior distribution are used to obtain point estimates suitable for comparison. The EDM models were constructed using R 3.0.2 and the rEDM package (https://github.com/ha0ye/rEDM). The package can be installed with the following lines of R code:

`> library(devtools)``> install_github(“ha0ye/rEDM”)`

R scripts for the models can be found in Dataset S4. Data files can be found in Datasets S1–S3.

All forecasts are made using a fourfold cross-validation procedure. To quantify model performance, we use Pearson’s correlation coefficient (*ρ*) between observed and predicted returns as a measure of accuracy and MAE as a measure of error. Comparisons of *ρ* between models uses a one-sided *t* test with SE calculated using the HC4 estimator from (47) and with adjusted degrees of freedom as suggested by ref. 48. Improvement in MAE is computed using a one-sided paired *t* test for the difference, treating each forecast as an independent sample. To compute an aggregate statistic combining all nine stocks, we first scale the observations and predictions for each stock so that the observed returns have mean = 0, variance = 1, and then combine the normalized values across stocks. The comparisons of *ρ* and MAE are done using this combined set of observations and predictions (Fig. 4 and *SI Appendix*, Fig. S3).

## Acknowledgments

We thank Michael Fogarty, Terry Beacham, Brad Werner, Ethan Deyle, Charles Perretti, and two anonymous reviewers for their feedback on this work. This research is supported by a National Science Foundation Grant DEB-1020372 (to G.S. and H.Y.), Foundation for the Advancement of Outstanding Scholarship and Ministry of Science and Technology of Taiwan (C.-h.H.), National Science Foundation–National Oceanic and Atmospheric Administration Comparative Analysis of Marine Ecosystem Organization (CAMEO) Program Grant NA08OAR4320894/CAMEO (to G.S.), a National Science Foundation Graduate Research Fellowship (to H.Y.), the Sugihara Family Trust (G.S.), the Deutsche Bank–Jameson Complexity Studies Fund (G.S.), and the McQuown Chair in Natural Science (G.S.).

## Footnotes

- ↵
^{1}To whom correspondence may be addressed. Email: hye{at}ucsd.edu or gsugihara{at}ucsd.edu.

Author contributions: H.Y., S.M.G., C.-h.H., and G.S. designed research; H.Y. performed research; H.Y. analyzed data; and H.Y., R.J.B., S.M.G., S.C.H.G., C.-h.H., L.J.R., J.T.S., and G.S. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

See Commentary on page 3856.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1417063112/-/DCSupplemental.

Freely available online through the PNAS open access option.

## References

- ↵.
- Dixon PA,
- Milicich MJ,
- Sugihara G

- ↵.
- Perretti CT,
- Munch SB,
- Sugihara G

- ↵.
- Wood SN,
- Thomas MB

- ↵
- ↵.
- Beamish RJ,
- Schnute JT,
- Cass AJ,
- Neville CM,
- Sweeting RM

- ↵.
- Sugihara G, et al.

- ↵.
- Deyle ER, et al.

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Peterman RM,
- Dorner B

*Fraser River Sockeye Production Dynamics*(The Cohen Commission of Inquiry into the Decline of Sockeye Salmon in the Fraser River, Vancouver), Technical Report 10 - ↵.
- Cohen BI

*The Uncertain Future of Fraser River Sockeye*(Commission of Inquiry into the Decline of Sockeye Salmon in the Fraser River, Ottawa), Final Report, Vol 2 - ↵
- ↵.
- Grant SCH,
- Michielsens CGJ,
- Porszt EJ,
- Cass A

*Pre-season Run Size Forecasts for Fraser River Sockeye Salmon (Oncorhynchus nerka) in 2010*(Canadian Science Advisory Secretariat, Ottawa), DFO Canadian Science Advisory Secretariat Research Document 2010/042 - ↵.
- MacDonald BL,
- Grant SCH

*Pre-season Run Size Forecasts for Fraser River Sockeye Salmon (Oncorhynchus nerka) in 2012*(Canadian Science Advisory Secretariat, Ottawa), DFO Canadian Science Advisory Secretariat Research Document 2012/011 - ↵.
- Grant S,
- MacDonald B

*Pre-season Run Size Forecasts for Fraser River Sockeye (*Oncorhynchus nerka*) and Pink (*O. gorbuscha*) Salmon in 2013*(Canadian Science Advisory Secretariat, Ottawa), DFO Canadian Science Advisory Secretariat Research Document 145 - ↵
- ↵
- ↵
- ↵.
- Sugihara G

- ↵.
- Takens F

*Dynamical Systems and Turbulence*. Lecture Notes in Mathematics (Springer, Berlin Heidelberg), Vol 898, pp 366–381 - ↵
- ↵
- ↵
- ↵.
- Grant SCH, et al.

*Evaluation of Uncertainty in Fraser Sockeye (*Oncorhynchus nerka*) Wild Salmon Policy Status Using Abundance and Trends in Abundance Metrics*(Canadian Science Advisory Secretariat, Ottawa), DFO Canadian Science Advisory Secretariat Research Document 2011/087 - ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Beamish RJ,
- Neville CEM,
- Cass AJ

*Oncorhynchus nerka*) in relation to decadal-scale changes in the climate and the ocean. Can J Fish Aquat Sci 54(3):543–554 - ↵.
- Beacham TD, et al.

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Grant SCH,
- MacDonald BL

*Pre-season Run Size Forecasts for Fraser River Sockeye (*Oncorhynchus nerka*) and Pink (*O. gorbuscha*) Salmon in 2011*(Canadian Science Advisory Secretariat, Ottawa), DFO Canadian Science Advisory Secretariat Research Document 2011/134 - ↵
- ↵.
- Crutchfield JP,
- McNamara BS

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵

## Citation Manager Formats

### More Articles of This Classification

### Related Content

### Cited by...

- Locally embedded presages of global network bursts
- Data-driven discovery of partial differential equations
- Global environmental drivers of influenza
- Discovering governing equations from data by sparse identification of nonlinear dynamical systems
- Equation-free modeling unravels the behavior of complex ecological systems