# Discrete approach to stochastic parametrization and dimension reduction in nonlinear dynamics

See allHide authors and affiliations

Contributed by Alexandre J. Chorin, June 19, 2015 (sent for review March 28, 2015; reviewed by Xiantao Li and Juan Restrepo)

## Significance

Many problems in science are too large and/or too complex to be fully analyzed by standard methods of computation. We present an approach where such problems are treated statistically, and the statistical analysis and the equations actually solved are discrete rather than continuous. We connect our approach to well-known results in statistical physics, and demonstrate its effectiveness in a widely used test problem.

## Abstract

Many physical systems are described by nonlinear differential equations that are too complicated to solve in full. A natural way to proceed is to divide the variables into those that are of direct interest and those that are not, formulate solvable approximate equations for the variables of greater interest, and use data and statistical methods to account for the impact of the other variables. In the present paper we consider time-dependent problems and introduce a fully discrete solution method, which simplifies both the analysis of the data and the numerical algorithms. The resulting time series are identified by a NARMAX (nonlinear autoregression moving average with exogenous input) representation familiar from engineering practice. The connections with the Mori–Zwanzig formalism of statistical physics are discussed, as well as an application to the Lorenz 96 system.

There are many time-dependent problems in science, where the equations of motion are too complex for full solution, either because the equations are not certain or because the computational cost is too high, but one is interested only in the dynamics of a subset of the variables. Such problems appear, for example, in weather and climate modeling, e.g., refs. 1, 2; in economics, e.g., ref. 3; in statistical mechanics, e.g., refs. 4, 5; and in the mechanics of turbulent flow, e.g., refs. 6, 7. In these settings it is natural to look for simpler equations that involve only the variables of interest, and then use data to assess the effect of the remaining variables on the variables of interest in some approximate way. In the present paper we focus on stochastic methods for doing this, with data coming either from experimental observations or from prior numerical calculations.

Consider a set of differential equations of the form*t* is the time, *x*, assumed here to be observed with negligible observation errors. One can write *x* and must be taken into account. It has been variously called “unresolved tendency” (8, 9), “model error” (10, 11), or “model noise” (12); we call it simply the “noise.” In the present paper we do not discuss general methods for constructing

A usual approach to the problem of computing *x* is to identify *z* from data (8, 13, 14), i.e., find a concise approximate representation *z* that can be evaluated on the computer, and then solve the equation**4** is an instance of a dimension-reduced equation, in the sense that it has fewer variables than Eq. **1**. However, this approach has some difficulties. In general the available data are measurements of *x*, not of *z*; to find *z* so that it can be identified one has to use Eq. **2**, and in particular differentiate *x* numerically, which is generally impractical or inaccurate because *z* may have high-frequency components or fail to be sufficiently smooth, and because the data may not be available at sufficiently small time intervals (an illuminating analysis in a special case can be found in refs. 15, 16). If one can successfully identify *z*, Eq. **4** generally becomes a nonlinear stochastic differential system, where in general *x* and

Here we take a different approach. We note that Eqs. **3** and **4** are always solved on the computer, i.e., in discrete form, that the data are always given at a discrete collection of points, and that one wishes to determine *x* but in general one is not interested in determining *z*. We can therefore avoid the difficult detour through a continuous *z* followed by a discretization, by working entirely in a discrete setting as follows. We can pick once and for all a particular accurate discretization of Eq. **3** with a particular time step *δ*,*n* indexes the result after *n* steps. As in the continuous case, this equation has to be corrected to account for the impact of the unresolved variables, and here also for the possible inaccuracy of the numerical scheme. We use the data to identify the discrepancy sequence, *x* data without approximation. This is equivalent to identifying the following reduced system:

We assume, as one does in the continuous case, that the system under consideration is ergodic, so that its long-time statistics are stationary. The sequence *x* as an exogenous input. This representation makes it possible to integrate the numerical scheme into the reduced equations, and to take into account efficiently the non-Markovian features of the reduced system as well as model and numerical errors. There is no stochastic differential system to solve. It is important to note that identifying *z*. The question, in what sense does *z*, is not relevant, because the goal is to calculate *x*, and *z* if Eq. **3** can be effectively approximated by a first-order Euler scheme. For practical purposes, the discrete stochastic parametrization accomplishes everything that would be accomplished by a successful continuous parametrization followed by an accurate approximation.

## NARMAX Representation

We represent the discrete-time process **5**] as a time series in the form of a variant of the NARMAX representation (19⇓⇓–22). The generality of the NARMAX approach will be increasingly important as model reduction methods are applied to more complex problems. To simplify notation, from now on we drop the subscript *δ* in *z*.

Eq. **5** becomes**7** has been written as if Eq. **6** were scalar. This is the NARMAX representation of *x* and *z*. In Eq. **7**, the terms in *z* are the autoregression part of order *p*, the terms in *ξ* are the moving average part of order *q*. Note that if we substitute **7**] and the second equation in [**6**], we obtain a NARMAX representation for *x*. A suitable choice of the functions **3**.

To implement the NARMAX representation, one has to determine its structure and estimate the parameters. Although NARMAX has been widely studied (see, e.g., refs. 19, 23⇓⇓–26 and references therein), one should use the earlier work with caution, especially in the detection of structure by least-squares–based methods, because in the standard NARMAX, unlike here, the exogenous process is independent of the noise process. Suppose one has selected a structure, that is, chosen the functions **7**]. Because the representation is linear in the parameters **6**]. Once the values of

We remark that for the above Gaussian likelihood, the MLE is the same as the least-squares estimator for the parameters, which has been widely used in control (23, 24). As shown in ref. 23, the above estimation procedure can be made recursive; also, the noise sequence need not be Gaussian.

The detection of the representation’s structure, however, is less straightforward, as is well-known (19, 21). Because in our problem the exogenous processes are not independent of the noise, popular techniques such as orthogonal least-squares and error reduction ratios (see, e.g., ref. 19 and references therein), do not work. We use the following criteria for selecting the structure of the representation: (*i*) it should fit the long-term statistics of the resolved variables, such as the mean and autocorrelation function; (*ii*) as the size of the data increases, the estimated parameters should converge; and (*iii*) the estimated parameters should reflect features of the resolved variables, such as symmetry properties.

It is of interest to relate the NARMAX representation to the Mori–Zwanzig (MZ) formalism (4, 5, 27, 28). The overall setting in the MZ representation is the same as here: one has data *α* for the *x* variables, and one samples data *β* for *y* from a given initial probability density function (pdf). The MZ formalism creates the approximation **3** by conditional expectation:*a* with respect to the initial measure given *b*. When the system is ergodic and the initial pdf for *β* is the invariant measure conditioned by *α*, this is the best least-squares approximation of *x*. The MZ formalism then yields an expression for **2** as a sum of a noise and a non-Markovian memory/dissipation term, corresponding to **7**; note that in [**3**] *z* to take the memory into account by including information from earlier steps.

Once the initial data

The evaluation of the various terms in the MZ formalism is difficult; as far as we know, there is only one case in the literature (29) where it has been successfully carried out in full for a nonlinear problem that is not completely trivial. The MZ formalism is a useful starting point for analytic approximations (29, 30), but it is difficult to use it to construct reduced models from data when suitable analytic approximations are not available. The formalism here is a more tractable way to use data for dimension reduction, and generalizes the MZ formalism to a broader class of approximations.

There is a large literature on data-based dimension reduction. In refs. 8, 31, the noise *z* is represented as the sum of an approximating polynomial in *x* obtained by regression and a one-step autoregression; the details are in the next section where we compare our results to those in refs. 8, 31. The shortcomings of this representation as a general tool are that it does not allow *z* to remember past values of *x*, and that the autoregression term is not necessarily small, making it difficult to solve the equations accurately. Furthermore, in refs. 8, 31, numerical values of a continuous *z* are obtained by finite differences. In refs. 9, 14, the noise is represented by a conditional Markov chain that depends on both current and past values of *x*; the Markov chain is deduced from data by binning and counting, assuming that exact observations of *z* are available. It should be noted that the Markov chain representation is intrinsically discrete, making this work close to ours in spirit. In ref. 32 the noise is treated as continuous and represented in a form that is partly analogous to the NARMAX representation once one translates from the continuum to the grid. An earlier construction of a reduced approximation can be found in ref. 12, where the approach was not yet fully discrete. Other interesting related work can be found in refs. 33⇓⇓–36.

## Lorenz 96 Equations

The Lorenz 96 equations (37) are a set of chaotic differential equations that is often used as a metaphor for the atmosphere. It has been widely used as a test bench for various dimension reduction and stochastic parametrization methods (8, 9, 14, 31, 38). Following ref. 14, we use the following formulation of the equations:*ε* measures the time-scale separation between the resolved variables

In the following section we present numerical results produced by our NARMAX scheme and compare them to those in ref. 8 labeled POLYAR (polynomial regression and autoregression). We do not compare with the results of ref. 14 because they require exact observations of *z*. In ref. 8, the continuous *z* is estimated by finite differences:**9**], and then **8**] by a fourth-order Runge–Kutta method, with

In the NARMAX scheme, we use the representation [**7**], choosing one of the functions **3**] and the others to be powers of *x*. This connects the numerical scheme with the representation of the noise. We select the structure and estimate the parameters as described earlier. The parameters are the same for each spatial component, reflecting the spatial symmetry of the equation. Each component of *z* remembers only its own past and the past of the corresponding component of *x*. The term **7** becomes

## Numerical Results

In the numerical runs, we generate data for

For the POLYAR equations of [**8**], a fifth-order polynomial is used for both observation settings. Increasing the order further produces small coefficients for the higher degree terms, which do not reduce the variance of noise. The estimated parameters, i.e., the coefficients of the polynomial and the parameters for the autoregression, for the first component *x* are shown in Table 1.

For the NARMAX equation [**10**], we estimated

First, we compare the statistics of the solutions generated by the two reduced systems with the statistics of the full system. We integrate the reduced equations in both cases and obtain values at *Top Left*). In the sparser data case *Bottom Left*). Table 3 displays the mean, the SD, and the Kolmogorov–Smirnov statistic that compare the cumulative distributions of the full Lorenz 96 system with that of the reduced equations. The NARMAX system is more accurate than the POLYAR system in both cases. The ACF and CCF are well reproduced by both reduced systems when *Middle, Top Right*). When *Middle, Bottom Right*).

We now investigate how well a reduced system predicts the behavior of the full system by calculating mean trajectories of the reduced systems and comparing them with a true trajectory of the full Lorenz 96 system, as follows. First we integrate the full Lorenz 96 system for *η* in POLYAR and *ξ* in NARMAX. We do not introduce artificial perturbations into the initial conditions, because the exact initial conditions for *x* are known, and by initializing *η* and *ξ* from data, we preserve the memory of the system so as to generate better ensemble trajectories. We then average the solutions of the reduced equations in each subinterval and compare these averages with the trajectories of the full system by calculating the root-mean-square error (RMSE) and anomaly correlation (ANCR) between them; the former measures the average difference between trajectories whereas the latter measures the average correlation between them; the formulas and their rationale can be found in ref. 14.

Results for RMSE and ANCR are shown in Fig. 2, where we use

## Conclusions

We have presented a discrete approach to data-based dimension reduction and stochastic parametrization in which the problem is consistently treated as discrete, obviating earlier difficulties in estimating noise from measurements and in approximating reduced continuum equations. Within this discrete approach, we have identified the reduced system via the NARMAX representation. This generalizes earlier work, in particular by making it easier to include memory effects in full. We have tested the resulting algorithm on the Lorenz 96 system of equations, often used as a test bench for dimension reduction schemes; our construction did better than earlier work in reproducing the dynamics and the long-range statistics of the variables of interest, most conspicuously in a problem where the data were sparse. We related our representation to the MZ formalism and suggested that our methods can be used to construct simpler data-based analogs of this formalism. We expect the advantages of our modeling to become even more marked as it is applied to increasingly complex problems.

## Acknowledgments

The authors thank the referees, as well as Prof. Jonathan Goodman, for reading the manuscript carefully and for their helpful suggestions, and Dr. Matthias Morzfeld, Prof. Kevin Lin, Prof. Xuemin Tu, and Prof. Robert Miller for helpful comments and good advice. This work was supported in part by the Director, Office of Science, Computational and Technology Research, US Department of Energy, under Contract DE-AC02-05CH11231, and by the National Science Foundation under Grants DMS-1217065 and DMS-1419044.

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. Email: chorin{at}math.berkeley.edu.

Author contributions: A.J.C. and F.L. designed research, performed research, contributed new analytic tools, analyzed data, and wrote the paper.

Reviewers: X.L., Pennsylvania State University; and J.R., Oregon State University.

The authors declare no conflict of interest.

## References

- ↵.
- Kalnay E

- ↵.
- Wilks DS

- ↵.
- Zeng Y,
- Wu S

- ↵.
- Evans D,
- Morris G

- ↵.
- Chorin AJ,
- Hald OH

- ↵.
- Bernard P,
- Wallace J

- ↵.
- Chorin AJ

- ↵
- ↵.
- Kwasniok F

- ↵.
- Harlim J

- ↵.
- Berry T,
- Harlim J

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Kloeden PE,
- Platen E

- ↵.
- Milstein GN,
- Tretyakov MV

- ↵.
- Billings SA

- ↵.
- Brockwell P,
- Davis R

- ↵.
- Fan J,
- Yao Q

- ↵.
- Hamilton JD

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Zwanzig R

- ↵
- ↵
- ↵.
- Arnold HM,
- Moroz IM,
- Palmer TN

- ↵
- ↵.
- Chekroun MD,
- Kondrashov D,
- Ghil M

- ↵
- ↵
- ↵
- ↵.
- Lorenz EN

*Proc ECMWF Seminar on Predictability*(European Centre for Medium-Range Weather Forecasts, Shinfield Park, Reading, Berkshire, UK), Vol 1, pp 1–18 - ↵

## Citation Manager Formats

## Article Classifications

- Physical Sciences
- Applied Mathematics