Information flow between subspaces of complex dynamical systems
- Department of Mathematics and Center for Atmosphere and Ocean Science, Courant Institute of Mathematical Sciences, New York University, New York, NY 10012
-
Contributed by Andrew J. Majda, April 16, 2007 (received for review March 21, 2007)
Abstract
The quantification of information flow between subspaces in ensemble predictions for complex dynamical systems is an important practical topic, for example, in weather prediction and climate change projections. Although information transfer between dynamical system components is an established concept for nonlinear multivariate time series, the specific nature of the nonlinear dynamics generating the observed flow of information is ignored in such statistical analysis. Here, a general mathematical theory for information flow between subspaces in ensemble predictions of a dynamical system is developed, which accounts for the specific underlying dynamics. The results below also include potentially useful approximation strategies for practical implementation in dynamical systems with many degrees of freedom. Specific elementary examples are developed here with both stable and unstable dynamics to both illustrate facets of the theory and to test Monte Carlo solution strategies.
It has been known for several decades that the practical prediction of complex chaotic dynamical systems such as those occurring in weather and climate requires a statistical framework involving ensembles of solutions (1–3). Sophisticated strategies for statistical ensemble generation and prediction in complex dynamical systems have been developed and applied to weather and climate prediction (4–10).
Recently, concepts from information theory (11, 12) have been introduced as natural tools to quantify the uncertainty in ensemble predictions for complex dynamical systems (13–21). For extended range forecasting or climate change projections, it is important to quantify the flow of information between subspaces of such complex dynamical systems in an ensemble prediction. For example, one might be interested in the impact of the subspace spanned by the atmospheric variables on the subspace spanned by the ocean variables in a climate change prediction or the impact of the subspace spanned by the high frequency synoptic scale eddies on the subspace of low frequency teleconnection patterns in an extended range ensemble forecast for the atmosphere (20). It is anticipated that such quantification of information flow between subspaces of complex dynamical systems will also prove to be useful in other diverse scientific disciplines such as neurology, nanotechnology, material science, and environmental science whenever ensemble predictions for the future state of a system are needed.
Information transfer between dynamical system components is an important established concept for coherence analysis of nonlinear multivariate time series (22–24); however, the specific nature of the nonlinear dynamics generating the observed flow of information is ignored in such statistical analysis. Recently, Liang and Kleeman (25) have put forward a new way to look at information flow between the two components of two-dimensional systems which addresses the change in uncertainty of one component due to the specific dynamic interaction with the second component. Here, their results are systematically generalized to a theory for the information flow between subspaces of complex nonlinear dynamical systems with many degrees of freedom. In addition, several potentially practical estimation strategies for information flow for large dimensional complex nonlinear dynamical systems are developed here (16, 19, 21) as well as simple examples, illustrating various new phenomena. A completely different generalization of the work in ref. 25 is reported by those authors elsewhere (26, 27).
Formulation and Basic Theory
Consider the state variable z for a nonlinear for dynamical system so that z = (x, y) where x ∈ ℝN, y ∈ ℝM are variables spanning two different subspaces. Here we assume that the dynamical system for z can be written as the equations
where σσT = Q ≥ 0 is a nonnegative symmetric M × M covariance matrix and Ẇ is M-dimensional white noise. Below, we consider the information flow in an ensemble prediction from the y variables to the x variables with noise to represent the role of additional unresolved processes on the dynamical system (12). The results below also apply to the important special case without noise, i.e., σ ≡ 0.
An ensemble prediction for Eq. 1 defined through the probability density p(x, y, t) satisfies the Fokker–Planck equation
where “∇x·()” and “∇y·()” denote divergences in x and y subspaces, respectively. Given a probability density, p, the absolute uncertainty in p is given by the entropy (11, 12)
whereas the expected value of a random variable g with respect to p is denoted by 〈g〉p = ∫gp. For a probability density p(x, y), the marginal probability distribution p
1(x) and the conditional distribution p(y|x) are defined as
with ∫ p(y|x)dy = 1.
First, we state an intuitive definition for information flow between subspaces and then we translate this definition into a precise mathematical statement and supply an exact formula for the evolution of this quantity
Definition 1.
For the statistical solution p(x, y, t) satisfying Eq. 2, the information flow at time t from the subspace spanned by y to the subspace spanned by x, T y→x(p), is the time rate of change of the difference in the absolute uncertainty in the x variables due to the dynamic information in the y variable.
The absolute uncertainty in the x variables alone at time t is given by H(p
1) where p
1(x, t) is the marginal distribution of p(x, y, t). To isolate the dynamic information at time t in the y variables, consider the dynamical system related to Eq. 1 where only the x variables evolve nontrivially, i.e., dx/dt = F1, dy/dt = 0; the corresponding Fokker–Planck equation for a probability density p*(x, y, t) for t > τ in this related system with initial density p*(x, y, τ) = p(x, y, τ) is given by
It is easy to see that p*(x, y, t) = p*(x| y, t) p
2(y, τ) where p
2(y, τ) = ∫ p(x, y, τ) dx is the y marginal and also p*(x| y, t) satisfies the same Eq. 5. Thus, p*(x, y, t) for t > τ expresses the ensemble evolution which could occur through the x variables alone with the probability distribution for the y variables frozen at the density p
2(y, τ). The quantity H(p*) for t > τ measures the absolute uncertainty in the x variables due to the x dynamics alone. With these preliminaries, following ref. 25, the information flow at time t from the subspacs y to the subspace x is given by
The following is a general theoretical formula which takes into account the specific nonlinear dynamics in Eq. 1 in the information flow
Theorem 1.
Note that the information flow is a flux into the x variables averaged with respect to the conditional distribution p(y|x, t). The proof of this Theorem is a direct calculation. First, we readily compute
Second, we calculate that
The difference between Eqs. 9 and 8, decomposed through Eq. 4, yields the formula in Eq. 7. Note that, in the final equality Eq. 9, the integration by parts and Eq. 4 have been used.
Next, assume that the nonlinearity, F
1(x, y) has the decomposition
From Eq. 4, it follows that
Thus, the following result is immediate that confirms our intuition about information flow (25):
Proposition 2.
If F1 has the decomposition in Eq. 10, then F11(x) does not contribute to the information flow, Ty→x(p).
An equilibrium distribution p eq(x, y) is a probability density which is a time-independent solution of the Fokker–Planck equation in Eq. 2. We have the following general fact:
Proposition 3.
If peq(x, y) factors into the product distribution, peq(x, y) = peq,1(x)peq,2(y), then there is no information flow at equilibrium, i.e., Ty→x(p) = 0.
Under the assumption in Proposition 3, we have
So, Proposition 3 follows immediately and the intuition is confirmed that if the equilibrium distribution factors into a product in the x and y variables, then there is no information flow.
The next set of identities for information flow are useful for developing practical computational strategies. First, assume
that with a smooth function φ(x, t), the marginal distribution p
1(x, t) has the form
and that the nonlinear function F
1 has the decomposition in Eq. 10. Under these assumptions, we have the following identity for information flow
Proposition 4.
The result in Eq. 12 follows immediately from Proposition 2 and the following calculus identity using Eqs. 11 and 12
There is a formula similar to Eq. 12 under the assumption that the conditional probability distribution satisfies a parallel assumption as in Eq. 11, i.e., there is a smooth function ψ(x, y, t) so that
Under the assumption in Eq. 13, the following formula is valid for the information flow
Proposition 5.
First, one integrates by parts in Eq. 7 and then uses the form in Eq. 13 with a similar calculus identity as developed below Eq. 11 to establish Eq. 14. To illustrate the potential use of Proposition 4, assume that the marginal distribution is Gaussian, i.e., assume
where Γ(t) is the covariance matrix, then from Proposition 4
Prototype Geophysical Models and Information Flow Relative to the Climatology
One of the important potential applications of the above theory is to information flow in ensemble predictions for weather
prediction and climate change projection. As discussed, for example, in refs. 11 and 12, the dynamical core of atmosphere/ocean general circulation models as well as a large family of intermediate complexity models
can be written as quadratically nonlinear dynamical systems,
where L and B satisfy
and thus conserve energy, and D and F represent damping and forcing. Introduce the decomposition, z = (x, y) and assume special random forcing and dissipation in the y variable alone as in Eq. 1 to define the prototype system
Here, d ≥ 0 is dissipation in the y variables. Besides Eq. 17, it is natural to assume (see refs. 11 and 12) that the nonlinear terms satisfy
There are many examples of geophysical systems satisfying Eqs. 17 and 19, including all of the multilayer models in section 1.6 of refs. 11 and 12 as well as important geophysical test models such as the Lorenz-96 model (11, 12, 19, 21), the truncated Burgers-Hopf and the Kruskal-Zabusky models (11, 12, 17). First, ignore the dissipation and forcing in Eq. 18; then it follows from Eqs. 17 and 19 that the system in Eq. 18 has the family of Gaussian invariant measures
which clearly satisfy Proposition 3 above. Furthermore, it was shown in section 6 of ref. 28 that Eq. 20 remains as the invariant measure with appropriate dissipation and white noise forcing.
The relative entropy of a probability density p with respect to the prior density π is given by the nonnegative functional
It measures the average lack of information in π relative to p (see refs. 11 and 12); Kleeman (15) has introduced the relative entropy of an ensemble prediction p, relative to the climatology, π, as a measure of utility in an ensemble prediction and this notion has been developed extensively
(11, 12, 16–18, 20, 21). In the present context, the relative information in the marginal compared with the marginal of the climatology in Eq. 20 is the object of interest,
In the present models, Eqs. 8 and 19 guarantee dH(p*)/dt = 0, whereas
is the amount of the energy which resides in the subspace spanned by x at time t during the ensemble prediction. With Eqs. 2, 17, 18, 19, and 23, one readily calculates the identity
for the rate of energy transfer from the y variables. Using all of the facts below Eq. 22, we have the following identity for information flow relative to the climatology
Proposition 6.
Thus, under the hypothesis of Eq. 18, we have directly related information flow relative to the climatology in the variables x in an ensemble prediction to T
y→x(p) and the rate of energy transfer E
y→x(p) for the ensemble prediction. Clearly, the optimal value of σ in Eq. 20 for comparison with the ensemble simulation is the value with the same mean energy. Proposition 6 relates the information flow, T
y→x(p), to the time rate of change of utility in an ensemble forecast and serves as a starting point for further generalizations.
An Instructive Model Demonstrating Information Flow
Here, we briefly illustrate several facets of information flow in an ensemble prediction for the 2 × 2 system of linear stochastic
equations
With a Gaussian initial data, the above linear equation has a Gaussian solution for all times that is readily computed exactly
by explicit formulas (29). Furthermore, Proposition 4 and Eq. 15 apply for all times so that the information flow in these examples is given by
with Γ12(t) = 〈(x − 〈x〉p)(y − 〈y〉p)〉p the cross-covariance of x and y and Γ11(t) = 〈(x − 〈x〉p)2〉p the variance of x.
Below, the value of the damping coefficient γ is set to one, whereas the noise level is fixed by σ = 0.2; the parameters α
and β are varied systematically so that the mean of the solution of Eq. 26 exhibits complete variety of stable and unstable behavior ranging from a saddle to stable and unstable nodes to stable and
unstable spirals to a Hamiltonian center. In each of these six cases, an ensemble prediction is generated through a 1,000
member ensemble with initial ensemble mean (10, 10) and uncorrelated unity covariance matrix. In the ensemble prediction, the information transfer in Eq. 27 is approximated by the Monte Carlo formula
with μx and μy denote the Monte Carlo ensemble average over variables x and y for ensemble size of K = 1,000, respectively.
The comparison of the exact analytic formulas and the Monte-Carlo ensemble simulation of information transfer for the six cases are depicted in Figs. 1 and 2. Fig. 1 Upper shows the information transfers as functions of time from left to right for a saddle, stable node, and unstable node, whereas Fig. 2 Upper is for a stable spiral, unstable spiral, and Hamiltonian center. Figs. 1 Lower and 2 Lower show the ensemble average of the dynamical variables x and y as functions of time. The overall trend of the information transfer in all of these diverse cases is captured very well by the Monte Carlo simulations; this is even true for the cases with a positive Lyapunov exponent and no invariant measure involving the saddle, unstable node, and unstable spiral even though the mean has grown to the respective values 2 × 109, 2 × 105, and 2 × 102 on the time interval depicted.
Nonoscillatory model. (Left) Solution with fixed point of saddle type. (Center) Stable node. (Right) Unstable node. (Upper) The information flow T y→x as a function of time. (Lower) Ensemble mean states μx(t) in solid and μy(t) in dashes, both, as functions of time.
Oscillatory model. (Left) Solution with fixed point of stable spiral. (Center) Unstable spiral. (Right) Center. (Upper) The information flow T y[arrow]x as a function of time. (Lower) Ensemble mean states μx(t) in solid and μy(t) in dashes, both, as functions of time.
There is no oscillation in the mean in the cases in Fig. 1, whereas there is decaying, amplifying, and neutral oscillation of the mean state, respectively, in Fig. 2, and such oscillatory behavior is clearly reflected in the information transfer T y→x(p). More interestingly, the frequency of the oscillation in the information flow roughly doubles the frequency of the mean state (or signal). This is easy to understand because the random fields x(t) and y(t) from the explicit solution both contain pieces with the same temporal frequency ω; thus, the quadratic covariance from the information transfer in Eq. 27 has a contribution with frequency 2ω, i.e., one half the period. Finally, one should note that in these examples, especially for those with equilibrium density p eq (e.g., the stable node, the stable spiral, and the Hamiltonian center), the information flow never converges to zero as suggested in Proposition 3 because the cross-covariance between x and y is nonzero at equilibrium.
Systematic Approximations for Information Flow
A theory for information flow in ensemble predictions has been outlined above but the practical utility of such ideas rests on the ability to develop practical approximate algorithms for information flow. This is the topic here. Two general methods are proposed below.
The first method builds on the general formalism in Eq. 11 and Proposition 4 for information flow. First, assume that the general marginal distribution, p
1(x,t), is approximated by
with
where φJ(x, t) is chosen to be the least biased probability density which matches the mean and higher moments of p
1(x, t) up to order J. This is the strategy used in refs. 16, 17, and 19–21 for computing relative information in ensemble predictions, and there are efficient algorithms to compute this rapidly provided
x is one or two-dimensional (19–21, 30); for the prototype geophysical models in Eq. 16, the formula in Eq. 12 of Proposition 4 can be evaluated explicitly using only ensemble moments. The simplest version of this algorithm occurs when J = 2 so that φJ(x, t) defines a Gaussian approximation; the needed approximate formula is already listed in general in Eq. 15 and of course, the advantage of the Gaussian approximation is that it applies when x has an arbitrary number of variables. For the truncated Burgers–Hopf model, it has been established (see ref. 17) that, surprisingly, typical ensemble predictions are nearly Gaussian so for this system, such an approximation would be
excellent. Alternatively, the same approximation strategy can be applied to the conditional distribution p(y| x, t) by using suitable similar approximations to ψ in Eq. 13 and Proposition 4.
The second strategy for computing information flow in an ensemble prediction involves asymptotic methods that exploit the
disparity in time scales between the x and y processes for special systems such as
with ε ≪ 1. In this case, one assumes ergodicity and a unique smoothly varying invariant measure, p
eq(y|x), for the reduced system
with τ = t/ε and x regarded as a parameter.
With these and other suitable assumptions, it is not too difficult to establish rigorously (see ref. 31 for examples) that as ε → 0 for any positive time,
so in this setting for ε ≪ 1, the information transfer is given approximately by
The battery of approximate procedures discussed in the previous paragraph can be applied to p
eq(y|x) in the present context. A more attractive computational option would involve using “on the fly” computational procedures
for Eq. 28 (see ref. 32).
Concluding Discussion
A systematic mathematical theory for information transfer between subspaces in ensemble predictions of nonlinear dynamical systems has been developed here including potential practical computational strategies and a simple illustrative test example. Further comprehensive numerical tests are needed to demonstrate the practical utility of calculating information transfer in realistic complex dynamical systems.
Acknowledgments
The research of A.J.M. is partially supported by the National Science Foundation, the Office of Naval Research, and the Defense Advanced Research Projects Agency. J.H. is supported as a postdoctoral fellow through the last two agencies.
Footnotes
- †To whom correspondence should be addressed. E-mail: jonjon{at}cims.nyu.edu
-
Author contributions: A.J.M. designed research; A.J.M. and J.H. performed research; and A.J.M. and J.H. wrote the paper.
-
The authors declare no conflict of interest.
- © 2007 by The National Academy of Sciences of the USA







