New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Information flow between subspaces of complex dynamical systems

Contributed by Andrew J. Majda, April 16, 2007 (received for review March 21, 2007)
Abstract
The quantification of information flow between subspaces in ensemble predictions for complex dynamical systems is an important practical topic, for example, in weather prediction and climate change projections. Although information transfer between dynamical system components is an established concept for nonlinear multivariate time series, the specific nature of the nonlinear dynamics generating the observed flow of information is ignored in such statistical analysis. Here, a general mathematical theory for information flow between subspaces in ensemble predictions of a dynamical system is developed, which accounts for the specific underlying dynamics. The results below also include potentially useful approximation strategies for practical implementation in dynamical systems with many degrees of freedom. Specific elementary examples are developed here with both stable and unstable dynamics to both illustrate facets of the theory and to test Monte Carlo solution strategies.
It has been known for several decades that the practical prediction of complex chaotic dynamical systems such as those occurring in weather and climate requires a statistical framework involving ensembles of solutions (1–3). Sophisticated strategies for statistical ensemble generation and prediction in complex dynamical systems have been developed and applied to weather and climate prediction (4–10).
Recently, concepts from information theory (11, 12) have been introduced as natural tools to quantify the uncertainty in ensemble predictions for complex dynamical systems (13–21). For extended range forecasting or climate change projections, it is important to quantify the flow of information between subspaces of such complex dynamical systems in an ensemble prediction. For example, one might be interested in the impact of the subspace spanned by the atmospheric variables on the subspace spanned by the ocean variables in a climate change prediction or the impact of the subspace spanned by the high frequency synoptic scale eddies on the subspace of low frequency teleconnection patterns in an extended range ensemble forecast for the atmosphere (20). It is anticipated that such quantification of information flow between subspaces of complex dynamical systems will also prove to be useful in other diverse scientific disciplines such as neurology, nanotechnology, material science, and environmental science whenever ensemble predictions for the future state of a system are needed.
Information transfer between dynamical system components is an important established concept for coherence analysis of nonlinear multivariate time series (22–24); however, the specific nature of the nonlinear dynamics generating the observed flow of information is ignored in such statistical analysis. Recently, Liang and Kleeman (25) have put forward a new way to look at information flow between the two components of twodimensional systems which addresses the change in uncertainty of one component due to the specific dynamic interaction with the second component. Here, their results are systematically generalized to a theory for the information flow between subspaces of complex nonlinear dynamical systems with many degrees of freedom. In addition, several potentially practical estimation strategies for information flow for large dimensional complex nonlinear dynamical systems are developed here (16, 19, 21) as well as simple examples, illustrating various new phenomena. A completely different generalization of the work in ref. 25 is reported by those authors elsewhere (26, 27).
Formulation and Basic Theory
Consider the state variable z for a nonlinear for dynamical system so that z = (x, y) where x ∈ ℝ^{N}, y ∈ ℝ^{M} are variables spanning two different subspaces. Here we assume that the dynamical system for z can be written as the equations where σσ^{T} = Q ≥ 0 is a nonnegative symmetric M × M covariance matrix and Ẇ is Mdimensional white noise. Below, we consider the information flow in an ensemble prediction from the y variables to the x variables with noise to represent the role of additional unresolved processes on the dynamical system (12). The results below also apply to the important special case without noise, i.e., σ ≡ 0.
An ensemble prediction for Eq. 1 defined through the probability density p(x, y, t) satisfies the Fokker–Planck equation where “∇_{x}·()” and “∇_{y}·()” denote divergences in x and y subspaces, respectively. Given a probability density, p, the absolute uncertainty in p is given by the entropy (11, 12) whereas the expected value of a random variable g with respect to p is denoted by 〈g〉_{p} = ∫gp. For a probability density p(x, y), the marginal probability distribution p _{1}(x) and the conditional distribution p(yx) are defined as with ∫ p(yx)dy = 1.
First, we state an intuitive definition for information flow between subspaces and then we translate this definition into a precise mathematical statement and supply an exact formula for the evolution of this quantity
Definition 1.
For the statistical solution p(x, y, t) satisfying Eq. 2 , the information flow at time t from the subspace spanned by y to the subspace spanned by x, T _{y→x}(p), is the time rate of change of the difference in the absolute uncertainty in the x variables due to the dynamic information in the y variable.
The absolute uncertainty in the x variables alone at time t is given by H(p _{1}) where p _{1}(x, t) is the marginal distribution of p(x, y, t). To isolate the dynamic information at time t in the y variables, consider the dynamical system related to Eq. 1 where only the x variables evolve nontrivially, i.e., dx/dt = F_{1}, dy/dt = 0; the corresponding Fokker–Planck equation for a probability density p*(x, y, t) for t > τ in this related system with initial density p*(x, y, τ) = p(x, y, τ) is given by It is easy to see that p*(x, y, t) = p*(x y, t) p _{2}(y, τ) where p _{2}(y, τ) = ∫ p(x, y, τ) dx is the y marginal and also p*(x y, t) satisfies the same Eq. 5 . Thus, p*(x, y, t) for t > τ expresses the ensemble evolution which could occur through the x variables alone with the probability distribution for the y variables frozen at the density p _{2}(y, τ). The quantity H(p*) for t > τ measures the absolute uncertainty in the x variables due to the x dynamics alone. With these preliminaries, following ref. 25, the information flow at time t from the subspacs y to the subspace x is given by
The following is a general theoretical formula which takes into account the specific nonlinear dynamics in Eq. 1 in the information flow
Theorem 1.
Note that the information flow is a flux into the x variables averaged with respect to the conditional distribution p(yx, t). The proof of this Theorem is a direct calculation. First, we readily compute Second, we calculate that The difference between Eqs. 9 and 8 , decomposed through Eq. 4 , yields the formula in Eq. 7 . Note that, in the final equality Eq. 9 , the integration by parts and Eq. 4 have been used.
Next, assume that the nonlinearity, F _{1}(x, y) has the decomposition From Eq. 4 , it follows that Thus, the following result is immediate that confirms our intuition about information flow (25):
Proposition 2.
If F_{1} has the decomposition in Eq. 10 , then F_{11}(x) does not contribute to the information flow, T_{y→x}(p).
An equilibrium distribution p _{eq}(x, y) is a probability density which is a timeindependent solution of the Fokker–Planck equation in Eq. 2 . We have the following general fact:
Proposition 3.
If p_{eq}(x, y) factors into the product distribution, p_{eq}(x, y) = p_{eq,1}(x)p_{eq,2}(y), then there is no information flow at equilibrium, i.e., T_{y→x}(p) = 0.
Under the assumption in Proposition 3, we have So, Proposition 3 follows immediately and the intuition is confirmed that if the equilibrium distribution factors into a product in the x and y variables, then there is no information flow.
The next set of identities for information flow are useful for developing practical computational strategies. First, assume that with a smooth function φ(x, t), the marginal distribution p _{1}(x, t) has the form and that the nonlinear function F _{1} has the decomposition in Eq. 10 . Under these assumptions, we have the following identity for information flow
Proposition 4.
The result in Eq. 12 follows immediately from Proposition 2 and the following calculus identity using Eqs. 11 and 12
There is a formula similar to Eq. 12 under the assumption that the conditional probability distribution satisfies a parallel assumption as in Eq. 11 , i.e., there is a smooth function ψ(x, y, t) so that Under the assumption in Eq. 13 , the following formula is valid for the information flow
Proposition 5.
First, one integrates by parts in Eq. 7 and then uses the form in Eq. 13 with a similar calculus identity as developed below Eq. 11 to establish Eq. 14 . To illustrate the potential use of Proposition 4, assume that the marginal distribution is Gaussian, i.e., assume where Γ(t) is the covariance matrix, then from Proposition 4
Prototype Geophysical Models and Information Flow Relative to the Climatology
One of the important potential applications of the above theory is to information flow in ensemble predictions for weather prediction and climate change projection. As discussed, for example, in refs. 11 and 12, the dynamical core of atmosphere/ocean general circulation models as well as a large family of intermediate complexity models can be written as quadratically nonlinear dynamical systems, where L and B satisfy and thus conserve energy, and D and F represent damping and forcing. Introduce the decomposition, z = (x, y) and assume special random forcing and dissipation in the y variable alone as in Eq. 1 to define the prototype system Here, d ≥ 0 is dissipation in the y variables. Besides Eq. 17 , it is natural to assume (see refs. 11 and 12) that the nonlinear terms satisfy
There are many examples of geophysical systems satisfying Eqs. 17 and 19 , including all of the multilayer models in section 1.6 of refs. 11 and 12 as well as important geophysical test models such as the Lorenz96 model (11, 12, 19, 21), the truncated BurgersHopf and the KruskalZabusky models (11, 12, 17). First, ignore the dissipation and forcing in Eq. 18 ; then it follows from Eqs. 17 and 19 that the system in Eq. 18 has the family of Gaussian invariant measures which clearly satisfy Proposition 3 above. Furthermore, it was shown in section 6 of ref. 28 that Eq. 20 remains as the invariant measure with appropriate dissipation and white noise forcing.
The relative entropy of a probability density p with respect to the prior density π is given by the nonnegative functional It measures the average lack of information in π relative to p (see refs. 11 and 12); Kleeman (15) has introduced the relative entropy of an ensemble prediction p, relative to the climatology, π, as a measure of utility in an ensemble prediction and this notion has been developed extensively (11, 12, 16–18, 20, 21). In the present context, the relative information in the marginal compared with the marginal of the climatology in Eq. 20 is the object of interest, In the present models, Eqs. 8 and 19 guarantee dH(p*)/dt = 0, whereas is the amount of the energy which resides in the subspace spanned by x at time t during the ensemble prediction. With Eqs. 2 , 17 , 18 , 19 , and 23 , one readily calculates the identity for the rate of energy transfer from the y variables. Using all of the facts below Eq. 22 , we have the following identity for information flow relative to the climatology
Proposition 6.
Thus, under the hypothesis of Eq. 18 , we have directly related information flow relative to the climatology in the variables x in an ensemble prediction to T _{y→x}(p) and the rate of energy transfer E _{y→x}(p) for the ensemble prediction. Clearly, the optimal value of σ in Eq. 20 for comparison with the ensemble simulation is the value with the same mean energy. Proposition 6 relates the information flow, T _{y→x}(p), to the time rate of change of utility in an ensemble forecast and serves as a starting point for further generalizations.
An Instructive Model Demonstrating Information Flow
Here, we briefly illustrate several facets of information flow in an ensemble prediction for the 2 × 2 system of linear stochastic equations With a Gaussian initial data, the above linear equation has a Gaussian solution for all times that is readily computed exactly by explicit formulas (29). Furthermore, Proposition 4 and Eq. 15 apply for all times so that the information flow in these examples is given by with Γ_{12}(t) = 〈(x − 〈x〉_{p})(y − 〈y〉_{p})〉_{p} the crosscovariance of x and y and Γ_{11}(t) = 〈(x − 〈x〉_{p})^{2}〉_{p} the variance of x.
Below, the value of the damping coefficient γ is set to one, whereas the noise level is fixed by σ = 0.2; the parameters α and β are varied systematically so that the mean of the solution of Eq. 26 exhibits complete variety of stable and unstable behavior ranging from a saddle to stable and unstable nodes to stable and unstable spirals to a Hamiltonian center. In each of these six cases, an ensemble prediction is generated through a 1,000 member ensemble with initial ensemble mean (10, 10) and uncorrelated unity covariance matrix. In the ensemble prediction, the information transfer in Eq. 27 is approximated by the Monte Carlo formula with μ_{x} and μ_{y} denote the Monte Carlo ensemble average over variables x and y for ensemble size of K = 1,000, respectively.
The comparison of the exact analytic formulas and the MonteCarlo ensemble simulation of information transfer for the six cases are depicted in Figs. 1 and 2. Fig. 1 Upper shows the information transfers as functions of time from left to right for a saddle, stable node, and unstable node, whereas Fig. 2 Upper is for a stable spiral, unstable spiral, and Hamiltonian center. Figs. 1 Lower and 2 Lower show the ensemble average of the dynamical variables x and y as functions of time. The overall trend of the information transfer in all of these diverse cases is captured very well by the Monte Carlo simulations; this is even true for the cases with a positive Lyapunov exponent and no invariant measure involving the saddle, unstable node, and unstable spiral even though the mean has grown to the respective values 2 × 10^{9}, 2 × 10^{5}, and 2 × 10^{2} on the time interval depicted.
There is no oscillation in the mean in the cases in Fig. 1, whereas there is decaying, amplifying, and neutral oscillation of the mean state, respectively, in Fig. 2, and such oscillatory behavior is clearly reflected in the information transfer T _{y→x}(p). More interestingly, the frequency of the oscillation in the information flow roughly doubles the frequency of the mean state (or signal). This is easy to understand because the random fields x(t) and y(t) from the explicit solution both contain pieces with the same temporal frequency ω; thus, the quadratic covariance from the information transfer in Eq. 27 has a contribution with frequency 2ω, i.e., one half the period. Finally, one should note that in these examples, especially for those with equilibrium density p _{eq} (e.g., the stable node, the stable spiral, and the Hamiltonian center), the information flow never converges to zero as suggested in Proposition 3 because the crosscovariance between x and y is nonzero at equilibrium.
Systematic Approximations for Information Flow
A theory for information flow in ensemble predictions has been outlined above but the practical utility of such ideas rests on the ability to develop practical approximate algorithms for information flow. This is the topic here. Two general methods are proposed below.
The first method builds on the general formalism in Eq. 11 and Proposition 4 for information flow. First, assume that the general marginal distribution, p _{1}(x,t), is approximated by with where φ_{J}(x, t) is chosen to be the least biased probability density which matches the mean and higher moments of p _{1}(x, t) up to order J. This is the strategy used in refs. 16, 17, and 19–21 for computing relative information in ensemble predictions, and there are efficient algorithms to compute this rapidly provided x is one or twodimensional (19–21, 30); for the prototype geophysical models in Eq. 16 , the formula in Eq. 12 of Proposition 4 can be evaluated explicitly using only ensemble moments. The simplest version of this algorithm occurs when J = 2 so that φ_{J}(x, t) defines a Gaussian approximation; the needed approximate formula is already listed in general in Eq. 15 and of course, the advantage of the Gaussian approximation is that it applies when x has an arbitrary number of variables. For the truncated Burgers–Hopf model, it has been established (see ref. 17) that, surprisingly, typical ensemble predictions are nearly Gaussian so for this system, such an approximation would be excellent. Alternatively, the same approximation strategy can be applied to the conditional distribution p(y x, t) by using suitable similar approximations to ψ in Eq. 13 and Proposition 4.
The second strategy for computing information flow in an ensemble prediction involves asymptotic methods that exploit the disparity in time scales between the x and y processes for special systems such as with ε ≪ 1. In this case, one assumes ergodicity and a unique smoothly varying invariant measure, p _{eq}(yx), for the reduced system with τ = t/ε and x regarded as a parameter.
With these and other suitable assumptions, it is not too difficult to establish rigorously (see ref. 31 for examples) that as ε → 0 for any positive time, so in this setting for ε ≪ 1, the information transfer is given approximately by The battery of approximate procedures discussed in the previous paragraph can be applied to p _{eq}(yx) in the present context. A more attractive computational option would involve using “on the fly” computational procedures for Eq. 28 (see ref. 32).
Concluding Discussion
A systematic mathematical theory for information transfer between subspaces in ensemble predictions of nonlinear dynamical systems has been developed here including potential practical computational strategies and a simple illustrative test example. Further comprehensive numerical tests are needed to demonstrate the practical utility of calculating information transfer in realistic complex dynamical systems.
Acknowledgments
The research of A.J.M. is partially supported by the National Science Foundation, the Office of Naval Research, and the Defense Advanced Research Projects Agency. J.H. is supported as a postdoctoral fellow through the last two agencies.
Footnotes
 ^{†}To whom correspondence should be addressed. Email: jonjon{at}cims.nyu.edu

Author contributions: A.J.M. designed research; A.J.M. and J.H. performed research; and A.J.M. and J.H. wrote the paper.

The authors declare no conflict of interest.
 © 2007 by The National Academy of Sciences of the USA
References
 ↵
 ↵

↵
 Epstein ES
 ↵
 ↵

↵
 Kalnay E
 ↵

↵
 Palmer T ,
 Molteni F ,
 Mureau R ,
 Buizza R ,
 Chapelet P ,
 Tribbia J
 ↵
 ↵

↵
 Majda AJ ,
 Wang X

↵
 Majda AJ ,
 Abramov R ,
 Grote M
 ↵
 ↵
 ↵

↵
 Majda AJ ,
 Kleeman R ,
 Cai D

↵
 Kleeman R ,
 Majda AJ ,
 Timofeyev I
 ↵

↵
 Haven K ,
 Majda AJ ,
 Abramov R
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵

↵
 Liang XS ,
 Kleeman R
 ↵
 ↵

↵
 Gardiner CW
 ↵

↵
 Imkeller P ,
 von Storch JS

↵
 VandenEijnden E
Citation Manager Formats
Sign up for Article Alerts
Jump to section
You May Also be Interested in
More Articles of This Classification
Physical Sciences
Related Content
 No related articles found.