New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Memory traces in dynamical systems

Communicated by David W. McLaughlin, New York University, New York, NY, October 3, 2008 (received for review April 3, 2008)
Abstract
To perform nontrivial, realtime computations on a sensory input stream, biological systems must retain a shortterm memory trace of their recent inputs. It has been proposed that generic highdimensional dynamical systems could retain a memory trace for past inputs in their current state. This raises important questions about the fundamental limits of such memory traces and the properties required of dynamical systems to achieve these limits. We address these issues by applying Fisher information theory to dynamical systems driven by timedependent signals corrupted by noise. We introduce the Fisher Memory Curve (FMC) as a measure of the signaltonoise ratio (SNR) embedded in the dynamical state relative to the input SNR. The integrated FMC indicates the total memory capacity. We apply this theory to linear neuronal networks and show that the capacity of networks with normal connectivity matrices is exactly 1 and that of any network of N neurons is, at most, N. A nonnormal network achieving this bound is subject to stringent design constraints: It must have a hidden feedforward architecture that superlinearly amplifies its input for a time of order N, and the input connectivity must optimally match this architecture. The memory capacity of networks subject to saturating nonlinearities is further limited, and cannot exceed
Critical cognitive phenomena such as planning and decisionmaking rely on the ability of the brain to hold information in shortterm memory. It is thought that the neural substrate for such memory can arise from persistent patterns of neural activity, or attractors, that are stabilized through reverberating positive feedback, either at the singlecell (1) or network (2, 3) level. However, such simple attractor mechanisms are incapable of remembering sequences of past inputs.
More recent proposals (4–6) have suggested that an arbitrary recurrent network could store information about recent input sequences in its transient dynamics, even if the network does not have informationbearing attractor states. Downstream readout networks can then be trained to instantaneously extract relevant functions of the past input stream to guide future actions. A useful analogy (4) is the surface of a liquid. Even though this surface has no attractors, save the trivial one in which it is flat, transient ripples on the surface can nevertheless encode information about past objects that were thrown in.
This proposal raises a host of important theoretical questions. Are there any fundamental limits on the lifetimes of such transient memory traces? How do these limits depend on the size of the network? If fundamental limits exist, what types of networks are required to achieve them? How does the memory depend on the network topology, and are special topologies required for good performance? To what extent do these traces degrade in the presence of noise? Previous analytical work has addressed some of these questions under restricted assumptions about input statistics and network architectures (7). To answer these questions in a more general setting, we use Fisher information to construct a measure of memory traces in networks and other dynamical systems. Traditionally, Fisher information has been applied in theoretical neuroscience to quantify the accuracy of population coding of static stimuli (see, e.g., ref. 8). Here, we extend this theory by combining Fisher information with dynamics.
The Fisher Memory Matrix in a Neuronal Network
We study a discrete time network dynamics given by Here a scalar, timedependent signal s(n) drives a recurrent network of N neurons (Fig. 1B). x(n) ∈ ℛ^{N} is the network state at time n, f(·) is a general sigmoidal function, W is an N × N recurrent connectivity matrix, and v is a vector of feedforward connections from the signal into the network. We keep v time independent to focus on how purely temporal information in the signal is distributed in the N spatial degrees of freedom of the network state x(n). The norm ‖v‖ sets the scale of the network input, and we will choose it to be 1. The term z(n) ∈ ℛ^{N} denotes a zero mean Gaussian white noise with covariance 〈z_{i}(k_{1})z_{j}(k_{2})〉 = εδ_{k1,k2}δ_{i,j}.
We build upon the theory of Fisher information to construct useful measures of the efficiency with which the network state x(n) encodes the history of the signal. Because of the noise in the system, a given past signal history {s(n − k)∣k ≥ 0}induces a conditional probability distribution P(x(n)∣s) on the network's current state. Here, we think of this history {s(n − k)∣k ≥ 0}as a temporal vector s whose kth component s_{k} is s(n − k). The Fisher memory matrix (FMM) between the present state x(n) and the past signal is then defined as This matrix captures [see supporting information (SI) Appendix] how much the conditional distribution P(x(n∣s) changes when the signal history s changes (Fig. 1). Specifically, if one were to perturb the signal slightly from s to s + δs, the Kullback Leibler divergence between the 2 induced distributions P(x(n∣s) and P(x(n)∣s + δs) would be approximated by (½)δs^{T}J(s)δs (SI Appendix). Thus the FMM (Eq. 2) measures memory through the ability of the past signal to perturb the network's present state. In this work, we will focus on the diagonal elements of the FMM. Each diagonal element J(k) ≡ J_{k,k} is the Fisher information that x(n) retains about a pulse entering the network at k time steps in the past. Thus, the diagonal captures the decay of the memory trace of a past input, and so we call J(k) the Fisher memory curve (FMC).
For a general nonlinear system, the FMC depends on the signal itself and is hard to analyze. In this article, we focus on linear dynamics where the transfer function in Eq. 1 is defined by f(x) = x. Because the noise is Gaussian, the conditional distribution P(x(n)∣s) is also Gaussian, with a mean that is linearly dependent on the signal δx(n)/δs(n − k) = W^{k}v, and a noise covariance matrix C_{n} = ε Σ_{k=0}^{∞}W^{k}W^{kT}, which is independent of the signal. Hence, the FMC is independent of signal history and takes the form
We focus on two related features of Eq. 3: the form of its dependence on the time lag k and the total area under the FMC, denoted by J_{tot}. An important parameter is the SNR in the input vector vs(n) + z(n) at a single time n, which is
FMCs for Normal Networks
In the following, we uncover a fundamental dichotomy in the memory properties of two different classes of networks: normal and nonnormal. We first focus on the class of normal networks, defined as having a normal connectivity matrix W. A matrix W is normal if it has an orthogonal basis of eigenvectors or equivalently commutes with its transpose. For normal networks, the relationship between the connectivity and the FMC simplifies considerably. Denoting the eigenvalues of W by λ_{i}, the FMC reduces to where v_{i} is the projection of the input connectivity vector v on the ith eigenmode. Thus, for normal matrices, the orthogonal eigenvectors do not yield any essential contribution to memory performance.
First we note that summing Eq. 4 over k yields the important sum rule for normal networks,
which is independent of the network connectivity W and v. This sum rule implies that normal networks cannot change the total SNR relative to that embedded in the instantaneous input but can only redistribute it across time. Whereas in the input vector, the SNR
The reduction of the FMC to eigenvalues allows us to understand its asymptotics. For large k, the decay of the FMC in Eq. 4 is determined by the distribution of magnitudes of the largest eigenvalues. Dynamic stability requires that the largest eigenvalue magnitude, denoted by
Examples of Normal Networks
An important class of normal matrices includes translation invariant lattices, i.e., circulant matrices. In the 1D case, W is of the form W_{ij} = d_{(i−j)} mod N, where d is any vector, and the eigenvectors of W are the Fourier modes. The signal enters at a single neuron so that v_{k} = δ_{k}_{,0} and couples to all of the modes with a uniform strength 1/
Another class of normal networks consists of networks with symmetric connectivity matrices. An example is a symmetric ddimensional lattice (a 2D example is shown in Fig. 2B). Near the edge of the eigenvalue spectrum ρ(r) ∝ (
Preferred Input Patterns in Nonnormal Networks
For nonnormal networks, J_{tot} depends not only on the network connectivity W but also on the feedforward connectivity v. To investigate the sensitivity to v, we note from Eq. 3 that, in general, J_{tot} can be expressed as where we have introduced the spatial FMM This matrix, J^{s}, and the temporal FMM J in Eq. 2, can be unified into a general space–time framework (see SI Appendix). J^{s} measures the information in the network's spatial degrees of freedom x_{i}(n) about the entire signal history. The total information in all N degrees of freedom is Tr J^{s} = N, independent of W. Because J^{s} is positive definite with trace N, Eq. 7 yields a fundamental bound on the total area under the FMC of any network W and unit input vector v: If W is normal, then J_{i,j}^{s} = δ_{i},_{j}, implying that all directions in space provide the same amount of total temporal information, and so J_{tot} is independent of the spatial structure v of the input, consistent with the sum rule (Eq. 5). However, if W is nonnormal, J^{s} has nontrivial spatial structure, reflecting an inherent anisotropy in state space induced by the connectivity matrix W. There will be preferred directions in state space, corresponding to the large principal components of J^{s}, that contain a large amount of information about the total history, whereas other directions will perform relatively poorly. The choice of v that maximizes J_{tot} is the eigenvector of largest eigenvalue of J^{s}.
The spatial anisotropy of nonnormal networks is demonstrated by evaluating the FMC for random asymmetric networks, where each matrix element W_{ij} is chosen independently from a zero mean Gaussian with variance α/N. If the feedforward connectivity v is chosen to be a random vector, the distribution of J_{tot} (Fig. 3A, blue) is centered around 1 as expected, because the trace of J^{s} equals N. However, if v is chosen as the maximal principal component of J^{s}, the resultant J_{tot} is approximately 4 times as large (Fig. 3A, red).
Additional insight into the structure of the preferred v comes from the Schur decomposition of W. Whereas every normal matrix is unitarily equivalent to a diagonal matrix (Fig. 3B Upper), every nonnormal matrix is unitarily equivalent to an upper triangular matrix (Fig. 3B Lower). On this basis, it may, in general, be preferable to distribute the signal near the beginning of the network to counterbalance the noise propagation along the network. We have tested this hypothesis by plotting the magnitude of the components of the optimal input vector for the random asymmetric networks in their Schur basis. As Fig. 3C shows, the optimal choice of feedforward weights v does indeed exploit the hidden feedforward structure by coupling the signal more strongly to its source than to its sink.
Transient Amplification and Extensive Memory
Comparing Eqs. 5 and 9 motivates defining networks with extensive memory as networks in which J_{tot} is proportional to N for large N. With this definition, normal networks do not have extensive memory. Furthermore, as indicated in Fig. 3A, despite the enhanced performance of generic asymmetric networks, their total memory remains O(1), prompting the question whether in fact there exist nonnormal networks with extensive memory. Surprisingly, such networks do exist.
A particularly simple example is the delay line shown in Fig. 4A Upper). In this example, the only nonzero matrix elements are W_{i}_{+1,i} =
The delay line with extensive memory is an example of a dynamical system with strong transient amplification. Network amplification can be characterized by the behavior of A_{k} ≡ ‖W^{k}v‖^{2} for k ≥ 0. Whereas in normal systems, A_{k} is montonically decreasing for all v, in nonnormal networks, A_{k} may initially increase before decaying to zero for large k (9). In the case of Fig. 4C (red), A_{k} = α^{k} increases exponentially, and this amplification lasts for a time of order N. It is important to note that, in such a system, not only the signal but also the noise is exponentially amplified as it propagates along the chain. Introducing the signal at the beginning of the chain guarantees that the signal and noise are amplified equally, resulting in the saturation of J(k) (Fig. 4B).
It is not necessary to have purely feedforward connectivity to have large transient amplification and extensive memory. As an example, we consider a delay line with feedback (Fig. 4A Lower). In addition to the feedforward connections,
Transient exponential amplification, as in the above examples, is not a necessary condition for extensive memory. Consider, for example, a delay line with inhomogeneous weights, W_{i}_{+1,i} =
Consequences of Finite Dynamic Range
The networks discussed above achieve extensive memory performance through transient superlinear amplification that lasts for O(N) time steps. However, such amplification may not be biophysically feasible for neurons that operate in a limited dynamic range, due, for example, to saturating nonlinearities. This raises the question, what are the limits of memory capacity for networks with saturating neurons? To address this question, we assume that the network architecture is such that all neurons have finite dynamic range, i.e., 〈x_{i}(n)〉^{2} < R for i = 1, …, N. We show (see SI Appendix) that in this case,
This bound implies that such a network cannot achieve an area under the FMC that is larger than O(
Can a network of neurons with finite dynamic range achieve the O(
A comparison between the performance of the fanout architecture and a random Gaussian network of the same size, N ≈ 7,000, is shown in Fig. 5 B and C. The first network consists of n = 7,021 neurons organized in a divergent chain of length L = 118 with the number of neurons at each layer growing as N_{k} = k and the connection strengths are
Finally, to test the robustness of the fanout architecture to saturation, we have simulated the dynamics of Eq. 1 with a saturating nonlinear transfer function f(x) = tanh(x) (see SI Appendix). As before, the input is white noise with SNR of 20. A sample of the signal and its reconstruction from the layers' activity is shown in Fig. 5D. The correlation coefficient of the 2 traces, roughly 0.8, is in accord with the theoretical prediction of the linear system, Fig. 5C. Thus, the fanout architecture achieves impressive memory capacity by distributed amplification of the signal across neurons without a significant amplification of the input to individual neurons.
Nonnormal Amplification and Memory in Fluid Dynamics
To illustrate the generality of the connection between transient, nonnormal amplification and memory performance, we consider an example from fluid mechanics. Indeed, nonnormal dynamics is thought to play an important role in various fluid mechanical transitions, including the transition from certain laminar flows to turbulence (10). Here, we focus on a particular type of local instability known as a convective instability (11) that plays a role in describing fluid flow perturbations around wakes, mixing and boundary layers, and jets. For example, the fluid flow just behind the wake of an object, or in the vicinity of a mixing layer where two fluids at different velocities meet, is especially sensitive to perturbations, which transiently amplify but then decay away as they are convected away from the object or along the mixing layer as the two velocities equalize.
Following refs. 9 and 11, we model these situations phenomenologically through the time evolution of a 1D flow perturbation u(x,t) obeying the linear evolution operator
This describes rightward drift plus diffusion in the presence of a quadratic feedback potential (Fig. 6A) driven by a 1D signal s(t) and zero mean, unit variance additive white Gaussian noise in time and space, η(x, t). Perturbations in the region ∣x∣ ≤
Although the sum rule (Eq. 5) holds also for continuous time (see SI Appendix) and discrete space, there is no analogous bound (Eq. 9) for continuous space because the number of degrees of freedom is infinite. Nevertheless, for a given system, J_{tot} is finite and is bounded by the amplification time, or equivalently by the effective number of amplified degrees of freedom, which, in our case, is O(1/h).
The optimal way for the signal to enter the network, i.e., the first principal eigenvector of J^{s}, is shown in Fig. 6A. This optimal input profile v(x) is a wave packet poised to travel through the convective instability (Fig. 6B). Fig. 6C (red) shows that the optimal memory performance, or maximal eigenvalue of J^{s}, scales linearly with 1/h. Thus, consistent with the results above, the maximal area under the FMC is proportional to the time over which inputs are superlinearly amplified. For comparison, we have computed the value of the optimal J_{tot} in the case of a fluid dynamics that contains only diffusion and drift, the first 2 terms of the righthand side of Eq. 13 but not the amplifying potential. In this case, J_{tot} is low for all values of h (Fig. 6C, blue).
Discussion
In this work, we focused on the diagonal part of the FMM (Eq. 2. In networks that are unitarily equivalent to simple delay lines (e.g., Fig. 4A Upper and Fig. 5A), this matrix is diagonal. However, in general, the offdiagonal elements are not all zero. Their value reflects the interference between two signals injected into the system at two different times, and their analysis provides an interesting probe into the topology of (partially directed) loops in the system, which give rise to such interference (see SI Appendix).
It is interesting to note the relation between the FMC J(k) and the more conventional memory function m(k) defined through the correlation between an optimal estimate of the past signal ŝ(n − k) based on the network state x(n) and the original signal s(n − k) (see Fig. 5C). Even in the linear version of Eq. 1 studied here, m(k) depends on the full FMM. Furthermore, it depends in a complex manner on the signal statistics (see SI Appendix), whereas Fisher information is local in signal space and in the present case is, in fact, independent of the signal except for an overall factor of the input SNR. Both features render signal reconstruction a much more complex measure to study analytically. Nevertheless, it is important to note that the FMC measures the SNR embedded in the network state, relative to the input SNR
Our results indicate that generic recurrent neuronal networks are poorly suited for the storage of longlived memory traces, contrary to previous proposals (4–6). In systems with substantial noise, only networks with strong and longlasting signal amplification can potentially sustain such traces. However, signal amplification necessarily comes at the expense of noise amplification, which could corrupt memory traces. To avoid this, longlived memory maintenance at high SNR further requires that the input connectivity pattern be matched to the architecture of the amplifying network. By analyzing the dynamical propagation of signal and noise through arbitrary recurrent networks, we have shown (see SI Appendix), remarkably, that for a given amount of signal amplification, no recurrent network can achieve less noise amplification (i.e., higher SNR) than a delay line possessing the same signalamplification profile, with the input entering at its source. However, a recurrent network, unlike a delay line, can amplify signals four times larger than its network size (see Fig. 4D).
Although most of our analysis was limited to linear systems, we have shown that systems with a divergent fan out architecture (see Fig. 5A) can achieve signal amplification in a distributed manner and thereby exhibit longlived memory traces that last a time O(
Given the poor memory performance of generic networks, our work suggests that neuronal networks in the prefrontal cortex or hippocampus specialized for working memory tasks involving temporal sequences may posses hidden, divergent feedforward connectivities. Other potential systems for testing our theory are neuronal networks in the auditory cortex specialized for speech processing or networks in the avian brain specialized for song learning and recognition.
The principles we have discovered hold for general dynamical systems, as illustrated in the example from fluid dynamics. In light of the results of Fig. 6, it is not surprising that reconstruction of acoustic signals injected into the surface of water in a laminar state, attempted in ref. 12, fared poorly. Our theory suggests that performance could be substantially improved if, for example, the signal were injected behind the wake of a fluid flowing around an object, or in the vicinity of a mixing layer, or even into laminar flows at high Reynolds numbers just below the onset of turbulence.
In this work, we have applied the framework of Fisher information to memory traces embedded in the activity of neurons, usually identified as shortterm memory. However, the same framework can be applied to study the storage of spatiotemporal sequences through synaptic plasticity, i.e., longterm memory (S.G. and H.S., unpublished work). More generally, memory of past events is a ubiquitous feature of biological systems, and they all face the problem of noise accumulation, decaying signals, and interference. In revealing fundamental limits on the lifetimes of memory traces in the presence of these various effects, and in uncovering general dynamical design principles required to achieve these limits, our theory provides a useful framework for studying the efficiency of dynamical processes underlying robust memory maintenance in biological systems.
Acknowledgments
We have benefited from useful discussions with Kenneth D. Miller, Eran Mukamel, and Olivia White. This work was supported by the Israeli Science Foundation (H.S.) and the Swartz Foundation (S.G.). We also acknowledge the support of the Swartz Theoretical Neuroscience Program at Harvard University.
Footnotes
 ^{1}To whom correspondence should be addressed. Email: surya{at}phy.ucsf.edu

Author contributions: S.G., D.H., and H.S. performed research; and S.G. and H.S. wrote the paper.

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/cgi/content/full/0804451105/DCSupplemental.
 © 2008 by The National Academy of Sciences of the USA
References
 ↵
 ↵
 Seung HS
 ↵
 Mongillo G,
 Barak O,
 Tsodyks M
 ↵
 ↵
 Jaeger H
 ↵
 Jaeger H,
 Haas H
 ↵
 ↵
 Seung HS,
 Sompolinsky H
 ↵
 Trefethen LN,
 Embree M
 ↵
 Trefethen LN,
 Trefethen AE,
 Reddy SC,
 Driscoll TA
 ↵
 Cossu C,
 Chomaz JM
 ↵
 Fernando C,
 Sojakka S
Citation Manager Formats
Sign up for Article Alerts
Jump to section
 Article
 Abstract
 The Fisher Memory Matrix in a Neuronal Network
 FMCs for Normal Networks
 Examples of Normal Networks
 Preferred Input Patterns in Nonnormal Networks
 Transient Amplification and Extensive Memory
 Consequences of Finite Dynamic Range
 Nonnormal Amplification and Memory in Fluid Dynamics
 Discussion
 Acknowledgments
 Footnotes
 References
 Figures & SI
 Info & Metrics