# Sparse low-order interaction network underlies a highly correlated and learnable neural population code

See allHide authors and affiliations

Edited* by Nikos K. Logothetis, The Max Planck Institute for Biological Cybernetics, Tuebingen, Germany, and approved April 19, 2011 (received for review January 4, 2011)

## Abstract

Information is carried in the brain by the joint activity patterns of large groups of neurons. Understanding the structure and function of population neural codes is challenging because of the exponential number of possible activity patterns and dependencies among neurons. We report here that for groups of ~100 retinal neurons responding to natural stimuli, pairwise-based models, which were highly accurate for small networks, are no longer sufficient. We show that because of the sparse nature of the neural code, the higher-order interactions can be easily learned using a novel model and that a very sparse low-order interaction network underlies the code of large populations of neurons. Additionally, we show that the interaction network is organized in a hierarchical and modular manner, which hints at scalability. Our results suggest that learnability may be a key feature of the neural code.

Sensory and motor information is carried in the brain by sequences of action potentials of large populations of neurons (1–3) and, often, by correlated patterns of activity (4–11). The detailed nature of the code of neural populations, namely the way information is represented by the specific patterns of spiking and silence over a group of neurons, is determined by the dependencies among cells. For small groups of neurons, we can directly sample the full distribution of activity patterns of the population; identify all the underlying interactions, or lack thereof; and understand the design of the code (12–15). However, this approach cannot work for large networks: The number of possible activity patterns of just 100 neurons, a population size that already has clear functional implications (16), exceeds 10^{30}. Thus, our understanding of the code of large neural populations depends on finding simple sets of dependencies among cells that would capture the network behavior (17–19).

The success of pairwise-based models in describing the strongly correlated activity of small groups of neurons (19–25) suggests one such simplifying principle of network organization and population neural codes, which also simplifies their analysis. Using only a quadratic number of interactions, out of the exponential number of potential ones, pairwise maximum entropy models reveal that the code relies on strongly correlated network states and exhibits distributed error-correcting structure (19, 21). It is unclear, however, if pairwise models are sufficient for large networks, particularly when presented with natural stimuli that contain high-order correlation structure. Here we show that in this case pairwise models capture much, but not all, of the network behavior. This implies a much more complicated structure of population codes (26, 27). Because learning even pairwise models is computationally hard (28–33), this may seem to suggest that population codes would be extremely hard to learn.

We show here that this is not the case for neural population codes. The sparseness of neuronal spiking and the highly correlated behavior of large groups of neurons facilitate the learning of the functional interactions governing the population activity. Specifically, we found that a very sparse network of low-order interactions gives an extremely accurate prediction of the probability of the hundreds of thousands of patterns that were observed in a long experiment and, in particular, captures the highly correlated structure in the population response to natural movies. We uncover the dominant network interactions using a novel approach to modeling the functional dependencies that underlie the population activity patterns. This “reliable interaction” (RI) model learns the dominant functional interactions in the network, of any order, using only the frequent and reliably sampled activity patterns of the network. Although this approach would not be useful for general networks, for neural populations that have a heavy-tailed distribution of occurrences of activity patterns, the result is an extremely accurate model of the network code. Our results indicate that because of the sparse nature of neural activity (34), the code of large neural populations of ganglion cells is learnable in an easy and accurate manner from examples. Finally, we demonstrate that large network models can be constructed from smaller subnetworks in a modular fashion, which renders the approach scalable and perhaps applicable to much larger networks.

## Results

To study the code of large networks of neurons, we recorded simultaneously the spiking activity of groups of **~**100 ganglion cells in a 2-mm^{2} patch of Salamander retina responding to long natural and artificial movies (Fig. 1*A* and *Methods*). To give a general representation of network activity, we discretized time using a window of size ; for a small enough , the responses of each neuron, *x _{i}*, are binary [i.e., the cell either spikes (1) or is silent (0)]. Here, we used 20 ms, which reflects the temporal structure of correlations between cells; different bin sizes did not qualitatively affect our results.

The distribution of occurrences of activity patterns over the population was heavy-tailed for both spatial and spatiotemporal patterns (Fig. 1*B*). Similar to what was found for smaller populations, typical pairwise correlations were weak (e.g., ref. 35) (Fig. 1*B*, *Inset*), yet the network as a whole was strongly correlated. Assuming that cells were independent resulted in orders-of-magnitude errors in describing the network synchrony (Fig. 1*C*) and in predicting specific spatial population activity patterns (Fig. 1*D*) as well as spatiotemporal ones (Fig. 1*E* and *Methods*). We conclude that an accurate description of the population code must take into account correlations between cells.

### Pairwise Models Are Not Sufficient to Describe the Distribution of Activity Patterns of Large Networks Responding to Natural Stimuli.

The most general description of the population activity of *n* neurons, which uses all possible correlation functions among cells, can be written using the maximum entropy principle (36, 37) as:

where must be found numerically, such that the corresponding averages, such as <*x _{i}*>, <

*x*>, or <

_{i}x_{j}*x*> over , agree with the experimental ones; the partition function

_{i}x_{j}x_{k}*Z*is a normalization factor. However, has an exponential number of parameters (interactions). Thus, this representation would be useful only if a small subset of parameters would be enough to capture the network behavior.

For small groups of 10–20 neurons, it has been shown that the minimal model of the network that relies only on pairwise interactions, namely, the maximum entropy pairwise model, with only s and s in Eq. **1**, captures more than 90% of the correlation structure of the network patterns (19, 20). Learning , which has only parameters, is a known hard computational problem (29). For 100 cells, we cannot even enumerate all network states and must rely on a Monte Carlo algorithm to find the parameters of (*Methods*). The resulting model gave a seemingly accurate description of the network spatial patterns (Fig. 1*D*), correcting the orders-of-magnitude mistakes that makes. We obtained similar results for the spatiotemporal activity patterns of 10 neurons over 10 consecutive time bins, which we learned using the temporal correlations between cells (Fig. 1*E* and *Methods*).

However, closer inspection reveals that the most common population activity patterns were actually misestimated (Fig. 1 *D* and *E*), whereas the rare patterns were predicted well by. This reflects a real failure of the pairwise model in this case and not a numerical artifact of our algorithm (Fig. S1). Moreover, is an accurate model of the population's response to artificial movies without spatial structure, such as spatiotemporal white noise (24) (Fig. 1*F* and Fig. S1). Thus, the inaccuracy of the pairwise model for the retina responding to natural stimuli is the signature of the contribution of higher-order interactions to the neural code of large populations.

High-order interactions, which reflect the tendency of groups of triplets and quadruplets, for example, to fire synchronously (or to be silent together) beyond what can be explained from the pairwise relations between cells, have been studied before in small networks (4, 26, 27, 36, 37). These synchronous events are believed to be crucial for neural coding and vital for the transmission of information to downstream brain regions (4, 6, 7, 10). However, unlike the case of small groups of neurons, where one can systematically explore all interaction orders, how can we find the high-order interactions that define the neural population code in large networks among the exponential number of potential ones?

### Learning the Functional Interactions Underlying the Neural Code from Reliably Sampled Activity Patterns.

We uncovered the structure of the population code of the retina using a network model that gives an extremely accurate approximation to the empirical distribution of population activity patterns. We note that if we were given the true distribution, , over all the network activity patterns, we could then infer the interactions among neurons simply by solving a set of linear equations for the parameters. For a specific activity pattern , substituting the appropriate 0’s and 1’s in Eq. **1**, we get that , where the sums are only over terms where all *x _{i}*s are 1. The interaction parameters that govern the network could then be calculated recursively, starting from the low-order ones and going up (36, 38):

In general, this approach would not be useful for learning large networks, because the number of possible activity patterns grows exponentially with network size. We would therefore rarely see any pattern more than once, regardless of how long we observe the system, and as a result we would not be able to estimate the parameters in Eq. **2**. However, the distribution of pattern occurrences in large networks of neurons had many patterns, both spatial and spatiotemporal ones, that appeared many times during the experiment (Fig. 1*B*). This feature of the pattern count distribution is a direct result of the sparseness of the neural code and the correlation structure of neural activity, and does not hold for other kinds of distributions (Fig. S2).

We can then use the frequently occurring patterns to estimate the interactions governing the joint activity of neural populations accurately and to build the RI model that approximates the distribution of neural population responses:

where are the dependencies of any order among neurons, whose interaction parameters, , are inferred, as in Eq. **2**, from patterns μ that appeared more than *n _{RI}* times (our learning threshold) in the training data. The order of interactions in the model is therefore determined by the nature of the patterns in the data and not by an arbitrarily imposed finite order; higher-order interactions beyond pairs, if needed, would stand for synchronous events that could not be explained by pairwise statistics.

### Sparse Low-Order Interaction Network Model Captures the Code of Large Populations of Neurons.

We learned an RI model for the data from Fig. 1, and the resulting model had far fewer parameters than the pairwise maximum entropy model but included a small set of higher-order interactions. This RI model predicted the probability of the spatial population activity patterns in a 1-h test data extremely well (Fig. 2*A*). Its accuracy was actually within the experimental sampling error of the data itself, which we quantified using the log-likelihood ratio of the model and the empirical data, . The RI model was equally accurate in predicting the spatiotemporal patterns in the population (Fig. 2*B*). For comparison, we also present the poor performance of the independent model and the systematic failure of the pairwise model (Fig. 2 *A* and *B*). We found similar behavior of the models when we used only the subpopulation of “Off” cells in the salamander retina recorded in our experiment (Fig. S3). Moreover, the interactions among the Off cells network were similar to the ones we found using the full network of all ganglion cells recorded (Fig. S3). The RI model remained a highly accurate model for different time bin values used to analyze neural activity (Fig. S4), whereas became even less accurate for larger time bins of 40 ms.

Because of the sparseness of the neural code, the number of interaction parameters that the RI model relies on is very small, much smaller than the number of all possible pairs, and these parameters are low-order ones. The functional architecture of the retina responding to a natural movie, inferred from the RI model, is shown in Fig. 3*A*, overlaid on the physical layout of the receptive field centers of the cells. We found that beyond a selective group of single-cell biases (α* _{i}*) and pairwise interactions (β

*), there were significant higher-order interactions (e.g., γ*

_{ij}*, δ*

_{ijk}*). However, importantly, the model was very sparse, using only hundreds of pairwise and triplet interactions, tens of quadruplets, and a single quintuplet (with a learning threshold of ). The pairwise interactions were mostly positive; that is, cells tended to spike together or to be silent together, and their strength decayed with distance (Fig. 3*

_{ijkl}*D*). These higher-order interactions had a similar decay with distance, but the interactions among nearby cells were mostly negative, indicating that neurons were more silent together than would be expected from pairwise relations alone (Fig. 3

*D*). We note that the decay of interactions with distance is stronger than the decay of spatial correlations in the natural movies we showed the retina (Fig. 3

*D*,

*Inset*). We found a similar decay of pairwise and higher-order temporal interactions among cells with the time difference between them as well as a similar polarity of the interaction sign (Fig. 3

*E*).

### Higher-Order Interactions in the Neural Code of the Retina Responding to Natural Scenes.

We found that the RI model was much better than the pairwise model in describing the retinal population's response to different natural movies (Fig. 4 *A* and *B*). The benefit of the high-order interactions that the RI models identify was even more pronounced when the whole retina patch was presented with the same stimulus, as we found for a “natural pixel” (NatPix) movie in which the stimulus shown was the luminance of one pixel from a natural movie, retaining natural temporal statistics, and for Gaussian full-field movies (*Methods*). The RI model's accuracy in describing the population activity patterns was similar to that of the model (but with far fewer parameters) for spatiotemporal white noise stimuli that had no temporal or spatial correlations (Fig. 4 *A* and *B*). The RI model was also similar to the model in capturing the retinal response to a manipulated natural movie, which retained the pairwise correlations in the movie but with randomized phases (*Methods*). Fig. 4*B* summarizes the average gain in accuracy of the RI model compared with for many groups of neurons in salamander and archer fish retinae responding to artificial, naturalistic, and natural stimuli. Our results indicate that as the stimulus becomes more correlated, the network response becomes more correlated with a growing contribution of high-order interactions; thus, the improvement introduced by the RI model becomes more pronounced. We conclude that the higher-order interactions in the neural response to natural movies are driven by the higher-order statistics in natural scenes. Furthermore, most of the high-order interactions that we found for widely correlated stimuli were gone when we constructed the conditionally independent response of the cells (Fig. 4*C*). This is done by shuffling the responses of each neuron to a repetitive movie, thereby retaining only stimulus-induced correlations among neurons. Thus, beyond the stimulus statistics, the retinal circuitry does play an important role in generating the functional higher-order interactions in the response.

We emphasize that *f _{RI}* was much more accurate than using a considerably smaller number of parameters. In particular, although the full-pairwise model requires almost 5,000 parameters for 99 neurons, the RI model was already more accurate with just 450 parameters (Fig. 4

*D*). The model continued to improve as we added more parameters or lowered the learning threshold

*n*, or when we added training samples (Fig. S5). The cross-validation performance continued to improve even when we used patterns that appeared only twice in the data to train the model, indicating that the model was not overfit (other measures of model performance and validation are shown in Fig. S6). We found similar behavior of the RI and models for spatiotemporal patterns (Fig. S5).

_{RI}We note that the RI model is not guaranteed to be a normalized probability distribution over all the population activity patterns. The reason is that by using only the reliable patterns to learn the model, we are inevitably missing nonzero interactions, corresponding to less common patterns that are not used in the training. Unlike other such pseudolikelihood models, which are common in machine learning and statistics (29, 38, 39), the RI model does give an extremely accurate estimate of the actual probability for almost all patterns observed in the test data. The patterns for which it would fail are typically so rare that we would not see them even in a very long experiment.

### Hierarchical Modular Organization of the Interaction Network Underlying the Code of Large Neural Populations.

For networks much larger than 100 neurons, we expect that no activity pattern will appear more than once even in a very long experiment because of neural noise; thus, the RI approach could not be simply applied. However, because the interactions of all orders decayed with distance between cells, we hypothesized that the RI model might be applied in a hierarchical manner. Indeed, we found that the interactions inferred from smaller overlapping subnetworks of spatially adjacent neurons (based on their receptive fields) were very similar to those estimated using the full network (Fig. 4*E*). This surprising result reflects an overlapping modular structure of neuronal interactions (40, 41); namely, neurons directly interact only with a relatively small set of adjacent neurons, known as their “Markov blanket” (28), and the interaction neighborhood of each neuron overlaps with that of its neighbors. This is not an ad hoc approximation but rather a mathematical equality based on the local connectivity structure of the network (42). This suggests that the RI approach may be scalable and useful to model and analyze much larger networks, also beyond the retina.

### Contribution of Higher-Order Interactions to Stimulus Coding.

The average stimulus preceding the joint spiking events of pairs and of triplets of ganglion cells (pattern-triggered average) was significantly different from what would be expected from conditionally independent cells (Fig. 5 *A*–*C*), reflecting that the population codebook carried information differently than what could be read from the single cells or pairs (4, 7). To demonstrate the role of higher-order interactions in stimulus coding under natural conditions, we presented the retina repetitively with a 50-s natural movie and trained an RI model on the responses of groups of 10 neurons to each 1-s clip of the movie using half of the repeats. We found that these models could discriminate between the clips very accurately (Fig. 5*D*). For larger networks, more samples are required; thus, we presented the same retina with two long and different full-field stimuli: one with natural temporal statistics (NatPix) and one in which each frame was sampled from a Gaussian distribution [full field flicker (FFF); *Methods*]. For each of the stimuli, an RI model was learned from the population responses *f _{NatPix}* and

*f*, respectively. New stimuli (not used for the training) were sampled from both stimulus classes and presented to the retina, and they were then classified according to the log-likelihood ratio, . Using the RI models, we found that we could accurately classify new stimulus segments within a few hundred milliseconds (Fig. 5

_{FFF}*E*) far better than (almost threefold faster) or the independent model (almost 10-fold faster). Finally, we also show that a limited memory-based estimation of

*f*, which is learned using only frequent patterns that appeared at least once every few minutes in the training data (Fig. 5

_{RI}*F*), proves to be highly accurate. This limited memory approach can be readily converted into an online learning algorithm: As new samples are presented, only the parameters corresponding to patterns that appear frequently are retained, whereas parameters learned from patterns that have not appeared in the recent past are completely forgotten.

## Discussion

Using a novel nonnormalized likelihood model, we have found that a sparse low-order interaction network captures the activity of a large neural population in the retina responding to natural stimuli. This RI model gave an extremely accurate prediction of the observed and unobserved patterns using a small number of interaction parameters. The network interaction model further revealed the modular nature of the code, which could also be learned using limited memory/history.

Studies of the “design principles” of the neural code have suggested efficiency, optimality, and robustness as central features (3, 43–45). Our results suggest a possible new feature following (43), namely that the neural code is learnable from examples. What makes the code learnable in our case is the combination of several characteristics. First, the sparseness of neuronal spiking, which has long been suggested as a design principle of the neural code (34), means that even for large networks many population activity patterns appear often enough so that interactions governing the joint activity of the network can be estimated reliably. Second, the network activity as a whole is strongly correlated. Third, although higher-order interactions are present and meaningful (e.g., ref. 27), they are few and do not go to very high orders. Finally, the limited range of direct interactions (small Markov blankets) results in the ability to learn the model from small subnetworks, thus rendering the approach scalable and perhaps applicable to much larger networks.

Based on these observations, we expect that the RI approach may be applicable to brain regions beyond the retina. Any neural population that exhibits highly correlated and sparse activity, as well as a limited range of interaction, may be a good substrate for the RI approach.

The accuracy and simplicity of the RI model in the case of the retinal neural code are related to the fact that although *f _{RI}* predicts the actual probability of almost all patterns we observe in the experiment, it is not a normalized probability distribution over all pattern space. Sacrificing this mathematical completeness means that we can then overcome the computational difficulty of learning probability distributions exactly. Moreover, we note that in many decision or classification problems, it is likelihood ratios rather than the likelihood values that matter. However, unlike other pseudolikelihood models commonly used in statistics and machine learning, the RI model gives an excellent estimate of the partition function

*Z*, because the all-zero pattern is so common.

We reiterate that the functional interactions inferred by the model do not necessarily correspond to physical connections. They may also result from joint input circuitry between cells, unrecorded neurons, or stimulus correlations. In particular, our results indicate that the higher-order interactions are strongly driven by the stimulus statistics. In addition, the abundance of higher-order interactions in the data compared with conditionally independent surrogate data indicates a contribution of retinal processing to higher-order interactions. Regardless of how higher-order interactions arise from the physical retinal circuitry, they are crucial for any observer of the retina to capture the statistics of the neural circuit response accurately. Our recordings did not capture all neurons in the retinal patch. It will be interesting to explore how dense recordings affect both higher-order interactions and the size of the Markov blankets we have found here. Based on the full-field stimuli experiments, we expect that dense populations will exhibit more highly correlated activity with a larger contribution of higher-order interactions, thus increasing the advantage of the RI model over pairwise approaches, similar to our results for NatPix stimuli.

Finally, we point out that the modular structure of the RI model, its computational efficiency, and the fact that it can be updated continuously using limited memory suggest that the RI model may also be a biologically plausible approach to learning and representing the joint activity of large neural populations (34).

## Methods

Full details on experimental procedures, analysis, and modeling are presented in *SI Methods*.

### Electrophysiology.

Experiments were performed on adult tiger salamanders and archer fish in accordance with Ben-Gurion University of the Negev and government regulations. Retinal responses were recorded using a multielectrode array with 252 electrodes (salamander) or 196 electrodes (archer fish). Extracellularly recorded signals were amplified and digitized at 10 kSamples/s. Datasets used in this study had simultaneous recordings of 99, 115, and 101 ganglion cells responding to natural movies; 99, 99, and 68 cells responding to spatiotemporal white noise; 90 cells responding to full-field stimuli; 99 and 83 cells responding to random phase movie from salamander retina; and 38 cells in archer fish.

### Visual Stimulation.

Natural movie clips were acquired using a video camera and converted to gray scale. Stimuli were projected onto the retina from a cathode ray tube video monitor. NatPix stimuli were generated by selecting a random pixel from a natural movie and displaying the intensity of that pixel uniformly on the entire screen.

## Acknowledgments

We thank M. Tsodyks, I. Lampl, R. Paz, O. Barak, M. Shamir, Y. Pilpel, G. Tkacik, and W. Bialek for discussions and comments on the manuscript. This work was supported by the Israel Science Foundation and The Center for Complexity Science. E.S. is supported by the Minerva Foundation, ERASysBio+ program, The Clore Center for Biological Physics, and The Peter and Patricia Gruber Foundation.

## Footnotes

^{1}To whom correspondence may be addressed. E-mail: ronensgv{at}bgu.ac.il or elad.schneidman{at}weizmann.ac.il.

Author contributions: E.G., R.S., and E.S. designed research; E.G., R.S., and E.S. performed research; E.G., R.S., and E.S. analyzed data; and E.G., R.S., and E.S. wrote the paper.

The authors declare no conflict of interest.

↵*This Direct Submission article had a prearranged editor.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1019641108/-/DCSupplemental.

## References

- ↵
- Perkel DH,
- Bullock TH

- ↵
- Georgopoulos AP,
- Schwartz AB,
- Kettner RE

- ↵
- Rieke F,
- Warland D,
- de Ruyter van Steveninck RR,
- Bialek W

- ↵
- ↵
- ↵
- Riehle A,
- Grün S,
- Diesmann M,
- Aertsen A

- ↵
- ↵
- Hatsopoulos NG,
- Ojakangas CL,
- Paninski L,
- Donoghue JP

- ↵
- Brown EN,
- Frank LM,
- Tang D,
- Quirk MC,
- Wilson MA

- ↵
- ↵
- ↵
- Schneidman E,
- Bialek W,
- Berry MJ 2nd.

- ↵
- Narayanan NS,
- Kimchi EY,
- Laubach M

- ↵
- ↵
- ↵
- ↵
- Hopfield JJ,
- Tank DW

- ↵
- LeMasson G,
- Marder E,
- Abbott LF

- ↵
- ↵
- Shlens J,
- et al.

- ↵
- Tkacik G,
- Schneidman E,
- Berry MJ 2nd.,
- Bialek W

- ↵
- Tang A,
- et al.

- ↵
- ↵
- Shlens J,
- et al.

- ↵
- ↵
- ↵
- ↵
- Pearl J

- ↵
- ↵
- ↵
- Broderick T,
- Dudík M,
- Tkačik G,
- Schapire RE,
- Bialek W

- ↵
- Cocco S,
- Leibler S,
- Monasson R

- ↵
- Bethge M,
- Berens P

- ↵
- ↵
- Bair W,
- Zohary E,
- Newsome WT

- ↵
- ↵
- ↵
- Besag J

- ↵
- Besag J

- ↵
- Santos GS,
- Gireesh ED,
- Plenz D,
- Nakahara H

- ↵
- Ganmor E,
- Segev R,
- Schneidman E

- ↵
- Abbeel P,
- Koller D,
- Ng AY

- ↵
- ↵
- ↵
- Turrigiano G,
- Abbott LF,
- Marder E

## Citation Manager Formats

## Article Classifications

- Biological Sciences
- Neuroscience