## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Optimal population coding by noisy spiking neurons

Edited* by Curtis G. Callan, Princeton University, Princeton, NJ, and approved June 30, 2010 (received for review April 10, 2010)

↵

^{1}V.B. and E.S. contributed equally to this work.

## Abstract

In retina and in cortical slice the collective response of spiking neural populations is well described by “maximum-entropy” models in which only pairs of neurons interact. We asked, how should such interactions be organized to maximize the amount of information represented in population responses? To this end, we extended the linear-nonlinear-Poisson model of single neural response to include pairwise interactions, yielding a stimulus-dependent, pairwise maximum-entropy model. We found that as we varied the noise level in single neurons and the distribution of network inputs, the optimal pairwise interactions smoothly interpolated to achieve network functions that are usually regarded as discrete—stimulus decorrelation, error correction, and independent encoding. These functions reflected a trade-off between efficient consumption of finite neural bandwidth and the use of redundancy to mitigate noise. Spontaneous activity in the optimal network reflected stimulus-induced activity patterns, and single-neuron response variability overestimated network noise. Our analysis suggests that rather than having a single coding principle hardwired in their architecture, networks in the brain should adapt their function to changing noise and stimulus correlations.

Populations of sensory neurons encode information about stimuli into sequences of action potentials, or spikes (1). Experiments with pairs or small groups of neurons have observed many different coding strategies (2–6): (*i*) independence, where each neuron responds independently to the stimulus, (*ii*) decorrelation, where neurons interact to give a decorrelated representation of the stimulus, (*iii*) error correction, where neurons respond redundantly, in patterns, to combat noise, and (*iv*) synergistic coding, where population activity patterns carry information unavailable from separate neurons.

How *should* a network arrange its interactions to best represent an ensemble of stimuli? Theoretically, there has been controversy over what is the “correct” design principle for neural population codes (7–11). On the one hand, neurons have a limited repertoire of response patterns, and information is maximized by using each neuron to represent a different aspect of the stimulus. To achieve this, interactions in a network should be organized to remove correlations in network inputs and thus create a decorrelated network response. On the other hand, neurons are noisy, and noise is combatted via redundancy, where different patterns related by noise encode the same stimulus. To achieve this, interactions in a network should be organized to exploit existing correlations in neural inputs to compensate for noise-induced errors. Such a trade-off between decorrelation and noise reduction possibly accounts for the organization of several biological information processing systems, e.g., the adaptation of center-surround receptive fields to ambient light intensity (12–14), the structure of retinal ganglion cell mosaics (15–18), and the genetic regulatory network in a developing fruit fly (19, 20). In engineered systems, compression (to decorrelate incoming data stream), followed by reintroduction of error-correcting redundancy, is an established way of building efficient codes (21).

Here we study optimal coding by networks of noisy neurons with an architecture experimentally observed in retina, cortical culture, and cortical slice—i.e., pairwise functional interactions between cells that give rise to a joint response distribution resembling the “Ising model” of statistical physics (6, 22–26). We extended such models to make them stimulus dependent, thus constructing a simple model of stimulus-driven, pairwise-interacting, noisy, spiking neurons. When the interactions are weak, our model reduces to a set of conventional linear-nonlinear neurons, which are conditionally independent given the stimulus. We asked how internal connectivity within such a network should be tuned to the statistical structure of inputs, given noise in the system, in order to maximize represented information.

We found that as noise and stimulus correlations varied, an optimal pairwise-coupled network should choose continuously among independent coding, stimulus decorrelation, and redundant error correction, instead of having a single universal coding principle hardwired in the network architecture. In the high-noise regime, the resulting optimal codes have a rich structure organized around “attractor patterns,” reminiscent of memories in a Hopfield network. The optimal code has the property that decoding can be achieved by observing a subset of the active neural population. As a corollary, noise measured in responses of single neurons can significantly overestimate network noise, by ignoring error-correcting redundancy. Our results suggest that networks in the brain should adapt their encoding strategies as stimulus correlations or noise levels change.

## Ising Models for Networks of Neurons

In the analysis of experimental data from simultaneously recorded neurons, one discretizes spike trains *σ*_{i}(*t*) for *i* = 1,…,*N* neurons into small time bins of duration Δ*t*. Then *σ*_{i}(*t*) = 1 indicates that the neuron *i* has fired in time bin *t*, and *σ*_{i}(*t*) = -1 indicates silence. To describe network activity, we must consider the joint probability distribution over *N*-bit binary responses of the neurons, , over the course of an experiment. Specifying a general distribution requires an exponential number of parameters, but for retina, cortical culture, and cortical slice, is well-approximated by the minimal model that accounts for the observed mean firing rates and covariances (6, 22–26). This minimal model is a *maximum-entropy* distribution (27) and can be written in the Ising form [1]Here the *h*_{i} describe intrinsic biases for neurons to fire, and *J*_{ij} = *J*_{ji} are pairwise interaction terms, describing the effect of neuron *i* on neuron *j* and vice versa. We emphasize that the *J*_{ij} model functional dependencies, not physical connections. The denominator *Z*, or *partition function*, normalizes the probability distribution. The model can be fit to data by finding couplings **g** = {*h*_{i}, *J*_{ij}}, for which the mean firing rates 〈*σ*_{i}〉 and covariances *C*_{ij} = 〈*σ*_{i}*σ*_{j}〉-〈*σ*_{i}〉〈*σ*_{j}〉 over match the measured values (6, 11, 23, 24).

This Ising-like model can be extended to incorporate the stimulus (*s*) dependence of neural responses by making the model parameters depend on *s*. We considered models where only the firing biases *h*_{i} depend on *s*: [2]Here is a constant (stimulus-independent) firing bias, and is a stimulus-dependent firing bias. The parameter β, which we call “neural reliability,” is reminiscent of the inverse temperature in statistical physics and reflects the signal-to-noise ratio in the model (9). Here, noise might arise from ion channel noise, unreliable synaptic transmission, and influences from unobserved parts of the network. As *β* → ∞, neurons become deterministic and spike whenever the quantities () are positive and are silent otherwise. As *β* → 0, neurons are completely noisy and respond randomly to inputs. Thus, β parametrizes the reliability of neurons in the model—larger β leads to more reliable responses, and lower β leads to less reliable, noisier responses.

The stimuli *s* are drawn from a distribution *P*_{s}(*s*), which defines the stimulus ensemble. Our analysis will investigate how *J*_{ij} should vary with the statistics of the stimulus ensemble and neural reliability (β) in order to maximize information represented in neural responses. As such, *J*_{ij} will not depend on specific stimuli within an ensemble.

In the absence of pairwise couplings (*J*_{ij} = 0), the model describes stimulus-driven neural responses that are conditionally independent given the stimulus: [3][4]Then, writing the stimulus-dependent drive *h*(*s*) as a convolution of a stimulus sequence *s*(*t*) with a linear filter (e.g., a kernel obtained using reverse correlation), Eq. **4** describes a conventional linear-nonlinear (LN) model for independent neurons with saturating nonlinearities given by tanh functions (shaped similarly to sigmoids). The bias of neurons is controlled by , and the steepness of the nonlinearity by β. Thus, our model (Eq. **2**) can be regarded as the simplest extension of the classic LN model of neural response to pairwise interactions.

We will regard a given environment as being characterized by a stationary stimulus distribution *P*_{s}(*s*). In our model, the stimulus makes its way into neuronal responses via the bias toward firing *h*_{i}(*s*). Thus, for our purposes, a fixed environment can equally be characterized by the distribution of *h*_{i}, , implied by the distribution over *s*. So we will use the distribution to characterize the stimulus ensemble from a fixed environment. The correlations in can arise both from correlations in the external stimulus (*s*) as well as inputs shared between neurons in our network (28). We will show that given such a stimulus ensemble, and neural reliability characterized by β, information represented in network responses is maximized when the couplings are appropriately adapted to the stimulus statistics. In this way, the couplings effectively serve as an “internal representation” or “memory” of the environment, allowing the network to adjust its encoding strategy.

## Maximizing Represented Information

Let *N* neurons probabilistically encode information about stimuli in responses {*σ*_{i}} distributed as Eq. **2** (see Fig. 1). The amount of information about encoded in {*σ*_{i}} is measured by the mutual information (29): [5]where the conditional distribution of responses is given by Eq. **2** and the distribution of responses, *P*({*σ*_{i}}), is given by . This mutual information is an upper bound to how much downstream layers, receiving binary words {*σ*_{i}}, can learn about the world (1). Because of noise and ineffective decoding by neural “hardware,” the actual amount of information used to guide behavior can be smaller, but not bigger, than Eq. **5**.

Eq. **5** is commonly rewritten as a difference between the entropy of the distribution over all patterns (sometimes called “output entropy”) and the average entropy of the conditional distributions (sometimes called the “noise entropy”): [6]where the entropy of a distribution measures uncertainty about the value of *x* in bits.

If neurons spiked deterministically (*β* → ∞), the noise entropy in Eq. **6** would be zero, and maximizing mutual information between inputs and outputs would amount to maximizing the output entropy *S*[*P*({*σ*_{i}})]. This special case of information maximization without noise is equivalent to all-order decorrelation of the outputs. It has been used for continuous transformations by Linsker (30) and Bell and Sejnowski (31), among others, to describe independent component analysis (ICA) as a general formulation for blind source separation and deconvolution. In contrast, here we examine a scenario where noise in the neural code cannot be neglected. In this setting, redundancy can serve a useful function in combating uncertainty due to noise (10). As we will see, information-maximizing networks in our scenario use interactions between neurons to minimize the effects of noise, at the cost of reducing the output entropy of the population.

Our problem can thus be compactly stated as follows. Given the distribution of inputs, , and the neural reliability β, find the parameters such that the mutual information between the inputs and the binary output words is maximized.

## Results

### Two Coupled Neurons.

We start with the simple case of two neurons, responding to inputs drawn from two different distributions . The first is the *binary* distribution, where *h*_{1,2} take one of two equally likely discrete values (± 1), with a covariance Cov(*h*_{1},*h*_{2}) = *α* (useful when the biological correlate of the input is the spiking of upstream neurons). In this case *P*_{h}(-1,-1) = *P*_{h}(1,1) = (1 + *α*)/4 and *P*_{h}(-1,1) = *P*_{h}(1,-1) = (1 - *α*)/4.

The second is a *Gaussian* distribution, where inputs take a continuum of values (useful when the input is a convolution of a stimulus with a receptive field). In this case, we also take the means to vanish (〈*h*_{1}〉 = 〈*h*_{2}〉 = 0), unit standard deviations (*σ*_{h1} = *σ*_{h2} = 1), and covariance . In both cases, α measures *input correlation* and ranges from -1 (perfectly anticorrelated) to 1 (perfectly correlated). We asked what interaction strength *J* between the two neurons (Fig. 2*A* and Eq. **2**) would maximize information, as the correlation in the input ensemble (parameterized by α) and the reliability of neurons (parameterized by β) were varied.

For the binary input distribution, the mutual information of Eq. **5** can be computed exactly as a function *J*, α, and β (see *SI Appendix*), and the optimal coupling *J*^{∗}(*α*,*β*) is obtained by maximizing this quantity for each α and β (Fig. 2*B*). When β is small, the optimal coupling takes the same sign as the input covariance. In this case, interactions between the two neurons enhance the correlation present in the stimulus. The resulting redundancy helps counteract loss of information to noise. As reliability (β) increases, the optimal coupling *J*^{∗} decreases in magnitude as compared to the input strength (see *Discussion*). This is because, in the absence of noise, a pair of binary neurons has the capacity to carry complete information about a pair of binary inputs. Thus, in the noise-free limit the neurons should act as independent encoders (*J*^{∗} = 0) of binary inputs.

For a Gaussian distribution of inputs, we maximized the mutual information in Eq. **5** numerically (Fig. 2 *C* and *D*). For small β, the optimal coupling *J*^{∗} has the same sign as the input correlation, as in the binary input case, thus enhancing input correlations and using redundancy to counteract noise. However, for large β, the optimal coupling has a sign *opposite* to the input correlation. Thus the neural output decorrelates its inputs (Fig. 2*E*). This occurs because binary neurons do not have the capacity to encode all the information in continuous inputs. Therefore, in the absence of noise, the best strategy is to decorrelate inputs to avoid redundant encoding of information. The crossover in strategies is at *β* ∼ 1 and is driven by the balance of output and noise entropies in Eq. **6**, as shown in Fig. S1. In all regimes more information is conveyed with the optimal coupling (*J*^{∗}) than by an independent (*J* = 0) network. The information gain produced by this interaction is larger for strongly correlated inputs (Fig. 2*F*).

For both binary and Gaussian stimulus ensembles, the biases toward firing () in the optimal network adjusted themselves so that individual neurons were active about half of the time (see *SI Appendix*). Adding a constraint on the mean firing rates would shift the values of in the optimal network, but would leave the results for the optimal coupling *J*^{∗} qualitatively unchanged.

Thus, information represented by a pair of neurons is maximized if their interaction is adjusted to implement different functions (independence, decorrelation to remove redundancy, and averaging to reduce noise) depending on the input distribution and neural reliability.

### Networks of Neurons.

We then asked what would be the optimal interaction network for larger populations of neurons. First, we considered a network of *N* neurons responding to an input ensemble of *K* equiprobable *N*-bit binary patterns chosen randomly from the set of 2^{N} such patterns. For *N* ≲ 10 it remained possible to numerically choose couplings and *J*_{ij} that maximized information about the input ensemble represented in network responses. We found qualitatively similar results to two neurons responding to a binary stimulus: For unreliable neurons (low β), the optimal network interactions matched the sign of input correlations, and for reliable neurons (high β), neurons became independent encoders. Input decorrelation was never an optimal strategy, and the capacity of the network to yield substantial improvements in information transmission was greatest when *K* ∼ *N* (see *SI Appendix*). Our results suggest that decorrelation will never appear as an optimal strategy if the input entropy is less than or equal to the maximum output entropy.

We then examined the optimal network encoding correlated Gaussian inputs drawn from a distribution with zero mean and a fixed covariance matrix. The covariance matrix was chosen at random from an ensemble of symmetric matrices with exponentially distributed eigenvalues (*SI Appendix*). As for the case of binary inputs, we numerically searched the space of **g** for a choice maximizing the information for *N* = 10 neurons and different values of neural reliability β. As β is changed, the optimal (*J*^{∗}) and uncoupled (*J* = 0) networks behave very differently. In the uncoupled case (Fig. 3*A*), decreasing β increases both the output and noise entropies monotonically. In the optimal case (Fig. 3*B*), the noise entropy can be kept constant and low by the correct choice of couplings *J*^{∗}, at the expense of losing some output entropy. The difference of these two entropies is the information, plotted in Fig. 3*C*. At low neural reliability β, the total information transmitted is low, but substantial relative increases (almost twofold) are possible by the optimal choice of couplings. The optimal couplings are positively correlated with their inputs, generating a redundant code to reduce the impact of noise (Fig. 3*D*). At high β, the total information transmitted is high, and optimal couplings yield smaller, but still significant, relative improvements (∼10%). The couplings in this case are anticorrelated with the inputs, and the network performs input decorrelation.

For unreliable neurons our results give evidence that the network uses redundant coding to compensate for errors. But theoretically there are many different kinds of redundant error-correcting codes—e.g., codes with checksums vs. codes that distribute information widely over a population. Thus we sought to characterize more precisely the structure of our optimal network codes.

### The Structure of the Optimal Code, Ongoing Activity, and the Emergence of Metastable States.

How does the optimal network match its code to the stimulus ensemble? Intuitively, the optimal network has “learned” something about its inputs by adjusting the couplings. Without an input, a signature of this learning should appear in correlated *spontaneous* network activity. Fig. 4 *A* and *B* *Top* shows the distributions of ongoing, stimulus-free activity patterns, *P*({*σ*_{i}}|*h* = 0), of the noninteracting network (*J* = 0) and those of a network that is optimally matched to stimuli (*J*^{∗}). While the activity of the *J* = 0 network is uniform over all patterns, the ongoing activity of the optimized network echoes the responses to stimuli.

To make this intuition precise and interpret the structure of the optimal code, it is useful to carefully examine the coding patterns in the stimulus-free condition. We find that the ability of the optimal network to adjust the couplings to the stimulus ensemble makes certain response patterns a priori much more likely than others. Specifically, the couplings generate a probability landscape over response patterns that can be partitioned into basins of attraction (see *SI Appendix*). The basins are organized around patterns with locally maximal likelihood (ML). For these ML patterns, , flipping any of the neurons (from spiking to silence or vice versa) results in a less likely pattern. For all other patterns within the same basin, their neurons can be flipped such that successively more likely patterns are generated, until the corresponding ML pattern is reached.

In optimal networks, when no stimulus is applied, the ML patterns have likelihoods of comparable magnitude, but when a particular input is chosen, it will bias the prior probability landscape, making one of these ML patterns the most likely response (Fig. 4*B* *Bottom*). This maps similar stimuli into the same ML basin, while increasing the separation between responses coding for very different stimuli. Overall this improves information transmission. We used the Jensen–Shannon distance to quantify discriminability of responses in an optimal network, compared to the uncoupled (*J* = 0) network, as a function of neural reliability β (Fig. 4*C*).* For high reliability, the independent and optimized networks had similarly separable responses, whereas at low reliability, the responses of the optimized network were much more discriminable from each other.

The appearance of ML patterns is reminiscent of the storage of memories in dynamical “basins of attraction” for the activity in a Hopfield network (32) (for a detailed comparison, see *SI Appendix*). We therefore considered the hypothesis that in the optimal network a given stimulus could be encoded not only by the ML pattern itself, but redundantly by all the patterns within a basin surrounding this ML pattern. Since ML patterns are local likelihood maxima, the noise alone is unlikely to induce a spontaneous transition from one basin to the next, making the basins of attraction potentially useful as stable and reliable representations of the stimuli.

To check this hypothesis, we quantified how much information about the stimulus was carried by the identity of the basins surrounding the ML patterns, as opposed to the detailed activity patterns of the network (Fig. 5*A*). To do this, we first mapped each neural response {*σ*_{i}} to its associated basin of attraction indexed by the ML pattern within it. In effect, this procedure “compresses” the response of neurons in the network to one number—the identity of the basin. Then we computed the mutual information between the identity of the response basins and the stimulus . We found that at high neural reliability β, information is carried by the detailed structure of the response pattern, {*σ*_{i}}. But when neural reliability β is low, most of the information is already carried by the identity of the ML basin to which the network activity pattern belongs. This is a hallmark of error correction via the use of redundancy—at low β, all the coding patterns within a given ML basin indeed encode the same stimulus.

While noise in individual neurons will still result in response variability, at low β the optimal network uses its interactions to tighten the basin around the ML pattern within which a response is likely. Thus, noise in individual neural responses should overestimate network noise. To test this, we first measured network noise entropy per neuron: true , which quantifies the variability in network responses to given stimuli. Then we measured apparent noise entropy per neuron: apparent , which quantifies the average variability in individual neural responses to given stimuli. We found that apparent single-neuron noise could greatly overestimate network noise (Fig. 5*B*). Furthermore, as neural reliability β decreased, single-neuron noise entropy increased monotonically, whereas noise in the optimal network responses saturated. In contrast, the noise entropy as a fraction of the total output entropy was similar when measured on the population or the single-neuron level, regardless of the value of β (Fig. 5*C*). This surprising property of the optimal codes would therefore allow one to obtain an estimate of the coding efficiency of a complete optimal network from the average of many single-neuron coding efficiency measurements.

Finally, in an optimal network with unreliable neurons (low β) most of the network information can be read out by observing just a subset of the neurons (Fig. 5*D*). Meanwhile at high β, every neuron in a network of *N* neurons carries roughly 1/*N* of the total information, because in the high β regime the neural output tends toward independence. In no case did we find an optimal network with a synergistic code such that observing a subset of neurons would yield less-than-proportional fraction of the total information.

## Discussion

Wiring neurons together in a network creates dependencies between responses and thus effectively reduces the repertoire of joint activity patterns with which neurons can encode their stimuli. This reduces the information the network can convey. On the other hand, connections between neurons can enable a population code to either mitigate the noisiness of each of the elements or decorrelate the network inputs. This would increase the information the network can convey. Here we studied the functional value of interactions by finding information-maximizing networks of pairwise-interacting, stimulus-dependent model neurons. We explored the coding properties of models containing three key features: (*i*) neurons are spiking (not continuous), (*ii*) neurons are noisy, and (*iii*) neurons can be functionally interacting, and recurrent connections can also be tuned to achieve optimal information transmission. We found that the optimal population code interpolates smoothly between redundant error-correcting codes, independent codes, and decorrelating codes, depending on the strength of stimulus correlations and neuronal reliability. In a related vein, other recent work has shown that efficient coding and discrimination of certain types of stimulus distributions favor nonzero interactions in a network (33, 34).

If neurons are unreliable (Fig. 6*A*), the optimal network “learns” the input distribution and uses this to perform error correction. This error correction is implemented in a distributed way, as opposed to using dedicated parity or check-bits that appear in engineered codes: The network “memorizes” different inputs using a set of patterns that maximize the likelihood (ML) at zero input, . These ML memories are encoded in the optimal couplings . There are many such potential patterns, and the external input breaks the degeneracy among them by favoring one in particular. The information carried by just the identity of the basin around a ML pattern then approaches that carried by the microscopic state of the neurons, . This mechanism is similar to one used by Hopfield networks, although in our case the memories, or ML patterns, emerge as a consequence of information maximization rather than being stored by hand into the coupling matrix (*SI Appendix*).

If neurons are reliable (Fig. 6*B*), the optimal network behavior and coding depend qualitatively on the distribution of inputs. For binary inputs, the single units simply become more independent encoders of information, and the performance of the optimal network does not differ much from that of the uncoupled network. In contrast, for Gaussian stimuli the optimal network starts decorrelating the inputs. The transition between the low- and the high-reliability regime happens close to *β* ∼ 1.2. This represents the reliability level at which the spread in optimal couplings (standard deviation) is similar to the amplitude of the stimulus-dependent biases, *h*_{i}. Intuitively, this is the transition from a regime, in which the network is dominated by “internal forces” (low β, “couplings > inputs”), to a regime dominated by external inputs (high β, “inputs > couplings”).

Independently of the noise, individually observed neurons in an optimal network appear to have more variability than expected from the noise entropy per neuron in the population. Interestingly, we found that the efficiency of the optimal code, or the ratio of noise entropy to output entropy, stays approximately constant. This occurs mainly because the per-neuron output entropy is also severely overestimated when only single neurons are observed. Our results also indicate that in an optimal network of size *N*, the amount of information about the stimulus can be larger than proportional to the size *M* of the observed subnetwork (i.e., *I*_{M} > (*M*/*N*)*I*_{N}). This means that the optimal codes for Ising-like models are not “combinatorial” in the sense that *all* output units need not be seen to decode properly. A full combinatorial code would be conceivable if the model allowed higher-than-pairwise couplings *J*.

All the encoding strategies we found have been observed in neural systems. Furthermore, as seen for our optimal networks, spontaneous activity patterns in real neural populations resemble responses to common stimuli (35, 36). One strategy—synergistic coding—that has been seen in some experiments (2–5) did not emerge from our optimization analyses. Perhaps synergy arises only as an optimal strategy for input statistics that we have not examined, or perhaps models with only pairwise interactions cannot access such codes. Alternatively, synergistic codes may not optimize information transmission—e.g., they are very susceptible to noise (10).

Our results could be construed as predicting adaptation of connection strengths to stimulus statistics (see, e.g., ref. 37). This prediction could be compared directly to data. To do this, we would select the *h*_{i}(*s*) in our model (Eq. **2**) as the convolution of stimuli with the receptive fields of *N* simultaneously recorded neurons. Our methods would then predict the optimal connection strengths *J*_{ij} for encoding a particular stimulus ensemble. To compare to the actual connection strengths we would instead fit the model (Eq. **2**) directly to the recorded data (38, 39). Comparing the predicted and measured *J*_{ij} would provide a test of whether the network is an optimal, pairwise-interacting encoder for the given stimulus statistics. Testing the prediction of network adaptation would require changing the stimulus correlations and observing matched changes in the connection strengths.

## Acknowledgments

V.B. and G.T. thank the Weizmann Institute and the Aspen Center for Physics for hospitality. G.T., J.P., and V.B. were partly supported by National Science Foundation Grants IBN-0344678 and EF-0928048, National Institutes of Health (NIH) Grant R01 EY08124 and NIH Grant T32-07035. V.B. is grateful to the IAS, Princeton for support as the Helen and Martin Chooljian Member. E.S. was supported by the Israel Science Foundation (Grant 1525/08), the Center for Complexity Science, Minerva Foundation, the Clore Center for Biological Physics, and the Gruber Foundation.

## Footnotes

^{2}To whom correspondence should be addressed. E-mail: gtkacik{at}sas.upenn.edu.Author contributions: G.T., J.S.P., V.B., and E.S. designed research; G.T., J.S.P., V.B., and E.S. performed research; G.T., J.S.P., V.B., and E.S. analyzed data; and G.T., V.B., and E.S. wrote the paper.

The authors declare no conflict of interest.

*This Direct Submission article had a prearranged editor.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1004906107/-/DCSupplemental.

↵

^{*}Given distributions*p*and*q*, let*m*(*α*) = (*p*(*α*) +*q*(*α*))/2. The Jensen–Shannon distance is*D*_{JS}= 0.5∫*dαp*(*α*) log_{2}[*p*(*α*)/*m*(*α*)] + 0.5∫*dαq*(*α*) log_{2}[*q*(*α*)/*m*(*α*)].*D*_{JS}= 0 for identical, and*D*_{JS}= 1 for distinct*p*,*q*.

## References

- ↵
- Rieke F,
- Warland D,
- de Ruyter van Steveninck RR,
- Bialek W

- ↵
- ↵
- ↵
- Narayanan NS,
- Kimchi EY,
- Laubach M

- ↵
- ↵
- ↵
- Barlow HB

- ↵
- Abeles M

- ↵
- Amit DJ

- ↵
- ↵
- ↵
- ↵
- Srinivasan MV,
- Laughlin SB,
- Dubs A

- ↵
- ↵
- Devries SH,
- Baylor DA

- ↵
- Borghuis BG,
- Ratliff CP,
- Smith RG,
- Sterling P,
- Balasubramanian V

- ↵
- Balasubramanian V,
- Sterling P

- ↵
- Liu YS,
- Stevens CF,
- Sharpee TO

- ↵
- ↵
- ↵
- MacKay DJC

- ↵
- Shlens J,
- et al.

- ↵
- Tkačik G,
- Schneidman E,
- Berry MJ II.,
- Bialek W

- ↵
- Tkačik G,
- Schneidman E,
- Berry MJ II.,
- Bialek W

- ↵
- Tang S,
- et al.

- ↵
- Shlens J,
- et al.

- ↵
- ↵
- ↵
- Cover TM,
- Thomas JA

- ↵
- Touretzky DS

- Linsker R

- ↵
- ↵
- Hopfield JJ

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Tkačik G

- ↵
- Granot-Atdegi E,
- Tkačik G,
- Segev R,
- Schneidman E

## Citation Manager Formats

### More Articles of This Classification

### Biological Sciences

### Neuroscience

### Physical Sciences

### Related Content

- No related articles found.

### Cited by...

- A cerebellar adaptation to uncertain inputs
- Toward a unified theory of efficient, predictive, and sparse coding
- Cholinergic shaping of neural correlations
- Dynamics of Multistable States during Ongoing and Evoked Cortical Activity
- Positional Information, Positional Error, and Readout Precision in Morphogenesis: A Mathematical Framework
- Rate and timing of cortical responses driven by separate sensory channels
- Near-Optimal Decoding of Transient Stimuli from Coupled Neuronal Subpopulations
- Statistics of the Vestibular Input Experienced during Natural Self-Motion: Implications for Neural Processing
- Intermediate intrinsic diversity enhances neural population coding
- Population Rate Dynamics and Multineuron Firing Patterns in Sensory Cortex
- Efficient Coding of Spatial Information in the Primate Retina
- Searching for simplicity in the analysis of neurons and behavior