Information optimization in coupled audiovisual cortical maps
See allHide authors and affiliations

Edited by Paul C. Martin, Harvard University, Cambridge, MA, and approved October 14, 2002 (received for review August 6, 2002)
Abstract
Barn owls hunt in the dark by using cues from both sight and sound to locate their prey. This task is facilitated by topographic maps of the external space formed by neurons (e.g., in the optic tectum) that respond to visual or aural signals from a specific direction. Plasticity of these maps has been studied in owls forced to wear prismatic spectacles that shift their visual field. Adaptive behavior in young owls is accompanied by a compensating shift in the response of (mapped) neurons to auditory signals. We model the receptive fields of such neurons by linear filters that sample correlated audiovisual signals and search for filters that maximize the gathered information while subject to the costs of rewiring neurons. Assuming a higher fidelity of visual information, we find that the corresponding receptive fields are robust and unchanged by artificial shifts. The shape of the aural receptive field, however, is controlled by correlations between sight and sound. In response to prismatic glasses, the aural receptive fields shift in the compensating direction, although their shape is modified due to the costs of rewiring.
In the struggle of biological organisms to survive and reproduce, processing of information is of central importance. Sensory signals provide valuable information about the external world such as the locations of predators and prey. Localization of sources is facilitated by topographic maps of neurons to various parts of the brain (1), reflecting the spatial arrangements of signals around the animal. The barn owl has to rely extensively on sounds to find its prey in the dark and consequently has developed precise “auditory space maps.”
By extensive experiments, Knudsen has shown that the optic tectum of the barn owl has both visual and aural maps of space that are in close registry (2, 3). The visual signal plays a crucial role in aligning the aural map; experimental manipulations of the owl's sensory experience reveal the plasticity of these maps in young animals and the instructive role played by the visual experience. (A recent review with specific references can be found in ref. 4.) The current study was motivated by experiments in which owls are fitted with prismatic spectacles that shift the visual fields by a preset degree in the horizontal direction (5, 6). In young owls, the receptive auditory maps were found to shift to remain in registry with the visual maps, which stayed unchanged.
There is at least one theoretical attempt to explain the registry of neural maps through “valuedependent learning,” where synaptic connections in a network are enhanced after “foveation toward an auditory stimulus” (7). In this article we take a more abstract approach to the coupling of audiovisual maps and search for neural connections (receptive fields) that maximize the information gained from the sensory signals. In earlier studies (8, 9), Bialek and A.Z. formulated an approach to optimization of information in the visual system and in computations with neural spike trains (10).
Here we extend the methods of ref. 8 for computing receptive fields in the visual system to finding the optimal connectivities in an audiovisual cortex such as the owl's optic tectum. We find that the shape and registry of the aural map is established by the correlations between the audio and visual signals. In response to an artificial shift of the visual field (as with the prismatic spectacles), the visual receptive field is unchanged. While the aural receptive field shifts in the adaptive direction, its shape changes due to the costs of rewiring the neurons.
The general formalism for our calculations is set up in General Formalisms, which reviews the methodology introduced in ref. 8. The essence of this approach is the assumption that neural connections act as linear filters of the incoming signals and also introduce noise in the outputs. If the (correlated) input signals and the random noise are taken from Gaussian probability distributions, the outputs are also Gaussiandistributed. The Shannon and Weaver (11) information content of the resulting outputs is calculated easily. The task is to find filter functions that maximize this information, subject to biologically motivated costs, and for given correlations of the input signals. In ref. 8 this approach was used to obtain receptive fields in the visual system. In Coupled AudioVisual Inputs, we generalize this formalism to coupled audiovisual signals.
A necessary input to the calculations is the correlations between the audio and visual signals as discussed in Signal Correlations. Because it is clearly much easier to localize objects by sight than sound, it is reasonable that the information carried by the visual channel should far exceed the aural channel. The two sources of information, however, are quite likely to be correlated, resulting in couplings between the corresponding filters. In the experiments on barn owls, the prismatic glasses shift the visual field and hence modify the correlations between the signals. We examine how such shifts change the filter functions (neural connectivities) that optimize the information content in the outputs.
As argued in Results, the disparities in the strengths of visual and aural signals simplify the search for optimal filters. In particular, we find that the visual receptive fields are relatively robust and unchangeable, whereas the shape of the aural receptive field is the product of two terms: one reflects the correlations between sights and sounds and shifts along with external displacement of these signals, and the second is associated with the costs of making connections to distant neurons. This result is interpreted further in Discussions, where some implications for experiments as well as directions for future extensions and generalizations are also discussed.
Analysis of Information
General Formalism.
The processing of information by neural connections in the cortex is modeled in ref. 8 as follows: After passing through intermediate stations, sensory signals arrive as a set of inputs {s_{J}}. Further processing takes place by neurons that sample the information from a subset of these inputs and produce an appropriate output. For ease of calculation, the outputs are represented as a linear transformation of the inputs, according to The filtering of information thus is parameterized by the matrix {F_{iJ}} and is also assumed to introduce an unavoidable noise η_{i}. There are of course many possible sensory inputs, which can be taken from a joint probability distribution P_{in}[s_{J}]. Eq. 1 is thus a transformation from one set of random variables (the inputs) to another (the outputs); the latter is described by the joint probability distribution function P_{out}[O_{i}]. The amount of information associated with a given probability distribution is quantified (11) (up to a baseline and units) by ℐ[P] ≡ − 〈ln P〉, where the averages are taken with the corresponding probability. The task of finding optimal filters is thus to come up with the matrix F that maximizes ℐ[P_{out}] for specified input and noise probabilities.
The Shannon information can be calculated easily for Gaussiandistributed random variables. Let us consider the set of N random variables {x_{i}}, taken from the probability where summation over the repeated indices is implicit, and det A indicates the determinant of the N × N matrix with elements A_{ij}. It is easy to check that, up to an unimportant additive constant of N/2, where we have noted that the pairwise averages are related to the inverse matrix by 〈x_{i}x_{j}〉 = A. A linear filter as in Eq. 1, maps one set of Gaussian variables to new ones. Thus if we assume that the inputs {s_{J}} and the (independent) noise {η_{i}} are Gaussiandistributed, we can calculate the information content of the output using Eq. 3, with We are interested in describing cortical maps related to visual or aural localization of objects. These locations vary continuously in space and are topographically mapped to positions on a twodimensional cortex. As such, it is convenient to promote the indices i and J, used above to label output and input neurons, to continuous vectors in twodimensional space. For example, following ref. 8, let us consider an image described by a scalar field s(x→) on a twodimensional surface with coordinates x→. The image is sampled by an array of cells such that the output of the cell located at x→ is given by where the function F(r→) describes the receptive field of the cell. Assuming uncorrelated neural noise, 〈η(x→)η(x→′)〉 = Nδ^{2}(x→ − x→′), and signal correlations, 〈s(x→)s(x→′)〉 = S(x→ − x→′), the filterdependent part of the output information is given by Note that we have assumed that the signal is translationally invariant such that correlations only depend on the relative distance between their sources. This allows us to change basis to the Fourier components, s̃(k→) ≡ ∫d^{2}xexp(−ik→⋅x→)s(x→), which are uncorrelated for different wave vectors k→. The overall information then is obtained from a sum of independent contributions and using ∑_{k→} → A∫d^{2}k/(2π)^{2}, where A is the cortical area, equal to where F(k→) and S̃(k→) are Fourier transforms of the receptive field F(x→) and signaltonoise correlations S(x→)/N, respectively.
The task is to find the function F(k→) which maximizes the information ℐ. Clearly, we need to impose certain costs on this function, since otherwise the information gain can become enormous for F → ∞. This cost ultimately originates from the difficulties of creating and maintaining neural connections that gather and transmit information over some distance, and is hard to quantify. Following Ref. 8, we shall assume that the overall cost (in appropriate “information” units) has the form This expression can be regarded as an expansion in powers of F and x, with the assumption that the cost is invariant under changing the sign of F and independent of the direction of the vector x→. It imposes a penalty for creating connections that increases quadratically with the length of the connection. Our central conclusion is, in fact, insensitive to the form of C(x→).
If the costs are prohibitive, there will be no filtering of signals. To avoid such cases, we compare only filters that are constrained such that ∫ d^{2}xF(x→)^{2} = 1 (or any other constant). In the optimization process, this constraint can be implemented via a Lagrange multiplier, resulting in an effective cost similar to the term proportional to λ in Eq. 8. Thus, this term and the constraint can be used interchangeably. In ref. 8 it was shown that optimizing Eq. 7 subject to the cost of Eq. 8, in the limit of low signal to noise, is equivalent to solving a Schrödinger equation with F(k→) playing the role of the wave function in a potential 𝒮(k) and the Lagrange multiplier taking the value of the groundstate energy. A potential of the form 𝒮(k) ∝ k^{−2} was used to obtain receptive fields with oncenter/offsurround character. In the next section we generalize this approach by considering correlated visual and aural inputs.
Coupled AudioVisual Inputs.
In our idealized model, a neuron in the optic tectum of the owl filters input signals coming from both the visual and auditory systems, and its output is given by the generalization of Eq. 5 to where α is summed over A and V for audio and visual signals, respectively. Assuming as before that the signals s_{α} and the noise η are independent, correlations of the output are obtained as For translationally invariant signals, the output information is given by the generalization of Eq. 7 to where 𝒮_{αβ}(k→) is a 2 × 2 matrix of (Fouriertransformed) signaltonoise correlations.
Once more we have to impose some constraints to make the maximization of the information in Eq. 11 with respect to the functions F_{V} and F_{A} biologically meaningful. In principle, there could be different costs for connections processing aural and visual signals. In the absence of concrete data, we make the simple choice of using the same form as Eq. 8 for both sets of filters such that the overall cost is The first term in the above cost function can be interpreted again as a Lagrange multiplier λ imposing a normalization constraint
Signal Correlations.
To proceed further, we need the matrix of signaltonoise correlations, which has the form The diagonal terms represent the selfcorrelations of each signal. Because many sources generate both sight and sound, the audio and visual signals will be correlated. These correlations are captured by the offdiagonal term ℛ(k). In the experiments on owls (5, 6), the visual signal is artificially displaced by a fixed angle in the horizontal direction. If we indicate this angle by the vector c→, an aural signal at location x→ becomes correlated with a visual signal at (x→ + c→). After Fourier transformation, this shift appears as the exponential factor exp(ik→⋅c→) in the offdiagonal terms of the correlation matrix.
Thus far we have treated sight and sound on the same footing. It is reasonable to assume that under most (well lit) conditions the quality of visual information is much higher than the aural one. For ease of computation, we shall further assume that the actual signaltonoise ratio is quite small, resulting in the set of inequalities In this limit of small signal to noise, the logarithm in Eq. 11 can be approximated by its argument (without the one), resulting in a quadratic form in the filter functions. Our task then comes down to maximizing the function with respect to F_{V} and F_{A}.
Results.
The optimal filters are obtained from functional derivatives of Eq. 16. Setting the variations with respect to F*_{V}(k→) to 0 gives whereas δW/δF^{*}_{A} = 0 leads to In arranging the above equations, we have placed within curly brackets terms that are much smaller according to the hierarchy of inequalities in Eq. 15. Note that in the absence of any correlations between the two signals (ℛ = 0), F_{A} = 0, since the aural signal is assumed to be much weaker than the visual one. Any nonzero F_{A} reduces F_{V} due to the normalization condition, resulting in a smaller value of 𝒲. It is indeed the correlations between the two signals that lead to a finite value of F_{A}, of the order of (ℛ/𝒮_{V}) [since λ ∼ 𝒪(𝒮_{V}), as shown below].
To leading order, Eq. 17 is the Schrödinger equation obtained in ref. 8 for the visual receptive field. Without further discussion, we shall indicate its solution by Note that we don't imply that cells in the optic tectum should have a receptive field for visual signals identical to that in the visual cortex. The quality of signals, the costs of neural connections, and the response of the cells may well vary from one cortical area to another. The eigenvalue E_{V} is controlled by the strength of the visual correlations and is of the order of 𝒮_{V}.
To simplify the solution to Eq. 18, we first assume that ℛ(k→) = R, a constant independent of k→. This is quite a reasonable assumption, corresponding to visual and aural signals that are correlated only if coming from the same direction, i.e. with 〈s_{V}(x→_{1})s_{A}(x→_{2})〉 = Rδ^{2}(x→_{1} − x→_{2}). We can then Fouriertransform the two sides of this equation to obtain and quite generally, for an arbitrary form of the cost function in Eq. 8, the solution is Due to the quadratic form of Eq. 16, the above result is the linear response of the system to the correlations between signals.
The significance of our result is that the aural receptive field F_{A}(x→) is not simply the visual receptive field shifted by c→, as one might have guessed. Rather, the shape of F_{A}(x→) could be distorted significantly by the cost function C_{A}(x). At the moment, the data may be too crude to determine the shape of F_{A}(x→), but it is still worthwhile to contemplate what sort of shape distortion may result in our simple model. For illustrative purposes, let us take F_{V}(x→) ∝ exp [−(x/l)^{2}] to be a Gaussian with l a lengthscale characteristic of the visual receptive field and C_{A}(x) = μx^{2}. Then we predict [with c→ = (c, 0) and x→ = (x, y)] where L ≡ defines a lengthscale characteristic of the relative cost of connecting distant neurons. Although there are three length scales involved L, l, and c, the shape of F_{A}(x, y) depends only on the two ratios L/l and c/l.
We now qualitatively describe the change in the shape of the aural receptive filed in Eq. 22, because the imposed shift c is varied as in the experiments of Brainard and Knudsen (5). (The exact analysis of the extremal points of Eq. 22 involves the solution of a cubic equation that will not be given here.) Two types of behavior are possible depending on the ratio l/L. For l ≪ L, where the cost of rewiring is negligible, the function F_{A}(x, y) has a single maximum located at x ≈ c (and y = 0), i.e. simply following the imposed shift. When l ≫ L, however, there is an intermediate range of values of c, where the aural receptive field has two peaks, one close to the origin, x̄ ≈ cL/l ≪ c, and another close to x̄_{+} ≈ c. A typical profile with two peaks is depicted in Fig. 1.
Discussions
Eq. 21 is the central result of our study. It provides the optimal linear filter for a weak signal correlated to a stronger one. Some specific features of this result in connection with the coupled visual and aural maps are:
The shape of the aural receptive field is very much controlled by the visual information, modulated by the costs associated with neural connectivities.
Artificially displacing the two signal sources, as in the case of the prismatic spectacles used on the barn owls (5, 6), modifies the aural receptive field. However, the resulting receptive field is not simply shifted (unless the costs of neural wirings are negligible) but also changes its shape.
Eq. 21 is the product of two functions, one peaked at the origin and the other at x→ = c→. Depending on the relative strengths and widths of these two peaks, the receptive field may be more sensitive to signals at the original or in the shifted location.
The experiments find, not surprisingly, that adaptation to the prismatic glasses strongly depends on the age of the individual owl. This feature can be incorporated in our model with the reasonable assumption that the cost of neural connections increases with age of the individual.
This work is a small step toward providing a quantitative framework for deducing the workings of the brain, starting from the tasks that it has to perform for the organism to function in its natural habitat. In this framework, the tasks of the sensory systems are more apparent: to extract the relevant signals from the background of natural inputs and as a first step to localize the source of the signal in the external world. It is possible to experimentally gather information about the correlations of various signals in the natural world, and there are indeed several studies of the statistics of various aspects of visual images (12). Of course, such statistics are also specific to the instrument (e.g. camera) used to obtain the image. More relevant are psychophysical studies that probe how individuals parse the visual information (12). We are not aware of similar studies on the statistics of natural sounds in different directions and their correlations with visual signals. Such studies may provide part of the material needed for a more detailed study.
The outcome of the procedure outlined in this article is a set of filter functions, which hopefully are related to the actual connections between neurons. The shape and range of such connections can be studied directly by injection of biocytin dye (13) and indirectly by mapping the receptive field of a neuron via a microelectrode probe. Detailed studies of this kind for the owls reared with prismatic spectacles, and their comparison with Eq. 21, may provide insights about the cost of making neural connections, another necessary input to our general formalism.
The analytical formalism itself can be extended in several directions. Already, in ref. 8 it was proposed that colored images can be studied by considering a vector signal s→ ranging over the color wheel. In regard to different sensory inputs, we may also ask whether and when it is advantageous to segregate outputs to distinct cortical areas, allowing for distinct maps {O_{ν}}. A more ambitious goal is to extend the formalism to timedependent signals, allowing for filters with appropriate time delays that attempt to take advantage of temporal patterns in the signals.
Acknowledgments
This work was supported in part by National Science Foundation Grants DMR0118213 (to M.K.) and PHY8904035 and PHY9507065 (to A.Z.).
Footnotes

↵‡ To whom correspondence should be addressed. Email: zee{at}itp.ucsb.edu.

This paper was submitted directly (Track II) to the PNAS office.
 Received August 6, 2002.
 Copyright © 2002, The National Academy of Sciences
References
 ↵
 ↵
 ↵
 Knudsen E. I.
 ↵
 ↵
 Brainard M. S.
 ↵
 Knudsen E. I.
 ↵
 Rucci M.
 ↵
 Bialek W.
 ↵
 ↵
 ↵
 Shannon C. E.
 ↵
 Sigman M.
 ↵
 DeBello W. M.