Cortical pooling algorithms for judging global motion direction
See allHide authors and affiliations

Communicated by William T. Newsome, Stanford University School of Medicine, Stanford, CA, December 21, 2006 (received for review September 22, 2006)
Abstract
Physiological studies suggest that decision networks read from the neural representation in the middle temporal area to determine the perceived direction of visual motion, whereas psychophysical studies tend to characterize motion perception in terms of the statistical properties of stimuli. To reconcile these different approaches, we examined whether estimating the central tendency of the physical direction of global motion was a better indicator of perceived direction than algorithms (e.g., maximum likelihood) that read from directionally tuned mechanisms near the end of the motion pathway. The task of human observers was to discriminate the global direction of random dot kinematograms composed of asymmetrical distributions of local directions with distinct measures of central tendency. None of the statistical measures of image direction central tendency provided consistently accurate predictions of perceived global motion direction. However, regardless of the local composition of motion directions, a maximumlikelihood decoder produced global motion estimates commensurate with the psychophysical data. Our results suggest that mechanismbased, readout algorithms offer a more accurate and robust guide to human motion perception than any stimulusbased, statistical estimate of central tendency.
The brain depends on sensory information to make evolutionary stable decisions that guide the behavior of an organism. In transforming incoming sensory information into a perceptual decision, the brain must combine, or “pool,” neural signals from early sensory cortex into a representation that can be decoded by decision networks at later stages of processing. The fundamental psychophysical and neural mechanisms mediating early sensory processing are relatively well understood, but we are only beginning to elucidate how cortical decision networks collate their sensory input to form stable and reliable decisions.
Physiological studies have exploited the strict organization of the middle temporal (MT) visual area in extrastriate cortex to delineate the major neural steps in the transformation of visual information to a perceptual decision (1, 2). MT is central to visual motion perception: Neural activity is broadly tuned for retinal position as well as the direction and speed of stimuli (3–7). Moreover, activity is equally sensitive to and covaries with a monkey's direction discrimination performance (8–11). The representation of motion direction in MT is topographically organized and superimposed on a coarse retinotopic map: Columns of neurons oriented perpendicularly to the pial surface respond to similar directions of motion and retinal positions (12, 13). Cortical networks downstream from MT access its neural representation of visual motion to direct many forms of behavior, including saccadic and smooth pursuit eye movements (14) and perceptual judgements of motion direction (15, 16) and speed (17).
A key question that remains unanswered is exactly how cortical networks executing different forms of behavior decode the neural representation of visual motion in MT. Numerous physiological studies have attempted to distinguish between two population coding algorithms for reading out directionally tuned activity in MT: “vector average” (population vector of stimulus direction and magnitude calculated from average, directiontuned neural responses weighted in proportion to their magnitude) and “winnertakeall” (most active directionally tuned column of MT neurons). In a series of pioneering studies combining visual stimulation with electrical microstimulation in monkeys, Newsome and colleagues (14, 18, 20) found that the algorithm deployed by cortical networks to decode the motion representation in MT depends on both task demands and the behavioral objective of the monkey. When discriminating very different visual motion directions, the perceptual system employs a winnertakeall algorithm (18), whereas the saccadic and smoothpursuit eye movement systems read off the vector average of competing velocity signals in MT (14, 19). Moreover, monkeys' performance on a free report task that incorporated elements of both of these experiments was mediated by a vector average algorithm when the direction preferences of visually and microstimulated neurons differed by <140° and by a winnertakeall algorithm when they differed by more than this value (20).
However, winnertakeall and vector average algorithms are not optimal in every sensory situation. For example, a monkey's performance on a fine directiondiscrimination task is better explained by an algorithm that selectively pools the most sensitive neurons (21). Furthermore, a growing body of work suggests that the likelihood function, representing the probability that a range of stimuli gave rise to the neural response, is an optimal estimator of performance on a wider range of perceptual tasks (22–27). Maximumlikelihood algorithms have been successfully applied to various sensory domains, including orientation perception (28), motion perception (29), and multisensory integration (30).
Surprisingly few human perceptual studies have investigated how the activity of directionally tuned mechanisms at late stages of motion processing is decoded to guide global motion perception. The few studies that have considered this question suggest that human observers exploit the statistical characteristics of stimuli to judge the direction and speed of global motion (31–35). Human perceptual judgments tend to coincide with the mean direction of a moving dot field when local dots directions are drawn from a uniform distribution with a restricted range (32). Having said this, when a small number of local directions dominate a moving dot field, observers' judgments are biased away from the mean toward the modal (31) direction of motion.
Thus, at present, it is unclear whether cortical decision networks simply exploit the statistical characteristics of moving stimuli or deploy one of a number of potential readout algorithms to guide human perceptual judgements of global motion direction. Here we examine whether estimating the central tendency of the physical direction of global visual motion provides a better indication of perceived direction than an algorithm that reads from directionally tuned mechanisms at late stages of motion processing.
Results
Statistical Guides to the Perceived Direction of Global Motion.
Our first objective was to determine the extent to which statistical characteristics of moving stimuli influence the perception of visual motion direction. In the first experiment, we sought to determine the relationship between the perceived and modal direction of global motion. The task in this and subsequent experiments was to judge whether a standard or comparison random dot kinematogram (RDK) had a more clockwise (CW) direction of motion (Fig. 1A). Dots of the standard RDK moved in a common direction (i.e., the vector average, median and modal direction were identical), randomly chosen on each trial from a range spanning 360°. Dot directions of the comparison RDK were sampled discretely from a Gaussian probability distribution. Across four conditions, we independently varied the half widths [standard deviations (SDs) counterclockwise (CCW) and CW to the mode] of the distribution (see Fig. 1 B–D and Methods) to generate a modal direction for the comparison RDK that was increasingly distinct from the vector average and median directions.
We expressed the data collected in each condition as the percentage of trials on which observers judged the modal direction of the comparison as more CW than the direction of the standard as a function of the angular difference between them. From logistic fits to these data, we estimated the modal direction of the comparison that had the same perceived direction as that of the standard (see Methods). Fig. 2A shows the results of this analysis for four observers as a function of the halfwidths of the distribution of dot directions for the comparison RDK (symbols). Values on the ordinate are normalized to the modal direction of the comparison (dotted line). As the clockwise (CW) and counterclockwise (CCW) halfwidths of the distribution of dot directions narrow and broaden, respectively, the perceived direction of motion diverges from the modal direction toward the median (solid line) and vector average (dashed line). When dot directions for the comparison were drawn from a Gaussian with a CCW SD of 60° and a CW SD of 0°, the modal direction of the comparison had to be rotated, on average, by 35° to be indistinguishable from the standard. Clearly, the perceived direction does not match the modal direction [mean square error (MSE) between the perceived (average of four observers) and modal direction of physical motion, 494.46]. Rather, it appears to coincide relatively closely with the median direction (MSE, 1.7) and to a lesser extent with the vector average (MSE, 29.82).
In a second experiment, we adopted the same approach used in the first with the following exception. In addition to varying the halfwidths of the Gaussian distribution of directions for the comparison RDK, we sampled the CCW and CW halves of the distribution at 5° and 1° intervals, respectively. Using this method, we were able to generate distributions of dot directions for the comparison RDK with the same mode and median but an increasingly disparate vector average.
Fig. 2B shows the results of this experiment plotted and notated in the same fashion as in Fig. 2A. As the CCW and CW halfwidths of the distribution of dot directions broaden, the perceived direction of motion diverges from the vector average and converges on the median (and modal) direction of motion. When the distribution of dot directions for the comparison had a CCW SD of 90° and a CW SD of 18°, the median direction of the comparison had to be rotated, on average, by only 0.5° to be indistinguishable from the standard. Once again, it is apparent that the perceived direction of motion does not match the vector average (MSE, 177.83) but does conform closely to the median direction of stimulus motion (MSE, 7.42).
Both experiments suggest that there is a close, but imperfect, correspondence between the perceived and median direction of global motion. One advantage of this strategy is that the median is a robust estimator of central tendency that minimizes the impact of outlying motion directions on the global percept. The aim of the third experiment was to determine whether we could produce the same pattern of results when the dot directions of the comparison RDK were chosen from a uniform (rectangular) distribution, which has no mode and equivalent energy at each local motion direction. To generate a median that was increasingly distinct from the vector average direction for the comparison RDK, we independently varied the ranges and used different sampling densities on the CCW and CW sides of the median of the distribution (see Fig. 1D and Methods).
To compare directly the perceived and median directions of motion, we estimated the median direction of the comparison that had the same perceived direction as that of the standard. Fig. 2C shows the results of this analysis as a function of the CCW and CW ranges of the distribution of dot directions for the comparison RDK. Values on the ordinate are normalized to the median direction of the comparison (solid line). When the distribution of dot directions for the comparison had a CCW range of 110° and a CW range of 70°, the median direction of the comparison had to be rotated, on average, by 9.14° to be indistinguishable from the standard. Therefore, when a moving stimulus consists of local directions with equivalent motion energy, there is a poor correspondence between the perceived and median global direction of motion (MSE, 51.58). Vector average (dashed line) is an even poorer estimator of perceived direction (MSE, 381.45), suggesting that computing the central tendency of the physical direction of global motion is not a particularly robust cue to the perceived direction of motion.
Neural Algorithms for Reading Out Global Motion Direction.
Our second objective was to determine whether an algorithm that reads from the activity of directionally tuned mechanisms near the end of the motion pathway offers a better estimate of the perceived direction of global motion. To tackle this question, we modeled the data collected in the three experiments on a trialbytrial basis with three readout algorithms: maximum likelihood, winnertakeall and vector average (see Fig. 4 and Methods). Briefly, each model consists of a bank of evenly (1°) spaced directionally tuned mechanisms that respond to a distribution of dot directions with a Gaussian sensitivity profile that is corrupted by Poisson noise. Chosen to be within the range obtained in previous psychophysical studies on the directional tuning of motion mechanisms (36–38) and physiological studies on the directional tuning of MT neurons (7, 39, 40), the bandwidth (halfwidth at halfheight) of each mechanism was fixed at 45°. Maximum likelihood estimates the perceived direction of motion from the weighted sum of responses for the population of mechanisms, where the activity of each mechanism is multiplied by the log of its tuning function. Winnertakeall estimates perceived direction as the preferred direction of the most active mechanism from the population of responses; vector average arrives at a corresponding estimate by averaging the preferred directions of all mechanisms weighted in proportion to the magnitude of their responses. Vector average produced very similar estimates of the perceived direction of motion (data not shown) to that of the vector average direction of physical global motion. Because we have already shown that the vector average of local motions in the stimulus is a consistently poor predictor of perceived direction, we now concentrate on maximum likelihood and winnertakeall.
Both algorithms provided very accurate estimates of perceived direction in all stimulus conditions, but the variance of winnertakeall's estimate was consistently larger than that of maximum likelihood. Because maximum likelihood is an unbiased, optimal estimator of perceived direction with minimal variance over trials, we chose to present its estimate of perceived direction. Filled symbols in Fig. 3 A–C show the perceived direction of motion of the four observers (mean ± SD) plotted in the same fashion as Fig. 2 A–C. Open symbols show the estimated global motion direction of the maximumlikelihood algorithm. Unlike the median direction of physical global motion, maximum likelihood captures the variability in the perceived direction of different global patterns of motion in each experiment. [The MSE between the data of the four observers and the maximum likelihood algorithm was 1.32 (Fig. 3A), 6.56 (Fig. 3B), and 13.3 (Fig. 3C) for each experiment, respectively.] This point is reinforced in Fig. 3D, which shows the perceived direction of motion of each observer in each experiment as a function of the predictions of the maximumlikelihood algorithm. All data points cluster around the unity line, indicating that maximum likelihood provides a robust estimate of the perceived motion direction, explaining, on average, 96% of the variance in the performance of observers across all stimulus conditions.
Discussion
We investigated whether the central tendency of the physical direction of global visual motion was a better guide to the perceived direction of global motion than an algorithm that reads from directionally tuned mechanisms at late stages of motion processing. In global dot motion direction discrimination, there was a close correspondence between the median and perceived direction of motion if the pattern of motion consisted of a small number of dominant local directions, but much less so if it consisted of equivalent energy at each local motion direction. Regardless of the local composition of motion directions, both maximumlikelihood and winnertakeall readout algorithms provided a much more robust guide to the perceived direction of global motion.
Previous psychophysical studies claim that estimates of the central tendency of the direction of global dot motion predicts the perceived direction of motion (31, 32). For example, Williams and Sekuler (32) found that a pattern appears to drift in a direction close to the mean of the dot directions when their range is constrained to be <180°. However, they used symmetrical distributions of dot directions and therefore could not distinguish the mean from other measures of central tendency. When a distribution of dot directions is skewed asymmetrically, the perceived direction can be biased away from the mean toward the modal direction of global motion (31). Zohary et al. (31) found that observers' judgements switched between the mean and modal direction, suggesting that the visual system has access to the entire distribution of local directions and adopts a flexible decision strategy. However, they did not consider the possibility that observers based their performance on the median direction of motion, which we have shown to be a better estimate of the global percept for patterns of motion dominated by small numbers of local directions. That said, in the present study the median failed to provide an accurate estimate across all conditions tested, suggesting that basing motion direction judgements on the statistical characteristics of a moving stimulus is not a reliable strategy for the visual system.
Zohary et al. (31) also assumed that the mode of an asymmetrically skewed distribution of dot directions corresponded to the most active directionally tuned mechanism. We have shown that this is not the case: The modal direction of patterns composed of asymmetrical distributions of directions is very different from that estimated by a winnertakeall, with the latter providing considerably more accurate estimates of the perceived direction. Indeed, both maximum likelihood and winnertakeall provided much more accurate and robust estimates of the perceived direction of global motion than any estimate of central tendency. This result demonstrates the benefit of designing stimuli that distinguish the statistical characteristics of physical stimulation from the underlying putative perceptual mechanism. Stochastic patterns composed of asymmetrical distributions of the parameter of interest are ideal for this purpose and as such may be a useful resource for future experimentation on motion perception. Such patterns could be used in physiological studies to distinguish between putative algorithms deployed by cortical networks to decode the velocity representation in MT. Similarly, psychophysical studies could use analogous stimulus distributions to study speed mechanisms. By way of example, Watamaniuk and Duchon (35) found that observers tend to base speed judgements on changes in the mean dot speed rather than the median or mode speeds. It follows from our results that an algorithm that reads from a bank of speedtuned mechanisms may offer a better estimate of speed perception than the mean physical speed of motion.
Both physiological and computational studies have shown that winnertakeall and maximum likelihood are members of a group of algorithms deployed by cortical decision networks to read from the motion representation in MT (14, 18, 20–27, 41). This encourages the view that performance on our task is mediated by cortical decision networks that read from the motion representation in the human homologue of monkey MT (26, 41). Whether or not decision networks adopt a flexible strategy, deploying different population coding algorithms to meet the prevailing behavioral objective (14, 18, 20, 21), or use a singular probabilistic readout strategy in MT (41), remains open to debate. The significance of our results is that readout algorithms, such as maximum likelihood, offer a more accurate and robust guide to human motion perception than the statistical characteristics of physical stimulation.
Methods
Observers.
Four observers with normal or correctedtonormal vision participated. Two were authors (B.S.W. and P.V.M.) and two (C.V.H. and E.F.) were naïve to the purpose of the experiments.
Stimuli.
RDKs (Fig. 1A) were generated on a Macintosh G5 (Apple, Cupertino, CA) by using software written in C. We displayed the RDKs on an Electron 22 Blue IV monitor (LaCie, Hillsboro, OR) with an update rate of 75 Hz at a viewing distance of 76 cm. Each RDK (generated anew before each trial) consisted of 10 frames, sequentially displayed at 18.75 Hz. Each frame consisted of 226 dots (luminance 0.05 cd/m^{2}) on a uniform background (luminance 25 cd/m^{2}) within a circular window (diameter 12°). The dot diameter was 0.1° and density was ≈2 dots/deg^{2}. On the first frame of a motion sequence, dots were randomly positioned inside the circular window; on subsequent frames, they drifted in a specified direction at 5°/sec. Dots that fell outside the circular window were redrawn inside at a random location.
Procedure.
Observers judged which of two sequentially presented RDKs had a more CW direction of motion. On each trial, we presented a standard and comparison RDK in a random temporal order, each presented for 0.53 sec and separated by a 1sec interval containing a blank screen of mean luminance (Fig. 1A). The dots of the standard RDK were displaced in the same direction, randomly assigned on each trial from a range spanning 360°. For the comparison RDK, the direction of each dot displacement was drawn, with replacement, from a probability distribution and was independent of both its own direction on previous frames and the directions of other dots.
In the first experiment, dot directions of the comparison RDK were discretely sampled at 5° intervals from a Gaussian distribution spanning a total range of 180° (Fig. 1B). We assigned each half of the Gaussian (i.e., directions CW and CCW to the modal direction) a different SD, thus generating asymmetrically distributed directions of dot motion. The SD of the CCW half of the Gaussian was 30°, 40°, 50°, or 60°; the corresponding values for the CW half were 30°, 20°, 10°, or 0°.
In the second experiment, dot directions of the comparison RDK were discretely sampled from a Gaussian (Fig. 1C) with an SD of 30°, 50°, 70°, or 90° for the CCW half and 6°, 10°, 14°, or 18° for the CW half. We sampled the CCW and CW halves of the distribution at 5° and 1° intervals, respectively. This technique generated asymmetrical distributions of dot directions with the same mode and median but a different vector average. For both experiments, the modal direction of the comparison RDK was randomly chosen on each trial using the Method of Constant Stimuli.
In the final experiment, for the comparison RDK we generated a uniform (rectangular) distribution of dot directions with a total range of 180° (Fig. 1D). We assigned each half of the distribution (i.e., directions CW and CCW to the median direction) a different range and sampling density. Dot directions for the CCW half of the distribution were sampled at 5° intervals over a range of 90°, 110°, 130°, or 150°; the corresponding values for the CW half were 90°, 70°, 50°, or 30°. The median direction of the comparison RDK was randomly chosen on each trial by using the Method of Constant Stimuli.
Data Analysis.
For each condition, observers completed 3–7 runs of 180 trials. Data were expressed as the percentage of trials on which observers judged the modal (experiments 1 and 2) or median (experiment 3) direction of the comparison as more CW than the standard as a function of the angular difference between them and fitted with a logistic function: where y is the percentage of CW judgements, μ is the stimulus level at which observers were unable to distinguish between the directions of the standard and comparison, and β is an estimate of direction discrimination threshold.
We modeled the pooling of local directional signals with three algorithms: maximum likelihood, winnertakeall, and vector average (Fig. 4). We simulated each model's performance on the direction discrimination task performed by the observers on a trialbytrial basis using analogous methods. Each model consisted of a bank of evenly spaced directionally tuned mechanisms spanning 360°, each of which responded to a limited range of local directions with a Gaussian sensitivity profile that was corrupted by Poisson noise. The centertocenter separation between adjacent mechanisms was fixed at 1°. The sensitivity of the ith mechanism, centered at θ_{i}, to direction of motion θ is where h is the bandwidth (halfheight, halfwidth), fixed at 45°. The response profile of the ith mechanism to stimulus D with a distribution of dot directions, D(θ), is where R_{max} is the maximum mean firing rate of the mechanism (to a stimulus in which all dots move in the preferred direction) in spikes per sec (60), t is stimulus duration, and pr{D(θ)} is the proportion of dot directions in the distribution. The number of spikes (n_{i}) elicited in response to a stimulus on a given presentation is Poisson distributed with a mean of R_{i}(D): The log likelihood of any stimulus D is computed as a weighted sum of the responses of the population of mechanisms, where the activity of each mechanism is multiplied by the log of its tuning function (24): The estimated direction is the value of θ_{i} for which log L(D) computed for all D is maximal. To obtain the estimated direction of the comparison from the winnertakeall model, we read off the preferred direction of the peak of the population response; to obtain the corresponding estimate from the vector average model, we calculated the average of the preferred directions of all mechanisms weighted in proportion to the magnitude of their responses.
Acknowledgments
B.S.W. was supported by the Leverhulme Trust, T.L. was supported by the Biotechnology and Biological Sciences Research Council, and P.V.M. was supported by the Wellcome Trust.
Footnotes
 ↵*To whom correspondence should be addressed. Email: bsw{at}psychology.nottingham.ac.uk

Author contributions: B.S.W., T.L., and P.V.M. designed research; B.S.W., T.L., and P.V.M. performed research; B.S.W. and T.L. analyzed data; and B.S.W. wrote the paper.

The authors declare no conflict of interest.
Abbreviations
 CW,
 clockwise;
 CCW,
 counterclockwise;
 MSE,
 mean square error;
 MT,
 middle temporal;
 RDK,
 random dot kinematogram.
 Received September 22, 2006.
 © 2007 by The National Academy of Sciences of the USA
References
 ↵
 ↵
 Born RT,
 Bradley DC
 ↵
 Lagae L,
 Raiguel S,
 Orban GA
 ↵
 Maunsell JH,
 Van Essen DC
 ↵
 ↵
 Zeki SM
 ↵
 Albright TD
 ↵
 ↵
 ↵
 ↵
 Shadlen MN,
 Britten KH,
 Newsome WT,
 Movshon JA
 ↵
 Albright TD,
 Desimone R,
 Gross CG
 ↵
 Malonek D,
 Tootell RB,
 Grinvald A
 ↵
 Groh JM,
 Born RT,
 Newsome WT
 ↵
 ↵
 ↵
 Liu J,
 Newsome WT
 ↵
 Salzman CD,
 Newsome WT
 ↵
 Lisberger SG,
 Ferrera VP
 ↵
 Nichols MJ,
 Newsome WT
 ↵
 ↵
 ↵
 Eeckman F,
 Bower J
 Foldiak P
 ↵
 Seung HS,
 Sompolinsky H
 ↵
 Sanger TD
 ↵
 ↵
 Rao R,
 Olshausen B,
 Lewicki MS
 Weiss Y,
 Fleet DJ
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Felleman DJ,
 Kaas JH
 ↵
 Britten KH,
 Newsome WT
 ↵