Performance-optimized hierarchical models predict neural responses in higher visual cortex
- aDepartment of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139; and
- bHarvard-MIT Division of Health Sciences and Technology, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA 02139
See allHide authors and affiliations
Edited by Terrence J. Sejnowski, Salk Institute for Biological Studies, La Jolla, CA, and approved April 8, 2014 (received for review March 3, 2014)

Significance
Humans and monkeys easily recognize objects in scenes. This ability is known to be supported by a network of hierarchically interconnected brain areas. However, understanding neurons in higher levels of this hierarchy has long remained a major challenge in visual systems neuroscience. We use computational techniques to identify a neural network model that matches human performance on challenging object categorization tasks. Although not explicitly constrained to match neural data, this model turns out to be highly predictive of neural responses in both the V4 and inferior temporal cortex, the top two layers of the ventral visual hierarchy. In addition to yielding greatly improved models of visual cortex, these results suggest that a process of biological performance optimization directly shaped neural mechanisms.
Abstract
The ventral visual stream underlies key human visual object recognition abilities. However, neural encoding in the higher areas of the ventral stream remains poorly understood. Here, we describe a modeling approach that yields a quantitatively accurate model of inferior temporal (IT) cortex, the highest ventral cortical area. Using high-throughput computational techniques, we discovered that, within a class of biologically plausible hierarchical neural network models, there is a strong correlation between a model’s categorization performance and its ability to predict individual IT neural unit response data. To pursue this idea, we then identified a high-performing neural network that matches human performance on a range of recognition tasks. Critically, even though we did not constrain this model to match neural data, its top output layer turns out to be highly predictive of IT spiking responses to complex naturalistic images at both the single site and population levels. Moreover, the model’s intermediate layers are highly predictive of neural responses in the V4 cortex, a midlevel visual area that provides the dominant cortical input to IT. These results show that performance optimization—applied in a biologically appropriate model class—can be used to build quantitative predictive models of neural processing.
Footnotes
↵1D.L.K.Y. and H.H. contributed equally to this work.
- ↵2To whom correspondence should be addressed. E-mail: dicarlo{at}mit.edu.
Author contributions: D.L.K.Y., H.H., and J.J.D. designed research; D.L.K.Y., H.H., and E.A.S. performed research; D.L.K.Y. contributed new reagents/analytic tools; D.L.K.Y., H.H., C.F.C., and D.S. analyzed data; and D.L.K.Y., H.H., and J.J.D. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
See Commentary on page 8327.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1403112111/-/DCSupplemental.
Freely available online through the PNAS open access option.
Citation Manager Formats
Article Classifications
- Biological Sciences
- Neuroscience
See related content: