A neural basis for real-world visual search in human occipitotemporal cortex

Edited by Robert Desimone, Massachusetts Institute of Technology, Cambridge, MA, and approved June 15, 2011 (received for review January 20, 2011)
July 5, 2011
108 (29) 12125-12130


Mammals are highly skilled in rapidly detecting objects in cluttered natural environments, a skill necessary for survival. What are the neural mechanisms mediating detection of objects in natural scenes? Here, we use human brain imaging to address the role of top-down preparatory processes in the detection of familiar object categories in real-world environments. Brain activity was measured while participants were preparing to detect highly variable depictions of people or cars in natural scenes that were new to the participants. The preparation to detect objects of the target category, in the absence of visual input, evoked activity patterns in visual cortex that resembled the response to actual exemplars of the target category. Importantly, the selectivity of multivoxel preparatory activity patterns in object-selective cortex (OSC) predicted target detection performance. By contrast, preparatory activity in early visual cortex (V1) was negatively related to search performance. Additional behavioral results suggested that the dissociation between OSC and V1 reflected the use of different search strategies, linking OSC preparatory activity to relatively abstract search preparation and V1 to more specific imagery-like preparation. Finally, whole-brain searchlight analyses revealed that, in addition to OSC, response patterns in medial prefrontal cortex distinguished the target categories based on the search cues alone, suggesting that this region may constitute a top-down source of preparatory activity observed in visual cortex. These results indicate that in naturalistic situations, when the precise visual characteristics of target objects are not known in advance, preparatory activity at higher levels of the visual hierarchy selectively mediates visual search.
The selection of complex stimuli, such as objects, from cluttered environments presents a complicated problem: during real-world visual search, objects are present at unspecified locations and may have an almost infinite number of possible visual appearances. Despite these difficulties, the detection of highly familiar object categories (e.g., people or cars) in natural scenes has been shown to be remarkably fast (1). Surprisingly, such detection is even possible in the near absence of spatial attention, by stark contrast to the detection of simple feature conjunctions (e.g., discriminating “T” from “L”) under comparable task conditions (2). On the basis of these and other findings, it has been suggested that mechanisms related to the visual search for familiar objects in natural scenes may differ from those related to the visual search for simple shapes in artificial scenes, such as typically studied in the laboratory (3). In the present study, we investigated the brain mechanisms mediating the selection of objects in real-world scenes. Specifically, we addressed the role and nature of top-down preparatory processes in the detection of familiar object categories in daily life environments.
Theoretical accounts of visual search hold that top-down preparation is an important component of efficient target detection (4, 5). Indeed, most items in a complex scene are not fully processed without top-down attention, as shown, for example, in the change blindness paradigm (6). At a neural level, it has been suggested that search preparation may be mediated by activity of neurons that are directly involved in the perceptual discrimination of targets from distractors (7). Evidence for this hypothesis comes from studies in which participants detected a specific shape at a small number of possible locations (810). Neurons in monkey inferotemporal (IT) cortex that were activated by a specified target shape also responded during the search preparation phase (9). In humans, several fMRI studies have demonstrated that perceptual expectation of a centrally presented isolated object gives rise to object-specific preparatory activity in visual cortex (1012), and that such activity facilitates subsequent processing of the anticipated object. These findings raise the possibility that similar anticipatory mechanisms may mediate visual search. However, in these previous functional magnetic resonance imaging (fMRI) studies, objects were presented centrally and the shape of the anticipated objects (e.g., a full-front view of a face) was known in advance, making it likely that these effects reflected explicit visual imagery during the preparation phase. By contrast, during visual search in natural vision, targets are located at unspecified locations in cluttered scenes, and are experienced in unexpected and novel viewing conditions. In other words, in real-world visual search the precise visual characteristics and location of target objects are not known in advance and can thus not be precisely imagined. Therefore, preparation mechanisms mediating the detection of target objects in real-world environments remain largely unknown.
In the present study we measured brain activity using fMRI while participants detected (and prepared to detect) people or cars in natural scenes. We found that search preparation by itself evoked category-specific activity patterns in visual cortex. Importantly, category-specific preparatory activity in object-selective cortex (OSC) greatly facilitated the detection of objects, whereas such activity in early visual cortex (V1) hindered object detection. Behavioral results indicated that this dissociation might have reflected the use of different search strategies, with a more abstract search strategy being more effective than a specific imagery-like strategy. Finally, whole-brain searchlight analyses showed that response patterns in medial prefrontal cortex, like those in OSC, both distinguished the target categories present in the scenes and the categories indicated by the search cues alone, suggesting that this region may constitute the top-down source of the category-selective preparatory activity observed in OSC.


Fourteen participants performed a scene categorization task involving the detection of people or cars in a large set of natural scenes that were new to the participants. The set consisted of heterogeneous photographs of city- and landscapes with a subset containing people or cars that appeared in diverse locations, sizes, shapes, and viewpoints (Fig. S1). On each trial, a symbolic cue indicated the target category. On 2 out of 3 trials, the cue was followed, after a delay, by a briefly (100 ms) presented and subsequently masked scene. Critically, on 1 of 3 trials, no scene was presented, allowing for the isolation of preparatory activity in the absence of visual input (Fig. 1A). Data were analyzed using multivoxel pattern analysis, a technique that has been successfully used to reveal object category coding in extrastriate visual cortex (13, 14).
Fig. 1.
Experimental design and analytical approach. (A) Schematic of trial structure. Participants were instructed to detect either people or cars in briefly presented natural scenes. A symbolic cue (a letter or number) indicated the target category on a trial-by-trial basis. On a proportion of trials (33%) the cue, but no scene stimulus, was presented, to isolate responses to the cue itself (“cue-only trial”). (B) OSC was localized in each individual participant by contrasting activity to intact versus scrambled objects, presented in a separate experiment. Voxels activated by this contrast were selected for multivoxel pattern analysis. (C) Multivoxel response patterns to the cue-only trials (people and cars cues) in the main experiment were correlated with response patterns evoked by exemplar pictures of people and cars, presented without other visual context in a separate category localizer experiment. The between-category correlations (diagonal comparisons) were subtracted from the within-category correlations (horizontal comparisons) to estimate the category-specificity of cue-related activity.
In a first analysis, responses evoked by the scene stimuli were modeled to probe (i) whether the people and cars embedded in the scenes were processed up to the category level despite the short presentation duration, diverse visual appearance, and cluttered background, and (ii) whether this processing was modulated by the task relevance of a given category. To measure categorical processing of objects, activity patterns evoked by the natural scenes were correlated with activity patterns evoked by isolated pictures of people and cars, presented in a separate category localizer experiment (Materials and Methods). Within OSC (Fig. 1B), defined based on its preference for intact compared with scrambled objects (Materials and Methods), activity patterns to the scenes containing people or cars correlated more strongly with the matching than the nonmatching categories (people or cars, respectively) from the category localizer (t13 = 5.7, P < 0.0001; Fig. 2). Furthermore, this “category information” was modulated by task relevance, with greater information regarding task-relevant than task-irrelevant objects (t13 = 2.6, P < 0.05; Fig. 2). This result indicates that the processing of the scenes was biased toward the task-relevant category, confirming a previous report (14). By contrast to OSC, no information about the two types of scenes was represented in the voxel patterns of early visual cortex (Materials and Methods): V1 (t13 = −0.3; Fig. 2), V2/V3 (t13 = 0.3, P = 0.8; Fig. 2). There was also no modulation of category information as a function of task relevance in V1 (t13 = −1.4; Fig. 2), V2/V3 (t13 = 1.3, P = 0.2; Fig. 2), showing that category-based attentional modulation was specific to OSC.
Fig. 2.
Multivoxel category information in visual cortex. Activity patterns to the scenes containing people or cars were correlated with activity patterns to the isolated pictures of people and cars presented in the category localizer. Category information was defined as the mean difference between matching (e.g., people–people) and nonmatching (e.g., people–cars) correlations. OSC activity patterns contained significant information about the object categories embedded in the natural scenes, both for task-relevant objects (P < 0.00001) and task-irrelevant objects (P < 0.05). Significantly more information was observed for task-relevant (target) objects than for task-irrelevant objects (P < 0.05). No significant category information was present in V1 or V2/V3 activity patterns, and there was no significant modulation of task-relevance in these ROIs. See Fig. S2 for results in face-, body-, and scene-selective regions of interest.
Next, we analyzed the trials during which no scene stimulus followed the cue presentation, modeling responses evoked by the symbolic cues in the absence of visual input. These analyses were performed to investigate the neural correlates of different preparatory states. Response patterns evoked by the “detect people” and “detect cars” cues were compared with response patterns evoked by actual pictures of people and cars from the independent category localizer (Fig. 1C). A greater similarity between response patterns of matching categories than those of nonmatching categories would provide evidence for content-specific preparatory activity in visual cortex. Confirming this hypothesis, OSC showed category-specific activity patterns in response to the cues (t13 = 2.0, P = 0.06; Fig. 3A), indicating that the symbolic search instruction alone activated category-selective populations in OSC. Unexpectedly, a similar effect was found in V1 (t13 = 2.9, P < 0.05; Fig. 3A), even though this region did not discriminate between the scenes containing cars and people (Fig. 2). No significant category-specific preparatory activity was found in V2/V3 (Fig. 3A), or in face-, body-, and scene-selective regions of visual cortex (P > 0.4, for all tests; Fig S3A). Finally, although multivoxel activity patterns discriminated the two cue types in OSC and V1, the overall amplitude of activity in these and other ROIs did not (P > 0.3, for all tests; Fig. S4).
Fig. 3.
Category-specific cue effect in visual cortex. (A) Category-specific cue effect in V1, V2/V3, and OSC for all participants (n = 14). The category-specific cue effect was calculated as in Fig. 1C. (B) Category-specific cue effect in V1, V2/V3, and OSC separately for the good (accuracy > 82%, n = 7) and poor participants (accuracy < 82%, n = 7). A Group (good/poor) × ROI (OSC/V1) ANOVA on category-specific cue effects showed a significant interaction between Group and ROI (F1,12 = 16.1, P < 0.005). Whereas the “good” participants showed highly significant category-specific cue activity in OSC (t6 = 6.0, P < 0.001) but not in V1 (t6=-0.7), the “poor” participants showed significant category-specific cue activity in V1 (t6 = 4.2, P < 0.005) but not in OSC (t6 = 0.7, P = 0.5).
Subsequently, we tested whether the content specificity (i.e., being more “car-like” or more “person-like”) of the preparatory activity on cue-only trials in V1 and OSC was related, across participants, to the speed and accuracy with which targets were detected on the cue-plus-scene trials. Importantly, the category specificity of cue-related responses in OSC was positively correlated with accuracy [accuracy (acc): r = 0.70, P < 0.01; Fig. 4A] and negatively correlated with response time [(RT): r = −0.89, P < 0.0001; Fig. 4B], indicating that content-specific preparatory activity in OSC was directly related to the speed and accuracy with which objects were detected. By contrast, the category specificity of cue-related responses in V1 was negatively correlated with accuracy (acc: r = −0.53, P < 0.05; Fig. 4A) and positively correlated with RT (r = 0.19, P = 0.5; Fig. 4B), suggesting that in early visual regions content-specific preparatory activity adversely affected object detection. The dissociation between OSC and V1 was confirmed in an additional analysis in which participants were divided in two groups based on their behavioral performance (Fig. 3B). These results suggest that different participants may have used different search strategies: an effective preparation strategy reflected in OSC activity, or an ineffective preparation strategy reflected in V1 activity.
Fig. 4.
Relation between category-specific cue effect in visual cortex and behavioral performance. The relation between the category specificity of cue-related activity (category-specific cue effect, horizontal axis) and (A) accuracy, and (B) response time is shown for areas V1 (red diamonds) and OSC (blue triangles). Each data point represents a participant. The category specificity of cue-related activity was positively related to behavioral performance in OSC, but negatively related to behavioral performance in V1. The category-specific cue effect was calculated as in Fig. 1C.
What could be the difference between effective and ineffective strategies? Effective preparation likely consists of anticipating relatively general category-diagnostic features, given the large variability in the visual appearance and location of target objects, and the large overlap of specific features between target and distractor objects in natural vision. Conversely, less effective strategies may consist of the preparation to detect specific visual features, such as those associated with a specific or canonical view of a car or person [e.g., horizontal (cars) versus vertical (persons) object segments]. To test this hypothesis, we conducted a behavioral study in which participants (n = 16) performed 6 runs of the same search task as used in the fMRI experiment, and filled out a posttest questionnaire probing their strategy (Table S1). Each of the statements related either to a general strategy, defined as the preparation to detect the target category at a relatively abstract level without vivid visual imagery (e.g., “after the person cue I formed a general idea of what a person in the scene may look like”), or a specific strategy, defined as a strategy during which participants would visually imagine specific visual features (e.g., “after the person cue I imagined persons with a prototypical posture as seen from the front”). First, although on average participants gave higher ratings to the general (3.2) than the specific (2.3) statements (t15 = 4.2, P < 0.001), there was variability in this difference across participants (range −0.2 to 2.5), indicating that participants reported to use different strategies. Next, we correlated the difference between these two averaged ratings (general minus specific) with behavioral performance across participants. This analysis revealed a strong positive correlation between the rating difference and accuracy (acc: r = 0.67, P < 0.005; RT: r = 0.03, P = 0.9), indicating that the use of the general strategy was beneficial to task performance relative to the use of the specific strategy. This correlation was a result of both a positive relation between accuracy and the rating for the general strategy (r = 0.52, P < 0.05; Fig. 5) and a negative relation between accuracy and the rating for the specific strategy (r = −0.42, P = 0.10; Fig. 5). This relation between search strategy and behavioral performance informs the interpretation of the fMRI results: the beneficial preparatory activity in OSC likely reflected the use of a relatively abstract strategy, whereas the disadvantageous preparatory activity in V1 may have reflected a more specific imagery-like strategy.
Fig. 5.
Relation between search strategy and behavioral performance. The degree to which participants reported to use a relatively general search strategy was positively correlated with accuracy (blue triangles), whereas a relatively specific imagery-like search strategy was negatively correlated with accuracy (red diamonds).
Finally, we explored the whole brain for regions that discriminated the scene types and for regions that showed facilitatory cue effects. We used a spherical searchlight approach (15) to test for each voxel in the brain, the degree to which multivoxel patterns in a 10-mm sphere around this voxel could discriminate the scene types based on the category-specific patterns from the independent localizer (identical to the category information measure in the ROI analysis). Results of the individual participants were spatially normalized to allow for a whole-brain group analysis testing for scene-discriminating regions throughout the brain (Materials and Methods). This analysis revealed two bilateral clusters in occipitotemporal cortex, closely overlapping the ventral and dorsal foci of OSC, thus confirming the results of the ROI analysis (Fig. 6). Three additional clusters were located in the frontal cortex: medial prefrontal cortex (mPFC; peak: xyz = 2, 43, 5; t13 = 6.1; P < 0.0001; cluster size: 594 mm3), right middle frontal gyrus (peak: xyz = 40, 11, 35; t13 = 6.0; P < 0.0001; cluster size: 729 mm3), and right precentral gyrus (peak: xyz = 59, −10, 32; t13 = 7.0; P < 0.0001; cluster size: 702 mm3). Interestingly, the mPFC cluster showed a significant category-specific cue effect (t13 = 2.5; P < 0.05), which was positively (although not significantly) correlated with accuracy (r = 0.40). No category-specific cue effect was found in the right middle frontal gyrus and precentral gyrus clusters (P > 0.5 for both tests). We used the same approach to test for regions in which the category-specific cue effect (calculated for each sphere, as in Fig. 1C) positively correlated with behavioral performance (expressed by a normalized behavioral performance score incorporating both accuracy and response time; Materials and Methods). This analysis revealed a large cluster in right occipitotemporal cortex (peak: xyz = 46, −58, 8; r = 0.87; P < 0.0001; cluster size: 1161 mm3), overlapping both OSC and scene-discriminating regions (Fig. 6). The cue-effect in this peak correlated significantly with both accuracy and response time (acc: r = 0.79, P < 0.001; RT: r = −0.76, P < 0.005). Furthermore, it showed significant intact versus scrambled object selectivity (t13 = 3.9; P < 0.005) and scene category information (t13 = 3.5; P < 0.005). Together, these whole-brain analyses indicate that facilitatory preparatory activity is primarily located within object-selective regions that discriminate target from distractor scenes. Furthermore, they provide evidence for mPFC involvement in real-world visual search, as this region both contained information about the categories of objects in the scenes and responded in a category-selective manner to search cues in the absence of visual input.
Fig. 6.
Whole-brain analyses. Results from whole-brain group analyses, overlaid on the group-average anatomical scan. (Left) The ventral (z = −18) and dorsal (z = 10) foci of OSC, activated by the univariate contrast between intact and scrambled objects in the OSC localizer (P < 0.001). (Center) Occipitotemporal and medial prefrontal clusters from the multivoxel searchlight analysis testing for spheres that discriminated between the two scene types (people vs. cars) on the basis of independent category localizer patterns (P < 0.001). (Right) The right occipitotemporal cluster from the multivoxel searchlight analysis testing for spheres with a positive correlation between the category-specific cue effect and behavioral performance (r > 0.78, corresponding to P < 0.001).


Our results provide evidence for content-specific activity patterns in visual cortex during the preparation to search for familiar object categories in cluttered natural scenes. Critically, preparatory activity in object-selective visual cortex, but not in early visual cortex, was found to facilitate the subsequent detection of never-before-seen category exemplars. These results indicate that the detection of complex objects in cluttered real-world scenes is selectively mediated by preparatory activity in higher levels of the visual hierarchy, where target scenes are discriminated from distractor scenes.
In a previous study (14), we found that attending to a particular object category in briefly presented natural scenes biased the processing of the scenes in favor of the attended category, as confirmed by the present results (Fig. 2). Notably, this effect was similar for spatially attended and spatially unattended scenes, indicating a global biasing mechanism that operates in parallel across the visual field and independent of spatial attention. Our present findings provide important evidence as to the origin of this biasing mechanism. Specifically, our results indicate that search preparation involves the priming or preactivation of neuronal populations that are selective to the target category. This internally generated activity then provides a competitive advantage for target stimuli, biasing the processing of scenes in favor of the attended category, and facilitating target detection across the visual field. The finding that the degree of category selectivity of the preparatory activity in OSC positively correlated with detection performance suggests that this preparatory mechanism is critical for efficient target detection.
By contrast to the beneficial effects of preparatory activity in OSC, preparatory activity in early visual cortex was negatively correlated with behavior: Participants who showed category-specific cue-evoked activity in V1 performed worse in the category detection task than participants who did not. Based on the known response properties of early visual cortex, an area selective to simple features such as line orientation, this result may indicate that participants who showed selective preparatory activity in V1 were preparing to detect specific visual features such as orientation cues, instead of (or in addition to) more abstract category diagnostic features. Although low-level features may differentially match the features of isolated category exemplars (e.g., cars have more horizontal segments than people), searching for low-level features is unlikely to be the optimal strategy for the detection of objects in cluttered natural scenes, in which many other objects share these low-level features. Our behavioral study confirmed this hypothesis by showing that a search strategy comprising the preparation to detect specific visual features was detrimental to detection performance.
The finding of top-down activity in visual cortex, in the absence of visual input, is not unique to visual search paradigms. For example, directing spatial attention to a particular location has been shown to increase activity in extrastriate regions responsive to stimuli at the attended location (16). Other studies have reported top-down activity of visual cortex in paradigms where participants were asked to visually memorize the orientation of a gabor stimulus (17, 18) or its color (18), were expecting a particular object shape (1012), or were asked to vividly imagine a specific category exemplar (19, 20) or letter (21). How do these previous findings of top-down activity relate to the preparatory activity reported here? First, an important difference between our visual search task and previous studies on attention, perceptual expectation, working memory, and visual imagery is that in our task, participants did not know what the target objects would look like or where they would appear. It is therefore unlikely that the category-specific cue effects in OSC reflect the visual imagery or memory of a specific shape, given the variability in appearance of target objects in the large number of unique scenes that were presented. Second, our findings are different from the effects of feature-based attention (22), unless one considers object category itself a feature (23). As argued above, and confirmed by our behavioral study, attention to particular low-level features (e.g., color or orientation) will not be helpful in performing the category detection task, in which target objects overlap heavily with distractor objects in terms of the low-level features that they share. Finally, and perhaps most importantly, although it is conceivable that the top-down modulatory mechanisms involved in visual search, feature-based attention, visual working memory, and visual imagery partially overlap, our results show that top-down preparatory activity is an integral part of real-world visual search in that it biases target-selective neuronal populations in favor of target objects, thereby facilitating their detection in cluttered scenes.
What constitutes the source of the preparatory activity patterns that we observed in visual cortex? Spatial attention studies have consistently implicated a fronto-parietal network as the source of spatial attention biases in visual cortex (24). Activity in this network precedes activity in visual cortex (25), and activity within these source regions is spatially specific (26). A similar fronto-parietal network has been implicated in feature-based attention (27), and feature-specific responses have been reported in parietal cortex (22). Finally, the prefrontal cortex is thought to exert top-down modulatory influences on visual cortex during working memory maintenance (28). In the present experiment, a whole-brain searchlight analysis identified the medial prefrontal cortex as a putative source of preparatory activity in visual cortex. This region showed category-selective responses to the scenes, similar to OSC. Furthermore, this category selectivity was already present during the preparation phase of the task, as would be expected for a source region. The mPFC region identified here may correspond to a region labeled superior orbital sulcus (SOS) in a previous study, where it was linked to the processing of scene context (29). Consistent with the present findings, it was hypothesized that SOS may maintain an updated representation of scene context to modulate and facilitate object processing in visual cortex (29). Future work needs to follow up on these findings using methods that are optimally suited to address the temporal flow of information in the brain.
In summary, the present study has demonstrated content-specific preparatory activity in OSC during real-world visual search. This preparatory activity biased neural population responses in favor of the task-relevant object category, thereby facilitating the detection of objects in cluttered natural scenes. Our results further suggest that preparatory visual activity is most effective when implemented at the level of visual cortex that discriminates target from distractor scenes, that is, in OSC. Finally, we identified a region in medial prefrontal cortex as a putative source of preparatory activity in visual cortex. Together, these findings provide a neural basis for visual search in our daily-life environment.

Materials and Methods


Fourteen healthy adult volunteers (six females) participated in two scanning sessions. All participants were right-handed with normal or corrected-to-normal vision and no history of neurological or psychiatric disease. Participants all gave informed written consent for participation in the study, which was approved by the Institutional Review Panel of Princeton University.


Natural scene pictures (n = 192) were selected from an online database (30). Forty-eight pictures contained one or more people (but no cars), 48 pictures contained one or more cars (but no people), 48 pictures contained both cars and people, and 48 pictures contained no cars and no people. The pictures were mostly photographs of city streets. The position, viewpoint, and size of the people and cars in the pictures were highly variable, mimicking real-world viewing conditions (see Fig. S1 for sample pictures). Forty-eight different perceptual masks were created. Each was a colored picture of a mixture of white noise at different spatial frequencies on which a naturalistic texture was superimposed (31). Within a scanning session each of the presented pictures was unique. The same set of pictures was used in a second scanning session (see General procedure). All pictures were full-color photographs reduced to 480 (vertical) × 640 (horizontal) pixels. The pictures (size: 7.5° × 10.0°) were presented centrally on top of a fixation cross. A projector outside the scanner room displayed the stimuli onto a translucent screen located at the end of the scanner bore. Participants viewed the screen through a mirror attached to the head coil.

General Procedure.

Each volunteer participated in two scanning sessions separated by, on average, 34 d. Each session consisted of eight runs of the main experiment, two runs of the category pattern localizer, and two runs of the OSC localizer. Data were analyzed using the AFNI software package and MATLAB (MathWorks, Natick, MA).

Main Experiment.

Each run consisted of 42 trials and lasted for 210 s. Of these 42 trials, 24 trials were cue-plus-scene trials, 12 were cue-only trials, and 6 were fixation-only (“null”) trials. Each run started and ended with a blank screen showing a fixation cross for 12.25 s. Each experimental trial started with a 0.4-s blank screen, followed by a 0.5-s presentation of a symbolic cue indicating the target category. The cue was followed by a fixation cross that was presented for 2 s, 2.25 s, or 3 s (equiprobable). For cue-plus-scene trials, the scene picture was then presented for 0.1 s, followed directly by a 0.3-s presentation of a perceptual mask and 0.7 s of fixation until the next trial. For cue-only trials, a 0.4-s fixation cross was presented instead of the scene and mask. The average trial duration was 4.4 s (see Fig. 1 for an overview of the trial layout).
Of the 36 experimental trials (24 cue-plus-scene trials and 12 cue-only trials) in each run, 18 were “detect people” trials and 18 were “detect cars” trials. The symbolic cues for people-detection trials were “B” (runs 1–4) and “2” (runs 5–8), whereas the cues for car-detection trials were “C” (runs 1–4) and “3” (runs 5–8). Of the 24 scenes presented in each run, 6 contained one or more people (but no cars), 6 contained one or more cars (but no people), 6 contained both cars and people, and 6 contained no cars and no people. Trial order was randomized (without replacement). The task was to press one button for the presence of the target category in the relevant picture pair and another button for the absence of the target category. The mapping of the two buttons (index and middle finger) to present and absent responses was counterbalanced across sessions and participants.

Category Pattern Localizer.

Category-selective patterns of activation were established using a separate localizer experiment. Stimuli were presented centrally, had a size of 12° × 12° and showed isolated objects shown on a white background. The experiment consisted of four conditions: human bodies, cars, outdoor scenes, and faces. One scanning run consisted of 21 blocks of 14 s each. Blocks 1, 6, 11, 16, and 21 were fixation-only baseline epochs. In each of the remaining blocks, 20 different stimuli from one category were presented. Each stimulus appeared for 350 ms, followed by a blank screen for 350 ms. Twice during each block, the same picture was presented two times in succession. Participants were required to detect these repetitions and report them with a button press (1-back task). Each participant was tested with two different versions of the experiment that counterbalanced for the order of the blocks. In both versions, assignment of category to block was counterbalanced, so that the mean serial position in the scan of each condition was equated.

OSC Localizer.

OSC was identified using a localizer scan with a design identical to that of the category pattern localizer described above, except that pictures of intact and scrambled objects were presented in alternating blocks.

Data Acquisition and Preprocessing.

Functional [EPI sequence; 34 slices per volume; resolution = 3 × 3 × 3 mm with 1-mm gap; repetition time (TR) = 2.0 s; echo time (TE) = 30 ms; flip angle = 90°) and anatomical (MPRAGE sequence; 256 matrix; TR, 2.5 s; TE, 4.38 ms; flip angle, 8°; 1 × 1 × 1 mm resolution) images were acquired with a 3T Allegra MRI scanner (Siemens, Erlangen, Germany). Functional data were slice-time corrected and motion corrected, and low-frequency drifts were removed with a temporal high-pass filter (cutoff of 0.006 Hz). Only data used for ROI definition were spatially smoothed with a Gaussian kernel (4-mm full-width half-maximum). No spatial smoothing was applied on data used for any of the other analyses.

ROI Definition.

OSC was defined for each participant in native space, by contrasting responses evoked by intact objects with responses evoked by scrambled objects, at P < 0.05 (uncorrected). V1 (Brodmann area 17) and V2/V3 (Brodmann area 18) were defined using the Talairach atlas implemented in AFNI (“TT_Daemon”), and projected back to each participant's native space. Brodmann areas 17 and 18 have been shown to closely correspond to V1 and V2/V3, respectively (32). The mean size, in number of voxels, of the ROIs were: OSC: 400 (SD = 123), V1: 255 (SD = 22), V2/V3: 1,166 (SD = 94). Left and right hemisphere ROIs were combined.

Statistical Analysis.

For each participant, general linear models were created for the main experiment and the category pattern localizer experiment. One predictor (convolved with a standard model of the hemodynamic response function) modeled each condition. All trials were included in the analyses. Regressors of no interest were also included to account for differences in the mean MR signal across scans and for head motion within scans. These regression analyses resulted, for each voxel, in a t value for each condition in the main experiment and for each condition in the localizer experiment. Following previous studies (13), we normalized these t values by subtracting, for each voxel, the mean t value across the relevant conditions of an experiment (e.g., the mean of bodies and cars in the category localizer) from the t value of each individual condition of this experiment (e.g., bodies and cars). This normalization resulted in the mean t value of each voxel being zero, thereby eliminating the effect of voxelwise response differences that were unspecific to our conditions but leaving condition-related variation intact. The normalized t values of conditions in the main experiment were correlated, across the voxels of an ROI, with the normalized t values of the body and car conditions in the localizer (Fig. 1; see ref. 14). The analysis was performed for each participant and session separately. Correlations were Fisher transformed [0.5 × loge((1 + r)/(1 − r))] before averaging the two sessions and statistical testing. Differences between voxelwise correlations were then tested using repeated-measures ANOVAs and t tests (two-tailed) with participant (n = 14) as random factor.

Searchlight Analysis.

A whole-brain pattern analysis was performed using a spherical searchlight (15). For each voxel in the brain. we computed voxelwise correlations in a sphere of 10-mm radius (corresponding to 121 voxels) around this voxel. The voxelwise correlations were computed as described in Statistical Analysis. The correlation values from each sphere were Fisher transformed and assigned to the center voxel of this sphere. The correlations were computed for each subject and session separately. Results were transformed into Talairach space (which included resampling to 3 × 3 × 3 mm voxels), the correlations of the two sessions were averaged for each subject, and random-effects group analyses were performed.
The first searchlight analysis tested for regions that discriminated between the two scene types (containing people or cars) based on the category localizer patterns (isolated pictures of bodies or cars). The average correlation between matching categories was contrasted with the average correlation between nonmatching categories. The threshold was set to P < 0.001 (uncorrected) and a minimum cluster size of 20 (resampled to 3 × 3 × 3 mm) voxels. The second searchlight analysis tested for regions in which the category-specific cue effect was correlated with behavior. The category-specific cue effect was calculated for each sphere as illustrated in Fig. 1C. The normalized behavioral performance score was calculated by taking the mean of the normalized RT score (multiplied by −1, such that higher scores reflected better performance) and the normalized accuracy score. Normalization consisted of subtracting the mean value from each participant's value, and dividing this by the SD of the group mean. The threshold was set to r > 0.78 (corresponding to P < 0.001) and a minimum cluster size of 20 (resampled) voxels.


This work was supported by National Institutes of Health Grants R01-EY017699 and R01-MH064043 and National Science Foundation Grant BCS-1025149.

Supporting Information

Supporting Information (PDF)
Supporting Information


S Thorpe, D Fize, C Marlot, Speed of processing in the human visual system. Nature 381, 520–522 (1996).
FF Li, R VanRullen, C Koch, P Perona, Rapid natural scene categorization in the near absence of attention. Proc Natl Acad Sci USA 99, 9596–9601 (2002).
JM Wolfe, ML Võ, KK Evans, MR Greene, Visual search in scenes involves selective and nonselective pathways. Trends Cogn Sci 15, 77–84 (2011).
J Duncan, GW Humphreys, Visual search and stimulus similarity. Psychol Rev 96, 433–458 (1989).
JM Wolfe, KR Cave, SL Franzel, Guided search: an alternative to the feature integration model for visual search. J Exp Psychol Hum Percept Perform 15, 419–433 (1989).
DJ Simons, RA Rensink, Change blindness: past, present, and future. Trends Cogn Sci 9, 16–20 (2005).
R Desimone, J Duncan, Neural mechanisms of selective visual attention. Annu Rev Neurosci 18, 193–222 (1995).
L Chelazzi, J Duncan, EK Miller, R Desimone, Responses of neurons in inferior temporal cortex during memory-guided visual search. J Neurophysiol 80, 2918–2940 (1998).
L Chelazzi, EK Miller, J Duncan, R Desimone, A neural basis for visual search in inferior temporal cortex. Nature 363, 345–347 (1993).
M Stokes, R Thompson, AC Nobre, J Duncan, Shape-specific preparatory activity mediates attention to targets in human visual cortex. Proc Natl Acad Sci USA 106, 19569–19574 (2009).
M Esterman, S Yantis, Perceptual expectation evokes category-selective cortical activity. Cereb Cortex 20, 1245–1253 (2010).
AM Puri, E Wojciulik, C Ranganath, Category expectation modulates baseline and stimulus-evoked activity in human inferotemporal cortex. Brain Res 1301, 89–99 (2009).
JV Haxby, et al., Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293, 2425–2430 (2001).
MV Peelen, L Fei-Fei, S Kastner, Neural mechanisms of rapid natural scene categorization in human visual cortex. Nature 460, 94–97 (2009).
N Kriegeskorte, R Goebel, P Bandettini, Information-based functional brain mapping. Proc Natl Acad Sci USA 103, 3863–3868 (2006).
S Kastner, MA Pinsk, P De Weerd, R Desimone, LG Ungerleider, Increased activity in human visual cortex during directed attention in the absence of visual stimulation. Neuron 22, 751–761 (1999).
SA Harrison, F Tong, Decoding reveals the contents of visual working memory in early visual areas. Nature 458, 632–635 (2009).
JT Serences, EF Ester, EK Vogel, E Awh, Stimulus-specific delay activity in human primary visual cortex. Psychol Sci 20, 207–214 (2009).
KM O'Craven, N Kanwisher, Mental imagery of faces and places activates corresponding stiimulus-specific brain regions. J Cogn Neurosci 12, 1013–1023 (2000).
L Reddy, N Tsuchiya, T Serre, Reading the mind's eye: decoding category information during mental imagery. Neuroimage 50, 818–825 (2010).
M Stokes, R Thompson, R Cusack, J Duncan, Top-down activation of shape-specific population codes in visual cortex during mental imagery. J Neurosci 29, 1565–1572 (2009).
JT Serences, GM Boynton, Feature-based attentional modulations in the absence of direct visual stimulation. Neuron 55, 301–312 (2007).
A Treisman, How the deployment of attention determines what we see. Vis Cogn 14, 411–443 (2006).
S Kastner, LG Ungerleider, Mechanisms of visual attention in the human cortex. Annu Rev Neurosci 23, 315–341 (2000).
SL Bressler, W Tang, CM Sylvester, GL Shulman, M Corbetta, Top-down control of human visual cortex by frontal and parietal cortex in anticipatory visual spatial attention. J Neurosci 28, 10056–10061 (2008).
SM Szczepanski, CS Konen, S Kastner, Mechanisms of spatial attention control in frontal and parietal cortex. J Neurosci 30, 148–160 (2010).
T Egner, et al., Neural integration of top-down spatial and feature-based information in visual search. J Neurosci 28, 6141–6151 (2008).
A Gazzaley, J Rissman, M D'Esposito, Functional connectivity during working memory maintenance. Cogn Affect Behav Neurosci 4, 580–599 (2004).
M Bar, Visual objects in context. Nat Rev Neurosci 5, 617–629 (2004).
BC Russell, A Torralba, KP Murphy, WT Freeman, LabelMe: a database and web-based tool for image annotation. Int J Comput Vis 77, 157–173 (2008).
DB Walther, E Caddigan, L Fei-Fei, DM Beck, Natural scene categories revealed in distributed patterns of activity in the human brain. J Neurosci 29, 10573–10581 (2009).
AM Wohlschläger, et al., Linking retinotopic fMRI mapping and anatomical probability maps of human occipital areas V1 and V2. Neuroimage 26, 73–82 (2005).

Information & Authors


Published in

Go to Proceedings of the National Academy of Sciences
Go to Proceedings of the National Academy of Sciences
Proceedings of the National Academy of Sciences
Vol. 108 | No. 29
July 19, 2011
PubMed: 21730192


Submission history

Published online: July 5, 2011
Published in issue: July 19, 2011


  1. attention
  2. categorization
  3. natural vision
  4. object detection


This work was supported by National Institutes of Health Grants R01-EY017699 and R01-MH064043 and National Science Foundation Grant BCS-1025149.


This article is a PNAS Direct Submission.



Marius V. Peelen1 [email protected]
Center for Mind/Brain Sciences, University of Trento, 38068 Rovereto, Italy; and
Department of Psychology and
Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544
Sabine Kastner
Department of Psychology and
Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544


To whom correspondence should be addressed. E-mail: [email protected].
Author contributions: M.V.P. and S.K. designed research; M.V.P. performed research; M.V.P. analyzed data; and M.V.P. and S.K. wrote the paper.

Competing Interests

The authors declare no conflict of interest.

Metrics & Citations


Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.

Citation statements



If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by


    View Options

    View options

    PDF format

    Download this article as a PDF file


    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to get full access to it.

    Single Article Purchase

    A neural basis for real-world visual search in human occipitotemporal cortex
    Proceedings of the National Academy of Sciences
    • Vol. 108
    • No. 29
    • pp. 11727-12185







    Share article link

    Share on social media