## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Novel neural circuit mechanism for visual edge detection

Contributed by Charles F. Stevens, December 1, 2014 (sent for review July 10, 2014; reviewed by Larry F. Abbott and Vijay Balasubrimanian)

## Significance

This paper is significant for three reasons. First, it proposes a new way that the visual cortex carries out a computation by simply having differing abundances of neurons that vary in a particular property (how large a piece of the visual world they report on). Second, it proposes a new use for population codes. Finally, it establishes the role for a quantitative anatomical feature of primary visual cortex: Abundances that differ in the way observed in experiment serve to apply an edge detection filter to the image.

## Abstract

The primary visual cortex is organized in a way that assigns a specific collection of neurons the job of providing the rest of the brain with all of the information it needs about each small part of the image present on the retina: Neighboring patches of the visual cortex provide the information about neighboring patches of the visual world. Each one of these cortical patches—often identified as a “pinwheel”—contains thousands of neurons, and its corresponding image patch is centered on a particular location in the retina. For stimuli within their image patch, neurons respond selectively to lines or edges with a particular slope (orientation tuning) and to regions of the patch of different sizes (known as spatial frequency tuning). The same number of neurons is devoted to reporting each possible slope (orientation). For the cells that cover different-sized regions of their image patch, however, the number of neurons assigned depends strongly on their preferred region size. Only a few neurons report on large and small parts of the image patch, but many neurons report visual information from medium-sized areas. I show here that having different numbers of neurons responsible for image regions of different sizes actually carries out a computation: Edges in the image patch are extracted. I also explain how this edge-detection computation is done.

Single neurons in sensory systems generally have receptive fields that are tuned to some particular characteristics of the environment, and the firing rate of these neurons signals how well the stimulus matches the preferred environmental features. For example, neurons in primary visual cortex (V1) have receptive fields that are tuned to stimulus location, the orientation of edges that enter the receptive field, and to an image property that is related to the size of the receptive field (spatial frequency). The firing rate of these neurons depends jointly on all three variables. In addition, firing rate reflects the contrast of the stimulus—how strongly the relevant stimulus features differ from the background—and not absolute stimulus intensity (1). Our ability to detect a particular feature in the environment depends not just on the firing rate of individual neurons but also on the number of neurons tuned to those features in the population of cortical neurons [a population code (2⇓⇓⇓⇓–7)]. The percept related to a stimulus feature depends, then, jointly on the number of neurons that are responding to the feature and the firing rates of the individual cells. This dual encoding is, as described in the next paragraph, dramatically illustrated in Campbell and Robson’s figure (first published in ref. 8) (Fig. 1) in which a sine-wave grating is presented with spatial frequency increasing along the horizontal axis and contrast of the grating decreasing along the vertical axis. This figure “draws out” the contrast function for our visual system by appearing, at each spatial frequency, to be gray above a certain contrast and to show the grating below that contrast level (9). Thus, one can directly visualize the visual system’s contrast sensitivity as a function of spatial frequency.

Individual V1 neurons are tuned to a preferred spatial frequency, but the number of neurons in a cortical region tuned to a particular spatial frequency is a maximum for some specific frequency (the peak of the curve in the Campbell–Robson figure) and decreases sharply for higher and lower frequencies (1). Our perception of the Campbell–Robson figure depends, as is described in more detail below, jointly on the tuning of individual cells, which fire at about the same rate to their best spatial frequency, and on the number of cells with each best frequency.

My goal here is to explore the computational consequences of this dual encoding mechanism. I conclude that weighting different spatial frequencies selectively by devoting different numbers of neurons to each spatial frequency can result in a previously unappreciated way for the brain to extract edges from visual scenes. This sort of extraction could, for example, contribute to our ability to recognize line drawings of objects very easily even though line drawings are so different from the image they represent. Note that the emphasis of edges by lateral inhibition present in the V1 receptive fields (8) is quite different from the extraction of edges described here in which the response to regions of constant luminance is completely suppressed.

The following presentation is arranged in five sections. In the first section, I present a function that specifies the number of neurons devoted to each spatial frequency, and in the second section, I give a simplified version of my argument for how a neural circuit does edge detection. The third section then describes the response of individual neurons and their relation to the V1 population response, and in the fourth section I show why assigning different numbers of neurons to different spatial frequencies is equivalent to filtering the visual scene. The final section identifies this filter as an edge detector.

## Results and Discussion

### More V1 Neurons Are Devoted to Some Spatial Frequencies Than to Others.

The resolution with which the visual scene is represented in V1 differs for various V1 simple cells (a simple cell is a particular type of V1 neuron that constitutes about half of the total neuron population in V1), and this resolution is quantified by the spatial frequency preference associated with each cell. Some spatial frequencies have more cells devoted to them than other spatial frequencies, and here I present an empirically derived function, a *k*th-order gamma distribution function, that gives an accurate description for the relative number of neurons assigned to each spatial frequency.

Surveys of the fraction of neurons with each spatial frequency preference are available for regions of V1 that correspond to different eccentricities (distance from the center of vision) in the visual field (1, 10). The results of one such survey (data provided in Table S1) are presented in Fig. 2*A* where the cumulative fraction of simple cells is displayed on the ordinate and their preferred spatial frequency on the abscissa. The three cumulative histograms refer to eccentricities, measured in degrees of visual angle, of 0°–5°, 10°–20°, and 20°–40°. As is apparent from Fig. 2*A*, the preferred spatial frequencies decrease with eccentricity.

I argue below that giving different weights to the various spatial frequencies by assigning different numbers of cells to each spatial frequency, as shown in Fig. 1, effectively applies an edge detector filter to the visual scene. To make this interpretation, I need an accurate empirical description of the data presented in Fig. 2. To display such an empirical description, each of the experimentally measured histograms in Fig. 2*A* has been fitted with a gamma distribution *k* = 9 described by the equation*a*(*E*) *=* 6.53, 9.51, and 12.04 for the eccentricities *E* = 0°–5°, 10°–20°, and 20°–40°. This empirical fit clearly gives a good description of the experimentally determined histograms. The scale parameter *a*(*E*) in this equation is found to depend approximately on eccentricity *E* according to the relation*E*, and *B*, and these functions account for the part of the Campbell–Robson effect that depends on weighting by V1 neuron number for various spatial frequencies.

### The Main Argument Described for a One-Dimensional Visual Cortex.

The point of this section is to make the argument of this paper in the simple case of a one-dimensional cortex. Reducing the dimensionality of the problem eliminates many of the complications of the actual cortex, but retains the key feature that the properties of objects are preserved by the computations carried out by V1 (11).

Imagine a 2D creature with a one-dimensional visual cortex. This creature can move its eye to translate the visual scene, but it cannot change the image size (distance from the creature) or rotate the image (by tilting its head). The visual scene is described as a function *x*).

The receptive field of a V1 cell, in this simple case, is

Now suppose that different numbers of neurons

The remainder of this paper is devoted to showing that the conclusion just reached—assigning different numbers of neurons to different cell types can result in filtering an image—also applies to the more complicated filtering operation carried out by the actual V1.

### Relation of Individual Neuron Responses to the V1 Population Code.

At the outset, I pointed out that the population response to a particular stimulus is the firing rates of neurons caused by that stimulus times the number of neurons that are responding. In this section, I present an equation that describes this population response for V1 simple cells. The firing rate of each simple cell is a coefficient in a particular wavelet transform of the visual scene, and I describe this relation. Furthermore, the number of neurons assigned to each spatial frequency is given by the equation presented in the first section, and this equation provides the weight on the firing rates of neurons with a particular spatial frequency preference to define the population code V1 uses.

As several authors have noted, simple cells in V1 can be described as performing a wavelet transform (11, 12). What this means is that an idealized description of the V1 computation conforms to a wavelet transform in which the firing rate of each V1 simple cell gives the weight on one basis function in a wavelet transform of the visual scene and that the range of receptive fields present in V1 is sufficient to provide an accurate representation of the image as presented to V1. The relation between the response of V1 simple cells and a wavelet transform is explained in more detail below.

#### Description of simple cell firing rates.

Each location in the visual scene is associated with a location (*x*, *y*) on the V1 map of the visual world, and locations in the visual world are labeled by their coordinates in the V1 map. Also, each simple cell at V1 map location *x*, *y*) with a firing rate that depends on (*y*) axis; this simplification makes the following argument less complicated, and the result obtained immediately generalizes to any orientation.

In the following, a complex receptive field describes a pair of neurons at the same location, one whose receptive field is an even function (the real part) and the other whose receptive field is an odd function (the imaginary part). The receptive fields for a pair of simple cells, each with parameters *x*, *y*) can be described by a Gabor function (13⇓⇓–16)

The receptive fields *x* direction, by translation of *y* direction, and by dilation by

The firing rate response *S*(*x*, *y*) is the inner product

Many types of wavelet transforms are known, and any specific wavelet transform is actually a member of a large family (12, 22). Unlike Fourier or Laplace transforms, which are defined by the use of a single type of basis function (like *x* domain to the frequency

A wavelet transform of a 2D image function *S*(*x*, *y*) with the receptive field *S*(*x*, *y*).

### Relation of Individual Cell Firing and the Population Response.

There are, however, two problems with this identification of the response of V1 simple cells with a wavelet transform based on Gabor functions. The first, and uninteresting, problem is that the way

One of the properties a function like *C* is a constant independent of *x* direction, translated by *y* direction, and scaled by multiplying the arguments by *S*(*x*, *y*) must be either*i* omitted from the *S* denotes the wavelet transform of the function *S*(*x*, *y*) here and throughout.

At the outset I pointed out that the population response of V1 simple cells is, at least in part, responsible for our percept of the visual world. If one assumes that the representation of a particular spatial frequency tuning at a given location in V1 depends linearly on the number of cells with that *F* and the wavelet transform of the image given above. Note that I have selected *E* (eccentricity) and *k* (order of the

### Weighting Certain Spatial Frequencies with Different Numbers of Cells Filters the Visual Image.

According to the discussion above, the population response of simple cells at any particular location in V1 is related to the wavelet transform of the part of the image represented at that V1 location that has been weighted by

If the convolution theorem used in the second section held for wavelet transforms, then multiplication by

Unfortunately, however, the convolution theorem does not, in general, hold for wavelet transforms. Nevertheless, as I show in *Methods*, a version of the convolution theorem does hold in special circumstances that apply to V1: The equation just above is indeed approximately true for V1 (*Methods*). Note that the filter function *K*(*x*) depends on *x* but not on *y* (parallel to the edge orientation assumed throughout), so that the filter function operates perpendicular to the long axis of the receptive field Gabor function. Had I considered other orientations, the filter function would always operate perpendicular to the long axis of the rotated Gabor function. Note also that I could have selected division of

### Filtering in V1 by the Observed Unequal Cell Numbers for Spatial Frequency Carries out Edge Detection.

Our goal of this last section is to identify the filter function

In *Methods* I show that *x* (measured in degrees of visual angle) is presented in Fig. 3 for an eccentricity of 20°–40°, and it can be seen to have the form of an edge detector. Note the appearance of Mach bands (the pair of smaller humps in Fig. 3) that are usually attributed to lateral inhibition (8).

Recall that I selected

I conclude, then, that weighting spatial frequencies the way V1 does (by devoting more cells to some spatial frequencies than to others) is a very rapid and efficient way to carry out edge detection filtering on an image. This is a new method of computation by neural circuits that may be used in other brain areas in which the V1-like computation has been conserved by evolution.

I have shown that assigning different numbers of cells to each spatial frequency carries out edge detection, but why have different preferred spatial frequencies at all? Why not just use a receptive field with a form that does edge detection rather that bothering to have so many different spatial frequencies? All of the information in the visual scene must be retained for some computations—like object recognition—and by using different cell numbers for each spatial frequency, both the information in the original image and that in its edge-detected version are available.

All of the arguments above have been formulated in terms of the average firing rates in the population code I present, and I have not considered the effects of noise and other idealizations in the description above. For the cells with heavy weighting—those with a preferred spatial frequency near the peak of the density functions in Fig. 2*B*—I expect that, even very near threshold, noise in the average would be small. I have not considered, however, the effects of noise on the weakly weighted cells near threshold where noise would be greater compared with the signal. Worse, I have ignored many other idealizations.

It has always been something of a mystery why we can easily recognize objects represented by simple line drawings that are so very different from the image of the object that they represent. I argue that line drawings tap into the natural representation of visual scenes that our visual cortex uses as described above.

## Methods

### How Unequal Cell Numbers Can Perform a Filtering Operation.

The goal here is to show that multiplying simple cell firing rates in response to an image *B*.

Recall that the population response

Because the spatial variables

Most of the results that apply to wavelet transforms are general in that they hold no matter what particular mother wavelet is being used. In some cases, however, a result applies only to wavelets based on a particular class of mother wavelets. Here I am concerned only with Gabor wavelets, so I consider mother wavelets *H*(*x*) is the Gaussian part of the Gabor wavelet, although the argument to follow does not depend *H* being a Gaussian.

I start with the equation that describes the wavelet transform of the filtered image function *S*,

where *K*(*x*). This equation is used to establish the relation between *K*(*x*) and its Fourier transform

If I start with the equation for

In the second line I changed variables to *u* and identify the integral on the left as *K*(*t*).

The preceding approximation establishes the general idea. To calculate the first correction term to the preceding approximation, expand

where the second term on the right is the first correction term; of course, higher-order corrections can also be calculated. Here

### The Filter Function *K*(*x*) Is an Edge Detector.

The *k* = 9 order gamma density function is a member of a class of densities that are known as “infinitely divisible,” which just means they are found by carrying out a *k*-fold convolution of a source function. In this case the source function is an exponential, so the gamma density function

where the exponent *k** in the last expression indicates the

The inverse Fourier transform *k*th power of the inverse Fourier transform of the exponential enclosed in parentheses above. The inverse Fourier transform

so (up to a multiplicative constant involving

The function

where *k* = 9,

Note that *x* here is in the direction normal to the *G* would always be normal to the rotated

## Acknowledgments

I am indebted to Prof. J. Anthony Movshon for supplying the data presented in Table S1 and for helpful suggestions about an earlier draft. This work was supported by the National Science Foundation under Grants PHY-1444273 and PHY-1066393 and by the hospitality of the Aspen Center for Physics, where much of the work was done.

## Footnotes

- ↵
^{1}Email: stevens{at}salk.edu.

Author contributions: C.F.S. designed research, performed research, contributed new reagents/analytic tools, analyzed data, and wrote the paper.

Reviewers: L.F.A., Columbia University; and V.B., University of Pennsylvania.

The author declares no conflict of interest.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1422673112/-/DCSupplemental.

## References

- ↵.
- Movshon JA,
- Thompson ID,
- Tolhurst DJ

- ↵
- ↵
- ↵.
- Georgopoulos AP,
- Schwartz AB,
- Kettner RE

- ↵.
- Lampl I,
- Ferster D,
- Poggio T,
- Riesenhuber M

- ↵
- ↵.
- Shadlen MN,
- Newsome WT

- ↵.
- Ratliff F

*Mach Bands: Quantitative Studies on Neural Networks in the Retina*(Holden-Day, San Francisco), 1st Ed - ↵.
- Campbell FW,
- Robson JG

- ↵.
- Issa NP,
- Trepel C,
- Stryker MP

- ↵.
- Stevens CF

- ↵.
- Mattat S

- ↵
- ↵.
- Jones JP,
- Palmer LA

- ↵
- ↵.
- Ringach DL

- ↵
- ↵
- ↵.
- Bosking WH,
- Zhang Y,
- Schofield B,
- Fitzpatrick D

- ↵
- ↵
- ↵.
- Louis AK,
- Maass P,
- Rieder A

## Citation Manager Formats

## Article Classifications

- Biological Sciences
- Neuroscience

## Sign up for Article Alerts

## Jump to section

## You May Also be Interested in

_{2.5}in 2011, with a societal cost of $886 billion, highlighting the importance of modeling emissions at fine spatial scales to prioritize emissions mitigation efforts.