On a common circle: Natural scenes and Gestalt rules
See allHide authors and affiliations

Communicated by A. James Hudspeth, The Rockefeller University, New York, NY (received for review October 25, 2000)
Abstract
To understand how the human visual system analyzes images, it is essential to know the structure of the visual environment. In particular, natural images display consistent statistical properties that distinguish them from random luminance distributions. We have studied the geometric regularities of oriented elements (edges or line segments) present in an ensemble of visual scenes, asking how much information the presence of a segment in a particular location of the visual scene carries about the presence of a second segment at different relative positions and orientations. We observed strong longrange correlations in the distribution of oriented segments that extend over the whole visual field. We further show that a very simple geometric rule, cocircularity, predicts the arrangement of segments in natural scenes, and that different geometrical arrangements show relevant differences in their scaling properties. Our results show similarities to geometric features of previous physiological and psychophysical studies. We discuss the implications of these findings for theories of early vision.
One of the most difficult problems that the visual system has to solve is to group different elements of a scene into individual objects. Despite its computational complexity, this process is normally effortless, spontaneous, and unambiguous (1). The phenomenology of grouping was described by the Gestalt psychologists in a series of rules summarized in the idea of good continuation (2, 3). More quantitative psychophysical measurements have shown the existence of association fields (4) or rules that determine the interaction between neighboring oriented elements in the visual scene (5, 6). Based on these rules and on the Gestalt ideas, pairs of oriented elements that are placed in space in such a way that they extend on a smooth contour joining them will normally be grouped together.
These psychophysical ideas have been steadily gaining solid neurophysiological support. Neurons in primary visual cortex (V1) respond when a bar is presented at a particular location and at a specific orientation (7). In addition, the responses of V1 neurons are modulated by contextual interactions (6, 8–15), such as the joint presence of contour elements within the receptive field and in its surround. This modulation depends on the precise geometrical arrangement of linear elements (6, 16) in a manner corresponding to the specificity of linkage of cortical columns by longrange horizontal connections (17, 18). Thus, neurons in V1 interact with one another in geometrically meaningful ways, and through these interactions, neuronal responses become selective for combinations of stimulus features that can extend far from the receptive field core.
The rules of good continuation, the association field, and the connections in primary visual cortex provide evidence of interaction of pairs of oriented elements at the psychophysical, physiological, and anatomical level. The nature of the interaction is determined by the geometry of the arrangement, including spatial arrangement and the orientation of segments within the visual scene. An important question is whether this geometry is related to natural geometric regularities present in the environment. It is well known that natural images differ from random luminance distributions (19, 20), but the structural studies of natural scenes have not yet addressed the existence of geometrical regularities. We address this issue here by studying whether particular pairs of oriented elements are likely to cooccur in natural scenes as a function of their orientation and relative location in space. Our results are focused on two different aspects of the organization of oriented elements in natural scenes: scaling and geometric relationships. We will show that these two are interdependent.
Scaling measurements involve studying how the probability of finding a cooccurring pair changes as a function of the relative distance. A classic result in the analysis of natural scenes is that the luminance of pairs of pixels is correlated and that this correlation is scaleinvariant (19, 20). This indicates that statistical dependencies between pairs of pixels do not depend on whether the observer zooms in on a small window or zooms out to a broad vista. The scale invariance results from stable physical properties such as a common source of illumination and the existence of objects of different sizes and similar reflectance properties (21). We show here that for particular geometries, the probability of finding a pair of segments follows a power law relation and thus is scaleinvariant. We show further that a very simple geometric rule, consistent with the idea of good continuation, predicts the arrangement of segments in natural scenes.
Materials and Methods
Images were obtained from a publicly available database (http://hlab.phys.rug.nl/imlib/index.html; ref. 22) of about 4,000 uncompressed black and white pictures, 1,536 × 1,024 pixels in size and 12 bits in depth, with an angular resolution of ≈1 min of arc per pixel. This particular database was chosen because of the high quality of its pictures, especially in their lack of motion and compression artifacts, which would otherwise overwhelm our statistics. To obtain a measure of local orientation, we used the steerable filters of the H_{2} and G_{2} basis (23). By using steerable filters, the energy value at any orientation can be calculated by extrapolating the responses of a set of basis filters. A G_{2} filter is a second derivative of a Gaussian and the H_{2} filter is its Hilbert transform. H_{2} and G_{2} filters have the same amplitude spectra, but they are 90° out of phase; that makes them quadrature pair basis filters. The size of the filters used was 7 × 7 pixels. A measure of oriented energy was obtained by combining both sets of filters E(ϕ) = G_{2}^{2}(ϕ) + H_{2}^{2}(ϕ) (23). This measure is repeated at every pixel of the image to obtain the energy function for each image (n) of the ensemble {E_{n}(x, y, ϕ)}. To study the joint statistics of E(x, y, ϕ), we discretized the different orientations at 16 different values, 0 = (−π/32, π/32), 1 = (π/32, 3π/32), . . . , 15 = (29π/32, 31π/32), as shown in the color representation of orientations of Fig. 1. With this information one can obtain a measure of the statistics of pairs of segments by calculating the correlation (weighting the cooccurrences of segments by their energy). where N is the total number of images and the integral is over each of the images of the ensemble. We were interested in measuring longrange correlations so we studied values of Δx, Δy = {−256, 256}. The correlation matrix has dimensions 512 × 512 × 16 × 16 and each point results from averaging 4,000 integrals over a 1,536 × 1,024 domain. To simplify the computations, for the general case, we decided to store at each pixel, for every image, the maximum energy value E(ϕ_{max}) and its corresponding orientation ϕ_{max}. An energy threshold E_{T} was arbitrarily set to match the visual perception of edges in a few images. Pixels in an image were considered “oriented” if E(ϕ_{max}) ≥ E_{T}, and “nonoriented” otherwise. This unique threshold value was applied to all images in the ensemble. Thus, for each image, we extracted a binary field E_{n}^{bin}(x, y) = {0, 1} and an orientation field Ang_{n}(x, y) = {1, . . . , 16}. From this binary field we can construct a histogram of cooccurrences: how many times an element at position (x, y) was considered oriented with orientation ϕ and at position (x + Δx, y + Δy) a segment was considered oriented with orientation ψ. Thus, formally, the histogram is obtained as C, taking as the Energy function E_{n}(x, y, ϕ) = 1 if ϕ = Ang_{n}(x, y) and E_{n}^{bin}(x, y) = 1; E_{n}(x, y) = 0 in any other case. The computation is reduced to counting the cooccurrences in the histogram H(Δx, Δy, ϕ, ψ) with Δx = {−256, 256}, Δy = {−256, 256}, ϕ, ψ = (0, π/16, 2π/16, . . . , π). From the histogram we obtained a measure of statistical dependence. Although choosing the threshold followed computational reasons, cortical neurons perform a thresholding operation and, thus, the measure of linear correlation (weighting cooccurrences by their energy) is not necessarily a more accurate measure of statistical dependence. The histogram was used for all of the data shown in Figs. 2 A–C, 3, 4, and 5. For Fig. 2D, for the particular case of collinear interactions, we computed the full linear crosscorrelation. This computation is considerably easier because it is done for fixed values of orientation and direction in space. The two measures shown (Laplacian correlation and collinear correlation) were obtained according to the formulas: for Laplacian filtering, and for collinear oriented filtering.
A quantitative signature of scale invariance is given by a function of the form C = r^{−}^{a} (power law) where C is the correlation, r the distance, and a constant. If the scale is changed r → λr = r′ the function changes as C(r) = λ^{−}^{a}r^{−}^{a} = kC(r′) where k is a constant. A power law is easily identified as a linear plot in the log–log graph, which is clear from the relation log(C) = −alog(r).
The axis of maximal correlation (Fig. 5b) was calculated as follows. For each pair of orientations (ϕ, ψ), a measure of cooccurrence was calculated integrating across 16 different lines of angles of values (0, π/128, 2π/128, . . . , π) over distances of [−40, 40] of the center of the histogram. Thus, for an angle θ and orientations (ϕ, ψ) the measure of cooccurrence is: P_{ϕ, ψ}(θ) = Σ_{i = −40}^{40} H(cos(θ ) * i, sin(θ) * i, ϕ, ψ). We then calculated the direction of maximal correlation θ_{max}(ϕ, ψ) and grouped all angles with common relative orientation ϕ − ψ = ξ. We had 16 different values for each ξ and from these 16 different values we calculated the mean P(θ, ɛ) = < θ_{max}(ψ, ψ + ɛ) > _{ψ} and the standard error. To calculate the mean energy as a function of relative orientation (Fig. 3) we integrated the histogram in spatial coordinates for each pair of orientations in space, and, as before, the different pairs where grouped according to their relative difference in orientation to calculate a mean value and a standard deviation, E_{ϕ, ψ} = ∫_{x = −100}^{100} ∫_{y = −100}^{100} H(x, y, ϕ, ψ)dxdy and E(ϕ) = 〈E_{ɛ, ɛ} _{+ ϕ}〉_{ɛ}. The code was parallelized by using MPI libraries and run over a small Beowulf cluster of Linux workstations.
In general, horizontal and vertical directions had better statistics because there are more horizontal or vertical segments than oblique in the images; these special orientations are also the most prone to artifacts from aliasing, staircasing, and the ensemble choice. Because we are interested in this study in the correlations as a function of relative distance and orientations, all of the quantitative measurements were performed by averaging overall orientations. However, the results shown still held true for each individual orientation.
Results
All 4,000 images used in this study were black and white, 1,536 × 1,024 pixels in size, and 12 bits in depth. We used a set of filters to obtain a measure of orientation at each pixel of every image of the database (23). The filters were 7 × 7 pixels in size and thus provided a local measure of orientation. The output of the filter was high at pixels where contrast changed abruptly in a particular direction, typically by the presence of line segments or edges, but also corners, junctions, or other singularities (Fig. 1). If the output of the filters were statistically independent, then we would expect a flat correlation as a function of (Δx, Δy, ϕ, ψ). In polar coordinates (r, θ, ϕ, ψ), the two problems that we address are naturally separated: the scaling properties result from studying how the histogram depends on r (distance), whereas the geometry does it from the dependence of the histogram on θ, ϕ, and ψ.
We studied the number of cooccurring pairs of segments as a function of their relative distance for different geometries (Fig. 2 A–C). The different geometric configurations correspond to the different orientations of the segments and their relative position within an image. We first studied the number of cooccurrences as a function of distance in the line spanned by the orientation of the reference segment, averaged across all possible orientations of the reference line (Fig. 2A). When both segments have the same orientation, we observe a scale invariant behavior, indicated by a linear relationship in the log–log plot (see Materials and Methods). Also it can be seen from this plot that collinear cooccurrences are more frequent than any other configuration. Fig. 2B shows the probability of cooccurrences is higher for the vertical orientation, and that scale invariance extends over a broader range.
The scaling properties are qualitatively different for segments positioned sidebyside, along a line orthogonal to the orientation of the first segment (Fig. 2C). Isooriented pairs were again the most frequent, but their cooccurrence in the orthogonal direction to the orientation of the first segment (Fig. 2C, black line) does not appear to be scale invariant. This is reflected by the presence of a kink as opposed to a straight line (power law) in the log–log plot, indicating welldefined scales with different behavior.
It is worth comparing the scale of interactions one observes by using different kinds of filters. Before filtering images, the luminance shows correlations, which follows a power law behavior (19, 20). After applying a Laplacian filter (equivalent to a centersurround operator, which measures nonoriented local contrast), the image is mostly decorrelated (Fig. 2D, red circles) (24, 25). This is seen in the exponential decay of the correlations, and in the fact that the correlations show similar behavior after a pixelbypixel shuffling of the image (Fig. 2D, cyan circles). The strength and scaling of the correlations across the collinear line changes radically when an oriented filter is used. In this example, to make a direct comparison between the various filters, we weighted each pair of segments by their energy value (linear crosscorrelation, instead of applying a threshold as in the earlier calculations). This calculation was done for the vertical reference line orientation, which showed longrange correlations (Fig. 2B, black circles), over much longer distances than observed with the Laplacian filter. Moreover, these correlations were not present when measured in the shuffled images (Fig. 2, green circles). It is clear from the above analysis that, when oriented filters are used, strong correlations that extend over large distances are revealed. The next question is how these correlations depend on the relative orientation of the line elements, and whether these dependencies have any underlying geometry. We first calculated the total number of cooccurrences as a function of the relative difference in orientation. Cooccurrences decreased as the relative orientation between the pair of segments increased, being maximal when they were isooriented and minimal when they were perpendicular (Fig. 3).
The next observation concerns spatial structure. The probability of finding cooccurring pairs of segments was not uniform, but rather displayed a consistent geometric structure. If the two segments were isooriented, their most probable spatial arrangement was as part of a common line, the collinear configuration (Fig. 4a). As the relative difference in orientation between the two segments increased, two effects were observed. The main lobe of the histogram (which in the isooriented case extends in the collinear direction) rotated and shortened, and a second lobe (where cooccurrences were also maximized) appeared at 90° from the first (Fig. 4 a–e). This effect progressed smoothly until the relative orientation of the two segments was 90°, where the two lobes were arranged in a symmetrical configuration, lying at 45° relative to the reference orientation. Thus, pairs of oriented segments have significant statistical correlations in natural scenes, and both the average probability and spatial layout depend strongly on their relative orientation. Remarkably, the structure of the correlations followed a very simple geometric rule. A natural extension of collinearity to the plane is cocircularity. Whereas two segments of different orientations cannot belong to the same straight line, they may still be tangent to the same circle if they are tilted at identical, but opposite, angles to the line joining them. Given a pair of segments tilted at angles ψ and ϕ, respectively, they should lie along two possible lines, at angles (ϕ + ψ)/2 or (ϕ + ψ + π)/2, in order to be cocircular (Fig. 4f). This is the arrangement we observed in natural scenes. The measured correlations, given any relative orientation of edges, were maximal when arranged along a common circle. To quantify this we calculated the orientation of the axis where cooccurrences were maximal. We did that for different relative orientations and compared it to the value predicted by the cocircularity rule (Fig. 5). This is particularly remarkable in that the comparison is not a fit, because the cocircularity rule has no free parameters.
Discussion
We have shown that there are strong, longrange correlations between localoriented segments in natural scenes, that their scaling properties change for different geometries, and that their arrangement obeys the cocircularity rule. The filters we used for edge detection in our images were an oriented version of Laplacianlike filters in that they were local but had elongated, rather than circularly symmetric, centersurround structures. This change is analogous to the difference between filters in the lateral geniculate nucleus (LGN) and simple cells in the primary visual cortex. Thus, given that Laplacian filtering decorrelates natural scenes (24), it was surprising to find the longrange correlations and scaleinvariant behavior of the collinear configuration. It is important to remark that our measure of correlation does not differ only in the type of filters used (elongated vs. circular symmetric), but also in the fact that we measured the correlations along a line containing the pair of segments. Long contours are part of the output of the Laplacian filters and thus the image should show correlations that might be hidden when integrating them across an area—essentially because a curve has zero area and thus the correlations along a curve are not significant when integrated over the twodimensional field of view. The findings of longrange correlations of oriented elements extends the notion that the output of linear localoriented filtering of natural scenes cannot be statistically independent¶ and shows that those correlations might be very significant through global portions of the visual field for particular geometries.
The cocircular rule has been used heuristically to establish a pattern of interactions between filters in computer vision (1, 27–29), and psychophysical studies suggest that the human visual system utilizes a local grouping process (“association field”) with a similar geometric pattern (4). Our finding provides an underlying statistical principle for the establishment of form and for the Gestalt idea of good continuation, which states that there are preferred linkages endowing some contours with the property of perceptual saliency (2). An important portion of the classical Euclidean geometry has been constructed by using the two simplest planar curves, the line and the circle (30); we show here that those are, in the same order, the most significant structures in natural scenes.
We have reported the emergence of robust geometric and scaling properties of natural scenes. This raises a question as to the underlying physical processes that generate these regularities. Although our work was solely based on statistical analysis, we can speculate on the possible constraints imposed by the physical world. In a simplifying view, we can think of a natural image as composed by object boundaries or contours, and textures. Collineal pairs of segments are likely to belong to a common contour; thus, our finding of scale invariance for collineal correlations is in agreement with the idea that scaleinvariance in natural images is a consequence of the distribution of apparent sizes of objects (21). Parallel segments, on the contrary, may be part of a common contour as well as a common texture, which would explain the two scaling regimes we observed. Cocircularity in natural scenes probably arises because of the continuity and smoothness of object boundaries; when averaged over objects of vastly different sizes present in any natural scene, the most probable arrangement for two edge segments is to lie on the smoothest curve joining them, a circular arc. These ideas, however, require an investigation that is beyond the scope of this paper.
The geometry of the pattern of interactions in primary visual cortex parallels the interactions of oriented segments in natural scenes. Longrange interactions tend to connect isooriented segments (17, 18), and interactions between orthogonal segments, which span a short range in natural scenes, may be mediated by shortrange connections spanning singularities in the orientation and topographic maps in the primary visual cortex (31). The finding of a correspondence between the interaction characteristics of neurons in visual cortex and the regularities of natural scenes suggest a possible role for cortical plasticity early in life, in order for the cortex to assimilate and represent these regularities. This plasticity might be mediated by Hebbianlike processes, reinforcing connections on neurons whose activity coincides (i.e., their corresponding stimuli are correlated under natural visual stimulation). Such plasticity could extend to adulthood to accommodate perceptual learning of novel and particular forms (32).
Although we find coincidences between the pattern of interactions in V1 and the distribution of segments in natural scenes, the sign of the interactions plays a crucial role. Reinforcement or facilitation of cooccurring stimuli (positive interaction) results in Hebbianlike coincidence detectors, whereas inhibiting the response results in Barlowlike detectors of “suspicious coincidences” that ignore frequent cooccurrences (33). Interestingly, the Hebbian idea and the decorrelation hypothesis represent two sides of the same coin. From our measurements of the regularities in natural scenes, and previous studies on the higher order receptive field properties in primary visual cortex, it appears that both types of operations exist. The response of a cell in V1 is typically inhibited when a second flanking segment is placed outside of its receptive field along an axis orthogonal to the receptive field orientation. This interaction is referred to as sideinhibition, which is strongest when the flanking segment has the same orientation as the segment inside the receptive field (13, 15, 34). In the present study, we found that isoorientation is the most probable arrangement for sidebyside segments in natural scenes, which therefore constitutes an example, in the domain of orientation, of decorrelation through inhibition. This inhibition may mediate the process of texture discrimination (13, 16, 35). The property of endinhibition has also been interpreted as a mechanism to remove redundancies and achieve statistical independence (36). The finding that responses of V1 neurons are sparse when presented with natural stimuli (37) and models of normalization of neuronal responses in V1 tuned to the statistics of natural scenes^{¶} also supports the idea that the interactions in V1 play an important role in decorrelating the output from V1. This is consistent with the general idea that one of the important functions of early visual processing is to remove redundant information (38–40), and suggests that interactions in V1 may continue with the process of decorrelation that is achieved by Laplacian (24) and localoriented filtering (41, 42).
But the visual cortex also can act in the opposite way, reinforcing the response to the most probable configurations. This is seen in the collinear configuration, which is the one that elicits most facilitation, and therefore illustrates how V1 can enhance the regularities in natural scenes. The fact that those correlations are significant over the entire visual field and are highly structured suggests that this is not a residual, or secondorder, process. The opposing processes of enhancement of correlations and decorrelation may be mediated by different receptive field properties that can exist within the same cell. The same flank can inhibit or facilitate depending on the contrast (26, 43), suggesting that V1 may be solving different computational problems at different contrast ranges or a different noisetosignal relationship. The dialectic behavior of visual cortex shows that the interplay between decorrelation (extraction of suspicious coincidences) and enhancement of a particular set of regularities (identification of form) may be mediated by the same population of neurons. Although the decorrelating process may be required to operate in the orientation domain to solve the problem of texture segmentation, particular sets of coincidences, which are repeated in the statistics, such as the conjunction of segments that form contours, need to be enhanced in the process of identification of form.
Acknowledgments
We thank M. Kapadia for suggesting connections of our work with neurophysiological data, and D. R. Chialvo, R. Crist, A. J. Hudspeth, and A. Libchaber for constructive comments on the manuscript. We especially thank P. Penev for stimulating input in the early stages of the project. This work was supported by National Institutes of Health Grant EY 07968 and by the Winston (G.A.C.) and Mathers Foundations (M.O.M.) and the Burroughs Wellcome Fund (M.S.).
Footnotes

↵‡ Present address: Functional Neuroimaging Laboratory, Department of Psychiatry, Cornell University, 1300 York Avenue, Box 140, New York, NY 10021.

↵§ To whom reprint requests should be addressed at: The Rockefeller University, 1230 York Avenue, Box 212, New York, NY 100216399. Email: marcelo{at}zahir.rockefeller.edu.

Article published online before print: Proc. Natl. Acad. Sci. USA, 10.1073/pnas.031571498.

Article and publication date are at www.pnas.org/cgi/doi/10.1073/pnas.031571498

↵¶ Simoncelli, E. P. & Schwarz, O., Oral Presentation, Conference on Neural Information Processing Systems, Dec. 1–3, 1998, Denver, CO.
Abbreviation
 V1,
 primary visual cortex
 Received October 25, 2000.
 Accepted December 1, 2000.
 Copyright © 2001, The National Academy of Sciences
References
 ↵
 Ullman S
 ↵
 Kofka K
 ↵
 Wertheimer M
 ↵
 ↵
 Polat U,
 Sagi D
 ↵
 ↵
 Hubel D H,
 Wiesel T N
 ↵

 Gulyas B,
 Orban G A,
 Duysens J,
 Maes H
 ↵
 Knierim J J,
 Van Essen D C

 Li C Y,
 Li W
 ↵
 ↵
 Kapadia M,
 Westheimer G,
 Gilbert C D
 ↵
 ↵
 Bosking W H,
 Zhang Y,
 Schofield B,
 Fitzpatrick D
 ↵
 ↵
 ↵
 ↵
 Van Hateren J H,
 Van der Schaaf A
 ↵
 ↵
 Atick J J,
 Redlich A N
 ↵
 Dan Y,
 Atick J J,
 Reid R C
 ↵
 ↵
 ↵
 ↵
 Hilbert D,
 CohnVossen S
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Vinje W E,
 Gallant J L
 ↵

 Thorpe W H,
 Mitchison G J
 Barlow H B
 ↵
 Miall C,
 Durbin R M,
 Mitchison G J
 Barlow H B,
 Foldiak P
 ↵
 ↵
 ↵
 Kapadia M,
 Westheimer G,
 Gilbert C D