Bayesian estimation of the shape skeleton
Abstract
Skeletal representations of shape have attracted enormous interest ever since their introduction by Blum [Blum H (1973) J Theor Biol 38:205–287], because of their potential to provide a compact, but meaningful, shape representation, suitable for both neural modeling and computational applications. But effective computation of the shape skeleton remains a notorious unsolved problem; existing approaches are extremely sensitive to noise and give counterintuitive results with simple shapes. In conventional approaches, the skeleton is defined by a geometric construction and computed by a deterministic procedure. We introduce a Bayesian probabilistic approach, in which a shape is assumed to have “grown” from a skeleton by a stochastic generative process. Bayesian estimation is used to identify the skeleton most likely to have produced the shape, i.e., that best “explains” it, called the maximum a posteriori skeleton. Even with natural shapes with substantial contour noise, this approach provides a robust skeletal representation whose branches correspond to the natural parts of the shape.
Sign up for PNAS alerts.
Get alerts for new articles, or get an alert when an article is cited.
Skeletal representations of visual shape, in which a shape contour is represented in terms of local symmetries about a set of possibly curving axes, have played a prominent role in theories of visual shape ever since the introduction of the medial axis transform (MAT) by Blum (1) and Blum and Nagel (2). The MAT is widely suspected to play a role in cortical representations of visual shape, perhaps by a neural implementation of Blum's local “grassfire” procedure. Cells as early as primary visual cortex show enhanced sensitivity near medial points (3, 4), suggesting an early locus of computation. Moreover, medial axes have known psychophysical correlates, including increased sensitivity to contrast (5, 6) and position (7) and probe comparisons in which latency patterns respect perceived axial structure (8). Medial axes are also deeply intertwined in theories of how shapes are decomposed into parts (9); for example, despite considerable controversy about how part cuts (boundaries between perceptually distinct parts) are determined, there is substantial agreement that part cuts must cross a medial axis (10). More broadly, many higher-level theories of shape and shape recognition are substantially based on axial representation of parts (11–13) presupposing prior computation of some sort of skeletal shape representation.
However, the computation of the medial axis skeleton suffers from several notorious problems, including spurious axial branches stemming from hypersensitivity to perturbations along the contour, and counterintuitive results (forking) at the ends of blunt parts (see Fig. 3 Insets). More recent advances in the computation of the MAT (14–18) have reduced, but not eliminated, these problems, which seem to be endemic to the underlying geometric conception of the MAT.
Summary of the Approach
The basic idea behind our approach is that real shapes owe their structure to a mixture of generative and random factors, e.g., shapes that are the result of an underlying skeleton plus a stochastic growth process. We apply Bayesian estimation to the problem of identifying a shape's most likely “generative skeleton,” under simple assumptions about the probability distribution of skeletons (providing a Bayesian prior), and a stochastic model of how shapes are generated from skeletons (providing a Bayesian likelihood function). The prior favors simple skeletons with relatively few and relatively straight branches. The likelihood model, i.e., the shape-generating stochastic process, assumes that shapes are generated by a lateral outward growth process in which there is some random variation in the direction of growth away from the axis and some random variation in the extent of growth. We then combine this prior and likelihood function by Bayes' rule, identifying the generative skeleton that is most likely to have produced the shape. An axial branch is included in this skeleton only when the additional skeletal complexity it creates is more than offset by the improved “goodness of fit” to the shape. The estimated skeleton, called the maximum a posteriori (MAP) skeleton, is the skeletal interpretation that, under the generative assumptions underlying the prior and likelihood functions, best “explains” the shape.
Bayesian Formulation, Priors, and Likelihood Functions
We begin by assuming a shape given by a discrete approximation SHAPE = {x1, x2, …, xN} ⊂ R2. (We assume a closed shape, but formally all that is required is a boundary with figure and ground assigned, so that the direction of the field of normals is well defined.) Skeletons are generated under a probability density function p(SKEL); and in turn shapes are generated from skeletons under a conditional probability density function playing the role of a likelihood function p(SHAPE|SKEL). The key idea is that this likelihood function expresses a generative model of shape (19) so that selecting a particular skeletal interpretation, a particular generative skeleton, amounts to explaining the observed shape in the most plausible way under the assumed generative model. Following the Bayesian approach (20–22) the plausibility of a particular skeletal description corresponds to its posterior probability, given by Bayes' rule:summing over all possible skeletons SKELi. Because the denominator in this expression is constant for a given shape, we can maximize it by maximizing the numerator, i.e., the product of the prior and likelihood.
Priors.
A skeleton SKEL consists of a set of axial segments C1 … CK, hierarchically organized into a root contour, branches, subbranches, etc. We define a prior probability density p(SKEL), using a natural hierarchical extension of our earlier work on contour information (23). For each axial segment Ci, we induce a prior density p(Ci) by assuming that successive points in its discrete approximation are generated by a density function centered on straight (zero curvature) continuation of the axis, with angular deviation from collinearity α following a von Mises distribution V(0°,b) ∝ ecos(bα) [similar to a Gaussian (normal) distribution but suitable for angular measurements (24)], which has proved accurate in modeling human contour perception (23, 25). Under this assumption, relatively straight axes (α near zero) have high probability, whereas probability decreases with larger turning angles, i.e., with larger magnitude of curvature in the underlying curve. Successive turning angles are assumed independent, so the prior p(C) for a curve C containing a series α1, α2 … of turning angles in its discrete approximation is ∏ip(αi). To induce a prior over skeletons, we simply augment this prior by assuming that axial branches Ci sprout with fixed probability pC, which yields the probability of a skeleton SKEL comprising K axes ofThis prior is high for skeletons with few and relatively straight axes and diminishes with increased branching or increasing curvature in any of the axial branches (Fig. 1), an assumption emprically validated by the generally simple forms exhibited by naturally occurring shapes (26). For a skeleton consisting of a single axis, the prior reduces to the established prior for a simple open contour (23) as seems natural.
Fig. 1.
Likelihoods.
The next step in a Bayesian account is the adoption of a likelihood model, in this case meaning a stochastic generative model by which a shape is produced from a hypothesized skeleton. To capture the idea that the shape is “extruded” laterally from the generative skeleton, we assume that from each point on each skeletal axis, “ribs” sprout on both sides, approximately perpendicular to the axis (hence, primarily outward), but with partly random lengths and directions (Fig. 2). More specifically, each rib sprouts in a direction that is perpendicular to the axis plus a random directional error φx, chosen independently for each rib (i.e., the rib ending at shape point x) from a von Mises density centered on zero, i.e., φx ∼ V(0,bφ) with spread parameter bφ. The expected rib length μ at each point v along the axial segment C is given by a “rib length function” μC(vx), which we estimate from the shape assuming only a continuity constraint (see Methods). To this expected rib length μC(vx) is added a random rib length error εx, chosen independently for each rib from a normal distribution, ε ∼ N(0,σC2). The expected rib lengths μc(vx) are themselves drawn from an exponentially decreasing density p(μ) ∝ e−βμ with decay constant β, meaning that wider axial parts are less likely than narrower ones, with probability decaying gradually with increasing widths. For each shape point x, the expected rib length μ, directional error φ, and rib length error ε are mutually independent, so the likelihood of the shape point p(x|SKEL) generated by a rib at point vx along axis C is given by the productThe likelihood of the entire shape is the product of the likelihoods of its constituent points,
Fig. 2.
The MAP Skeleton
Given the prior and likelihood defined as above, the final step is to compute the skeletal structure with maximum posterior probability, the MAP skeleton. We propose estimation of this skeleton as a “competence” or computational theory of mental shape representation, meaning a specification of the function that the human system is attempting to compute when it represents shape (rather than an account of the implementation it uses to compute it). We can maximize the posterior by, equivalently, choosing the skeleton that minimizes the negative logarithm of the posterior, often referred to as its description length (DL) because it reflects the complexity of expressing the hypothesis in an optimal code (27). Taking the negative logarithm of Eq. 1, the DL of the skeletal posterior is justApart from the constant term (the negative logarithm of the denominator in Eq. 1) the DL has two additive components: DL(SKEL), which reflects the complexity of the skeletal hypothesis itself, and DL(SHAPE|SKEL), which reflects the complexity of the shape as described by that skeleton. The MAP skeleton is the description that minimizes the sum of these two complexities. Hence the MAP skeleton is naturally regarded as identifying the simplest description of the shape as the outcome of a skeletal generative process. This attractive interpretation stems directly from the Bayesian conception and is not shared by other stochastic techniques for skeletal-axis computation.
The process of estimating the MAP skeleton requires inverting the likelihood function by choosing, for each shape point, the skeletal point that has “responsibility” for it, i.e., assigns it the highest likelihood. This skeletal point is most likely to have sprouted a rib whose endpoint is the shape point in question. (To stabilize the computation, we allow shape points to have mixed sources, treating them as probability-weighted mixtures of multiple ribs.) Part boundaries along the contour can be regarded as points at which responsibility for contour points switches from one axis to another (e.g., the boundaries between color-coded regions in the hand in Fig. 2; see below). The shape likelihood depends on this hypothesized ensemble of responsibilities, whereas the responsibilities depend on the currently estimated skeleton, suggesting a process similar to the well known expectation–maximization procedure, in which we alternately (i) estimate the correspondences (i.e., the ribs) between axial and contour points (the expectation phase), and (ii) search through the parametric space of skeletons, attempting to increase the posterior (decrease the DL) given the currently hypothesized correspondence (the maximization phase). This procedure is described in more detail in Methods.
Results
Figs. 3–5 show typical examples of the MAP skeleton, along with a conventional Voronoi-based implementation of Blum's MAT (1, 15) shown in Figs. 3–5 Insets for comparison. Simple shapes (Fig. 3a) yield intuitive results devoid of spurious branches, and the estimated skeleton is robust against contour noise (Fig. 3 b and c). Fig. 4 more specifically illustrates the robustness of the MAP skeleton as contour noise is introduced; the axial structure of the human form is recovered in a substantially invariant way in all three versions (a: no noise; b: noise throughout; c: noise on one arm and one leg only). Fig. 4c exemplifies the difficult case in which noise is added to some parts but not others, as in Richards et al.'s (28) famous “fuzzy pear,” which cannot be correctly handled by uniform smoothing techniques. Finally, Fig. 5 shows results for a variety of animal shapes. In each case the MAP skeleton corresponds closely to the intuitive part structure of the shape. The perceptual naturalness of these computed skeletons can be taken as “instant psychophysics,” supporting our claim that the MAP skeleton corresponds reasonably well to psychological shape representations.
Fig. 3.
Fig. 4.
Fig. 5.
A critical component of MAP skeleton estimation is the evaluation of candidate axes for inclusion in the hypothesized skeleton. As noted above, traditional approaches to computing the MAT have suffered from the problem of spurious axial branches, interfering with what otherwise might be a desirable isomorphism between the branches of a skeleton and the natural parts of a shape. The Bayesian approach provides a tool for handling this problem: a principled estimate of the statistical “significance” or evidence in favor of an axial branch. The relevant comparison is between a skeletal hypothesis SKEL that does not include the axial branch C and an augmented hypothesis SKEL' = SKEL + C that does include it (Fig. 6). Following Bayes, we adopt the axial branch C if the posterior with it is better than the posterior without it, i.e.This condition can be easily restated in terms of DL,meaning that we should adopt axis C if doing so results in a net reduction in complexity (DL).
Fig. 6.
The difference in DLs is sometimes referred to as the weight of evidence, in this case quantifying the degree to which the added descriptive accuracy (or goodness of fit) of the augmented skeletal description offsets the added complexity of the additional axis. The criterion is thus a principled one in that it accurately reflects whether the new part yields a net benefit given the assumptions underlying the prior and likelihood functions. Because the weight of evidence quantifies the strength of posterior belief in the candidate axis C, it may serve to quantify the perceptual “salience” of the corresponding part of the shape (29).
More broadly, this argument raises the possibility that Bayesian skeleton estimation might subsume such well known determinants of parts as the minima rule (9) (that extrema of negative contour curvature constitute likely part boundaries) and the shortcut rule (10) (which holds that part cuts tend to be of minimal length). In the Bayesian framework each of these principles emerges as a by-product of the posterior maximization, in that, although they play no overt role in the computation, they tend to be obeyed by MAP skeletal estimates. For example, MAP skeleton estimation does not in any way involve the computation of curvature nor the identification of curvature extrema. Yet the boundaries between the points stochastically generated by one axial branch and another automatically tend to lie in regions of negative contour curvature and in particular near points of negative minima; so MAP skeletal estimates tend to place part boundaries at those locations. The boundaries between color-coded rib regions in Fig. 2, which lie between the fingers of the hand, are examples. The thrust of this argument admittedly parallels traditional, nonstochastic arguments, in this case amounting to a stochastic generalization of the transversality principle (30). (Distinct axes tend under our prior to be transverse, so stochastically generated ribs from distinct axes tend to point in transverse directions, meaning that collections of ribs stemming from a distinct axes tend to abut each other at points of rapidly changing tangent direction, as in Figs. 2 and 6.) But the Bayesian approach unifies what is otherwise a heterogeneous collection of disparate part-determination rules (31, 32) under the umbrella of one overarching inferential goal, the maximization of the skeletal posterior. Moreover, it provides a unified account of both part boundaries (points along the contour at which the shape divides) and part cuts (the resulting divisions of the shape itself), which in standard accounts each require its own distinct systems of rules, but which in ours both fall out of the same computation. Most importantly, our account has the potential to explain the well known pattern of exceptions to standard rules, for example, part boundaries that fall in positive-curvature regions of the bounding contour (31) and negative minima that are not perceived as part boundaries (8, 10, 33). Each of the traditional rules holds under some circumstances and fails under others, but always in accord with the maximization of the posterior.
Discussion
Among the most promising recent approaches to computing the MAT are those based on stochastic methods (34–37), which decrease sensitivity to noise by including a random element in the computation. Like ours, these methods presume a partly random rather than perfectly symmetric correspondence between opposing contour points and/or minimize some energy functional that embodies an asymmetry cost. Nevertheless, like Blum's (1), most of these methods are based on the notion of local symmetry, although treated stochastically rather than deterministically. In particular, the medial axis itself is defined geometrically as the locus of points equidistant between contour points that have been designated as corresponding. In this sense, these existing stochastic methods fall short of a fully Bayesian approach in several respects, most specifically in that they lack an overt probabilistic shape-generating model for the skeleton, and for the shape given a skeleton, and thus lack well defined priors and likelihood functions.
In contrast, our approach adopts a full-fledged inverse-probability conception, setting as the computational goal the identification of a skeletal model most likely to have generated the shape (38). This approach entails overtly decomposing the observed shape into a “signal” process (the skeleton) and a “noise” process (the stochastic growth process). Of course, visual features treated as noise in some contexts might best be treated as signal in others (e.g., the notch in Fig. 3b, which might in some contexts be better treated as a bite or “negative part”), suggesting that top-down factors may play a role as well. A Bayesian approach has the benefit that such knowledge, when available, may be readily incorporated into the model. As our results suggest, however, a generic default generative model gives reasonably good results with bottom-up geometry alone.
The MAP skeleton should not be regarded as an attempt to compute the MAT per se, but rather to estimate a related but distinct skeletal structure, the generative skeleton. The MAP skeleton is not, in principle, necessarily medial, thought it tends to maximize mediality, but only in conjunction with other properties not present in the MAT, such as skeletal simplicity, and low variance in the rib lengths. The previously noted problems connecting the MAT to psychological percepts of shape are widely regarded as intrinsic to its fixed geometric definition. By contrast the MAP skeleton represents a more abstract perceptual shape description, which (like the MAT) brings out axial structure, but (unlike the MAT) does so in such a way that is both perceptually plausible and, in the sense that we have posed the problem, inferentially optimal.
The main benefit of our approach is the intuitive skeletons that MAP skeleton estimation tends to produce, with each axis corresponding to one perceived “part,” even with substantial contour noise (Figs. 3–5). This approach allows a compact, low-dimensional, but intuitive representation of shape, with enormous potential applications for shape recognition (12, 13), computer-based indexing of shape databases (18), and understanding of the function of long-range connections in visual cortex (4, 39, 40).
Moreover, our approach offers a number of important technical tools not provided by other methods, including a principled measure of the statistical evidence in favor of an axis (the difference in the log posteriors with and without the axis), the maximum value of the posterior over the space of skeletons, which gives a measure of how well any skeletal description explains the given shape, and a principled measure of shape complexity, the DL of the MAP skeleton. Each of these quantities has a natural psychological correlate, respectively, the subjective part salience, the subjective “axiality” of the shape, and the subjective complexity of the shape, none of which has received rigorous definitions in the literature before to our knowledge. All of these advantages stem from the underlying idea of formulating shape representation as a Bayesian inference problem, bringing it into line with a growing segment of modern perceptual theory (20, 21) and drawing closer to Attneave's original goal (41) of understanding shape as an information-processing problem.
Methods
Here, we sketch a computational procedure for estimating the MAP skeleton. As mentioned, we regard our theory as a “theory of the computation,” not a processing model; the computational implementation should be taken simply as a “proof of concept” that the MAP skeleton is computable and has the intended desirable properties and not as a realistic model of neural shape processing.
We seek the skeletal description with minimum DL, defined as the negative logarithm of the posterior probability p(SKEL|SHAPE). The rib length function is estimated by pooling ribs within a moving mask with a fixed width (plus or minus ≈⅓ the length of a typical axis in the examples shown), enforcing the constraint of a continuous length function connected to each axis. (This pooling introduces some dependence among the estimated ribs, making Eq. 3 only an approximation.) We also assume a von Mises distribution on the deviation between the inward-pointing shape normal and the rib, which is amplified when this deviation exceeds π/2 and has the effect of preventing “explanation from outside the shape.”
To estimate the skeleton, we use a gradient descent procedure loosely based on expectation–maximization. We use the conventional Voronoi-based MAT (15) to form an initial, grossly overfitted estimate of the skeleton. This point set is organized into a hierarchical structure by merging axes so as to maximize collinearity within each axis. Then all nonroot axes are subjected to the Bayesian posterior ratio test of significance (Eq. 4); axes failing the test are then pruned. The remaining axes are parameterized by using a piecewise cubic spline approximation, with knot points at every axial branch point, and additional knot points chosen successively until the spline approximation fits the original axis to within a fixed tolerance, resulting in a variable number m of knot points per axis. This procedure yields a representation having 2m parameters per axis (plus one additional parameter required to code the location of the root axis). With this parametric description as a starting point, an iterative gradient procedure is initiated, with two stages alternating:This procedure converges to an estimate of the MAP skeleton, examples of which are shown in Figs. 4 and 5.
1.
Estimate the “ribs” by associating each contour point with some set of axis points that explain it. For each shape point x, we choose the axis point and side (left or right) that assigns x the highest likelihood (Eq. 3).
2.
With the rib correspondences fixed, take one step down the gradient of DL (equivalently, up the gradient of posterior). We use a standard simplex method (Nelder-Mead) to execute the gradient descent.
Abbreviations
- MAT
- medial axis transform
- MAP
- maximum a posteriori
- DL
- description length.
Acknowledgments
We thank Elan Barenholtz, Randy Gallistel, and Eileen Kowler for helpful comments. J.F. was supported by National Science Foundation Grant 9875175 and National Institutes of Health Grant R01 EY15888, and M.S. was supported by National Science Foundation Grant BCS-021694. Shapes in Figs. 2b, 4a, and 5 were obtained from the Laboratory for Engineering Man/Machine Systems at Brown University (Providence, RI).
References
1
H Blum J Theor Biol 38, 205–287 (1973).
2
H Blum, RN Nagel Pattern Recogn 10, 167–180 (1978).
3
TS Lee J Physiol (Paris) 97, 121–139 (2003).
4
TS Lee, D Mumford, R Romero, VAF Lamme Vision Res 38, 2429–2454 (1998).
5
I Kovacs, B Julesz Nature 370, 644–646 (1994).
6
I Kovacs, A Feher, B Julesz Vision Res 38, 2323–2333 (1998).
7
CA Burbeck, S Pizer Vision Res 35, 1917–1930 (1995).
8
E Barenholtz, J Feldman Vision Res 43, 1655–1666 (2003).
9
DD Hoffman, WA Richards Cognition 18, 65–96 (1984).
10
M Singh, GD Seyranian, DD Hoffman Percept Psychophys 61, 636–660 (1999).
11
D Marr, HK Nishihara 200, 269–294 (1978).
12
I Biederman Psychol Rev 94, 115–147 (1987).
13
MJ Tarr, HH Bulthoff, M Zabinski, V Blanz Psychol Sci 8, 282–289 (1997).
14
BB Kimia, AR Tannenbaum, SW Zucker Int J Comput Vision 15, 189–224 (1995).
15
RL Ogniewicz, O Kubler Pattern Recogn 28, 343–359 (1995).
16
D Geiger, T-L Liu, RV Kohn IEEE Trans Pattern Analysis Machine Intelligence 25, 86–99 (2003).
17
RA Katz, SM Pizer Int J Comput Vision 55, 139–153 (2003).
18
K Siddiqi, A Shokoufandeh, S Dickinson, S Zucker Int J Comput Vision 30, 1–24 (1999).
19
M Leyton Cognit Sci 13, 357–387 (1989).
20
, eds D Knill, W Richards (Cambridge Univ Press, Cambridge Perception as Bayesian Inference, 1996).
21
D Kersten, P Mamassian, A Yuille Annu Rev Psychol 55, 271–304 (2004).
22
WW Geisler, D Kersten Nat Neurosci 5, 508–510 (2002).
23
J Feldman, M Singh Psychol Rev 112, 243–252 (2005).
24
KV Mardia Statistics of Directional Data (Academic, London, 1972).
25
J Feldman Percept Psychophys 63, 1171–1182 (2001).
26
DW Thompson On Growth and Form (Cambridge Univ Press, Cambridge, 1942).
27
J Rissanen Stochastic Complexity in Statistical Inquiry (World Scientific, Singapore, 1989).
28
W Richards, B Dawson, D Whittington J Opt Soc Am A 3, 1483–1491 (1986).
29
DD Hoffman, M Singh Cognition 63, 29–78 (1997).
30
B Bennett, D Hoffman Image Understanding,, ed WA Richards (Ablex, Norwood, NJ), pp. 215–256 (1987).
31
M Singh, DD Hoffman From Fragments to Objects: Segmentation and Grouping in Vision: Advances in Psychology,, eds T Shipley, P Kellman (Elsevier, New York) Vol 130, 401–459 (2001).
32
J de Winter, J Wagemans Cognition 99, 275–325 (2006).
33
K Siddiqi, KJ Tresness, BB Kimia Perception 25, 399–424 (1996).
34
Y-F Tsao, K-S Fu Comput Vision Graphics Image Processing 25, 348–370 (1984).
35
SC Zhu, AL Yuille Int J Comput Vision 20, 187–212 (1996).
36
S-C Zhu IEEE Trans Pattern Anal Mach Intell 21, 1158–1169 (1999).
37
B Kegl, A Krzyzak IEEE Trans Pattern Anal Mach Intell 24, 59–74 (2002).
38
A Telea, C Sminchisescu, S Dickinson (IEEE Computer Society, Los Alamitos, CA) Vol 4, 19–22 (2004).
39
CD Gilbert The Cognitive Neurosciences,, ed MS Gazzaniga (MIT Press, Cambridge, MA), pp. 73–90 (1995).
40
L Spillman, JS Werner Trends Neurosci 19, 428–434 (1996).
41
F Attneave Psychol Rev 61, 183–193 (1954).
Information & Authors
Information
Published in
Classifications
Copyright
© 2006 by The National Academy of Sciences of the USA. Freely available online through the PNAS open access option.
Submission history
Received: May 19, 2006
Published online: November 21, 2006
Published in issue: November 21, 2006
Keywords
Acknowledgments
We thank Elan Barenholtz, Randy Gallistel, and Eileen Kowler for helpful comments. J.F. was supported by National Science Foundation Grant 9875175 and National Institutes of Health Grant R01 EY15888, and M.S. was supported by National Science Foundation Grant BCS-021694. Shapes in Figs. 2b, 4a, and 5 were obtained from the Laboratory for Engineering Man/Machine Systems at Brown University (Providence, RI).
Authors
Competing Interests
The authors declare no conflict of interest.
Metrics & Citations
Metrics
Citation statements
Altmetrics
Citations
Cite this article
103 (47) 18014-18019,
Export the article citation data by selecting a format from the list below and clicking Export.
Cited by
Loading...
View Options
View options
PDF format
Download this article as a PDF file
DOWNLOAD PDFLogin options
Check if you have access through your login credentials or your institution to get full access on this article.
Personal login Institutional LoginRecommend to a librarian
Recommend PNAS to a LibrarianPurchase options
Purchase this article to access the full text.