Skip to main content
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian
  • Log in
  • My Cart

Main menu

  • Home
  • Articles
    • Current
    • Latest Articles
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • Archive
  • Front Matter
  • News
    • For the Press
    • Highlights from Latest Articles
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Purpose and Scope
    • Editorial and Journal Policies
    • Submission Procedures
    • For Reviewers
    • Author FAQ
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Home
Home

Advanced Search

  • Home
  • Articles
    • Current
    • Latest Articles
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • Archive
  • Front Matter
  • News
    • For the Press
    • Highlights from Latest Articles
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Purpose and Scope
    • Editorial and Journal Policies
    • Submission Procedures
    • For Reviewers
    • Author FAQ

New Research In

Physical Sciences

Featured Portals

  • Physics
  • Chemistry
  • Sustainability Science

Articles by Topic

  • Applied Mathematics
  • Applied Physical Sciences
  • Astronomy
  • Computer Sciences
  • Earth, Atmospheric, and Planetary Sciences
  • Engineering
  • Environmental Sciences
  • Mathematics
  • Statistics

Social Sciences

Featured Portals

  • Anthropology
  • Sustainability Science

Articles by Topic

  • Economic Sciences
  • Environmental Sciences
  • Political Sciences
  • Psychological and Cognitive Sciences
  • Social Sciences

Biological Sciences

Featured Portals

  • Sustainability Science

Articles by Topic

  • Agricultural Sciences
  • Anthropology
  • Applied Biological Sciences
  • Biochemistry
  • Biophysics and Computational Biology
  • Cell Biology
  • Developmental Biology
  • Ecology
  • Environmental Sciences
  • Evolution
  • Genetics
  • Immunology and Inflammation
  • Medical Sciences
  • Microbiology
  • Neuroscience
  • Pharmacology
  • Physiology
  • Plant Biology
  • Population Biology
  • Psychological and Cognitive Sciences
  • Sustainability Science
  • Systems Biology

Structure and information in spatial segregation

Philip S. Chodrow
PNAS October 31, 2017 114 (44) 11591-11596; published ahead of print October 13, 2017 https://doi.org/10.1073/pnas.1708201114
Philip S. Chodrow
aOperations Research Center, Massachusetts Institute of Technology, Cambridge, MA 02139;bDepartment of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139;cLaboratory for Information Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02139
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Philip S. Chodrow
  • For correspondence: pchodrow@mit.edu
  1. Edited by Michael F. Goodchild, University of California, Santa Barbara, CA, and approved September 11, 2017 (received for review May 17, 2017)

  • Article
  • Figures & SI
  • Info & Metrics
  • PDF
Loading

Significance

The multiscalar structure of ethnoracial spatial segregation informs both urban policy and sociological theory, but existing methods for studying this structure have limitations. The tools developed in this paper enable flexible, multiscalar forms of analysis and visualization for spatial segregation. These tools illuminate how spatial boundaries between demographic groups contribute to overall segregation, those boundaries change over time, and the scale of segregation varies throughout a region. They apply to both ethnoracial and income segregation and can be used to study large geographic datasets.

Abstract

Ethnoracial residential segregation is a complex, multiscalar phenomenon with immense moral and economic costs. Modeling the structure and dynamics of segregation is a pressing problem for sociology and urban planning, but existing methods have limitations. In this paper, we develop a suite of methods, grounded in information theory, for studying the spatial structure of segregation. We first advance existing profile and decomposition methods by posing two related regionalization methods, which allow for profile curves with nonconstant spatial scale and decomposition analysis with nonarbitrary areal units. We then formulate a measure of local spatial scale, which may be used for both detailed, within-city analysis and intercity comparisons. These methods highlight detailed insights in the structure and dynamics of urban segregation that would be otherwise easy to miss or difficult to quantify. They are computationally efficient, applicable to a broad range of study questions, and freely available in open source software.

  • diversity
  • segregation
  • multiscale analysis
  • information theory
  • machine learning

The ongoing ethnoracial diversification of America has been accompanied by ongoing evolution in the structure of ethnoracial segregation. While white–black segregation, white–Asian segregation, and white–Hispanic residential segregation are currently on the decline, the processes driving these declines vary extensively across groups, cities, and decades (1⇓–3). The measurement and modeling of these trends constitute urgent problems for policy and planning. In addition to the moral challenges raised by unequal access to education, employment, and public resources, ethnoracial residential segregation is directly destructive to urban economies: a recent study estimated that white–black residential segregation costs Chicago $3 billion in income to black residents; 80,000 unrealized college graduates; and 100 lives lost to homicide annually.

Recent scholarly work on residential segregation has been empowered by the increasing availability of both high-resolution demographic data and computational resources with which to analyze it. Progress has moved from aspatial indices (4, 5) to explicitly spatial indices (6) to, most recently, multiscale analysis. Multiscalar methodology emphasizes that different features of urban segregation are visible only at certain scales of analysis, where a “scale” may refer to a characteristic geographic length (7⇓⇓–10), a number of neighbors (11⇓⇓–14), or a set of aggregate spatial units, such as census tracts or designated places (15, 16).

Geographic- and neighbor-based approaches have principally been pursued through the study of segregation profiles. These are curves that plot the value of segregation index—usually the Information Theory Index (17)—as a function of a geographic smoothing bandwidth or number of nearest neighbors. These profiles efficiently represent how the degree of segregation depends on the size of the “local neighborhoods of individuals” and in the case of ref. 14, can also characterize the dependence of these values on the selection of the entire region of analysis. These methods, however, also share a characteristic limitation. Profile curves view scale as a global property—at each point on the curve, a single scale is used for the entire analytical region. Inspection of segregation in Detroit (Fig. 1) suggests the coexistence of multiple scales of separation in different areas of the city. Because profile methods use a global scale, at each point on the profile curve, some of the local features of segregation in Detroit are necessarily lost. A recent paper (9) makes progress against the global-scale limitation by constructing egocentric profile curves and studying their properties using clustering and inferential methods. This approach illuminates interesting spatial patterns of demographic difference but decouples those egocentric profiles from overall segregation measures. Profile methods are also limited in their ability to characterize how much of overall segregation is at any particular geographic scale. While measures such as the macro–micro segregation ratio and net microsegregation of ref. 10, can suggest scalar decompositions of this type, they do not share with the explicit decomposition methods below the strong mathematical properties necessary to make such analysis precise.

Fig. 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 1.

Census block groups in Wayne County, Michigan, including the city of Detroit. Black boundaries delimit census-designated places within the county. Each block group is colored according to the group that is most concentrated in that block group relative to the city-wide average, and the shade of each block group corresponds to the degree of concentration. Details are in Supporting Information.

Another approach to multiscale measurement explicitly decomposes overall segregation into terms reflecting contributions at different scales using a mathematical property of the Information Theory Index. Unlike profile methods, the decomposition approach does not assume a global geographic or population scale. Fig. 1 illustrates this flexibility in Wayne County, with black outlines delimiting census-designated places used as an intermediate scale. The designated places span a wide range of geographic and population scales, and many of their boundaries correspond to lines of demographic separation. A typical analysis proceeds to decompose overall segregation into between-place and within-place components, with the latter term reflecting segregation on a level “below” the scale of places. However, this decomposition is dependent on the quality of the intermediate scale used. While ref. 15 argues that places correspond to meaningful communities and administrative units, the use of places may also obscure important large-scale features. Using the Information Theory Index, the place decomposition uses 30 distinct spatial units to capture just 48% of segregation in the between-place term, suggesting that more than one-half of overall segregation in Detroit is below this scale. This suggestion may mislead. As we will see, just seven spatial units suffice to capture 71% of overall segregation. The use of fixed spatial units, such as places, may thus substantially understate the scale on which segregation processes operate.

In this paper, we bring contemporary mathematics and machine learning to bear on the study of spatial segregation. The methods that we develop advance profile curve approaches by constructing curves that allow for local variation in geographic and population scales. They advance decomposition analysis by constructing nonarbitrary intermediate spatial units and attaching explicit segregation contributions to boundaries between those units. They advance average-scale measures, like those of ref. 10, by providing a measure of local scale that may be used for both intracity exploration and intercity comparison. Finally, our methods may be viewed as another approach to coping with the Modifiable Areal Unit problem (18). While we cannot fully avoid the dependence of our analysis on the spatial units made available to us, these methods provide a way to trade excessive data resolution for meaningful spatial units that reflect demographic structure rather than administrative boundaries.

Learning the Structure of Segregation

Operationally, we view the problem of learning the structure of segregation as the task of finding interpretable units of spatial aggregation with boundaries that correspond to demographic transitions. This problem is a form of regionalization—spatially constrained clustering. While recent papers have developed an array of methods for regionalization (19⇓⇓–22), none are designed for multiscale segregation studies. A recent method (23) for identifying ethnoracial neighborhoods, on the other hand, does not produce contiguous spatial units. Methods adapted to the context of spatial segregation are therefore necessary.

We begin by revisiting and generalizing the relationship between the Information Theory Index and Shannon’s information theory. We view census tracts and cross-tabulations as defining an empirical distribution pX,Y(X,Y) between spatial locations X and demographic labels Y, so that pX,Y(7,Hispanic) is the likelihood that a resident of the city chosen uniformly at random lives in spatial unit 7 and identifies as Hispanic. Let P be the space of valid probability distributions over demographic labels. A Bregman divergence (24) is a function df:P×P→ℝ that assigns to each pair q,r∈P a real number measuring the difference between them according to the formuladf(q,r)=f(q)−f(r)−⟨∇fr,q−r⟩.[1]In this expression, f is a strictly convex function on P, and ∇fr is its gradient evaluated at r. The class of Bregman divergences includes many useful distance-like functions as special cases, including the Kullback–Leibler divergence and Euclidean distance. The Bregman entropy of a demographic distribution pY is Hf(Y)=−f(pY), and the Bregman information (25) isIf(X,Y)=∑xpX(x)df(pY|X=x,pY).[2]In words, the Bregman information is an average of divergences of local demographic distributions from the global one weighted by the population count at each location.

The import of the Bregman formalism is that the majority of contemporary segregation measures, both categorical and ordinal, may be expressed as a Bregman information If(X,Y), sometimes normalized by the Bregman entropy Hf(Y). Such measures include the Information Theory Index (17), the Divergence Index (26), the Neighborhood Sorting Index (27), and three of the four ordinal measures of ref. 28. Table S1 in Supporting Information gives explicit formulae for these segregation measures in the Bregman formalism. The methods that we develop below are thus fully compatible with a wide range of contemporary categorical and ordinal segregation measures.

View this table:
  • View inline
  • View popup
Table S1.

Taxonomy of smoothing-based segregation measures according to functional form, Bregman divergence generator f, and smoothing kernel ϕ

We now formulate the regionalization problem. Let c:X→{1,…,k} be a function that assigns to each location x a region label c(x). We regard C=c(X) as a random variable and aim to choose c, such that the aggregation that it induces captured segregation at large spatial scales. The Chain Rule of Bregman information (29) offers a decomposition of the formIf(X,Y)=If(C,Y)+If(X,Y|C).[3]The term If(C,Y) gives the segregation captured at the aggregate spatial scale, and If(X,Y|C) is the residual segregation at lower scales. A good labeling function will tend to make the first term large. This motivates the following problem:c∗=argmaxcIf(c(X),Y)subject tospatial constraints.[4]In this paper, we will specify the spatial constraints in terms of contiguity and soft regularity requirements, but other spatial constraint formulations may be desirable in other contexts. In general, Eq. 4 is not efficiently solvable, and we therefore develop two approximate methods.

Our first method is a form of greedy information maximization. Whenever we merge two spatial units and combine their demographic counts, information is lost unless their distributions are identical. At each stage of the greedy algorithm, we identify the two adjacent spatial units for which this loss is minimized and merge them, forming a region. We continue this process until only the specified number of regions remains. The mathematical form of the information loss as well as the full specification of the algorithm itself are provided in Supporting Information.

This algorithm, a form of agglomerative clustering, has the virtues of computational performance and direct optimization of the between-clusters Bregman information. Additionally, it presents major advances relative to previous work in both profile and decomposition methods. The plot of information captured against number of regions shown in Fig. 2D provides a profile curve of segregation against the number of clusters. Unlike previous profile methods, these curves do not assume the scale to be global at any of its points and support mathematically precise decomposition claims about how much segregation is captured at a given scale. The agglomerative method has an attractive structural feature: to each boundary between regions, agglomerative clustering assigns an information value reflecting its contribution to overall segregation. The sum of all such information values is overall segregation If(X,Y). In Fig. 2A, for example, agglomerative clustering highlights the dividing line between the predominantly black urban core of Detroit and the predominantly white suburbs; this single boundary accounts for a full 44% of segregation in Detroit. The hierarchical nature of the algorithm additionally specifies smaller subdivisions nested within this dominating boundary. The seven regions shown account for 71% of total segregation.

Fig. 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 2.

Illustrative partitioning of Detroit, Chicago, and Philadelphia under (A–D) greedy and (E–H) spectral regionalization methods. The value of k gives the number of partitions, and the weight of each boundary reflects its contribution to overall segregation. In E–G, σ=30 was used for all spectral preprocessing. D and H give profile curves under each method. The dashed lines give the total segregation If(X,Y).

Fig. 2 B and C also highlights some characteristic limitations of greedy agglomerative methods. First, because greedy partitioning requires nothing more than a connectivity constraint between tracts, the regions that it finds may be highly irregular in shape, as seen in Chicago and especially, Philadelphia. Whether this is acceptable depends on the analytical context. Second, greedy partitioning is extremely sensitive to data perturbations, which can lead to unpredictable results. In Chicago, for example, the small region to the northeast does not appear sharply differentiated from the surrounding, predominantly white suburbs. However, the region to the west combines black and Hispanic neighborhoods, suggesting that the algorithm has missed a natural spatial boundary. We therefore seek alternative methodology for regionalization that combines the attractive features of agglomerative methods—fast computation and hierarchical visualization—with robustness to fine-scale data variation; the ability to control region shape; and a global view of spatial structure.

We therefore propose to coarse-grain the map into intermediate spatial units using a global algorithm and subsequently organize these units using agglomerative methods as before. The coarse-graining step plays a similar role to that of a spatial smoother that evens out small-scale irregularities in the data—but unlike a fixed-bandwidth smoother or bespoke neighborhood, we use a method that allows the spatial scale to vary throughout the study region. To carry out the coarse-graining step, we propose spectral graph partitioning. Spectral methods are well-suited for regionalization (22), as they approximately solve a normalized cut problem that is explicitly formulated in terms of spatial boundaries. Eigenvectors of the graph Laplacian correspond to interpretable graph partitions, as visualized in Fig. S1 in Supporting Information. The analyst controls the number of desired regions in the spectral stage as well as a spatial parameter σ that controls a tradeoff between detailed boundaries that may capture more structure and smoother boundaries that define more regularly shaped regions, such as those shown in Fig. S2 of Supporting Information. In the case of categorical variables, the parameter σ and the normalized cut problem have a natural statistical interpretation in terms of the likelihoods of confusing areal units within and between regions. This interpretation as well as a formal specification of the algorithm are provided in Supporting Information.

Fig. S1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. S1.

Illustrative eigenvectors in Detroit. Each eigenvector contains information about spatial demographic boundaries, with divisions deemed more important by the algorithm highlighted by eigenvectors corresponding to smaller k. In these visualizations, σ=30.

Fig. S2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. S2.

Spectral partitions of Detroit (A), Philadelphia (B), and Chicago (C) before hierarchical postprocessing.

Fig. 2 E–G illustrates the relative virtues and limitations of spectral preprocessing. In Detroit, the spectral preprocessing finds substantively identical regions to those found through pure greed. In Chicago and Philadelphia, spectral preprocessing results in regions with more regular shapes and distinguishes, for example, the western Hispanic and black regions missed by pure greedy partitioning in Chicago. However, information values for these partitions are slightly lower than the corresponding greedy partitions; the parameters discussed above allow the analyst to exercise control over this tradeoff.

These methods extend easily to the study of boundaries in time as well as space. This may be achieved by connecting time slices into a single temporal graph on which we perform regionalization. As an example, Fig. 3 shows the dynamics of segregation in Detroit over the time period 1990–2010, data for which we obtained from ref. 30 to obtain consistent spatial units across time. Fig. 3 A–C shows evolving demographics as well as k=7 labeled spatiotemporal regions obtained via spectral partitioning with hierarchical postprocessing. These methods allow us to directly read off the changing shape and contribution to segregation of spatial boundaries. From 1990 to 2000, overall segregation increased in Detroit. This largely reflects increasing isolation of suburban whites (region B) from blacks and Hispanics, but within this overall trend, there are easily missed nuances. These nuances are shown in Fig. 3 by black boundaries, along which segregation increased, and white ones, along which it decreased, relative to the previous time step. For example, separation between the Grosse Point communities (cluster C) and the predominantly black cluster A decreased. Indeed, the analysis suggests that segregation in and around Grosse Point changed rapidly in this time period; a cluster of tracts that were demographically more similar to Grosse Point in 1990 more closely resembles central Detroit in 2000, and the boundary between the two clusters shifts accordingly. Overall segregation decreased in 2010, largely because of amelioration of the spatial divisions that had intensified in the previous decade. However, the growing Asian population of Hamtramck (cluster F) increasingly distinguished it from its surroundings. Regionalization allows us to quantify not only the changing magnitude of segregation in Detroit but also its changing structure.

Fig. 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 3.

Evolving spatial boundaries in Detroit. The size of each spatial boundary corresponds to its information value as identified via greedy partitioning with spectral preprocessing. In B and C, white boundaries have decreased in segregation magnitude, and black ones have increased relative to the previous time step. The demographic breakdowns in D correspond to the region labels of A. In this figure only, data were used on a tract level as supplied by ref. 30.

In practice, the local information density discussed below or other exploratory methods should be used to determine whether a given dataset is indeed “regionalizable” into spatially distinct demographic regions. The analyst must then choose a desired number of final clusters and in the case of spectral preprocessing, the hyperparameter σ and number of intermediate clusters. A simple approach to the selection problem is to fix a desired number of final clusters and then conduct a grid search over σ and the number of intermediate clusters using the mutual information as a loss function. More detailed approaches involving the inspection of the spectrum of the graph Laplacian are also possible.

Local Scale of Segregation

As discussed above, classical profile methods view scale as a global property that is constant in geographic space. Regionalization methods present one path past this limitation. In this section, we develop another such path in the form of a local measure of spatial scale. This measure highlights demographic transition areas in cities, and its average value may be used to compare cities according to the extent to which demographic difference exists on small spatial scales.

Let x0 be a fixed location, and let Br(x0) be the geographic neighborhood of radius r centered at x0. The quantity jr(x0)≜If(X,Y|X∈Br(x0)) measures the degree of local segregation in this small neighborhood of x0. For mathematical convenience, we view the joint distribution p(x,y) as a smooth function of x. The local information density is defined as the limit of segregation per unit area when r grows small:j(x0)≜limr→0jr(x0)πr2.[5]By definition, we would expect that j(x0) will be large when there are many locales in which high degrees of segregation exist on small spatial scales. To work with the local information density, we require proof that the limit exists and a practical computational formula, neither of which are provided by Eq. 5.

We may obtain both using an information-geometric (31) framework visualized in Fig. S3 in Supporting Information. The relationship between demographic and geographic variation is summarized by the pullback metric tensorgf(x)=12(∇xpY|X=x)THfpY|X=x∇xpY|X=x.The Hessian Hfp consists of the second derivatives of f evaluated at p. When working in 2D geographic space, gf(x) is a 2×2 matrix with entries that vary with location x. Intuitively, when the scale of segregation is small, the entries of gf are large, indicating that small changes in geography correspond to large changes in demographics. The metric tensor is deeply connected to classical statistics: when df is the Kullback–Leibler divergence, gf is the Fisher Information matrix for the model pY|X, where X is viewed as a deterministic parameter.

Fig. S3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. S3.

Construction of the approximate information manifold M. Starting with spatial data (A), we construct an adjacency network G (B) which approximates the topology of a compact submanifold in ℝ2. The map α embeds G in P (C).

The following theorem relates the local information density to the pullback metric.

Theorem 1.

The local information density j defined by Eq. 5 exists everywhere, and its value at a point x0 isj(x0)=14πtrace gf(x0).[6]

The trace of gf(x) is the sum of its diagonal entries. Theorem 1 ensures that the local information density is well-defined and provides a convenient way to compute it that does not depend on limiting operations. The proof of Theorem 1, as well as the required computation of the spatial derivative ∇xpY|X=x is outlined in Supporting Information.

Fig. 4 shows the local information density computed using Theorem 1 in Detroit and Philadelphia. In each city, the local information density is highest at boundaries between monoracial regions and lowest in areas in which the demographic distribution is constant in space. The local information density thus highlights areas that will tend to be divided by the boundaries drawn by the regionalization methods developed above.

Fig. 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 4.

Local information density in (A) Detroit and (B) Philadelphia computed using Eq. 6.

When the local information density is high at many spatial locations, this indicates that large amounts of spatial segregation exist at small spatial scales. This occurs in the case of Philadelphia and contrasts with the cases of Detroit and Atlanta. The mean value J(X,Y)≜⟨j(X)⟩ of the local information density thus provides a measure of average spatial scale in each city. Fig. 5 plots this measure against the overall segregation I(X,Y) for a range of selected cities. Cities, such as Detroit, Atlanta, and Chicago, are sharply segregated into megaregions; these concentrate toward the lower right of the plot. Cities in the upper right, such as New York and Philadelphia, are also sharply segregated but at much smaller characteristic spatial scales. These results are directionally aligned with the similarly motivated segregation ratio of ref. 8; however, the segregation ratio does not provide any local information concerning the dependence of spatial scale on location. The local information density is, therefore, to be preferred for detailed study of the structure of individual cities.

Fig. 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 5.

Mutual information and mean local information density of selected cities.

Discussion

We have developed a suite of methods for studying the structure of spatial segregation using modern information theory and machine learning. These methods advance profile curve approaches by generating curves that do not assume global scale and that support decomposition claims. They advance decomposition methods by constructing nonarbitrary boundaries based on spatial demographic trends, which may be studied directly or used as input to further analysis. The local information density may be averaged and compared between cities, but unlike existing scale measurements, it may also be used for detailed analysis to study how characteristic spatial scales of segregation vary across the region of analysis. In sum, these methods enable systematic, scalable analysis of the local and global structures of spatial segregation.

Although our example has been ethnoracial segregation, these methods generalize in two important ways. First, they are not limited to demographic study; arbitrary compositional data with spatial correlations may be used, such as those that frequently arise in ecology, geology, and geography. Second, they are not limited to categorical variables; the formalism of Bregman information allows the use of these methods for ordinal and partially ordinal variables as well. Many other generalizations and modifications are possible. It may, for example, be of interest to conduct soft regionalization, in which each spatial unit is assigned a fractional membership score for each region. Such analysis may be especially appropriate when distinct spatial regions are separated by soft gradients of demographic change rather than sharp boundaries. In the case of spectral partitioning, this may be done by replacing the hard k-means subroutine with a Gaussian mixture model or by using more advanced methods, such as those in ref. 32.

The primary limitation of these methods is their noninferential character. Although information theory is deeply intertwined with statistics, our methods use no explicit probabilistic model of spatial variation, making unavailable formal inferential procedures, such as model selection. Although the regionalization problem that we solve bears resemblance to the problem of community detection in annotated networks addressed by ref. 33, that framework does not incorporate spatial structure. An approach to segregation that supports both detailed spatial structure and formal inference in the context of segregation would be of considerable interest to both theorists and practitioners.

Materials and Methods

Data Access.

All data used in this study are freely available from the Five-Year Estimates of the 2013 American Community Survey (ACS), Table B03002, and the Longitudinal Tract Database of ref. 30. We used refs. 34 and 35 to programmatically access the 2013 ACS.

Software Repositories.

We are pleased to make available two software repositories accompanying this analysis. Package compx for the R programing language implements greedy and spectral regionalization and computes the information measures If(X,Y) and J(X,Y). The package may be accessed at https://github.com/PhilChodrow/compx. The analysis repository for this project is sufficient to fully reproduce the results of this paper. The project files are available at https://github.com/PhilChodrow/spatial_complexity.

Delimiting Cities.

The delimiting of cities based on nonarbitrary population densities or natural boundaries is an area of active discussion in urban planning (36–37). Our simple approach in this study was to analyze the region composed of all counties in which some or all of the city’s municipal boundaries lie.

Data Visualization.

Ref. 38 was used for all data visualization.

1. Visualization of Demographic Distributions

The text makes heavy use of a map visualization, which color-codes tracts according to the group that is most overrepresented there relative to the region-wide distribution. Formally, for each spatial location x, the hue is the argument y such thaty∗=argmaxy pY|X(y|x)logpY|X(y|x)pY(y),[S1]and the saturation corresponds to the value of the maximization.

2. Generality of Bregman Information

2.1. Smoothing-Based Measures Through Bregman Divergences.

The purpose of this section is to show that many state of the art formulations of evenness and exposure are special cases of the information measures described in this article. We will need a variety of divergence-generating functions; the following theorem proves that they are indeed valid as such. In Table S1, we organize the most commonly used and cited smoothing-based measures of segregation according to their information-theoretic structure.

Theorem 2.

The following functions f:P→ℝ are strictly convex on their domain and continuously differentiable on intP.

  • Euclidean norm: f1(p)=∥p∥2.

  • Negative entropy: f2(p)=∑ipilog⁡pi.

  • Cumulative entropy: f3(p)=(g3∘σ)(p), where g3(c)=∑icilog⁡ci.

  • Cumulative variance: f4(p)=(g4∘σ)(p), where g4(c)=−4∑jcj(1−cj).

  • Cumulative root variance: f5(p)=(g5∘σ)(p), where g5(c)=−2∑jcj(1−cj).

  • Square mean: Assume additionally that the alphabet Y is an alphabet of integers, and let f6(p)=(∑ipiyi)2.

The function σ:P→ℝ+n is the cumulative summation map given by σ(p)k=∑j=1kpj.

Proof:

Continuous differentiability of each function is clear, since σ is a linear map. Strict convexity of f1 and f2 is well-known, and therefore, it remains to prove strict convexity of f3–f6.

  • Cumulative entropy: The map σ is linear, and f3 is, therefore, strictly convex if g3 is strictly convex on R+n. We note that

∂2g3(c)∂ci∂cj={1ci i=j0 otherwise.

  • We, therefore, have that Hg3(c) is diagonal and consists of positive entries everywhere in ℝ+n and that it is, therefore, positive definite as required.

  • Cumulative variance: Convexity is unaffected by linear terms. Since g4(c)−4eTc=4∥c∥2 is strictly convex on ℝ+n, so is g4.

  • Cumulative root variance: We have

∂2g5(c)∂ci∂cj={12(ci(1−ci))32,i=j0 otherwise.

  • As with g5(c), Hg5 is, therefore, diagonal and consists of positive entries, and it is, therefore, positive definite.

  • Square mean: The map y↦∑ipiyi is linear, and the squaring operation is strictly convex.

Corollary 1.

The spatial exposure index between group m and group n is defined by ref. 6 asPmn=∫p(m|x)p(n|x)dℙX(x).The spatial exposure index satisfies∑m≠nPmn=1+H(Y|X),where the divergence generating function is the Euclidean norm f1⁢(p)=∥p∥2.

Corollary 2.

The Divergence Index D of ref. 26 satisfiesD=I(X,Y),where the divergence generating function is the entropy f2(p)=∑ipilog⁡pi.

Corollary 3.

The Information Theory Index H∼ of refs. 6, 17, and 28 satisfiesH∼=I(X,Y)H(Y),where the divergence generating function is the entropy f2(p)=∑ipilog⁡pi.

Proof:

Equation 7 of ref. 6, for example, defines H∼ asH∼=H(Y)−H(Y|X)H(Y),from which the result follows by the standard identity I(X,Y)=H(Y)−H(Y|X).

Corollary 4.

The Ordinal Information Theory Index HO, the Ordinal Variation Ratio RO, and the Ordinal Square Root Index SO of ref. 28 are all of the formI(X,Y)H(Y),where the divergence-generating functions are f3, f4, and f5, respectively.

Corollary 5.

Let N be a random variable with H(N|X)= 0; that is, N is completely determined by X. The Generalized Neighborhood Sorting Index G of ref. 27 satisfiesG=I∼(N,Y)I(X,Y),where the divergence generating function is f6 and where I∼(N,Y) is computed with respect to the spatially smoothed distribution Φ(p), in which Φ is the uniform ego-network smoother of order n.

Proof:

Since we have already proved the convexity and continuous differentiability of f6, the only task required is to cast equation 4 of ref. 27 in the claimed form. The numerator of this equation may be written∫df(p∼(⋅|n),p¯)dℙN(n),and the denominator may be written∫df(p∼(⋅|x),p¯)dℙX(x),as required.

3. Algorithms

Let c:X→{1,…,k} be a function that assigns to each location x a cluster label c(x). We regard C=c(X) as a random variable and aim to choose c, such that the aggregation that it induces captured segregation at large spatial scales. The Chain Rule of Bregman information (29) offers a decomposition of the formIf(X,Y)=If(C,Y)+If(X,Y|C).[S2]The term If(C,Y) gives the segregation captured at the aggregate spatial scale, and If(X,Y|C) is the residual segregation at lower scales. A good labeling function will tend to make the first term large. This motivates the following problem:c∗=argmaxcIf(c(X),Y)subject tospatial constraints.[S4]

We consider two approaches to this problem. Agglomerative hierarchical clustering directly addresses this problem using a stage-wise greedy approach, while spectral partitioning solves a related, first-order approximation to this problem in an approximate but global fashion.

3.1. Agglomerative Partitioning.

Given a Bregman divergence, the Jensen–Bregman divergence (39) between locations is defined by the formuladjb(x1,x2)=pX(x1)df(pY|X=x1,pY|X∈{x1,x2})+pX(x2)df(pY|X=x2,pY|X∈{x1,x2}).The Jensen–Bregman divergence djb(x1,x2) measures the Bregman information loss associated with merging spatial units x1 and x2 into a single unit and combining their demographics. When the divergence is small, x1 and x2 have similar demographic characteristics, and we may cease to distinguish them with only small information loss. Greedy regionalization proceeds stage-wise. At each stage, we identify two adjacent spatial units that can be merged with minimal information loss; formally, we solvex1∗,x2∗=argmin(x1,x2)∈Edjb(x1,x2).[S4]We then merge x1∗ and x2∗, repeating until only the desired number of spatial units remains. This procedure is formalized in Algorithm 1.

Algorithm 1: Agglomerative partitioning

1: function AGGLOMPARTITION(R, k)

2: while |R|>k do

3:x1∗,x2∗←argmin(x1,x2)∈Edjb(x1,x2).

4:aggregate(x1∗,x2∗)

5: end while

6: return R

7: end function

3.2. Spectral Partitioning.

Spectral graph partitioning according to ref. 40 aims to approximately solve the normalized cut problem. Given an undirected graph G with weights wij between nodes i and j, spectral graph partitioning seeks cuts in G, such that the subsets defined by the cuts have strong connections within themselves but only weak ones between them. Formally, it aims to approximately solve the normalized cut problem:c∗=argminc:N(G)→{1,…,k}∑ℓcut(c,ℓ)vol(c,ℓ),[S5]where cut(c,ℓ)=∑c(i)=ℓ,c(j)≠ℓwij and vol(c,ℓ)=∑c(i)=ℓ,c(j)wij. To use spectral partitioning, we construct the entries of the weight matrix W in terms of the Jensen–Bregman divergence:wij={e−djb(p,q)2σ (i,j)∈E0 otherwise.[S6]To approximately solve Eq. S5, we form the “random walk” normalized Laplacian (41) given byL=D−1(D−W),[S7]where D=diag(We) is the diagonal matrix of row sums of W. The eigenvectors of L then form a feature space, in which each additional dimension contains increasingly detailed information about the cut structure of G. Performing k-means clustering on the k eigenvectors corresponding to the k smallest eigenvalues then yields the partition. This procedure is formalized in Algorithm 2.

When df is the Kullback–Leibler divergence, If(X,Y) is the Shannon information between X and Y. In this case, the weight matrix W, the hyperparameter σ, and normalized cut problem [S5] all have attractive probabilistic interpretations. In this case, the Chernoff bound implies that wij is asymptotically equal to the probability of randomly selecting pX(x1)/(2σ) residents from areal unit x1 with demographic distribution equal to pY|X∈{x1,x2} and then similarly selecting pX(x2)/(2σ) residents from areal unit x2 with empirical distribution equal to pY|X∈{x1,x2}. This probability is small when x1 and x2 have very different demographic distributions. The parameter σ may be viewed as a regularizer; when it is large, the “effective populations” of each spatial unit are small, and the weights wij, therefore, tend to be more uniform. The expressions cut(c,ℓ) and vol(c,ℓ) are then interpretable as first-order approximations to the probability of confusing two units within region ℓ and confusing two units along the boundary of region ℓ, respectively.

Algorithm 2: Spectral partitioning

1: function SPECTRALPARTITION(R, σ, k)

2: for xi,xj∈R do

3:Aij←exp[−djb(xi,xj)/2σ]

4: end for

5: D←diag(eTA)

6: L←D−1(D−A)

7: V←[v1,…,vk] ▷ vℓ the ℓth eigenvector of L

8: return kmeans(V,k)

9: end function

4. Sample Eigenvectors for Detroit

Spectral partitioning inspects the eigenvectors of the normalized Laplacian L to find regional structure. We show in Fig. S1 a sampling of eigenvectors in Detroit to illustrate the information that they contain. The k-means algorithm in the eigenspace is used to optimally aggregate this information into a final regionalization (42).

5. Spectral Partitioning Without Hierarchical Postprocessing

In the text, we show the results of spectral partitioning with hierarchical postprocessing to generate final regionalizations for Detroit, Chicago, and Philadelphia. For illustrative purposes, we show in Fig. 2 the intermediate stage of spectral partitioning using σ=30. The values of k may be chosen by inspecting the spectrum of the normalized Laplacian; additional details are in ref. 41. Fig. S2A may be usefully contrasted with Fig. 1.

6. Manifold View of Local Scale

6.1. The Information Manifold and the Metric Tensor.

A geometric view of the metric tensor g may be obtained from a manifold view of spatial compositional data analysis. Fig. S3 illustrates this framework in a selected subregion of Detroit using just three ethnoracial categories for visualization purposes. Fig. S3A shows the spatial units as provided by the US Census dataset. In Fig. S3B, we construct the adjacency network of spatial units and explicitly show the demographic composition of each one. By mapping each spatial unit to its demographic distribution, we obtain a set of points on the probability simplex P, with edges between geographically adjacent units (Fig. S3C). We view this construction as approximating a smooth submanifold M of P. Distances between points in this space are measured according to the metric tensor g, which is fully determined by the Bregman divergence f as discussed in the text. The role of the metric tensor is to provide a method of translating between geographic distances and demographic distances. This underlies its role as a local scale measure: when one can travel in geographic space without demographic change, this implies that spatial variation, if present, must be locally on larger scales. In particular, the geodesic distance between points on M is defined as the shortest path along M between those points under g:δ(x1,x2)=argminγ∈CM(x1,x2)∫01g(γ′(t),γ′(t))dt.[S8]CM(x1,x2) is the set of unit-speed curves along M, such that, for any γ∈C, γ(0)=x1 and γ(1)=x2. We may interpret δ(p1,p2) as the minimal amount of demographic change one would undergo in traveling from location x1 to location x2.

6.2. Proof of Theorem 1.

We now present a proof of the fully general statement of Theorem 1. Define the local information j(x0) asj(x0)≜limr→0I(X,Y|X∈Br(x0))r2,[S9]where Br(x0)={x∈R|∥x−x0∥2≤r2}. For simplicity, fix x0, and let Br=Br(x0). In this section, we will prove the following theorem.

Theorem.

Let R⊂ℝn be compact and of nonzero measure under a finite measure μ that is absolutely continuous with respect to the Lebesgue measure λn. Assume that the map α:x↦pY|X=x is smooth. Then, the local information j⁢(x) exists at all x∈int⁢(R) and satisfiesj(x0)=121n+2tr gx.[S10]

The proof proceeds essentially by direct calculation; to structure the calculation in a coherent way, we divide it into a series of lemmas.

Recall that, by hypothesis, ℙX is absolutely continuous with respect to the Lebesgue measure on ℝn and therefore, has a Radon–Nikodym derivative (probability density function) pX, so that∫fdℙX=∫fpX dλfor any f. We also assumed that pX is smooth and will proceed to take derivatives and Taylor expansions accordingly.

Let Vn(r) be the volume of the n ball of radius r and Sn−1(r) be the volume of the n−1 sphere of radius r. We will abbreviate Vn=Vn(1) and Sn=Sn(1). We have Vn(r)=rnVn and Sn−1(r)=rn−1Sn.

We use the standard symbol o(f(r)) to denote terms satisfyinglimr→0o(f(r))f(r)=0.

Lemma 1.

We haveℙ(X∈Br)=pX(x0)Vn(r)+o(rn+1).

Proof:

We computeℙ(x∈Br)=∫BrdℙX=∫BrpX(x)dλ(x)=∫Br[pX(x0)+DpX(x0)(x−x0)+o(∥x−x0∥)]dλ(x).The middle term in the integral vanishes via spherical symmetry, and we obtainℙ(x∈Br)=∫Br[pX(x0)+o(∥x−x0∥)]dλ(x)=pX(x0)Vn(r)+∫Bro(∥x−x0∥)dλ(x)=pX(x0)Vn(r)+∫Bro(r)dλ(x)=pX(x0)Vn(r)+o(rn+1),as was to be shown.

For notational convenience, let α(x)=p(⋅|x) and a=p(⋅|X∈Br).

Lemma 2.

We havea=α(x0)+o(r).

Proof:

We computea=𝔼[α(X)|X∈Br]=∫α(x)dℙX|X∈Br(x)=∫α(x)p(x|X∈Br)dλ(x)=1ℙ(X∈Br)∫Brα(x)p(x)dλ(x).Since α(x) and p(x) are both smooth, their product is as well, and we may Taylor expand about x=x0 to obtaina=1ℙ(X∈Br)∫Br[α(x0)p(x0)+T(x−x0)+o(∥x−x0∥)]dλ(x),where T stands for a linear map that we need not calculate, since this term vanishes in the integral through spherical symmetry. Thus,a=1ℙ(X∈Br)∫Br[α(x0)p(x0)+o(∥x−x0∥)]dλ(x)=α(x0)p(x0)Vn(r)+o(rn+1)ℙ(X∈Br)=α(x0)p(x0)Vn(r)+o(rn+1)p(x0)Vn(r)+o(rn+1)=α(x0)+o(r),(Lemma 1)as was to be shown.

Lemma 3.

For any x∈Br,df(p(⋅|x),p(⋅|X∈Br))=df(p(⋅|x),p(⋅|x0))+o(r2).[S11]

Proof:

The proof proceeds by exploiting the local quadratic structure of the Bregman divergence df:df(q+δ,q)=12Hfq(δ,δ)+o(∥δ∥2).[S12]Using the same notations α(x)=p(⋅|x) and a=p(⋅|X∈Br) from Eq. S12, we obtaindf(p(⋅|x),p(⋅|X∈Br))=df(α(x),a)=Hfa(α(x)−a,α(x)−a)+o(∥α(x)−a∥2).Since p(y|x) is smooth as a function of x, so is α(x), and the final term is, therefore, o(∥r∥2). Rearranging terms, we can writedf(p(⋅|x),p(⋅|X∈Br))=Hfα(x)(α(x)−a,α(x)−a)+(Hfa−Hfα(x))(α(x)−a,α(x)−a)+o(∥α(x)−a∥2.Since f is smooth, the components of the tensor Hfa−Hfα(x) are o(r). Furthermore, by Lemma 2, α(x)−a=α(x)−α(x0)+o(r)=O(r)+o(r). The entire second term is, therefore, o(r2). Turning to the first term, we haveHfα(x)(α(x)−a,α(x)−a)=Hfα(x)(α(x)−α(x0)+o(r),α(x)−α(x0)+o(r))=Hfα(x)(α(x)−α(x0),α(x)−α(x0))+o(r2)=df(α(x),α(x0))+o(r2),where in the second line, we have used the fact that α(x)−α(x0)=O(r). This completes the proof.

Lemma 4.

Let T be a real, symmetric bilinear form and Δ2 be the diagonal operator. Then,∫Br(0)T∘Δ2dλ=rn+2Sn−1n(n+2)tr(T).

Proof:

By the spectral theorem, there exists an orthonormal basis, in which the matrix A of T is diagonal, and its entries are the eigenvalues {λi} of T. Since Br(0) is radially symmetric about the origin, we may integrate in this basis instead, obtaining∫Br(0)T∘Δ2dλ=∫Br(0)vTAvdλ(v).Since A is diagonal,∫Br(0)vTAvdλ(v)=∫Br(0)∑iλivi2dλ(v)=∑iλi∫Brvi2dλ.By spherical symmetry, the integrals inside the sum are all equal to 1/n∫Br(0)∥v∥2dλ(v), and we obtain∑iλi∫Brvi2dλ=(∑iλi)n∫Br(0)∥v∥2dλ(v)=tr(T)n∫Br(0)∥v∥2dλ(v)=tr(T)n∫ρ∈[0,r]ρ2Sn−1(ρ)dρ(polar coordinate transform)=tr(T)Sn−1n∫ρ∈[0,r]ρn+1dρ=Sn−1rn+2n(n+2)tr(T),as was to be shown.

We are now prepared to prove the theorem. We haveI(X,Y|X∈Br)=∫Brdf(p(⋅|x),p(⋅|X∈Br))dℙX|X∈Br(x)=∫Brdf(α(x),a)dℙX|X∈Br(x)=∫Brdf(α(x),a)p(x|X∈Br)dλ(x)=1ℙ(X∈Br)∫Brdf(α(x),a)p(x)dλ(x)=1p(x0)Vn(r+o(rn+1))×∫Brdf(α(x),a)p(x)dλ(x).(Lemma 1)Focusing now on the integral, we have∫Brdf(α(x),a)p(x)dλ(x)=∫Br[df(α(x),α(x0))+o(r2)]p(x)dλ(x)=12∫Br[Hfα(x0)(α(x)−α(x0),α(x)−α(x0))+o(r2)]×p(x)dλ(x).Since α(x)−α(x0)=Dαx0(x−x0)+o(r), we can rewrite the integrand, obtaining∫Brdf(α(x),a)p(x)dλ(x)=∫Br[(Hfα(x0)∘Δ2∘Dαx0)(x−x0)+o(r2)]p(x)dλ(x).Since p(x) is smooth, p(x)−p(x0)=o(r), and we can further rewrite the integral as∫Brdf(α(x),a)p(x)dλ(x)=∫Br[(Hfα(x0)∘Δ2∘Dαx0)(x−x0)+o(r2)]p(x0)dλ(x)+o(rn+2)=p(x0)∫Br[(Hfα(x0)∘Δ2∘Dαx0)(x−x0)]dλ(x)+o(rn+2)=p(x0)∫Br(0)(Hfα(x0)∘Δ2∘Dαx0)dλ+o(rn+2).By Lemma 4,∫Br(0)(Hfα(x0)∘Δ2∘Dαx0)dλ=Sn−1rn+2n(n+2)tr(Hfα(x0)∘Δ2∘Dαx0)=Sn−1rn+2n(n+2)tr(gx0).Returning now to the main calculation, we haveI(X,Y|X∈Br)=1p(x0)Vn(r)+o(rn+1)×[p(x0)12Sn−1rn+2n(n+2)tr(gx0)+o(rn+2)].We can cancel terms using the fact that Sn−1/Vn=n, and we obtainI(X,Y|X∈Br)=12r2n+2tr(gx0)+o(r2).Dividing through by r2 and taking the limit as r→0 proves the theorem.

6.3. Numerical Computation of Spatial Derivatives.

Computation of the metric tensor gx requires the estimation of the derivative Dα of the attribute map at x as well as the computation the Hessian tensor Hf at x. For most common choices of f, the Hessian may be computed analytically, and therefore, we focus on the computation of the derivative Dα. Since the direct computation of different quotients typically leads to numerical instability, we instead use a more robust method based on weighted linear regression. The fundamental idea is to regress the attribute differences α(xi)−α(x) on the geographic displacements xi−x; the regression coefficients will then approximate the components of the derivative Dα. Let E(x) denote the ego network of node x in the geographic network of Fig. S2B. Our approximation formula isDαx≈(𝐗T𝐖𝐗)−1𝐗T𝐖𝐘x,and 𝐗, 𝐖, and 𝐘x are defined below.

𝐗 is the matrix with the ith row that is the difference xi−x for each xi∈E(x).

𝐖 is a diagonal weighting matrix that prioritizes tracts closer to the origin x. We used a Gaussian radial basis weight, yielding𝐖ij={exp[−∥xi−x∥22σ]i=j0otherwise,

  • where σ is a tunable characteristic length scale set to 10 km in our computations, corresponding to very weak weighting.

  • 𝐘x is the matrix with the ith row that is the vector pY|X=xi−pY|X=x.

Acknowledgments

I am grateful to Marta C. González for helpful discussions. This article is based on work supported by National Science Foundation Graduate Research Fellowship Grant 1122374, the Massachusetts Institute of Technology and King Abdulaziz City for Science and Technology Center for Complex Engineering Systems, and the Phillips Corporation.

Footnotes

  • ↵1Email: pchodrow{at}mit.edu.
  • Author contributions: P.S.C. designed research, performed research, contributed new reagents/analytic tools, analyzed data, and wrote the paper.

  • The author declares no conflict of interest.

  • This article is a PNAS Direct Submission.

  • This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1708201114/-/DCSupplemental.

Published under the PNAS license.

References

  1. ↵
    1. Iceland J,
    2. Weinberg D,
    3. Hughes L
    (2014) The residential segregation of detailed Hispanic and Asian groups in the United States: 1980-2010. Demogr Res 31:593–624.
    .
    OpenUrl
  2. ↵
    1. Firebaugh G,
    2. Farrell CR
    (2016) Still large, but narrowing: The sizable decline in racial neighborhood inequality in metropolitan America, 1980–2010. Demography 53:139–164.
    .
    OpenUrlCrossRefPubMed
  3. ↵
    1. Logan JR,
    2. Stults BJ,
    3. Farley R
    (2004) Segregation of minorities in the metropolis: Two decades of change. Demography 41:1–22.
    .
    OpenUrlCrossRefPubMed
  4. ↵
    1. Massey DS,
    2. Denton NA
    (1988) The dimensions of residential segregation. Soc Forces 67:281–315.
    .
    OpenUrlCrossRef
  5. ↵
    1. Reardon SF,
    2. Firebaugh G
    (2002) Measures of multigroup segregation. Sociol Methodol 32:33–67.
    .
    OpenUrlCrossRef
  6. ↵
    1. Reardon SF,
    2. O’Sullivan D
    (2004) Measures of spatial segregation. Sociol Methodol 34:121–162.
    .
    OpenUrlCrossRef
  7. ↵
    1. Reardon SF, et al.
    (2009) Race and space in the 1990s: Changes in the geographic scale of racial residential segregation, 1990-2000. Soc Sci Res 38:55–70.
    .
    OpenUrlCrossRefPubMed
  8. ↵
    1. Reardon SF, et al.
    (2008) The geographic scale of metropolitan racial segregation. Demography 45:489–514.
    .
    OpenUrlCrossRefPubMed
  9. ↵
    1. Fowler SC
    (2016) Segregation as a multiscalar phenomenon and its implications for neighborhood-scale research: The case of South Seattle 1990–2010. Urban Geogr 37:1–25.
    .
    OpenUrl
  10. ↵
    1. Lee BA, et al.
    (2008) Beyond the census tract: Patterns and determinants of racial segregation at multiple geographic scales. Am Socio Rev 73:766–791.
    .
    OpenUrl
  11. ↵
    1. Östh J,
    2. Malmberg B,
    3. Andersson E
    (2015) Analysing segregation using individualized neighborhoods. Socio-Spatial Segregation: Concepts, Processes, and Outcomes, eds Lloyd CD, Shuttleworth I, Wong DWS (Policy, Bristol, UK), pp 135–162.
    .
  12. ↵
    1. Clark WAV,
    2. Anderson E,
    3. Östh J,
    4. Malmberg B
    (2015) A multiscalar analysis of neighborhood composition in Los Angeles, 2000–2010: A location-based approach to segregation and diversity. Ann Assoc Am Geogr 105:1260–1284.
    .
    OpenUrlCrossRef
  13. ↵
    1. Östh J,
    2. Clark WAV,
    3. Malmberg B
    (2014) Measuring the scale of segregation using k-nearest neighbor aggregates. Geogr Anal 47:34–49.
    .
    OpenUrl
  14. ↵
    1. Hennerdal P,
    2. Nielsen MM
    (2017) A multiscalar approach for identifying clusters and segregation patterns that avoids the modifiable areal unit problem. Ann Am Assoc Geogr 107:555–574.
    .
    OpenUrl
  15. ↵
    1. Fowler CS,
    2. Lee BA,
    3. Matthews SA
    (2016) The contributions of places to metropolitan ethnoracial diversity and segregation: Decomposing change across space and time. Demography 53:1955–1977.
    .
    OpenUrl
  16. ↵
    1. Lichter DT,
    2. Parisi D,
    3. Taquino MC
    (2015) Toward a new macro-segregation? Decomposing segregation within and between metropolitan cities and suburbs. Am Socio Rev 80:843–873.
    .
    OpenUrl
  17. ↵
    1. Theil H,
    2. Finezza AJ
    (1971) A note on the measurement of racial integration of schools by means of informational concepts. J Math Sociol 1:187–194.
    .
    OpenUrlCrossRef
  18. ↵
    1. Openshaw S,
    2. Taylor P
    (1981) The modifiable areal unit problem. Quantitative Geography: A British View, eds Wrigley N, Bennett R (Routledge and Kegan Paul, London).
    .
  19. ↵
    1. Bradley JR,
    2. Wikle CK,
    3. Holan SH
    (2017) Regionalization of multiscale spatial processes using a criterion for spatial aggregation error. J R Stat Soc Series B Stat Methodol 79:815–832.
    .
    OpenUrl
  20. ↵
    1. Duque JC,
    2. Rey SJ
    (2012) The max-p regions problem. J Reg Sci 52:397–419.
    .
    OpenUrlCrossRef
  21. ↵
    1. Garreton M,
    2. Sánchez R
    (2016) Identifying an optimal analysis level in multiscalar regionalization: A study case of social distress in greater Santiago. Comput Environ Urban Syst 56:14–24.
    .
    OpenUrl
  22. ↵
    1. Yuan S,
    2. Tan PN,
    3. Cheruvelil KS,
    4. Collins SM,
    5. Soranno PA
    (2015) Constrained spectral clustering for regionalization: Exploring the trade-off between spatial contiguity and landscape homogeneity. Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (IEEE, New York), pp 1–10.
    .
  23. ↵
    1. Spielman SE,
    2. Logan JR
    (2013) Using high-resolution population data to identify neighborhoods and establish their boundaries. Ann Assoc Am Geogr 103:67–84.
    .
    OpenUrlCrossRefPubMed
  24. ↵
    1. Bregman LM
    (1967) The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput Math and Math Phys 7:200–217.
    .
    OpenUrl
  25. ↵
    1. Banerjee A,
    2. Merugu S,
    3. Dhillon IS,
    4. Ghosh J
    (2005) Clustering with Bregman divergences. J Mach Learn Res 6:1705–1749.
    .
    OpenUrl
  26. ↵
    1. Roberto E
    (2015) Measuring inequality and segregation. arXiv:1508.01167 pp 1–26.
    .
  27. ↵
    1. Jargowsky PA,
    2. Kim J
    (2005) A measure of spatial segregation: The generalized neighborhood sorting index (National Poverty Center, Dallas), Working Paper 05–3.
    .
  28. ↵
    1. Reardon SF
    (2008) Measures of ordinal segregation. Occupational and Residential Segregation, Research on Economic Inequality, eds Flückiger Y, Reardon SF, Silber J (Emerald Group Publishing, Somerville, MA), Vol 17, pp 129–155.
    .
  29. ↵
    1. Dhillon IS,
    2. Mallela S,
    3. Kumar R
    (2003) A divisive information-theoretic feature clustering algorithm for text classification. J Mach Learn Res 3:1265–1287.
    .
    OpenUrlCrossRef
  30. ↵
    1. Logan JR,
    2. Xu Z,
    3. Stults BJ
    (2014) Interpolating U.S. Decennial census tract data from as early as 1970 to 2010: A longitudinal tract database. Prof Geogr 66:412–420.
    .
    OpenUrlCrossRefPubMed
    1. Amari SI,
    2. Nagaoka H
    (2007) Methods of Information Geometry (American Mathematical Society, Providence, RI), p 206.
    .
  31. ↵
    1. Nock R,
    2. Vaillant P,
    3. Henry C,
    4. Nielsen F
    (2009) Soft memberships for spectral clustering, with application to permeable language distinction. Pattern Recogn 42:43–53.
    .
    OpenUrl
  32. ↵
    1. Newman MEJ,
    2. Clauset A
    (2016) Structure and inference in annotated networks. Nat Commun 7:1–16.
    .
    OpenUrlCrossRefPubMed
  33. ↵
    1. Walker K
    (2016) tigris Version 0.3: Load Census TIGER/Line Shapefiles into R.
    .
  34. ↵
    1. Glenn EH
    (2016) acs Version 2.0: Download, Manipulate, and Present American Community Survey and Decennial Data from the US Census.
    .
  35. ↵
    1. Rozenfeld HD, et al.
    (2008) Laws of population growth. Proc Natl Acad Sci USA 105:18702–18707.
    .
    OpenUrlAbstract/FREE Full Text
  36. ↵
    1. Rozenfeld HD,
    2. Rybski D,
    3. Gabaix X,
    4. Makse Ha
    (2011) The area and population of cities: New insights from a different perspective on cities. Am Econ Rev 101:2205–2225.
    .
    OpenUrlCrossRef
  37. ↵
    1. Wickham H
    (2009) ggplot2: Elegant Graphics for Data Analysis (Springer, New York).
    .
  38. ↵
    1. Acharyya S,
    2. Banerjee A,
    3. Boley D
    (2013) Bregman Divergences and the Triangle Inequality. Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013 (Society for Industrial and Applied Mathematics Publications, Philadelphia), pp 476–484.
    .
  39. ↵
    1. Shi J,
    2. Malik J
    (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22:888–905.
    .
    OpenUrlCrossRef
  40. ↵
    1. Von Luxburg U
    (2007) A tutorial on spectral clustering. Stat Comput 17:395–416.
    .
    OpenUrlCrossRef
  41. ↵
    1. Bach FR,
    2. Jordan MI
    (2004) Learning spectral clustering. Advances in Neural Information Processing Systems (MIT Press, Cambridge, MA), pp 305–312.
    .
View Abstract
PreviousNext
Back to top
Article Alerts
Email Article

Thank you for your interest in spreading the word on PNAS.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Structure and information in spatial segregation
(Your Name) has sent you a message from PNAS
(Your Name) thought you would like to see the PNAS web site.
Citation Tools
Structure of spatial segregation
Philip S. Chodrow
Proceedings of the National Academy of Sciences Oct 2017, 114 (44) 11591-11596; DOI: 10.1073/pnas.1708201114

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Request Permissions
Share
Structure of spatial segregation
Philip S. Chodrow
Proceedings of the National Academy of Sciences Oct 2017, 114 (44) 11591-11596; DOI: 10.1073/pnas.1708201114
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Mendeley logo Mendeley
Proceedings of the National Academy of Sciences: 116 (7)
Current Issue

Submit

Sign up for Article Alerts

Jump to section

  • Article
    • Abstract
    • Learning the Structure of Segregation
    • Local Scale of Segregation
    • Discussion
    • Materials and Methods
    • 1. Visualization of Demographic Distributions
    • 2. Generality of Bregman Information
    • 3. Algorithms
    • 4. Sample Eigenvectors for Detroit
    • 5. Spectral Partitioning Without Hierarchical Postprocessing
    • 6. Manifold View of Local Scale
    • Acknowledgments
    • Footnotes
    • References
  • Figures & SI
  • Info & Metrics
  • PDF

You May Also be Interested in

Several aspects of the proposal, which aims to expand open access, require serious discussion and, in some cases, a rethink.
Opinion: “Plan S” falls short for society publishers—and for the researchers they serve
Several aspects of the proposal, which aims to expand open access, require serious discussion and, in some cases, a rethink.
Image credit: Dave Cutler (artist).
Several large or long-lived animals seem strangely resistant to developing cancer. Elucidating the reasons why could lead to promising cancer-fighting strategies in humans.
Core Concept: Solving Peto’s Paradox to better understand cancer
Several large or long-lived animals seem strangely resistant to developing cancer. Elucidating the reasons why could lead to promising cancer-fighting strategies in humans.
Image credit: Shutterstock.com/ronnybas frimages.
Featured Profile
PNAS Profile of NAS member and biochemist Hao Wu
 Nonmonogamous strawberry poison frog (Oophaga pumilio).  Image courtesy of Yusan Yang (University of Pittsburgh, Pittsburgh).
Putative signature of monogamy
A study suggests a putative gene-expression hallmark common to monogamous male vertebrates of some species, namely cichlid fishes, dendrobatid frogs, passeroid songbirds, common voles, and deer mice, and identifies 24 candidate genes potentially associated with monogamy.
Image courtesy of Yusan Yang (University of Pittsburgh, Pittsburgh).
Active lifestyles. Image courtesy of Pixabay/MabelAmber.
Meaningful life tied to healthy aging
Physical and social well-being in old age are linked to self-assessments of life worth, and a spectrum of behavioral, economic, health, and social variables may influence whether aging individuals believe they are leading meaningful lives.
Image courtesy of Pixabay/MabelAmber.

More Articles of This Classification

Physical Sciences

  • Deep elastic strain engineering of bandgap through machine learning
  • Single-molecule excitation–emission spectroscopy
  • Microscopic description of acid–base equilibrium
Show more

Applied Mathematics

  • Statistical dynamical model to predict extreme events and anomalous features in shallow water waves with abrupt depth change
  • Cellular interactions constrain tumor growth
  • Enhancing human learning via spaced repetition optimization
Show more

Social Sciences

  • Emergence of analogy from relation learning
  • Defining the economic scope for ecosystem-based fishery management
  • Social threat learning transfers to decision making in humans
Show more

Related Content

  • No related articles found.
  • Scopus
  • PubMed
  • Google Scholar

Cited by...

  • No citing articles found.
  • Google Scholar

Similar Articles

Site Logo
Powered by HighWire
  • Submit Manuscript
  • Twitter
  • Facebook
  • RSS Feeds
  • Email Alerts

Articles

  • Current Issue
  • Latest Articles
  • Archive

PNAS Portals

  • Classics
  • Front Matter
  • Teaching Resources
  • Anthropology
  • Chemistry
  • Physics
  • Sustainability Science

Information

  • Authors
  • Editorial Board
  • Reviewers
  • Press
  • Site Map

Feedback    Privacy/Legal

Copyright © 2019 National Academy of Sciences. Online ISSN 1091-6490