# Fast flow-based algorithm for creating density-equalizing map projections

See allHide authors and affiliations

Edited by Michael F. Goodchild, University of California, Santa Barbara, CA, and approved January 11, 2018 (received for review July 16, 2017)

## Significance

Geographic maps are a popular means to visualize spatial statistics. Conventionally, each map region is displayed with an area proportional to the actual land area. But equal-area maps can grossly misrepresent demographic data: Densely populated cities should be given more prominence than large, but sparsely populated territories. Cartograms solve this problem by rescaling map regions in proportion to, for example, population or gross domestic products. Until now, it has generally been cumbersome or slow to calculate map projections for contiguous cartograms. Here we describe and benchmark a fast flow-based algorithm that computes cartograms in a matter of seconds, yet maintains the strengths of previous methods—a development which may lead to a more widespread adoption of cartograms.

## Abstract

Cartograms are maps that rescale geographic regions (e.g., countries, districts) such that their areas are proportional to quantitative demographic data (e.g., population size, gross domestic product). Unlike conventional bar or pie charts, cartograms can represent correctly which regions share common borders, resulting in insightful visualizations that can be the basis for further spatial statistical analysis. Computer programs can assist data scientists in preparing cartograms, but developing an algorithm that can quickly transform every coordinate on the map (including points that are not exactly on a border) while generating recognizable images has remained a challenge. Methods that translate the cartographic deformations into physics-inspired equations of motion have become popular, but solving these equations with sufficient accuracy can still take several minutes on current hardware. Here we introduce a flow-based algorithm whose equations of motion are numerically easier to solve compared with previous methods. The equations allow straightforward parallelization so that the calculation takes only a few seconds even for complex and detailed input. Despite the speedup, the proposed algorithm still keeps the advantages of previous techniques: With comparable quantitative measures of shape distortion, it accurately scales all areas, correctly fits the regions together, and generates a map projection for every point. We demonstrate the use of our algorithm with applications to the 2016 US election results, the gross domestic products of Indian states and Chinese provinces, and the spatial distribution of deaths in the London borough of Kensington and Chelsea between 2011 and 2014.

Aguideline for displaying statistical data in a diagram is the “area principle”: Each part of the diagram should have an area in proportion to the number it represents (1). For many categorical data, a bar chart is a simple visualization method that satisfies the area principle. For example, if our data are the electors that voted for the US president in December 2016, we can categorize the electors by US state. Every bar in Fig. 1, *Top* corresponds to a state that sent at least one Republican elector to the Electoral College. In Fig. 1, *Bottom*, the bars show the states with Democratic electors. The colors chosen for the bars are the traditional red for Republicans and blue for Democrats. Because the bar chart satisfies the area principle, the election is won by the color that occupies more area, which is evidently red in this example.*

Although a bar chart is often a suitable visualization tool, it cannot reveal the spatial pattern behind the data. The bar chart in Fig. 1 lacks the information where the states are located: Neighboring bars do not necessarily correspond to states that are geographic neighbors. If we want to visualize how the states fit together in real space, we need a different approach. The common alternative is to show a map such as Fig. 2*A*, where we use an Albers equal-area conic projection for the contiguous United States to produce their familiar geographic outline. We add Alaska and Hawaii, suitably rescaled, below the map of the contiguous United States. Each state is colored with either red or blue, depending on the party affiliation of the electors.^{†}

The map in Fig. 2*A* accurately shows the relative area and position of each state. However, it does not obey the area principle of statistics. For example, Montana (abbreviated MT in Fig. 2*A*) covers more than 2,000 times the area of Washington, DC, but both regions have the same number of electors. On aggregate, Republican electors won 74% of the US area in square kilometers, but had only 57% of the vote share in the Electoral College. So, Fig. 2*A* has the opposite problem of Fig. 1 where we satisfied the statistical area principle, but conveyed no information about the states’ locations. One might suspect that showing the locations and simultaneously satisfying the area principle are as impossible as squaring the circle. Fortunately, however, there is a visualization method, known as a cartogram, that can tackle this challenge (2, 3).

After a brief review and classification of cartograms, we introduce a technique that produces cartograms of a quality comparable to the most popular technique currently in use: the diffusion cartogram (4). The technique proposed in this article solves a completely different set of equations so that the computation can finish within a fraction of the previously needed time. We benchmark our algorithm with data from the United States, India, China, and the London borough of Kensington and Chelsea to demonstrate that our method accurately satisfies the area principle and generates visually pleasing cartograms.

## Classification of Cartogram Methods

In a cartogram, regions are deformed such that their areas are equal to statistical data such as population, votes in an election, or gross domestic product. An example, showing the Electoral College on a cartogram, is the diagram in Fig. 2*B* which we adapted from Wikipedia (https://commons.wikimedia.org/wiki/File:Cartogram%E2%80%942012_Electoral_Vote.svg). Similar cartograms have been shown in the news media (5, 6). Here each elector is represented by a small square. The squares are then positioned with two objectives in mind. First, the shapes on the cartogram should resemble those on the map in Fig. 2*A*. Second, the set of neighboring states in Fig. 2 *A* and *B* should be the same. Satisfying both objectives is not trivial. A careful comparison with Fig. 2*A* shows that, for example, Arizona (AZ) and Texas (TX) incorrectly appear as neighbors in Fig. 2*B*. On the other hand, the geographic neighbors Colorado (CO) and Nebraska (NE) have been separated in Fig. 2*B* to make space for other states in the vicinity.

For certain applications, it is perfectly acceptable that neighboring states are split apart. As long as the areas of the states are proportional to the number of electors, such representations are called noncontiguous cartograms (7). Dorling’s circular cartograms are good examples of noncontiguous cartograms that, while not strictly maintaining the topology, indicate where the represented regions are located (8). Contiguous cartograms, by contrast, not only rescale the regions, but also keep the topology intact (i.e., neighbors on the map are neighbors on the cartogram and vice versa).

The methods that have been proposed for generating contiguous cartograms fall into two distinct categories. The first group consists of algorithms that operate only on the boundaries of regions (9⇓⇓⇓⇓⇓⇓⇓–17). Each region is represented by one or multiple polygons. The input to these algorithms is a finite number of polygon corners *B*) we would not be able to uniquely locate a state capital such as Austin, TX, because it is far from any state border. One might symbolically place all capitals at the centroid of the corresponding polygon, but some centroids might be outside the polygon if it is concave or contains holes (e.g., lakes or enclaves). The situation is even more complicated if we want to represent multiple distinct points or lines (e.g., rivers or roads) inside a state as distinct objects on a boundaries-only cartogram.

The second group of contiguous cartogram algorithms approaches the problem from a different point of view by producing a continuous transformation 𝐓 for the entire continuous set of longitudes and latitudes on the input map, including coordinates that are not on a boundary (4, 18⇓⇓⇓–22). We refer to this group as “all-coordinates” algorithms. Generating the map projection 𝐓 for all longitudes and latitudes can be computationally more demanding than shifting only the boundary coordinates. In fact, for applications where only the boundaries are of interest—as is the case for the US election map—the boundaries-only algorithms can give adequate results. However, the run time of these discrete algorithms typically increases steeply with the number of corners. As a result, they often rely on coarse-grained input to gain speed, for example by removing Michigan’s Upper Peninsula from the US map (12, 13, 16). If we wish to show data that are resolved at a scale much finer than the polygons to be displayed [e.g., graticules for a fine, spatially regular grid (23) or individual addresses], the all-coordinates algorithms usually outpace their boundaries-only counterparts.

In this article, we describe an all-coordinates algorithm that needs only a few seconds to produce the complete projection 𝐓 for realistic input. Knowing 𝐓 will allow us to show the positions of all US state capitals with respect to the states’ boundaries (Fig. 3*B*) and the coordinates of individual death cases in London (Fig. 4 *B* and *C*).

## Previous All-Coordinates Methods to Produce a Cartogram Projection

For the sake of concreteness, let us assume that we want to make a cartogram whose areas are proportional to the population. We define the population density as the function

An accurate cartogram must project the rectangle

Eq. **1** alone does not uniquely specify 𝐓 because it is only one single equation for the two unknowns **1** (19, 22). Although such mechanical metaphors make intuitive sense, there is no direct physical connection between force and area. Therefore, it is not immediately obvious how the forces should be chosen as functions of **1** is valid. Some methods treat the term “force” in a less literal sense so that the area constraints are more explicitly part of the equations (9, 11). However, these algorithms must take special care to avoid topological errors (e.g., regions that are flipped or boundaries that intersect themselves) during the relaxation of the forces. Another method, based on neural networks, starts by placing sample points on a regular grid (26). During the training of the network, the samples are attracted toward regions of high density to mimic the population distribution. Their final positions define a mapping which can produce a cartogram by considering its inverse. However, a large number of sample points are necessary to produce smooth boundaries.

An alternative physical metaphor is to view the process that generates the cartogram as the flow of a fluid. In this analogy, we think of the map as a Petri dish covered with a thin layer of water. In an experiment, we would model the population density

The most familiar process that equilibrates the density is Brownian motion. On a macroscopic scale, the Fokker–Planck equation that describes Brownian motion is Fick’s second law, *A*, where we show the US Electoral College results. The diffusion algorithm guarantees that, unlike in Fig. 2*B*, each state keeps its neighbors while still reaching the target areas to any desired level of accuracy. The diffusion cartogram distorts the shapes of the states, which is inevitable for any contiguous cartogram method. The shapes are, however, still recognizable; this is one of the reasons why diffusion cartograms have become popular in the past decade (3). Another reason is that, despite the apparent complexity of the equations, they can be computed relatively efficiently.

However, Fickian diffusion is only one of many types of fluid dynamic rules that make particle densities equal everywhere. As we argue now, there is an alternative that is computationally more efficient while producing cartograms of comparable quality.

## Flow-Based Cartogram with Linear Equalization

In a flow-based cartogram, the population density ρ is treated not only as a function of position *SI Appendix*, section 2, we explain in more detail why 𝐓 is density equalizing.

We can satisfy the continuity equation while simultaneously demanding Fick’s law **2** shows that the evolution of ρ is then governed by the heat equation *SI Appendix*, section 2) so that one is left wondering whether other vortex-free, mass-conserving processes might also be suitable for generating cartograms. As we now argue, if we replace the heat equation by a linear equalization of the density toward the mean,**4** so that the resulting transformation 𝐓 satisfies Eq. **1** (24). We derive the concrete formulas for 𝐯 in *SI Appendix*, section 2, and give only a brief summary here.

After an affine transformation of all coordinates, we place the mapped area inside a rectangular box with bounding coordinates

Eqs. **5**–**7** look superficially similar to the corresponding equations in the diffusion-based cartogram (4), but there are two important differences. First, neither the sums in Eqs. **5** and **6** nor the integral in Eq. **7** depend on t so that the Fourier transforms need to be computed only once at the beginning of the calculation. Second, after we have computed the Fourier transforms, here we require only quick arithmetic operations: addition, subtraction, multiplication, and division. For a diffusion cartogram, by contrast, we must repeatedly calculate time-dependent Fourier transforms and evaluate the exponential function during the integration of Eq. **3** (*SI Appendix*, section 2). The speed of computing the exponential function depends on details of the implementation and hardware, but is in general much slower than addition, subtraction, multiplication, or division (27).

These mathematical differences alone already cut the time needed per integration step by more than half. Another simplification compared with the diffusion cartogram is that we need to integrate Eq. **3** only until

We overlay the map with an **5**–**7** at the start of the calculation with the fast Fourier transform algorithm (28). We have found that the time needed for this one-time procedure is a negligible fraction of the total run time. After storing the **3** for nongrid positions 𝐫 by interpolating between the grid points. We numerically approximate the integral using a predictor–corrector method that automatically adapts the size of the next time step. During each step, we distribute the integration of the

## Benchmarking the Algorithm with Data for the United States, India, and China

We have implemented the algorithm based on Eqs. **4**–**7** as a C program. In this section, we illustrate its performance with three case studies: the 2016 vote in the US Electoral College (Fig. 3*B*), the distribution of India’s gross domestic product (GDP) by state (Fig. 5), and mainland China’s and Taiwan’s GDP by province (Fig. 6).

In each case, we first project the longitudes and latitudes of the territorial borders with an Albers equal-area conic projection onto a flat 2D space. As described above, we embed the resulting map (Figs. 2*A*, 5*A*, and 6*A*, respectively) inside an

For the discrete Fourier transforms, we divide the large rectangular box into a grid of ^{‡}

When numerically integrating Eq. **3**, the choice of time steps determines how accurately we estimate *B*) and *B*), respectively. These differences are certainly so small that they cannot be detected by eye. We generally set a maximum relative area error of

This level of accuracy is all the more remarkable when considering the speed of our implementation. On a Dell Precision T7810 workstation with a 12-core Intel Xeon E5-2680V3 processor and an Ubuntu 16.04.2 operating system, we need 1.5 s for the US Electoral College cartogram (Fig. 3*B*), 2.6 s for the India GDP cartogram (Fig. 5*B*), and 2.7 s for the China GDP cartogram (Fig. 6*B*). Compared with the diffusion algorithm, which needs 59.5 s to generate the US cartogram (Fig. 3*A*) with equal accuracy, this is a speedup by roughly a factor 40. Among other all-coordinates cartogram algorithms, only the rubber-sheet method Carto3F (22) can achieve comparable speed, but not for all types of input. For a cartogram of Chinese provinces, Carto3F needs 8 min of computer time. Our fast flow-based method achieves smaller area errors in a fraction of this time.

## Benchmarking with Data for Mortality in Kensington and Chelsea (London) 2011–2014

As noted above, cartogram algorithms that generate the complete density-equalizing projection 𝐓 are particularly advantageous when displaying demographic data that are individual points on a map. We now demonstrate how our algorithm can be applied to such input and how we can use it to compare different statistical models. The data also serve as another benchmark for the speed of our method. Our example involves the locations of all 3,197 death cases in the London borough of Kensington and Chelsea between the years 2011 and 2014. The database from the United Kingdom’s Office for National Statistics (ONS) (30) lists the number of deaths in each of London’s 4,835 Lower Layer Super Output Areas (LSOAs). A total of 103 LSOAs are located in Kensington and Chelsea. We show the density of death cases in this borough on an equal-area map in Fig. 4*A*. Each death corresponds to one point on the map placed at a random position inside the LSOA where it occurred.

The point pattern on the equal-area map is spatially heterogeneous with two bands of high density, one in the south and another in the north, separated by a band of lower density in the middle. However, it remains unclear from the equal-area map whether the differences in the spatial distribution of death cases are caused by differences in per-capita mortality or by a heterogeneous population density. We can distinguish between these two effects by projecting the death cases to a cartogram where each LSOA area is proportional to the number of inhabitants (Fig. 4*B*).

The most striking feature on this cartogram is the high per-capita mortality in the southeast corner of the borough. The reason for the high number of death cases in the LSOA with the ONS code “Kensington and Chelsea 018C” is a large proportion of elderly, most likely because of the St. Wilfrid’s nursing home located in this LSOA. Because mortality increases markedly as a person becomes elderly, total population is too crude a measure to predict death rates. We now show how to improve the prediction by using each LSOA’s age-adjusted mortality as the basis of a cartogram instead of the simple per-capita mortality displayed in Fig. 4*B*.

Data from the ONS (30, 31) include population size and death cases in the following age groups for each LSOA: 0 y old, 1–4 y old, 5–9 y old, …, 85–89 y old, and

In Fig. 4*C*, we show a cartogram with LSOA areas proportional to *C*, indicating that the data are inconsistent with a homogeneous Poisson process.

We show a kernel density estimate of the underlying probability distribution in Fig. 4*D*. We use a bivariate normal kernel with a bandwidth chosen according to ref. 34. Fig. 4*D* reveals a minimum in the age-adjusted death rate in the east of the borough and a maximum in the north. Previous studies have argued that indicators of health (e.g., life expectancy) in different parts of London are positively correlated to average household income (35). A choropleth map of deprivation in Kensington and Chelsea (36) does indeed follow a strikingly similar regional pattern to that of the death rate in Fig. 4*D*.

The flow-based method of Eqs. **4**–**7** calculates the cartograms in Fig. 4 *B* and *C* in 1.6 s and 1.9 s, respectively. To avoid boundary effects in Fig. 4*D*, we also include data for Kensington and Chelsea’s neighboring boroughs when computing the cartograms and the kernel density estimate. The equivalent calculations with the diffusion-based method take 69.9 s and 99.5 s, respectively.

## Measures of Distortion

Our algorithm not only is accurate and fast, but also generates cartograms whose visual appearance is on par with those of previous methods. In Fig. 3 we directly compare the diffusion cartogram of the United States (Fig. 3*A*) with the faster method based on Eqs. **4**–**7** (Fig. 3*B*). The border between Illinois (IL) and Indiana (IN) is straighter in Fig. 3*A* than in Fig. 3*B* and thus more similar to the input map (Fig. 2*A*). On the other hand, the border between New Mexico (NM) and Colorado (CO) is straighter and Oklahoma’s (OK) panhandle less bent in Fig. 3*B*. Overall, however, the differences between both cartograms are only subtle.

Because visual appearance is not a fully satisfactory criterion, we now turn to quantitative measures of distortion. One way to compare the local distortion of different projections is by analyzing the Tissot indicatrix that is constructed as follows. Suppose we draw an infinitesimal circle at the coordinates *SI Appendix*, Fig. S2, we show concrete examples of Tissot indicatrices for our benchmarking examples. We denote the semimajor and -minor axes of the Tissot indicatrix by *SI Appendix*, section 3, except in a few special cases there cannot be a conformal density-equalizing projection (20). As a global measure for the deviation of a cartogram from conformality, we can use, for example, either the spatially averaged or the maximum local distortion error,

When computing e and *SI Appendix*, section 4. Briefly, the aspect ratio of a region is the ratio of the larger to the smaller side length of the bounding rectangle with minimum area, minimized over all possible rotations with respect to the coordinate axes. The Hamming distance between two polygons is the area lying within exactly one of them (40). For the measurement in Table 1, we rescale each polygon on the input map and the corresponding polygon on the cartogram so that they have equal areas. We then calculate the minimum Hamming distance between these two polygons by shifting one polygon with respect to the other. We define δ as the sum of the minimum Hamming distances, where the summation is over all corresponding pairs of polygons. For the relative position error, we compute the angle between the line connecting the centroids of two polygons on the input map and the line that connects the centroids of the corresponding two polygons on the cartogram. We obtain θ by averaging over all pairs of polygons (41).

Most measures listed in Table 1 exhibit only small relative differences in the range of a few percent between the diffusion and fast flow-based methods. Diffusion performs a little better in the majority of examples and measures, but there are also cases where the fast flow-based method produces a smaller error. Considering the vastly different run times, the fast flow-based method is the better solution as a general-purpose algorithm for interactive applets.

## Conclusion

The scientific value of cartograms can go far beyond providing mere entertainment, shock, or amusement. As “isodemographic maps” they have been used for mapping diseases and mortality for several decades (42⇓–44) to improve health services (45). Arguably, the technical challenge of computing the map projection has so far prevented more widespread use. We accompany this article with C code available at https://github.com/Flow-Based-Cartograms/go_cart to alleviate some of the challenge. The code optionally produces the graticule of the inverse transformation so that features found on the cartogram can be identified in the original domain. We reconstruct the original positions by first approximating 𝐓 as a piecewise linear function and then computing its inverse. The speed of the cartogram algorithm depends on the number of processing units available to the user. If the calculation runs on a multicore server, users will be able to take full advantage of the parallelized code at no cost. We hope that in this form the algorithm will be accessible to a wider audience.

## Acknowledgments

This research was supported by the European Commission (Project FP7-PEOPLE-2012-IEF 6-4564/2013).

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. Email: michael.gastner{at}yale-nus.edu.sg.

Author contributions: M.T.G. designed research; M.T.G., V.S., and P.M. performed research; M.T.G. analyzed data; and M.T.G., V.S., and P.M. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

↵*Because of peculiarities in the US electoral system, the Electoral College is not an exact representation of the proportion of votes cast by the US population at large. The Republican candidate Donald Trump was elected as US president despite losing the popular vote to the Democratic candidate Hillary Clinton. We show a cartogram of the popular vote distribution in

*SI Appendix*, section 1.↵

^{†}The only exception is Maine which applies the “congressional district method”: Although the majority in Maine voted for the Democratic candidate Hillary Clinton, the Republican candidate Donald Trump still gained one electoral vote for winning the second congressional district (abbreviated as ME2 in Fig. 2*A*).↵

^{‡}We exclude the island territory of Lakshadweep from the maps of India because it is so small that it is neither visible on an equal-area map nor on a GDP cartogram.This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1712674115/-/DCSupplemental.

- Copyright © 2018 the Author(s). Published by PNAS.

This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).

## References

- ↵
- de Veaux RD,
- Velleman PF,
- Bock DE

- ↵
- ↵
- Nusrat S,
- Kobourov S

- ↵
- Gastner MT,
- Newman MEJ

- ↵
- Gamio L

*The Washington Post*, Nov 1, 2016. Available at https://www.washingtonpost.com/graphics/politics/2016-election/how-election-maps-lie/. Accessed April 25, 2017. - ↵
- Parlapiano A

*The New York Times*, Nov 1, 2016. Available at https://www.nytimes.com/interactive/2016/11/01/upshot/many-ways-to-map-election-results.html. - ↵
- ↵
- Dorling D

- ↵
- ↵
- Merrill DW,
- Selvin S,
- Mohr MS

- ↵
- House DH,
- Kocmoud CJ

- ↵
- ↵
- ↵
- Inoue R,
- Shimizu E

- ↵
- Kämper JH,
- Kobourov SG,
- Nöllenburg M

*arXiv*:1112.4626. - ↵
- Cano RG, et al.

- ↵
- Daya Sagar BS

- ↵
- ↵
- Cauvin C,
- Schneider C

- ↵
- ↵
- Edelsbrunner H,
- Waupotitsch R

- ↵
- Sun S

- ↵
- Hennig B

- ↵
- Dacorogna B,
- Moser J

- ↵
- Avinyó A,
- Solà-Morales J,
- València M

- ↵
- ↵
- Traub JF

- Brent RP

- ↵
- Frigo M,
- Johnson SG

- ↵
- Statistics Times

- ↵
- Office for National Statistics

- ↵
- Office for National Statistics

- ↵
- Lilienfeld DE,
- Stolley PD

- ↵
- ↵
- Venables WN,
- Ripley BD

- ↵
- Dorling D

- ↵
- The Economist

*The Economist*, June 24, 2017. Available at https://www.economist.com/news/britain/21723839-grenfell-tower-fire-has-become-stark-reminder-glaring-gap-between-rich-and-poor-even). Accessed December 23, 2017. - ↵
- Papadopoulos A

- ↵
- Snyder JP

- ↵
- Alam MJ,
- Kobourov SG,
- Veeramoni S

- ↵
- Skiena SS

- ↵
- Heilmann R,
- Keim DA,
- Panse C,
- Sips M

- ↵
- Selvin S, et al.

- ↵
- Armitage P,
- Colton T

- Dorling D

- ↵
- Wieland SC,
- Brownstein JS,
- Berger B,
- Mandl KD

- ↵
- Lovett DA, et al.

## Citation Manager Formats

## Article Classifications

- Physical Sciences
- Applied Mathematics

## Sign up for Article Alerts

## Jump to section

- Article
- Abstract
- Classification of Cartogram Methods
- Previous All-Coordinates Methods to Produce a Cartogram Projection
- Flow-Based Cartogram with Linear Equalization
- Benchmarking the Algorithm with Data for the United States, India, and China
- Benchmarking with Data for Mortality in Kensington and Chelsea (London) 2011–2014
- Measures of Distortion
- Conclusion
- Acknowledgments
- Footnotes
- References

- Figures & SI
- Info & Metrics