# Spatial patterning among savanna trees in high-resolution, spatially extensive data

^{a}Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06511;^{b}Center for Global Discovery and Conservation Science, Arizona State University, Tempe, AZ 85287;^{c}Department of Ocean Engineering, Texas A & M University, College Station, TX 77843;^{d}Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544;^{e}Scientific Services, South African National Parks, Skukuza 1350, South Africa;^{f}Centre for African Ecology, School of Animal, Plant and Environmental Sciences, University of the Witwatersrand, Johannesburg 2050, South Africa

See allHide authors and affiliations

Contributed by Ignacio Rodriguez-Iturbe, February 25, 2019 (sent for review November 13, 2018; reviewed by Dara Entekhabi and Ricardo Holdo)

## Significance

Understanding how environmental variation shapes vegetation distributions is crucial for predicting biosphere responses to global change. Savannas represent a major challenge, and repeated efforts have failed to constrain variation in savanna vegetation structure to a useful degree. Here, we find that trees cluster in savannas and that their clustering is governed by power laws, with parameters and geometries that are invariant to environment. This explains why savannas have resisted mechanistic characterization and offers a possible tool for constraining variability in vegetation structure.

## Abstract

In savannas, predicting how vegetation varies is a longstanding challenge. Spatial patterning in vegetation may structure that variability, mediated by spatial interactions, including competition and facilitation. Here, we use unique high-resolution, spatially extensive data of tree distributions in an African savanna, derived from airborne Light Detection and Ranging (LiDAR), to examine tree-clustering patterns. We show that tree cluster sizes were governed by power laws over two to three orders of magnitude in spatial scale and that the parameters on their distributions were invariant with respect to underlying environment. Concluding that some universal process governs spatial patterns in tree distributions may be premature. However, we can say that, although the tree layer may look unpredictable locally, at scales relevant to prediction in, e.g., global vegetation models, vegetation is instead strongly structured by regular statistical distributions.

Savannas are defined as having open tree canopies and a more-or-less continuous grass layer—a classification that differentiates savannas from closed-canopy forests (1) and perhaps even from pure grasslands (2), but which encompasses substantial variability in vegetation and ecosystem structure (3). Beyond this broad definition, variability in savanna vegetation structure has resisted mechanistic characterization and explanation, despite repeated analysis (4⇓⇓⇓–8), perhaps because savanna vegetation is the product of complex interactions and feedbacks between bottom-up constraints from climate and soils and top-down control from fire and herbivory. As such, savannas pose a challenge for predictive frameworks: diverse biosphere models consistently underpredict savanna distributions or even fail to capture savanna vegetation entirely. A more fundamental understanding of how savanna vegetation varies, and its drivers, will be critical for improving the representation of this globally widespread biome that covers at least 40% of the global tropics.

There may be mundane reasons that savanna structural variability has been difficult to constrain. One possibility is that existing efforts have examined vegetation structure at scales that are miniscule compared with the processes that shape it (3). These may include rare events (9, 10), which are difficult to sample adequately, or the problem may be more trivial, since field plots—often as small as fractions of hectares—may fail to statistically sample tree populations sufficiently. By the same token, larger-scale analyses (5, 11) often leverage optical remote sensing data, which cover relatively short time scales and are often insufficient for characterizing savannas (12). Scaling up, with high-quality data, will clearly be necessary for deepening our mechanistic understanding of savanna tree-layer variability (3).

Savannas may also vary for reasons less mundane than those related to undersampling. Extensive theoretical work (13, 14) and more limited empirical observations (15) suggest that variation in savannas may be more organized, and less haphazard, than plot-based studies suggest. While locally patchy, savannas can be highly regular. In special cases, patterns can repeat themselves at characteristic scales (16, 17); these patterns may result from the combination of local activation (e.g., facilitation via hydraulic redistribution; ref. 18) with longer-range inhibition (e.g., tree-tree competition; ref. 19). In other cases, or at least on other scales, patterns may be scale-free, best described by a power-law relationship between, e.g., the size of a patch of trees and its frequency (15, 20) of the form:*f* of clusters of size *x* decreases with *x* via the exponent α (known as the power-law exponent). Short-range facilitation without inhibition (a type of Yule process) could be a possible mechanism (21), but power laws can emerge from diverse processes such that inferring process from pattern is impossible without extremely careful parameterization, and is difficult even then (21). Either way, distributions at larger scales speak to a degree of predictability—albeit not at the plot level—that predictive approaches to savanna ecosystems have so far mostly ignored.

These power-law distributions may describe widespread tree-layer variation in savannas and other arid ecosystems (15, 20). However, empirical work examining power laws in data (not just in savannas) has come under scrutiny for a lack of statistical rigor in how power laws are estimated (22, 23), broadly invalidating results based on fitting linear regressions to statistical distributions. Data are also improving dramatically in both extent and quality; for instance, airborne Light Detection and Ranging (LiDAR) techniques can now provide highly accurate data on vegetation occurrence and structure at the submeter scale over large landscapes. Methodologically, these data represent a tangible improvement over optical approaches for differentiating trees from the grasses in between; LiDAR also allows for evaluations of power law occurrence over a range of spatial scales (24) that is unprecedented in savannas. Comprehensive reevaluation of claims about the ubiquity of power-law distributions in the savanna tree layer is necessary and now possible.

New high-resolution LiDAR data also permit more nuanced examinations of tree clusters. Theory suggests that scale-free patterns in vegetation structure should be reflected not just in power law distributions of cluster sizes, but also in the geometry of those clusters (23). In a classic example, the length of the coastline of an island classically scales as a power of its area, via an exponent that defines its fractal dimension; however, because coastlines are tortuous (rough), these fractals do not scale as strictly 2D objects (with dimensionality of 2) but rather as something between a line and an area (25).

Here, we reexamine the statistical distributions and geometries of tree patch sizes in Kruger National Park in South Africa (Fig. 1), asking whether distributions are best described by power laws, and how their parameters and geometries depend on underlying environmental variation in climate and soils. To do this, we used high-resolution (56 cm) airborne LiDAR-based estimates of tree distributions (trees > 3.5 m height) across 10 large landscapes (>6,000 ha each) in Kruger, using robust distribution fitting techniques to estimate distribution parameters (see *Methods and Materials* for more detail). Kruger spans significant orthogonal ecological gradients in rainfall (300–750 mm) and soils (clay-rich versus sandy) (6), originating from differences in parent material, which also give rise to strongly contrasting topography and river structure, as well as contrasting fire and herbivory regimes.

## Results and Discussion

We found that power laws were widespread, in some cases over two or even three orders of magnitude in tree clump size (Fig. 2*A*). Across all landscapes, the tree cluster size above which power laws adequately fit the fat tail of the distribution (defined as *x*_{min}) is significantly related to independent, field-based estimates of the maximum canopy area of large trees (Fig. 2*B*; *R*^{2} = 0.630, *n* = 10, *P* = 0.0062), suggesting that, intuitively, power-law tree clustering can only occur above the size of the individual tree. Both tree size (*R*^{2} = 0.622, *n* = 10, *P* = 0.0067) and *x*_{min} (*R*^{2} = 0.552, *n* = 10, *P =* 0.013) increased significantly with rainfall (Fig. 3). Together, these findings substantially corroborate the idea that tree-clustering distributions are predictable, if not at small scales then at larger ones, and that universal statistical distributions (power laws) may be useful for describing the clustering of trees in savanna systems.

What is much more surprising is the unexpected consistency in the distribution of large tree clumps across landscapes in Kruger. Once we controlled for the smallest meaningful cluster size (*x*_{min}), power law slopes were relatively consistent across landscapes, showing no systematic variation with respect to rainfall (*t*_{10} = 0.13, *P* = 0.90), landform (*t*_{10} = 1.19, *P* = 0.30; Fig. 2*C*), fire frequency (*t*_{10} = −0.31, *P* = 0.77), or mean distance to river (*t*_{10} = −0.48, *P* = 0.66), although distributions were impossible to estimate with confidence (Fig. 2*C*) on two landscapes with low tree cover (0.1% and 1.5%) and should not be considered conclusive with respect to geology. Excluding these, the mean power-law α across landscapes was 2.72 ± 0.16 (or equivalently 1.72 ± 0.16 for the power-law exponent of the inverse cumulative distribution function), showing dramatically less variation in tree cluster size distributions over gradients in rainfall—in high resolution, spatially extensive data—than previous studies have described (15, 20). Excluding clumps that were close to rivers from the analysis did not change estimates of α, which further suggests that rivers were not primarily responsible for observed tree clustering patterns (excluding clumps within 500 m, 1 km, and 5 km yield estimates of α = 2.71 or 2.72).

Perimeter-area relationships were also predictable across landscapes, included those with lower tree cover (Fig. 4). Perimeter-area methods have been criticized as a method for characterizing scale-free distributions, because approaches are based on strong a priori assumptions about self-similarity, and are subject to computational issues and interactions with scales of analysis (23). Here, we found that patch geometry seemed to scale consistently within landscapes (Fig. 4*A*) in a way that did not depend on the scale of analysis [a frequent limitation of analyses of this type (23); Fig. 5]. Unlike patch-size distribution parameters, however, fractal dimension changed with rainfall (Fig. 4*B*), with fuller patches at higher rainfall and more linear ones at lower rainfall (*R*^{2} = 0.586, *n* = 10, *P* = 0.0099; see Table 1). Again, changes were small but may reflect increases in tree size with increasing rainfall or may alternatively suggest trends in tree clustering across a broader rainfall gradient. Regardless, tree-cluster geometry may be more accessible at typical scales of analysis (23) and more sensitive to underlying environment than tree cluster size distributions at large scales (Fig. 2), which are not often available, or direct plot-based assessments of tree density at smaller scales, which yield poor predictive ability (3, 4, 6).

From this analysis, tree clusters and their distributions look more consistent across environments than previously thought. Theory gives us no real reason to suppose this should be the case; while generating power laws in tree cluster sizes is not difficult, most mechanistic models yield distributions with parameters that vary with respect to environment (15, 26). One possibility could be that tree cluster distributions are governed by some universality class; another could be that tree clustering is governed by some predictable but hitherto unappreciated aspect of tree physiology and its scaling properties. Either would represent a major breakthrough in our understanding of savanna vegetation heterogeneity.

Before such radical conclusions can be supported, however, a next line of future inquiry should be to broaden the scope of the current analysis to include more diverse environmental variation. Although Kruger encompasses broad variation in soils and a reasonable rainfall gradient (6), landscapes here range in tree cover (with height > 3.5 m) from near zero only up to ∼13% and in rainfall from a minimum of 300 mm to 750 mm rainfall, and as such are not representative of the full range of possible savanna variation (7), especially among continents across which tree architecture may differ (27). This will require comprehensive large-scale, high-quality data, and may not be possible to the exceptional degree presented here, but establishing whether cluster size distributions are in fact invariant and accurately estimating their parameters will depend on a broader geographical approach.

What we can say with confidence is that, overall, the tree layer in savannas is far more predictable than smaller-scale plot analyses have suggested. While the tree layer looks unpredictable locally (3, 6, 8), it is instead strongly structured by regular statistical distributions at scales relevant to prediction. Savanna ecologists should aim for mechanistic understanding, both of invariant distributions (if indeed they are), and of parameters that vary smoothly with environment (here, individual tree size). Meanwhile, global models, which often operate assuming no spatial structure either within or among simulation units (28, 29), should consider how to leverage emergent distributions to improve predictions of vegetation distributions and carbon cycles in savannas.

## Methods and Materials

Tree clusters were mapped across 10 landscapes in Kruger National Park, South Africa, in April 2012 using the Carnegie Airborne Observatory (CAO)–AToMS System (30). AToMS includes a waveform LiDAR scanner with integrated Global Positioning System-Inertial Measurement Unit (GPS-IMU), which returned 3D positioning and altitude data for the sensor onboard the aircraft, allowing for highly precise and accurate projection of laser ranging measurements on the ground. The GPS-IMU data were combined with the laser ranging data to determine the 3D location of each laser return. LiDAR data were collected from 2,000 m above ground level at 50 kHz laser pulse repetition rate, a 17° half-scan angle and 50% overlap between adjacent flight lines, resulting in LiDAR measurements with 56-cm laser spot spacing. These raw LiDAR returns were then used to derive top-of-canopy and ground digital elevation models (DEMs) (31). Woody canopy height was estimated as the difference between top-of-canopy and ground DEMs.

This woody canopy height map was then thresholded at 3.5 m to yield a presence/absence map of trees; the threshold of 3.5 m was chosen to conservatively accommodate LiDAR accuracy (which erodes strongly below 2 m in height) but also has ecological meaning in savannas, since fires and most herbivores have much lighter impacts on savanna trees above this height. These maps were analyzed using the package “raster” in R (version 3.2.2), with tree clusters identified via the “clump” algorithm with a Queen (Moore) neighborhood. The algorithm is more robust to gaps in tree canopies than the alternative Rook (von Neumann) neighborhood would be. Having identified clusters and estimated their perimeter and area, we then proceeded to fit power law distributions with the package “poweRlaw” (22). This package fits the inverse cumulative distribution function *F*(*x*), defined as:

rather than fitting either a statistical distribution to the frequency distribution or, worse, a linear model to binned data, which has been a common approach historically; this approach has been shown theoretically and via testing on distributions with known parameters to yield more robust estimates of power-law distribution parameters (22). This package also extracts an estimate for the minimum cluster size above which a fat-tailed distribution fits data (*x*_{min}); note too that, here *x* and α correspond to those quantities defined in the main text. Values of α and *x*_{min} were estimated by minimizing the Kolmogorov–Smirnov statistic. To evaluate sensitivity of parameter estimates to samples and sample sizes, we also bootstrapped (100x) parameter estimates to derive 95% confidence intervals.

Perimeter (*P*):area (*A*) relationships were evaluated as the linear relationship between log-transformed perimeter and log-transformed area, yielding an estimate of slope β and the overall relationship

The fractal dimension of a set of self-similar clusters (25) is given by *x*_{min} estimated for each landscape were included for estimating this relationship.

To evaluate the effects of scale of analysis on estimates of power-law and P:A relationships (Fig. 5), we aggregated rasters from their original resolution by a factor of 1–8, taking average tree cover within each new pixel and defining presence when proportional tree cover was greater than or equal to 0.5. We found that no parameter estimate depended strongly on the scale of analysis, suggesting that results presented in the main text are robust.

Finally, we have used a dataset maintained by Kruger National Park management (the Veld Condition Assessment dataset; see ref. 6 for a complete analysis and description of these data) for ground-based estimates of individual tree canopy area, which were used to compare against estimates of the minimum cluster size above which power-law fits apply (i.e., are minimum cluster sizes related to tree size?) (Datasets S1 and S2). In 2008, vegetation structure data were collected at 457 sites throughout Kruger. At each site, tree size (including diameter, height, and maximum canopy diameter) and species were collected for all trees with height > 3 m within a radius of 5 m of eight points distributed evenly within a 50 m × 60 m plot. For the purposes of this analysis, we then selected the tree with the largest canopy diameter at each plot. To maximize data use, estimates of canopy diameter for each landscape were estimated as mean canopy diameter for all VCA plots within 50 km of the centroid of each large landscape (larger than the landscapes included here, but a better balance given sparse field sampling). Canopy area was determined as *A* = π*r*^{2}. We are clearly overestimating canopy area, however, given that field measurements recorded only maximum canopy diameter; tree size trends with rainfall would thus benefit from further, careful direct examination.

Variation in tree cluster parameters with respect to environmental variables was evaluated using independent maps maintained by Kruger National Park Scientific Services of underlying parent geologic material, mean annual rainfall, and permanent river distributions. Rainfall maps are based on data collected at 22 weather stations that have been continuously monitored since 1989 throughout Kruger (see ref. 6 for more information). Fire frequency estimates are also derived from management-maintained maps; individual fires were mapped on the ground by park managers from 1941 through 2000; subsequent fire scar mapping was done from satellite imagery derived from NASA’s Moderate Resolution Imaging Spectroradiometer instrument.

## Acknowledgments

We thank Juan Bonachela and Sally Archibald for helpful discussion of this work. This collaboration was supported by a grant from the Andrew W. Mellon Foundation, National Science Foundation Division of Mathematical Sciences Grants 1615531 (to A.C.S.) and 1615585 (to S.A.L.), and National Science Foundation Macrosystems Biology Grant 1802453 (to A.C.S.). Airborne LiDAR data collection and processing were made possible by grants from the Andrew W. Mellon Foundation (to G.P.A.). The Carnegie Airborne Observatory has been made possible by grants and donations to G.P.A. from the Avatar Alliance Foundation, Margaret A. Cargill Foundation, David and Lucile Packard Foundation, Gordon and Betty Moore Foundation, Grantham Foundation for the Protection of the Environment, W. M. Keck Foundation, John D. and Catherine T. MacArthur Foundation, Mary Anne Nyburg Baker and G. Leonard Baker Jr., and William R. Hearst III.

## Footnotes

- ↵
^{1}To whom correspondence may be addressed. Email: carla.staver{at}yale.edu or irodriguez{at}ocen.tamu.edu.

Author contributions: A.C.S., G.P.A., I.R.-I., S.A.L., and I.P.J.S. designed research; A.C.S. performed research; G.P.A. contributed new reagents/analytic tools; A.C.S. and G.P.A. analyzed data; and A.C.S., G.P.A., I.R.-I., S.A.L., and I.P.J.S. wrote the paper.

Reviewers: D.E., Massachusetts Institute of Technology; and R.H., University of Georgia.

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1819391116/-/DCSupplemental.

Published under the PNAS license.

## References

- ↵
- Staver AC,
- Archibald S,
- Levin SA

- ↵
- ↵
- Staver AC

- ↵
- ↵
- Sankaran M,
- Ratnam J,
- Hanan N

- ↵
- Staver AC,
- Botha J,
- Hedin L

- ↵
- Lehmann CER, et al

- ↵
- Colgan MS,
- Asner GP

- ↵
- ↵
- Gillson L

- ↵
- Bucini G,
- Hanan NP

- ↵
- Staver AC,
- Hansen MC

- ↵
- Tarnita CE, et al

- ↵
- Rietkerk M,
- Dekker SC,
- de Ruiter PC,
- van de Koppel J

- ↵
- ↵
- ↵
- Bonachela JA, et al

- ↵
- Scholz FG, et al

- ↵
- Dohn J,
- Augustine DJ,
- Hanan NP,
- Ratnam J,
- Sankaran M

- ↵
- Berdugo M,
- Kefi S,
- Soliveres S,
- Maestre FT

- ↵
- ↵
- Clauset A,
- Shalizi CR,
- Newman MEJ

- ↵
- ↵
- ↵
- ↵
- ↵
- Moncrieff GR, et al

- ↵
- Fisher RA, et al

- ↵
- ↵
- Asner GP, et al

- ↵

## Citation Manager Formats

## Article Classifications

- Physical Sciences
- Environmental Sciences

- Biological Sciences
- Ecology