New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Species abundance distribution results from a spatial analogy of central limit theorem

Edited by James H. Brown, University of New Mexico, Albuquerque, NM, and approved March 2, 2009 (received for review October 8, 2008)
Abstract
The frequency distribution of species abundances [the species abundance distribution (SAD)] is considered to be a fundamental characteristic of community structure. It is almost invariably strongly rightskewed, with most species being rare. There has been much debate as to its exact properties and the processes from which it results. Here, we contend that an SAD for a study plot must be viewed as spliced from the SADs of many smaller nonoverlapping subplots covering that plot. We show that this splicing, if applied repeatedly to produce subplots of progressively larger size, leads to the observed shape of the SAD for the whole plot regardless of that of the SADs of those subplots. The widely reported shape of an SAD is thus likely to be driven by a spatial parallel of the central limit theorem, a statistically convergent process through which the SAD arises from small to large scales. Exact properties of the SAD are driven by species spatial turnover and the spatial autocorrelation of abundances, and can be predicted using this information. The theory therefore provides a direct link between SADs and the spatial correlation structure of species distributions, and thus between several fundamental descriptors of community structure. Moreover, the statistical process described may lie behind similar frequency distributions observed in many other scientific fields.
Many models have been proposed to explain the general observation that the majority of species are rare, and to predict the major properties of the species abundance distribution (SAD) (1). Some assume a particular biological process, such as sequential niche division among species (2), stochastic population dynamics driven by simple rules and constraints (3, 4), or spatial rules imposed on species geographic distributions (5–7). These models can produce quite realistic SADs, often close to lognormal distributions. However, the ubiquity of the SAD pattern (i.e., its independence of particular taxon specifics and other biological settings) indicates that the processes responsible are much more general, and perhaps of a statistical rather than a biological nature (7). Indeed, similar patterns have also been observed in many nonbiological systems (8).
It has been suggested (9) that the approximately lognormal shape of the SAD might result from a random multiplicative process acting on abundances (i.e., an additive process acting on their logarithms). Purely multiplicative processes cannot, however, be responsible for the multiple SADs that are observed simultaneously at several spatial scales (10). The reason is that the SAD of an assemblage on a study plot (whose bounds may be arbitrary or ecological) is necessarily spliced from the SADs of subassemblages occurring in nonoverlapping subplots covering that plot (6, 11, 12). Because abundances for the whole plot arise by summing the abundances of the subassemblages across all of the subplots, an additive process acting on abundances must also play a role. In fact, many models of the SAD explicitly or implicitly comprise additive processes (4, 13, 14). However, this has never been clearly identified as the major mechanism responsible for the shape of the SAD. Here, we show that it is the additive process itself that represents the clue to understanding the universally reported shape of SADs, regardless of any modelspecific dynamics.
Suppose that the SAD for an assemblage on a plot (SADp) is comprised from those of the subassemblages on nonoverlapping subplots (SADsp). We can ask how the properties of the SADp depend on the properties of the SADsp, and to what extent it is affected solely by the process through which the SADp arises. We will explore the possibility that the SADp is independent of the SADsp for the smallest subplots (initialSADsp), because the statistical process giving rise to the SADp outweighs the contribution of the particular initialSADsp. This situation would be similar to the process that lies behind the central limit theorem (CLT) [introduced in 1733 by de Moivre and proved in 1901 by Lyapunov (15)]. According to the CLT, the normal (Gaussian) distribution arises by addition of many mutually independent variables with finite variances regardless of their original distribution.
The process through which an SADp arises, being spliced (see Materials and Methods) from many initialSADsp, is, however, different, because it is necessarily spatially determined. This means that the abundances of each of two adjacent subplots to be joined are dependent on each other, and that some species are missing from some subplots. The SAD then arises by summing pairs of abundances of the species common to both joined subplots, and appending abundances of the species missing from one subplot at each recurrent step. The resulting distribution is thus shaped by the spatial correlation structure, which is exemplified by species spatial turnover and the spatial autocorrelation of abundances. Positive correlation between the abundances of given species in neighbouring subplots leads to elongation of the righthand tail of the SAD, because eventual high abundance in one subplot is probably added to similarly high abundance in another. This elongation occurs even if abundances are not correlated (for abundances are positive and thus only the righthand tail can grow), but the stronger is the autocorrelation the faster the tail grows, regardless of the exact nature of that autocorrelation (Fig. 1). However, species spatial turnover leads to the addition of species occurring in only one of two joined subplots, which produces a prevalence of rare species in the spliced SAD. These two effects combined thus lead to a rightskewed abundance distribution.
Results
We simulated the process described above (for details, see Materials and Methods), varying its inputs in terms of the shape of the initialSADsp, and using observed levels of species spatial turnover (measured as the proportion of species common to both neighbouring subplots, i.e., the Jaccard index, J) and of spatial autocorrelation of abundances (determined by Pearson's correlation coefficient, r) (see Materials and Methods). We proceeded in a stepbystep manner, splicing pairs of neighbouring initialSADsp in the first step, then (the second step) splicing pairs of neighbouring SADsp that resulted from the first step, and so on up to the SADp of the whole plot.
Three different simulation experiments were performed, each beginning with a differently shaped initialSADsp (leftskewed, regular, rightskewed). We checked whether all of the simulations had reached a particular shape of the distribution, whether these shapes were the same regardless of initialSADsps, and ultimately compared the resulting distributions from each of the 3 simulations with the observed SADp of 2 wellresolved spatial datasets. These latter comprised (i) trees within a tropical study plot on Barro Colorado Island (ref. 16 and 17 and http://ctfs.si/edu/datasets/bci) (see Materials and Methods), and (ii) central European birds mapped on a transect through the whole of the Czech Republic (7) (see Materials and Methods). All of the observed and simulated SADp and SADsp to be compared were standardized to the same mean abundance (i.e., a_{st} = a/ā, where a_{st} is the standardized abundance, a is a raw abundance, and ā is mean abundance), and veiled by minimum observed values. The SADsp to be spliced were neither standardized nor veiled.
Both datasets revealed close similarity to the respective SADsp resulting from the convergent processes (Figs. 2 and 3). None of (i) a rank plot (Figs. 2 bottom row and 3C), (ii) the maximum distance between simulated and observed cumulative distributions [Kolmogorov–Smirnov statistics (KS)] (see SI Appendix, Fig. S1) or (iii) the skewness of the SADp of logtransformed abundances (Fig. 3 A and B) revealed disagreement between observation and the SADp resulting from the simulated splicing from the 200th step on. Visually, the simulations followed the usually reported shape (i.e., sigmoid and almost symmetric ranklogabundance plot) from the 20th step on (for steps of 50 and 100 see Fig. 2, second and third rows). A nonparametric DKW test (18) based on Kolmogorov–Smirnov statistics could not reject agreement between modeled and observed SADp in any case, whereas for the earlier steps the agreement was rejected at P < 0.01 (see Materials and Methods). The difference between SADp for tropical trees and central European birds (Fig. 3C) was accurately predicted by the difference in species spatial turnover, J, and spatial autocorrelation of abundances, r. The probabilistic process of splicing of SADsp in neighbouring subplots, modeled by our simulations, thus represents a realistic mechanism for the emergence of observed SADs.
Discussion
We have demonstrated a universal principle that inevitably applies if summing variables irregularly distributed in space or time, and thus inevitably affects the SAD. This principle is similar to the CLT, which states that sums of the same numbers of mutually independent variables approach a bellshaped distribution. We argue that sums of various numbers of mutually independent or dependent, positive variables approach a rightskewed distribution, which is more or less symmetric on a logarithmic scale. The crucial difference between the CLT and our principle, i.e., “various numbers of variables,” corresponds to the fact that some species are missing in some samples, whereas the potential mutual dependence of variables corresponds to spatial intraspecific correlation between abundances of two adjacent plots. The mutual dependence is not, however, necessary, because it only determines how heavy is the right tail of the distribution (Fig. 1). Applying this simple principle to abundance data, we get realistic SADs. Because missing observations (either really missing or missing because of the limitations of the method of observation) and/or their mutual dependence is rather common across all fields of science, we would not be surprised if this principle governed many other asymmetric distributions observed there (8).
The fit of our prediction was obtained using the simplified assumption that both of the spatial parameters are constant over all steps (i.e., all spatial scales). This is clearly unrealistic, because at least spatial turnover has been reported to be scale dependent (19). However, by parameterizing the process using measurements extracted from the whole plot, we set the process by the parameters crucial for the final convergent stage. If considering only a small part of the transect data, we should not assume that the observed SAD has yet converged, but we might still assume agreement between the observed abundance distribution and simulated SADsp at some particular step of the process. That is exactly what happened for all of the initialSADsp and, surprisingly, for various settings (see Materials and Methods) of the parameters (Fig. 4 and SI Appendix, Fig. S2). The process is thus so pervasive that it predicts the observed shape, whatever the initialSADsp, even for smaller areas with an SADsp that does not represent a complete convergent stage.
Having demonstrated this universal principle, it is possible to see why so many models that have been proposed (1) produce quite realistic SADs. All of the spatial models include the existence of species spatial turnover and most of them spatial autocorrelation. Various mechanisms then only tune their exact values to fit a model to data. For instance, manipulating the proportion of newly arriving individuals (13, 20) or the proportion of newly established species (21) effectively leads to specific levels of species turnover and spatial autocorrelation, and so it is not surprising that it affects the shapes of resulting SADs. Many similar processes effectively produce species turnover at several spatial scales, which is, according to our theory, the proximate driver of observed SADs.
Importantly, we need not assume that SADs for real assemblages have actually emerged because of the large number of steps of the process described above. However, we argue that this process encapsulates the major feature of the emergence of observed SADp, which is the splicing of SADsp in neighbouring subplots. In reality, the spatial scale of the initialSADsp may correspond to the spatial requirements of an individual, i.e., home range of an animal or the spatial requirement of a plant. The shape of such an initialSADsp may be driven by that of the speciesbody size distribution (22), and thus may be much less extreme (i.e., closer to the shape resulting from the convergent process) than those used in our simulations. The process thus might actually require a much smaller number of steps to reach full convergence.
Another possibility is that an SAD really originates from many steps of splicing, starting with initialSADsps for extremely small patches. The “abundance” of a species in these small patches would then be represented by the probability of species occurrence, and the “true” SAD would be a frequency distribution of these probabilities. Because the probability of occurrence corresponds to the reciprocal of the size of a species' home range, the SADs might still be linked with the speciesbody size distribution. Both interpretations of the initialSADsp have the potential to link our theory with the factors that affect landscape properties enabling species coexistence (productivity and habitat complexity) and species energetic and resource requirements at very local scales. According to our theory, only at the very local scale are biologically important processes taking place, whereas the patterns observed at large scales are dominated by a statistical process. The theory thus has the potential to separate statistical and biological effects. Importantly, we do not need to assume any particular “fundamental” scale (comprising initialSADsp) from which the patterns on other scales are derived; the convergent process leads to the observed SAD shape regardless of the scale we begin with, given a sufficient number of splicing steps.
Our approach comprises purely bottomup processes leading from SADs at local scales to convergent SAD at large spatial scales. This approach is in contrast to the topdown attempts to derive particular shapes of SADs by spatial sampling of given regional SAD (23), and to prevailing macroecological consideration of regional patterns as those determining local ones (24). Biologically relevant processes may actually act at regional scales or rather at many scales interacting together. Even then, the purely statistical bottomup process we describe has in most cases an overwhelming influence on the shapes of regional SADs, because it acts whenever there are particular local distributions (of any shape), and nonzero spatial species turnover between subplots.
Our theory provides a direct link between SADs on the one hand and species spatial turnover and autocorrelation on the other, i.e., between several fundamental descriptors of community structure. Many such links have already been determined (7), and the mathematical connections to other macroecological patterns have been demonstrated (e.g., the speciesarea relationship) (25). Here, we have shown that abundance patterns can be derived using three assumptions: (i) that most species do not occur everywhere, (ii) that species abundances are positive (a trivial, but critical detail), and (iii) that these abundances are spatially autocorrelated. These assumptions represent quite universal biological observations, and thus it is understandable that they universally lead to the observed shape of the SAD.
According to our theory, the approximately lognormal shape of SADs, universally found in species assemblages, is a consequence of a purely statistical limiting process parameterized by species spatial turnover. The exact parameters of each particular SAD are then given by the structure of species' spatial distributions, and an SAD thus reflects the spatial distribution of habitats and (meta)population and metacommunity dynamics. Therefore, as in the case of other macroecological patterns (7), the overall shape reflects a universal statistical process, but the details and particular parameters reveal biology and can bring important information about the structure and dynamics of ecological communities.
Materials and Methods
Splicing.
This is a newly introduced term for an operation over probability distribution functions, which comprises summing and concatenating (appending) mutually dependent variables; the standard term “convolution” is related only to summation of (mutually independent) variables. The analytical expansion of the splicing is “f_{1} spliced with f_{2}” π_{1}f_{1} + π_{2}f_{2} + Jf_{1}*_{c}f_{2}, where π_{1} + π_{2}+ J = 1, and *_{c} is a correlated convolution.
Simulation.
It was a stepbystep process, each step with 3 inputs ((i) a pair of identical distributions given by S real positive numbers; (ii) Jaccard index, J; and (iii) a pair of real numbers {σ_{min}; σ_{max}}, which set up the spatial autocorrelation of abundances), and one output (a distribution given by S real positive numbers). Each step consisted of (i) drawing two sets of S × J abundances (those for species common to the two subplots) from the distributions input; (ii) making random pairs of these abundances {a_{1},a_{2}} so that σ_{min}a_{1} ≤ a_{2} ≤ σ_{max}a_{1} (if the inequality cannot be met, the a_{2} that is nearest to the constraints σ_{min}a_{1} and σ_{max}a_{1} is attributed to the a_{1}) and appending a_{1} + a_{2} to the distribution in the output; (iii) drawing S × (1 − J) abundances (those for species that occur only in one of two subplots) from a distribution input, and appending them to the distribution in the output. The parameter S = 5,000. Note that drawing from a distribution given by a set of particular values does not mean that only those values can be drawn. (For procedure and picture guide see SI Appendix, Guide and Procedures). For utility to run the procedure, see www.cts.cuni.cz/wiki/ecology:start.
Extracting of the Parameters.
The J = S_{com}/S_{tot}, where S_{com} is the number of species common to the two (East and West in this case) halves of the observed plot, and S_{tot} is the number of species within the whole observed plot. The σ_{min} and σ_{max} were chosen empirically to meet the observed r when running simulations; the r is a Pearson's correlation coefficient between abundances of the two halves of the observed region; the species occurring in only onehalf were excluded. This applies to both the datasets.
BCI 1983 Data.
Data on 307 tropical tree species from the plot of 50 ha on Barro Colorado Island, Panama; all of the dead trees and the trees labeled as “which not yet entered census” were excluded.
Transect (April–June) 2004–2005 data.
Data on 144 temperate bird species censused within 150m distance around each of 768 points along a linear East–West transect in south Bohemia and Moravia; points were separated by between 300 and 500 m.
Test.
A test using the Dvoretzky–Kiefer–Wolfowitz inequality (P(KS>ε) ≤ 2Exp(−2nε^{2}); ε > 0; P is the probability that KS oversteps the ε by chance; n is a number of samples from the tested distribution; if both the assumed and tested distributions are given by a sample, which is the case, the inequality is an even stronger criterion). KS takes values of 0.07 and 0.1 for steps from 200 on in cases of tropical tree and central European bird data, respectively. If we wanted to reject the agreement of data and simulation using these values, we would need significance levels P > 0.09 (n = 307) and P > 0.1 (n = 144), respectively. However, the values KS >0.14 that hold for all of the steps <50 in both cases, are easy to reject at level P ≪ 0.01. The KS < 0.1 and level needed for rejection P > 0.37 (n = 84) in test for the Fig. 4.
Settings.
Fig. 1: Full bars J = 60%, empty bars J = 90%, regular initialSAD, histograms show stages 450–500; Fig. 1 A and C: {σ_{min}; σ_{max}} = {0;10^{99}}, which produces r̄ ≈ 0; Fig. 1. B and D: {σ_{min}; σ_{max}} = {0.9;1.1}, which produces r̄ ≈ 0.953; Figs. 2 and 3A: J = 88.1%, {σ_{min}; σ_{max}} = {0.9;1.11}, which produces r̄ ≈ 0.95 (observed values are: J = 88.1%, r̄ = 0.97); Fig. 3 B and C: J = 77%, {σ_{min}; σ_{max}} = {0.5;1.7}, which produces r̄ ≈ 0.84 (observed values are: J = 76.4%, r̄ ≈ 0.81); Fig. 4 A and C: J = 70%, {σ_{min}; σ_{max}} = {0.9;1.1}, r̄ ≈ 0.95; Fig. 4 B and D: J = 70%, {σ_{min}; σ_{max}} = {0.3;100}, r̄ ≈ 0.195.
Acknowledgments
We thank Tomáš Herben, Petr Keil, and Ethan White for valuable comments to the text. This work was supported by Marie Curie Fellowship 039576RTBPEIF (to A.L.Š.); Czech Ministry of Education (Grants LC06073 and MSM0021620845); Grant Agency of the Academy of Sciences of the Czech Republic (Grant IAA601970801); and a Royal SocietyWolfson Research Merit Award (to K.J.G.). The Barro Colorado Island forest dynamics research project is supported by National Science Foundation (Grants DEB0640386, DEB0425651, DEB0346488, DEB0129874, DEB00753102, DEB9909347, DEB9615226, DEB9615226, DEB9405933, DEB9221033, DEB9100058, DEB8906869, DEB8605042, DEB8206992, and DEB7922197 to Stephen P. Hubbell); the Center for Tropical Forest Science; the Smithsonian Tropical Research Institute; the John D. and Catherine T. MacArthur Foundation; the Mellon Foundation; the Celera Foundation; numerous private individuals; and the hard work of over 100 people from 10 countries over the past 2 decades. The plot project is part of the Center for Tropical Forest Science, a global network of largescale demographic tree plots.
Footnotes
 ^{1}To whom correspondence should be addressed. Email: sizling{at}cts.cuni.cz

Author contributions: A.L.S., D.S., E.S., and J.R. analyzed data; and A.L.S., D.S., E.S., J.R., and K.J.G. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/cgi/content/full/0810096106/DCSupplemental.
References
 ↵
 ↵
 Tokeshi M
 ↵
 ↵
 Hubbell SP
 ↵
 Harte J,
 Kinzig AP,
 Green JL
 ↵
 Storch D,
 Marquet PA,
 Brown JH
 Šizling AL,
 Storch D
 ↵
 ↵
 ↵
 Cody ML,
 Diamond JM
 May RM
 ↵
 ↵
 Šizling AL,
 Storch D,
 Reif J,
 Gaston KJ
 ↵
 Storch D,
 Šizling AL
 ↵
 ↵
 Blackburn TM,
 Gaston KJ
 Marquet PA,
 Keymer JE,
 Cofré H
 ↵
 Kallenberg O
 ↵
 Condit R
 ↵
 Hubbell SP,
 et al.
 ↵
 Wasserman LA
 ↵
 Storch D,
 Marquet PA,
 Brown JH
 Gaston KJ,
 Evans KL,
 Lennon JJ
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
Citation Manager Formats
Sign up for Article Alerts
Jump to section
You May Also be Interested in
More Articles of This Classification
Physical Sciences
Statistics
Biological Sciences
Related Content
 No related articles found.