Phanerozoic marine biodiversity dynamics in light of the incompleteness of the fossil record

February 13, 2006
103 (8) 2736-2739


Long-term evolutionary dynamics have been approached through quantitative analysis of the fossil record, but without explicitly taking its incompleteness into account. Here we explore the temporal covariance structure of per-genus origination and extinction rates for global marine fossil genera throughout the Phanerozoic, both before and after corrections for the incompleteness of the fossil record. Using uncorrected data based on Sepkoski’s compendium, we find significant autocovariance within origination and extinction rates, as well as covariance between extinction and origination, not one, but two, intervals later, corroborating evidence for the unexplained temporal gap found by past studies. However, these effects vanish when the data are corrected for the incompleteness of the fossil record. Instead, we observe significant covariance only between extinction and origination in the immediately following intervals. The gap in the response of the biosphere to extinction in the uncorrected fossil record thus appears to be an artifact of the incompleteness of the fossil record, specifically due to episodic variation in the probability that taxa will be preserved, on time scales comparable to the temporal resolution of Sepkoski’s data. Our results also indicate that at that temporal resolution (the stage/substage of duration ≈5 million years), changes in origination and extinction do not persist for longer than one interval, except that elevated origination rates immediately after extinction may last for more than a single interval. Thus, although certain individual cases may deviate from the overall pattern, we find that in general the biosphere’s response to perturbation is immediate geologically and usually short-lived.
Sepkoski’s compendia of marine fossil families (1) and genera (2, 3) have been central to the analysis of the long-term patterns of origination, extinction, and overall diversity of marine animals throughout the Phanerozoic. Many analyses have looked for temporal correlations between rates of origination, extinction, and total biodiversity in stages and substages of the Phanerozoic (49). Of special interest has been the discovery of the delayed recovery of diversity after times of unusually high extinction (5, 6), as well as the observation that times of high extinction and origination tend to persist for more than a substage (57). These results suggest either that the causes of elevated rates of origination or extinction persist for many millions of years, and/or that the biosphere has some intrinsic limits in how fast it can recover from extinction (6).
However, as all of these studies have recognized, Sepkoski’s raw data do not include any corrections for the incompleteness of the fossil record. Significantly, there is growing evidence that this incompleteness, and its temporal variation, can have significant effects on perceived patterns of origination, extinction, and biodiversity in the fossil record (1012). While some analyses have been adjusted in an attempt to account for incompleteness (e.g., ref. 6), none has performed a basic time-series analysis on a data set that has used geological and paleontological first principles to quantitatively correct the observed rates of origination and extinction derived from Sepkoski’s compendium of marine genera for the effects of incompleteness. Foote has recently presented such a data set (13), and its differences from Sepkoski’s raw data are striking. Specifically, Foote has noted that the measured peaks of origination and extinction in the corrected data are far more volatile than in Sepkoski’s uncorrected data (representative time series are presented in Fig. 1), and he cautioned that between-stage correlations in rates of origination and extinction may indeed be artifacts created by the incompleteness of the fossil record.
Fig. 1.
Per-genus rates of (a) origination Oi and (b) extinction Ei as a function of time. Uncorrected data (Upper) and data corrected for the incompleteness of the fossil record (Lower) are presented. Diversity has been counted by using the boundary-crosser method, assuming pulsed turnover (13). Different geological periods are indicated in standard colors; average rates are indicated by dotted lines.

Analytic Approach

Here we examine how the previously reported correlations of between-stage rates of origination and extinction (5, 6, 8) are affected when Foote’s corrected data are subjected to time-series analysis. Specifically, we use vector autoregression (VAR) analysis (14) to examine the temporal covariance structure between per-genus origination and extinction rates of Phanerozoic marine genera, for both Sepkoski’s raw data and the data corrected by Foote for the incompleteness of the fossil record (13).
For each ith interval of duration ti in the time-series data (Fig. 1), we calculated the covariances between per-genus rates of origination Oi and extinction Ei with the originations Oi−n and extinctions Ei−n in the n previous intervals, denoted VAR(n). Combining originations and extinctions into the two-dimensional vector xi = (Oi, Ei), the VAR(1) model is xi = Φ1·xi−1 + εi, where Φ1 is the 2 × 2 matrix of normalized covariance coefficients relating to origination and extinction one interval previous, εi represents serially uncorrelated random noise, and intercepts have been omitted to simplify notation. The VAR(2) model extends the analysis to two lags, and is defined as xi = Φ1·xi−1 + Φ2·xi−2 + εi, explicitly:
Previous analyses (4, 6, 8) have determined autocorrelations and cross-correlations, separately normalized to the range from −1 to +1, which precludes direct numerical comparison between the magnitudes of an autocorrelation within a time series and a cross-correlation with another. Here the multidimensional VAR technique (14) simultaneously determines the relative contributions of the two, but also uses a different normalization: clearly, autocovariance and covariance coefficients cannot both be constrained to the same finite range if their relative magnitudes are not bounded. Instead, the VAR coefficients are normalized by the inverse of the variance, which in our two-dimensional system constrains the autocovariance coefficients (diagonal Φ matrix elements) to the range from −1 to +1, the same as for the more familiar autocorrelation (the autocovariance of a time series of one-dimensional scalars). An origination autocovariance coefficient Φ1oo = 0.5 ± 0.2 implies that, holding extinction at its average value, if origination in the previous time interval is 1% above average, origination will be 0.5 ± 0.2% above average in the present interval. By contrast, the covariance coefficients (off-diagonal Φ matrix elements) are determined relative to the normalized autocovariance coefficients; they are therefore numerically unconstrained, and in particular can have values greater than +1. Although these covariance coefficients cannot be easily mapped to cross-correlation coefficients, quantitative interpretation is nonetheless straightforward. For example, a value of the covariance coefficient Φ1oe = 1.5 ± 0.5 implies that, holding extinction at its average value, if origination in the previous time interval is 1% above average, extinction will be 1.5 ± 0.5% above average in the present interval.


We analyzed several published tabulations of per-genus rates of origination and extinction of Phanerozoic genera. The uncorrected raw binned data come from the compendium of fossil genera published by Sepkoski (3), and have been used in previous studies (68). We used the 101 time intervals running from the Cambrian (543 million years ago) to the present, and excluded genera appearing in only a single interval (“singletons”) to minimize Lagerstätten and monographic effects. Per-genus rates of origination (extinction) were defined as the fraction of total genera that appeared (disappeared) in an interval.
A second major tabulation is Foote’s boundary-crossing data (13), which were broken into 77 intervals corresponding mainly to geological stages. With the boundary-crosser approach, we have only point estimates of origination and extinction rates. Compiling the rates of origination and extinction depends on where in the intervening interval the taxa actually originated or went extinct. Foote used two end-member approaches for dealing with this uncertainty: (i) he assumed that taxa originated and went extinct at a uniform rate throughout the interval (the model of continuous turnover); or, (ii) he assumed that all originations occur at the beginning, and all extinctions, at the end, of the interval (the model of pulsed turnover). The uncorrected data presented in Fig. 1 are tabulated assuming this pulsed model (figure 2 of ref. 13), which currently seems the more likely approach (15).
Fig. 2.
Correcting for the incompleteness of the fossil record causes autocovariance within origination, and within extinction, to disappear. Scatter plots of origination Oi−1 in a given interval vs. origination in the following interval Oi are presented for Sepkoski’s uncorrected (a) and corrected (c) data. Similar scatterplots of extinction are presented for uncorrected (b), and corrected (d) data. Boundary-crossers with pulsed turnover are used in all panels. Color coding corresponds to geologic time period, as in Fig. 1. Mean averages are indicated by black lines. The autocovariance present in the uncorrected data (a and b), manifest as the clumping of data points around the gray line of slope 1, disappears in the corrected data (c and d).
Neither tabulation method, however, accounts for the incompleteness of the fossil record. Foote has used preservation and rock volume effects to estimate true origination and extinction per-taxon rates, in both continuous and pulsed turnover scenarios (figures 3 and 4 of ref. 13), which we term “corrected” data. While it is inherently challenging to verify any model that corrects for missing data, to date Foote is the only one to have made such a correction, and it highly desirable to determine what effect these corrections have on the perceived diversity dynamics derived from analysis of the uncorrected data. Analyzing the corrected data is especially important because the incompleteness of the fossil record distorts our view of the history of life; for example, observed times of extinction always predate true times of extinction, unless fossil misidentifications or correlation errors occur. The observed temporal ranges of taxa are not only incomplete but also biased.
Fig. 3.
Episodic incompleteness of the fossil record can cause a rapid recovery from extinction to appear delayed. (a) Complete preservation: an extinction pulse at ti−1 followed immediately by complete recovery (origination pulse) at ti. (b) Uniform, incomplete preservation: assuming an exponential distribution of waiting times between a fossil’s apparent and true extinction (6), the pulse of extinction at ti−1 (dark purple) is smeared over ti−2 and ti−3 (lighter purple); the analogous argument applies to origination (blue). Nonetheless, the origination distribution’s peak at ti still immediately follows the extinction distribution’s peak at ti−1, leaving temporal correlations qualitatively unchanged relative to the case of complete preservation. (c) Episodic, incomplete preservation: preservability in the time interval ti immediately after an extinction at ti−1 is often comparatively low (13), so that taxa that actually originated during ti will be reported to have originated in ti+1 and ti+2. This drop in preservability at ti further decreases the reported origination rate at ti (blue arrow) while concomitantly increasing the apparent rates in ti+1 and ti+2 (white arrows), creating the artifactual temporal lag/delay in time of recovery from extinction that appears in Sepkoski’s data (13).


As a baseline to compare with previous studies, we first analyzed the uncorrected data in both binned and pulsed boundary-crosser tabulations. We then analyzed the corrected data in the preferred pulsed model, and for comparison we also analyzed the corrected data by assuming continuous turnover. Our results are presented in Table 1, with coefficients reported as the point estimate ± one standard error, with significant coefficients (P < 0.05) in boldface type.
Table 1.
VAR(2) coefficients for all data sets, reported as point estimate ± heteroskedasticity robust standard errors
CovarianceLagCoefficientUncorrected binnedUncorrected pulsedCorrected continuousCorrected pulsed
Origination autocovariance1Φ1oo0.60 ± 0.100.59 ± 0.21−0.10 ± 0.130.05 ± 0.10
 2Φ2oo0.00 ± 0.08−0.26 ± 0.10−0.01 ± 0.13−0.13 ± 0.07
Origination → extinction covariance1Φ1oe0.17 ± 0.130.08 ± 0.050.13 ± 0.130.07 ± 0.04
2Φ2oe−0.07 ± 0.11−0.02 ± 0.020.22 ± 0.12−0.00 ± 0.03
Extinction → origination covariance1Φ1eo0.08 ± 0.070.36 ± 0.380.62 ± 0.151.37 ± 0.42
2Φ2eo0.23 ± 0.051.16 ± 0.300.13 ± 0.131.49 ± 0.54
Extinction autocovariance1Φ1ee0.37 ± 0.170.46 ± 0.160.06 ± 0.140.12 ± 0.12
 2Φ2ee0.05 ± 0.110.05 ± 0.10−0.19 ± 0.150.01 ± 0.11
Numbers in boldface type are 5% significant values (P < 0.05). Φ components are as in Eq. 1. Uncorrected binned data are from Sepkoski’s compilation (3). Boundary-crosser data, from both uncorrected (pulsed) and corrected (continuous and pulsed) analyses, are as in figures 2, 3, and 4 in ref. 13.
Because several factors might have influenced our results, we also ran several additional regressions: we included ti as another variable to account for varying interval duration, and we included a linear drift (detrended the data), to account for long-term secular decline. These results, presented in Table 2, which is published as supporting information on the PNAS web site, agree within error with those in Table 1.
For the uncorrected data, in both binned and boundary-crosser tabulations, we find that per-genus origination and extinction rates have strong autocovariance, even while accounting for extinction-origination covariance (Fig. 2a and b). This finding agrees with earlier results (47), which have formed the basis for the suggestion of inertia in the rates of origination and extinction; i.e., once origination (6, 7) or extinction (57) rates increase (decrease), they remain high (low). The only significant covariance is between rates of origination and the rates of extinction two intervals previous, or about 5–10 million years (Φ2eo = 1.16 ± 0.30 for boundary-crossers). That is, the present origination rate depends on the extinction rate two intervals previous, and not on the rate in the immediately preceding interval (i.e., Φ1eo is insignificant). Although calculated by a completely separate analysis, this result agrees with the peak at 10 million years in the cross-correlation analysis by Kirchner and Weil (6) that has been used to infer delayed recovery after extinction.
The picture drastically changes, however, when Foote’s corrections for the incompleteness of the fossil record (13) are applied to Sepkoski’s raw data. The autocovariances vanish entirely (Fig. 2 c and d). Note that the statistical result is not a negative autocovariance: it is not the case that a high rate of extinction presages a low rate in the next interval, for example. The temporal gap between extinction and subsequent origination also disappears entirely; no evidence for delayed recovery remains. In Foote’s continuous model, the two-interval lag Φ2eo becomes insignificant, replaced by a significant one-interval lag Φ1eo. In Foote’s pulsed model, both one-interval and two-interval lags contribute significantly.


Our analysis suggests that the temporal gap between extinction and subsequent origination in the uncorrected data are an artifact of the smearing back of extinctions [Signor–Lipps effect (16)] and the smearing forward of originations [the Sppil–Rongis (6, 17) or Jaanusson (18, 19) effect] due to incomplete preservation. If a taxon’s true last occurrence is missed because of incomplete preservation, its extinction will be mistakenly reported too early; conversely, originations will be misreported too late. However, if the fossil record were uniformly incomplete, then, as Kirchner and Weil point out (6), the Signor–Lipps and Jaanusson effects are not sufficient to explain why delayed recovery after extinction should be observed in the uncorrected data if delayed recovery were not a real feature of the history of life (Fig. 3a and b). But the Signor–Lipps and Jaanusson effects can produce the false appearance of delayed recovery if the fossil record is nonuniformly incomplete (Fig. 3c). In fact, there is strong empirical evidence that preservation potential is not uniform through time (1012). Foote’s corrections to Sepkoski’s data allow for stage-by-stage variation in the quality of preservation, and he found not only that preservation probabilities vary widely from stage to stage, but also that they correlate highly with the amount of preserved marine sedimentary rock (figure 5 of ref. 13). It appears that the delayed recovery observed in the uncorrected data is specifically an artifact of the high temporal variability in the incompleteness of the fossil record (Fig. 3). Piecemeal examination of the stratigraphic record lends support to this conclusion, where extinction is often observed in intervals of relatively high preservability, whereas the subsequent originations first appear in immediately following intervals of comparatively low preservability (13). Highly episodic preservation probabilities are also expected, given our knowledge of the processes responsible for the deposition of sedimentary rock, which houses the fossil record. Explicit sequence stratigraphic modeling of how both tectonic factors and changes in eustatic sea-level affect the structure of the sedimentary rock record predicts that, even with completely uniform origination and extinction, the rock record should exhibit artifactual peaks in extinction followed by delayed peaks in origination (20).
Incomplete preservation, whether episodic or not, may also result in members of a group of taxa, that in reality all became extinct in a single interval, to appear to have gone extinct over several intervals in the uncorrected data (Fig. 3 b and c) (6). This smearing out will broaden the spikes in the extinction and origination record, and the resulting multiperiod peaks will have high autocovariance, which we observe in the uncorrected data. When the smearing is taken out, Foote’s corrected data appear much more volatile, as evident in Fig. 1.
Our results suggest that the Signor–Lipps and Jaanusson effects each shorten stratigraphic ranges on time scales of stage or substage duration (≈5 million years); i.e., that originations (extinctions) typically occur one stratigraphic interval too late (too early) in the fossil record. This idea is consistent with the results of quantitative sequence stratigraphic models, where species are allowed to evolve and be preserved in basins that are accumulating sediment (21, 22). In these computer simulations, observed stratigraphic ranges may have gaps that span up to half the true species durations; the incompleteness of the stratigraphic ranges implied by Foote’s corrections of Sepkoski’s data are of a similar magnitude. Moreover, data on the preservability of marine taxa are also consistent with the finding that the incompleteness of the fossil record has typically shortened stratigraphic ranges by one or more stages/substages. Under the assumptions of pulsed turnover, a preservability of 0.5 implies that there is a half chance that a taxon’s time of first (last) occurrence will not be preserved in the earliest (latest) interval in which it really was extant (15). Foote and Sepkoski (23) estimated the preservability of marine groups in the fossil record, finding that preservabilities are typically about 0.5 at this stratigraphic resolution (e.g., Anthozoa, 0.4–0.5; Crinoidea, 0.36–0.37; Gastropoda, 0.41–0.55; Bivalvia, 0.45–0.51; Echinoidea, 0.56–0.65). Several major groups have significantly lower values (e.g., Malacostraca, 0.19–0.33; Osteichthyes, 0.15–0.29; Chondrichthyes, 0.07–0.18), implying even larger Signor–Lipps and Jaanusson effects, whereas only two have much higher values (Brachiopoda, 0.95–1.0; Cephalopoda, 0.85–0.90), suggesting that the Signor–Lipps and Jaanusson effects should not be significant for these groups in the type of analysis we have conducted here.
The results of the VAR analysis depend in part on when within the stratigraphic intervals the true originations and extinctions actually occurred. At present, there is no definitive answer as to how they are distributed, and although the pulsed model seems more likely, the case is less compelling for originations than for extinctions (15). Assuming that Foote’s corrections improve our view of global evolutionary trends, our analysis suggests that on average the response of origination rates to extinction is immediate, and may persist for more than one time interval, if the pulsed model is indeed the better correction for the limited temporal resolution in Sepkoski’s compendium. The lack of autocovariance in the corrected data also suggests that the response of the biosphere to perturbation in general may be both immediate and short-lived (24).
Finally, it is important to remember that in analyses such as ours here, we are reporting an average, or dominant, signal for the entire time series. In time series as causally rich and complex as Phanerozoic marine diversity, there will be individual instances that do not conform to the general pattern. Whereas our analysis does not support the finding of delayed recovery after extinctions in general (6), empirical data do suggest a delayed recovery in certain individual cases, such as following the end-Permian mass extinction (25), although empirical data from local sections can be hampered by the lack of control for sequence stratigraphic architecture (22). As always, in analyzing phenomenologically rich systems there is a tradeoff between large-scale generalities, which we present here, and detailed event-specific description and explanation.

Supplementary Material

Supporting Table
Media (pnas_0511083103_index.html) is missing or otherwise invalid.Media (pnas_0511083103_11083table2.xls) is missing or otherwise invalid.


vector autoregression.


We thank R. Bambach for his binned and phyla tabulations of the data from the late J. Sepkoski’s compendium of fossil genera. We thank M. Foote for providing the raw data from ref. 13 and both M. Foote and S. Holland for very helpful reviews and invaluable discussion. This work was partially supported by National Science Foundation Grants EAR-000385 and DEB-0083983 (to C.R.M.).

Supporting Information



J. J. Sepkoski Milwaukee Pub. Mus. Contr. Biol. Geol 51, 1–125 (1992).
J. J. Sepkoski J. Paleontol 71, 533–539 (1997).
J. J. Sepkoski Bull. Am. Paleontol 363, 1–563 (2002).
J. F. Quinn Paleobiology 13, 465–478 (1987).
S. M. Stanley Paleobiology 16, 401–414 (1990).
J. W. Kirchner, A. Weil Nature 404, 177–180 (2000).
J. W. Kirchner Nature 415, 65–68 (2002).
J. W. Kirchner, A. Weil 267, 1301–1309 (2000).
R. A. Rohde, R. Muller Nature 434, 208–210 (2005).
J. Alroy, C. R. Marshall, R. K. Bambach, K. Besuzko, M. Foote, F. T. Fürsich, T. A. Hansen, S. M. Holland, L. C. Ivany, D. Jablonski, et al. Proc. Natl. Acad. Sci. USA 98, 6261–6266 (2001).
S. E. Peters, M. Foote Nature 416, 420–424 (2002).
A. B. Smith Phil. Trans. R. Soc. London B 356, 351–367 (2001).
M. Foote J. Geol 111, 125–148 (2003).
J. D. Hamilton Time Series Analysis (Princeton Univ. Press, Princeton, NJ), pp. 291–350 (1994).
M. Foote Paleobiology 31, 6–20 (2005).
P. W. Signor, J. H. Lipps Geol. Soc. Am. Spec. Pap 190, 291–296 (1982).
C. R. Marshall New Approaches to Speciation in the Fossil Record, eds D. H. Erwin, R. L. Anstey (Columbia Univ. Press, New York), pp. 208–235 (1995).
C. R. Marshall The Adequacy of the Fossil Record, eds S. K. Donovan, C. R. C. Paul (Wiley, London), pp. 23–53 (1998).
V. Jaanusson The Ordovician System: Proceedings of a Palaeontological Association Symposium Birmingham, September 1974, ed M. G. Bassett (Univ. of Wales Press and National Museum of Wales, Cardiff), pp. 301–326 (1976).
S. M. Holland Paleobiology 2, 92–109 (1995).
S. M. Holland, M. E. Patzkowsky Palaios 17, 134–146 (2002).
S. M. Holland, M. E. Patzkowsky Geology 27, 491–494 (1999).
M. Foote, J. J. Sepkoski Nature 398, 415–417 (1999).
M. E. J. Newman 263, 1605–1610 (1996).
J. L. Payne, D. J. Lehrmann, J. Y. Wei, M. J. Orchard, D. P. Shrag, A. H. Knoll Science 305, 506–509 (2004).

Information & Authors


Published in

Go to Proceedings of the National Academy of Sciences
Proceedings of the National Academy of Sciences
Vol. 103 | No. 8
February 21, 2006
PubMed: 16477008


Submission history

Received: April 21, 2005
Published online: February 13, 2006
Published in issue: February 21, 2006


  1. extinction
  2. Jaanusson effect
  3. origination
  4. Signor–Lipps effect
  5. vector autoregression


We thank R. Bambach for his binned and phyla tabulations of the data from the late J. Sepkoski’s compendium of fossil genera. We thank M. Foote for providing the raw data from ref. 13 and both M. Foote and S. Holland for very helpful reviews and invaluable discussion. This work was partially supported by National Science Foundation Grants EAR-000385 and DEB-0083983 (to C.R.M.).



Department of Physics, Jefferson Laboratory, 17 Oxford Street,
Motohiro Yogo
Department of Economics, and
Present address: Rolling Green Golf Club, 280 North State Road, Springfield, PA 19064.
Charles R. Marshall
Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, and Department of Earth and Planetary Sciences, Harvard University, Cambridge, MA 02138


To whom correspondence should be addressed. E-mail: [email protected]
Communicated by Steven M. Stanley, University of Hawaii at Manoa, Honolulu, HI, December 22, 2005
Author contributions: P.J.L. and C.R.M. designed research; P.J.L. and M.Y. performed research; M.Y. contributed new reagents/analytic tools; P.J.L., M.Y., and C.R.M. analyzed data; and P.J.L. and C.R.M. wrote the paper.

Competing Interests

Conflict of interest statement: No conflicts declared.

Metrics & Citations


Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.

Citation statements



If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by


    View Options

    View options

    PDF format

    Download this article as a PDF file


    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to access the full text.

    Single Article Purchase

    Phanerozoic marine biodiversity dynamics in light of the incompleteness of the fossil record
    Proceedings of the National Academy of Sciences
    • Vol. 103
    • No. 8
    • pp. 2467-3007







    Share article link

    Share on social media