On the path to extinction
See allHide authors and affiliations

Communicated by David Cox, University of Oxford, Oxford, United Kingdom, January 4, 2007 (received for review September 15, 2006)
Abstract
Populations can die out in many ways. We investigate one basic form of extinction, stable or intrinsic extinction, caused by individuals on the average not being able to replace themselves through reproduction. The archetypical such population is a subcritical branching process, i.e., a population of independent, asexually reproducing individuals, for which the expected number of progeny per individual is less than one. The main purpose is to uncover a fundamental pattern of nature. Mathematically, this emerges in large systems, in our case subcritical populations, starting from a large number, x, of individuals. First we describe the behavior of the time to extinction T: as x grows to infinity, it behaves like the logarithm of x, divided by r, where r is the absolute value of the Malthusian parameter. We give a more precise description in terms of extreme value distributions. Then we study population size partway (or uway) to extinction, i.e., at times uT, for 0 < u < 1, e.g., u = 1/2 gives halfway to extinction. (Note that mathematically this is no stopping time.) If the population starts from x individuals, then for large x, the proper scaling for the population size at time uT is x into the power u − 1. Normed by this factor, the population uway to extinction approaches a process, which involves constants that are determined by life span and reproduction distributions, and a random variable that follows the classical Gumbel distribution in the continuous time case. In the Markov case, where an explicit representation can be deduced, we also find a description of the behavior immediately before extinction.
General branching processes are defined through individuals who give birth independently of each other. We limit ourselves to singletype such populations, all individuals having the same reproduction and, more generally lifepath, distribution. Benchmark cases of such processes are the simple Galton–Watson, birthanddeath, and Markov branching processes. Even though such Markovian structures dominate the probabilistic literature, more general processes (allowing aging, various distributions of life spans and fertility periods, as well as repeated litters of varying sizes) are relevant for biological modelling (1). A presentation from a mathematical point of view can be found in ref. 2 (multitype populations) or ref. 3 (singletype processes).
A multitype framework would have been still more general but also far less accessible. The notsomathematical reader will undoubtedly note that even this presentation is demanding at certain points. However, conclusions should be clear, and they, rather than the work to derive them, form the message of this paper: intrinsic stable extinction follows a simple and beautiful pattern. The purpose is thus general understanding, rather than paving the way for specific applications. For those, multitype theory may be useful, but at the bitter end they will tend to need custommade models of narrow relevance.
A (singletype) branching process is subcritical if the expected number m of children per individual is less than one. Let μ(a) denote the expected number of children by age a. Clearly, m = μ(∞). In the lattice case, births can occur only at multiples of some time unit, e.g., yearly. For simplicity, our focus is on the nonlattice case.
We consider socalled Malthusian processes. These are defined by the requirement that there exists a number α, the Malthusian parameter, such that This is always fulfilled in the critical (m = 1) and supercritical (1 < m < ∞), and under very mild conditions (always met with in the real world) also in subcritical cases. Then, α < 0 and since −α will play an important role, we shall write r = −α > 0. In the case of Galton–Watson processes, r = 1n m.
Let Z_{t} ^{x} be a shorthand for the number of individuals alive at time t, provided the population started from x individuals at time 0, taken as newborn then, for preciseness. Z_{t} without a suffix, is Z _{t} ^{1}, the case of one ancestor. If L denotes the life span distribution, any nonlattice Malthusian process will then satisfy (ref. 3, p. 156). We write C for the proportionality constant in this, i.e., 𝔼[Z_{t} ] ∼ Ce ^{αt} = Ce^{−rt} and recall that in the lattice case the enumerator integral is replaced by a sum. (To the nonmathematical reader, the importance of this is not the special formula for C, but rather the fact that the constant can be calculated from demographic facts.) For Galton–Watson and Markov branching processes C = 1, and expressions are exact, 𝔼[Z_{t} ] = m^{t} for integer t in the Galton–Watson case.
The expected population size thus has the same asymptotic form in all three cases, α >, =, < 0. For variances, though, asymptotics take different forms in the sub and supercritical cases. Assume that m < 1 and that Malthusianness is strengthened to a second order property: let ξ denote the reproduction point process, so that μ = 𝔼[ξ], and write it is required that 𝔼[ξ̂(α)^{2}] < ∞. Then, for some constant B. This can be checked from the convolution equation holding for second moments (ref. 3, p. 136). In the more often studied supercritical case, the leading term is proportional to e ^{2αt}.
We impose second order Malthusianness throughout and assume that the life span distribution L and reproduction measure μ satisfy respectively. As a consequence, the so called x log xcondition holds, guaranteeing the Kolmogorov and Yaglom theorems on survival chances and conditional population size given nonextinction (see ref. 4, p. 170, for the case of agedependent processes and ref. 3, p. 159, and ref. 5, for general processes). In the nonlattice formulation, they tell us that for some 0 < c < 1, exist, Σ_{k} b_{k} = 1, and It is worth noting that these two theorems do not require the full strength of branching process independence assumptions (see ref. 6 for populationsizedependent branching).
Even though we shall not embark upon the subtleties of measuring the population in other ways than by just counting its individuals, it turns out that the proof of our results is considerably facilitated by use of counting with a particular random characteristic (ref. 3, p. 167 ff.). The characteristic we shall use will be one that records all events to come in an individual's life and daughter process. It is thus not “individual” like the ones treated in op. cit. but still well behaved (7).
The first and second moment results quoted above for processes just counting the number of individuals alive, clearly extend to processes counted with characteristics χ such that e^{rt} 𝔼[χ(t)] is directly Riemann integrable, and so do the KolmogorovYaglom theorems (5).
The Time to Extinction
Now consider processes with x ancestors, Z_{t} ^{x} .
By 2 the time to extinction, satisfies for some c_{t} tending to c, as t → ∞. As a consequence, where γ_{x} → Euler's γ.
To verify 6 note that for any fixed number v, 0 < v < ∞ On the other hand, where sup_{t≥v} c_{t} − c < ε for any ε > 0, given that v is sufficiently large. Thus it remains to observe that as x → ∞ where o(1), denotes a remainder term, vanishing as x → ∞.
Extreme value theory handily delivers distributional forms of this, though again distinction should be made between continuoustime and lattice cases. In the former, Kolmogorov's theorem yields the exponential tail that implies a classical Gumbel limit (ref. 8, p. 17). Indeed, for each initial number x, define η_{x} by Then, as x → ∞, This extends Pakes's (9) result for Markov branching processes.
In Between Dawn and Demise
Let Z_{uTx} ^{x} denote the population size at some time uT_{x} , 0 ≤ u ≤ 1, between its inception at size Z _{0} ^{x} = x and extinction. We consider nonlattice processes, and to ease exposition, write t_{x} = ln x, so that T_{x} ∼ t_{x} /r, by 7 and 8 . Since 𝔼[Z_{t} ^{x} ] ∼ xCe^{−rt} , as t → ∞, also x^{u−1} 𝔼[Z _{u(tx +t)/r} ^{x}] ∼ Ce^{−ut} , x → ∞, and it is natural to guess that x^{1−u} provides the right norming and that in distribution, as the initial population size x → ∞, at least for fixed 0 < u < 1.
A first check of this conjecture can be made by simulation. Consider a binary splitting Markov branching process, i.e., birthanddeath process, with the probability p _{0} = 0.75 for zero and p _{2} = 0.25 for two offspring, and expected life span = 1. For birthanddeath, as pointed out, C = 1 and further b can be shown to equal p _{0}/(2p _{0} − 1), which is 1.5 in the present case.
Fig. 1 displays the simulation results. The two upper panels show that the process Z_{uTx} ^{x} , 0 < u < 1 displays much more of variation than does Z_{t} ^{x} . The lower panels indicate that the amount of variation is just right to allow nontrivial normalisation. They also point at convergence towards a process with a rather constant mean and a variance increasing with u.
The expectation and variance of the proposed limiting population size are in terms of the classical Gamma function. Thus, for b/C close to one (it cannot be smaller), the Gamma function exerts a strong influence on the behavior of expected size, and since Γ(1) = Γ(2) = 1 the overall picture is that of a fairly constant mean. For large b/C it increases practically exponentially. This may seem intriguing, but is explained by the norming through x ^{1−u}. The limiting variance always increases with u, from 0 to b ^{2}.
Fig. 2 shows these expectations and standard deviations for different values of b assuming C = 1. The situation for b = 1.5 thus seems to fit simulations.
The PathtoExtinction Theorem and Its Proof
Our proof of the convergence follows a threestep programme, which requires acquaintance with weak convergence theory (10). The reader interested in the basic pattern of extinction, rather than mathematical technique is thus advised to jump to Theorem 1.

For fixed 0 < u < 1, write ζ_{x}(t) = x^{u−1} Z _{u(tx +t)/r} ^{x} and prove that as x → ∞, the arrow indicating weak convergence in the Lindvall–Skorohod topology extended to the whole real line.

Consider ζ_{x} at the random argument τ_{x} : = ln c + η_{x} → τ : = ln (C/b) + η, where η is Gumbel by 7 and 8 , and check conditions for ζ_{x}(τ_{x}) → Ce^{−uτ} = C^{1−u} b^{u} e^{−uη} in distribution for fixed u.

Study the asymptotics of ζ_{x}(τ_{x})(u) = x^{u−1} Z_{uTx} ^{x} as a function of u varying inside the unit interval.
Step 1.
We already noted that Similarly, We can conclude that for any fixed t, ζ_{x}(t) → Ce^{−ut} in mean square. The finite dimensional convergence is obvious.
Turning to tightness, we note that for any v ∈ ℝ and a > 0 But the total progeny Y of a population from one newborn ancestor clearly dominates the maximum of the numbers ever simultaneously present, and the sum of the maxima majorises the supremum of the sum. Since 𝔼[Y] = 1/(1 − m), it follows that We can conclude that for some constant K.
Now consider the population counted with a characteristic χ (ref. 3, p. 167) that for any h > 0 gives to each individual present the value of

an indicator of dying the next h/r time units plus

the total number of progeny of the individual born within the same next time span of length h/r.
Denote the process thus counted by D_{h} ^{x} (t). Clearly, for any t ∈ ℝ, Write δ = h/r, and denote the total progeny within t time units stemming from one newborn individual by Y_{t} . Then, For nonlattice processes it follows that, as t → ∞, If we assume that reproduction is L ^{2}Lipschitz in the sense that the second moment of the number of births between ages t and t + h, v_{h} (t), satisfies then a similar, but prohibitive, analysis of the convolution equation for second moments, shows that also This Lipschitzproperty is broadly satisfied, e.g., if the number of children in short intervals is bounded. Indeed, if B is a bound, then v_{h} (t) ≤ B(μ(t + h) − μ(t), so that such qualities of the reproduction function μ transfer.
By Chebyshev's inequality, therefore, for a suitable K, for h little enough and K′ some constant.
Tightness follows along the lines of the Corollaries in ref. 10 (p. 83 and p. 142). The process ζ_{x}(t) thus has the weak nonrandom continuous limit Ce^{−ut} , t ∈ ℝ, as x → ∞. The modification from there is that those corollaries concern processes on the unit interval, whereas our processes are defined on the whole line, with the natural Lindvall–Skorohod topology of convergence on all finite subintervals.
Step 2.
In the nonlattice case, the pair (τ_{x}, ζ_{x}) is also weakly convergent and by the continuity of the exponential limit function and the Gumbel limit, in distribution, as claimed (see ref. 10, p. 151, or ref. 11, Section 2.2).
Step 3.
The preceding two steps can now be repeated for any linear combination yielding the finite dimensional distributional convergence where η is standard Gumbel.
We summarize the situation for nonlattice populations:
Theorem 1.
Consider subcritical Malthusian general, singletype, nonlattice branching processes with finite reproduction variances. Assume conditions 1 and 9 . Then the finite dimensional distributional convergence 10 holds.
It remains to note the discontinuity at the endpoints of the unit interval in that x^{u−1} Z_{uTx} ^{x} equals 1 at u = 0 and 0 at u = 1. The former disappears in the Markov case, where age distribution does not matter and C = 1. The latter mirrors the drastic fall of population size from u close to but < 1 to u = 1, at which we proceed to have a closer look.
Markov Branching Processes on the Eve of Extinction
Markov branching processes, i.e., splitting processes (children are only born at their mother's death) where individuals have exponentially distributed life spans, have been much used in mathematical biology, even though they are of limited biological relevance, since exponential life spans imply the absence of aging. Being nonlattice, Markov branching processes are covered by our general results on extinction time and path to extinction.
However, they also allow a direct approach. The latter is mathematically interesting, using a conditioning argument at a nonstopping time. For the Markov case, it recovers Theorem 1 under slightly relaxed conditions, and adds new knowledge because it renders it possible to study the process immediately before extinction, viewing the discontinuity from the left, as u → 1, through a magnifying glass, as it were (Theorem 2).
Now, let Z_{t} ^{x} be a Markov branching process in which particles have exponentially distributed lifetimes with parameter a. Then exactly. Hence, as pointed out, C = 1; also r = a(1 − m), as any textbook would tell you.
Further, consider a function v(t) < t. A total probability argument yields In words, the last equation holds because, conditionally on Z _{v(t)} ^{x} = y, the probability of the original population, initiated by x ancestors, dying out at dt is the same as the probability that a population of y individuals at v(t) dies out after time t − v(t). Note that this argument applies quite generally to Markov processes. What is special in the branching case is the simple relation valid for any positive integer y and helpful in calculating an integral representation of the probability generating function Analysis of the latter for the case v(t) = ut, 0 < u < 1 yields Theorem 1 for the Markov branching case, as mentioned under somewhat less restrictive conditions: the finite reproduction moment can be replaced by the x log x condition where p _{k} is the probability of splitting into k children.
In the present context, it is more interesting that the case v(t) = t − u, 0 < u < t, yields distributional convergence of Z_{Tx} ^{x} −u, as x → ∞, the limiting process Y_{u} , thus displaying the properties of extinct populations on the eve of their disappearance.
Indeed, write π_{y}, y = 1, 2, … for the unique solution (up to multiplicative factors) of the system the stationary measure of the process (ref. 12, p. 24). Then, the following can be proved to hold:
Theorem 2.
Consider a subcritical Markov branching process Z_{t} ^{x} starting from x individuals, having parameters a and {p_{k} } satisfying the x log x condition, as above. Then, as x → ∞, in distribution for fixed u > 0, and finitedimensionally. The limiting process is Markov. It starts from Y _{0} = 1, stays at any position y = 1, 2, 3, … an exponentially distributed time, with parameter ay, and then jumps to a position z with probability In the long run, it increases exponentially: as u → ∞, in distribution. Here X is an exponentially distributed random variable, with mean b.
Acknowledgments
We thank a referee for taking readability seriously. This work has been supported by the Swedish and Australian Research Councils.
Footnotes
 ^{†}To whom correspondence should be addressed. Email: jagers{at}chalmers.se

Author contributions: P.J. designed research; P.J., F.C.K., and S.S. performed research; P.J., F.C.K., and S.S. contributed new reagents/analytic tools; and P.J., F.C.K., and S.S. wrote the paper.

The authors declare no conflict of interest.

Freely available online through the PNAS open access option.
 © 2007 by The National Academy of Sciences of the USA
References

 Haccou P ,
 Jagers P ,
 Vatutin V

↵
 Jagers P

↵
 Jagers P

↵
 Athreya K ,
 Ney P
 ↵
 ↵
 ↵

↵
 Leadbetter MR ,
 Lindgren G ,
 Rootzén H
 ↵

↵
 Billingsley P

↵
 Silvestrov DS

↵
 Harris TE