Prevolutionary dynamics and the origin of evolution
See allHide authors and affiliations

Communicated by Clifford H. Taubes, Harvard University, Cambridge, MA, July 14, 2008 (received for review May 31, 2008)
Abstract
Life is that which replicates and evolves. The origin of life is also the origin of evolution. A fundamental question is when do chemical kinetics become evolutionary dynamics? Here, we formulate a general mathematical theory for the origin of evolution. All known life on earth is based on biological polymers, which act as information carriers and catalysts. Therefore, any theory for the origin of life must address the emergence of such a system. We describe prelife as an alphabet of active monomers that form random polymers. Prelife is a generative system that can produce information. Prevolutionary dynamics have selection and mutation, but no replication. Life marches in with the ability of replication: Polymers act as templates for their own reproduction. Prelife is a scaffold that builds life. Yet, there is competition between life and prelife. There is a phase transition: If the effective replication rate exceeds a critical value, then life outcompetes prelife. Replication is not a prerequisite for selection, but instead, there can be selection for replication. Mutation leads to an error threshold between life and prelife.
The attempt to understand the origin of life has inspired much experimental and theoretical work over the years (1–10). Many of the basic building blocks of life can be produced by simple chemical reactions (11–15). RNA molecules can both store genetic information and act as enzymes (16–24). Fatty acids can selfassemble into vesicles that undergo spontaneous growth and division (25–28). The defining feature of biological systems is evolution. Biological organisms are products of evolutionary processes and capable of undergoing further evolution. Evolution needs a generative system that can produce unlimited information. Evolution needs populations of information carriers. Evolution needs mutation and selection. Normally, one thinks of these properties as being derivative of replication, but here, we formulate a generative chemistry (“prelife”) that is capable of selection and mutation before replication. We call the resulting process “prevolutionary dynamics.” Replication marks the transition from prevolutionary to evolutionary dynamics, from prelife to life.
Let us consider a prebiotic chemistry that produces activated monomers denoted by 0* and 1*. These chemicals can either become deactivated into 0 and 1 or attach to the end of binary strings. We assume, for simplicity, that all sequences grow in one direction. Thus, the following chemical reactions are possible: Here i stands for any binary string (including the null element). These copolymerization reactions (29, 30) define a tree with infinitely many lineages. Each sequence is produced by a particular lineage that contains all of its precursors. In this way, we can define a prebiotic chemistry that can produce any binary string and thereby generate, in principle, unlimited information and diversity. We call such a system prelife and the associated dynamics prevolution (Fig. 1).
Each sequence, i, has one precursor, i′, and two followers, i0 and i1. The parameter a_{i} denotes the rate constant of the chemical reaction from i′ to i. At first, we assume that the active monomers are always at a steady state. Their concentrations are included in the rate constants, a_{i}. All sequences decay at rate, d. The following system of infinitely many differential equations describes the deterministic dynamics of prelife: The index, i, enumerates all binary strings of finite length, 0,1,00,…. The abundance of string i is given by x_{i} and its time derivative by ẋ_{i}. For the precursors of 0 and 1, we set x_{0′} = x_{1′} = 1. If all rate constants are positive, then the system converges to a unique steady state, where (typically) longer strings are exponentially less common than shorter ones. Introducing the parameter b_{i} = a_{i}/(d + a_{i0} + a_{i1}), we can write the equilibrium abundance of sequence i as x_{i} = b_{i} b_{i′} b_{i″}… b_{σ}. The product is over the entire lineage leading from the monomer, σ (= 0 or 1), to sequence i. The total population size converges to X = (a_{0} + a_{1})/d. The rate constants, a_{i}, of the copolymerization process define the “prelife landscape.” We will now discuss three different prelife landscapes.
For “supersymmetric” prelife, we assume that a_{0} = a_{1} = α/2, and a_{i} = a for all other i. Hence, all sequences grow at uniform rates. In this case, all sequences of length n have the same equilibrium abundance given by x_{n} = [α/2a][a/(2a + d)]^{n}. Thus, longer sequences are exponentially less common. The total equilibrium abundance of all strings is X = α/d. The average sequence length is n̄ = 1 + 2a/d.
Selection emerges in prelife, if different reactions occur at different rates. Consider a random prelife landscape, where a fraction p of reactions are fast (a_{i} = 1 + s), whereas the remaining reactions are slow (a_{i} = 1). Fig. 2A shows the equilibrium distribution of all sequences as a function of the selection intensity, s. For larger values of s, some sequences are selected (highly prevalent), whereas the others decline to very low abundance. The fraction of sequences that are selected out of all sequences of length n is given by (1 − p)^{2}[1 − p(1 − p)]^{n−1}. See supporting information (SI) for all detailed calculations.
Another example of an asymmetric prelife landscape contains a “master sequence” of length n (Fig. 2B). All reactions that lead to that sequence have an increased rate b, while all other rates are a. The master sequence is more abundant than all other sequences of the same length. But the master sequence attains a significant fraction of the population (= is selected) only if b is much larger than a. The required value of b grows as a linear function of n. In this prelife landscape, we can also discuss the effect of “mutation.” The fast reactions leading to the master sequence might incorporate the wrong monomer with a certain probability, u, which then acts as a mutation rate in prelife. We find an error threshold: The master sequence can attain a significant fraction of the population, only if u is less than the inverse of the sequence length, 1/n.
Let us now assume that some sequences can act as a templates for replication. These replicators are not only formed from their precursor sequences in prelife but also from active monomers at a rate that is proportional to their own abundance. We obtain the following differential equation As before, the index i enumerates all binary strings of finite length. The first part of the equation describes prelife (exactly as in Eq. 2). The second part represents the standard selection equation of evolutionary dynamics (28). The fitness of sequence i is given by f_{i}. All sequences have a frequencydependent death rate, which represents the average fitness, φ = Σ_{i}f_{i}x_{i}/Σ_{i}x_{i} and ensures that the total population size remains at a constant value.
The parameter r scales the relative rates of templatedirected replication and templateindependent sequence growth. These two processes are likely to have different kinetics. For example, their rates could depend differently on the availability of activated monomers. In this case, r could be an increasing function of the abundance of activated monomers. Templatedirected replication requires doublestrand separation. A common idea is that doublestrand separation is caused by temperature oscillations, which means that r is affected by the frequency of those oscillations. The magnitude of r determines the relative importance of life versus prelife. For small r, the dynamics are dominated by prevolution. For large r, the dynamics are dominated by evolution.
Fig. 3 shows the competition between life (replication) and prelife. We assume a random prelife landscape where the a_{i} values are taken from a uniform distribution between 0 and 1. All sequences of length n = 6 have the ability to replicate. Their relative fitness values, f_{i}, are also taken from a uniform distribution on [0,1]. For small values of r, the equilibrium structure of prelife is unaffected by the presence of potential replicators; longer sequences are exponentially less frequent than shorter ones. There is a critical value of r, where a number of replicators increase in abundance. For large r, the fastest replicator dominates the population, whereas all other sequences converge to very low abundance. In this limit, we obtain the standard selection equation of evolutionary dynamics with competitive exclusion.
Between prelife and life, there is a phase transition. The critical replication rate, r_{c}, is given by the condition that the net reproductive rate of the replicators becomes positive. The net reproductive rate of replicator i can be defined as g_{i} = r(f_{i} − φ) − (d + a_{i0} + a_{i1}). For r < r_{c}, the abundance of replicators is low, and therefore, φ is negligibly small. In Fig. 3, we have d = 1 and a_{i0} + a_{i1} = 1 on average. For the fastest replicator, we expect f_{i} ≈ 1. Thus, the phase transition should occur around r_{c} ≈ 2, which is the case. Using the actual rate constants of the fastest replicator in our system, we obtain the value r_{c} = 1.572, which is in perfect agreement with the exact numerical simulation (see broken vertical line in Fig. 3).
Replication can be subject to mistakes. With probability u, a wrong monomer is incorporated. In Fig. 4, we consider a “singlepeak” fitness landscape: One seqence of length n can replicate. The probability of errorfree replication is given by q = (1 − u)^{n}. The net reproductive rate of the replicator is now given by g_{i} = r(f_{i}q − φ) − (d + a_{i0} + a_{i1}). The replicator is selected if the replication accuracy, q, is greater than a certain value, given by q > (d + a_{i0} + a_{i1})/rf_{i}. Thus, mutation leads to an error threshold for the emergence of life. Replication is selected only if the mutation rate, u, is less than a critical value that is proportional to the inverse of the sequence length, 1/n. This finding is reminiscent of classical quasispecies theory (3, 4), but there, the error threshold arises when different replicators compete (“within life”). Here, we observe an error threshold between life and prelife.
Traditionally, one thinks of natural selection as choosing between different replicators. Natural selection arises if one type reproduces faster than another type, thereby changing the relative abundances of these two types in the population. Natural selection can lead to competitive exclusion or coexistence. In the present theory, however, we encounter natural selection before replication. Different information carriers compete for resources and thereby gain different abundances in the population. Natural selection occurs within prelife and between life and prelife. In our theory, natural selection is not a consequence of replication, but instead natural selection leads to replication. There is “selection for replication” if replicating sequences have a higher abundance than nonreplicating sequences of similar length. We observe that prelife selection is blunt: Typically small differences in growth rates result in small differences in abundance. Replication sharpens selection: Small differences in replication rates can lead to large differences in abundance.
We have proposed a mathematical theory for studying the origin of evolution. Our aim was to formulate the simplest possible population dynamics that can produce information and complexity. We began with a “binary soup” where activated monomers form random polymers (binary strings) of any length (Fig. 1). Selection emerges in prelife, if some sequences grow faster than others (Fig. 2). Replication marks the transition from prelife to life, from prevolution to evolution. Prelife allows a continuous origin of life. There is also competition between life and prelife. Life is selected over prelife only if the replication rate is greater than a certain threshold (Fig. 3). Mutation during replication leads to an error threshold between life and prelife. Life can emerge only if the mutation rate is less than a critical value that is proportional to the inverse of the sequence length (Fig. 4). All fundamental equations of evolutionary and ecological dynamics assume replication (31–33), but here, we have explored the dynamical properties of a system before replication and the emergence of replication.
Acknowledgments
This work was supported by the John Templeton Foundation, the Japan Society for the Promotion of Science (H.O.), the National Science Foundation/National Institutes of Health joint program in mathematical biology (NIH Grant R01GM078986), and J. Epstein.
Footnotes
 ^{†}To whom correspondence should be addressed. Email: martin_nowak{at}harvard.edu

Author contributions: M.A.N. and H.O. wrote the paper.

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/cgi/content/full/0806714105/DCSupplemental.
 © 2008 by The National Academy of Sciences of the USA
References
 ↵
 ↵
 Miller SL,
 Orgel LE
 ↵
 ↵
 ↵
 Stein DL,
 Anderson PW
 ↵
 ↵
 ↵
 Fontana W,
 Buss LW
 ↵
 Fontana W,
 Buss LW
 ↵
 Dyson F
 ↵
 Miller SL
 ↵
 ↵
 Benner SA,
 Caraco MD,
 Thomson JM,
 Gaucher EA
 ↵
 Ricardo A,
 Carrigan MA,
 Olcott AN,
 Benner SA
 ↵
 ↵
 Joyce GF
 ↵
 ↵
 Bartel DP,
 Szostak JW
 ↵
 ↵
 ↵
 ↵
 ↵
 Johnston WK,
 Unrau PJ,
 Lawrence MS,
 Glasner ME,
 Bartel DP
 ↵
 ↵
 ↵
 Hanczyc MN,
 Fujikawa SM,
 Szostak JW
 ↵
 Chen IA,
 Roberts RW,
 Szostak JW
 ↵
 ↵
 Flory PJ
 ↵
 Szwarc M,
 van Beylen M
 ↵
 Nowak MA
 ↵
 Hofbauer J,
 Sigmund K
 ↵
 May RM