Prevolutionary dynamics and the origin of evolution
See allHide authors and affiliations
-
Communicated by Clifford H. Taubes, Harvard University, Cambridge, MA, July 14, 2008 (received for review May 31, 2008)

Abstract
Life is that which replicates and evolves. The origin of life is also the origin of evolution. A fundamental question is when do chemical kinetics become evolutionary dynamics? Here, we formulate a general mathematical theory for the origin of evolution. All known life on earth is based on biological polymers, which act as information carriers and catalysts. Therefore, any theory for the origin of life must address the emergence of such a system. We describe prelife as an alphabet of active monomers that form random polymers. Prelife is a generative system that can produce information. Prevolutionary dynamics have selection and mutation, but no replication. Life marches in with the ability of replication: Polymers act as templates for their own reproduction. Prelife is a scaffold that builds life. Yet, there is competition between life and prelife. There is a phase transition: If the effective replication rate exceeds a critical value, then life outcompetes prelife. Replication is not a prerequisite for selection, but instead, there can be selection for replication. Mutation leads to an error threshold between life and prelife.
The attempt to understand the origin of life has inspired much experimental and theoretical work over the years (1–10). Many of the basic building blocks of life can be produced by simple chemical reactions (11–15). RNA molecules can both store genetic information and act as enzymes (16–24). Fatty acids can self-assemble into vesicles that undergo spontaneous growth and division (25–28). The defining feature of biological systems is evolution. Biological organisms are products of evolutionary processes and capable of undergoing further evolution. Evolution needs a generative system that can produce unlimited information. Evolution needs populations of information carriers. Evolution needs mutation and selection. Normally, one thinks of these properties as being derivative of replication, but here, we formulate a generative chemistry (“prelife”) that is capable of selection and mutation before replication. We call the resulting process “prevolutionary dynamics.” Replication marks the transition from prevolutionary to evolutionary dynamics, from prelife to life.
Let us consider a prebiotic chemistry that produces activated monomers denoted by 0* and 1*. These chemicals can either become deactivated into 0 and 1 or attach to the end of binary strings. We assume, for simplicity, that all sequences grow in one direction. Thus, the following chemical reactions are possible:
Here i stands for any binary string (including the null element). These copolymerization reactions (29, 30) define a tree with infinitely many lineages. Each sequence is produced by a particular lineage that contains all of its precursors. In this way, we can define a prebiotic chemistry that can produce any binary string and thereby generate, in principle, unlimited information and diversity. We call such a system prelife and the associated dynamics prevolution (Fig. 1).
A binary soup and the tree of prelife. (A) Prebiotic chemistry produces activated monomers, 0* and 1*, which form random polymers. Activated monomers can become deactivated, 0* → 0 and 1* → 1 or attach to the end of strings, for example, 00 + 1* → 001. We assume that all strings grow only in one direction. Therefore, each string has one immediate precursor and two immediate followers. (B) In the tree of prelife, each sequence has exactly one production lineage. The arrows indicate all of the chemical reactions of prelife up to length n = 4.
Each sequence, i, has one precursor, i′, and two followers, i0 and i1. The parameter ai denotes the rate constant of the chemical reaction from i′ to i. At first, we assume that the active monomers are always at a steady state. Their concentrations are included in the rate constants, ai. All sequences decay at rate, d. The following system of infinitely many differential equations describes the deterministic dynamics of prelife:
The index, i, enumerates all binary strings of finite length, 0,1,00,…. The abundance of string i is given by xi and its time derivative by ẋi. For the precursors of 0 and 1, we set x0′ = x1′ = 1. If all rate constants are positive, then the system converges to a unique steady state, where (typically) longer strings are exponentially less common than shorter ones. Introducing the parameter bi = ai/(d + ai0 + ai1), we can write the equilibrium abundance of sequence i as xi = bi bi′ bi″… bσ. The product is over the entire lineage leading from the monomer, σ (= 0 or 1), to sequence i. The total population size converges to X = (a0 + a1)/d. The rate constants, ai, of the copolymerization process define the “prelife landscape.” We will now discuss three different prelife landscapes.
For “supersymmetric” prelife, we assume that a0 = a1 = α/2, and ai = a for all other i. Hence, all sequences grow at uniform rates. In this case, all sequences of length n have the same equilibrium abundance given by xn = [α/2a][a/(2a + d)]n. Thus, longer sequences are exponentially less common. The total equilibrium abundance of all strings is X = α/d. The average sequence length is n̄ = 1 + 2a/d.
Selection emerges in prelife, if different reactions occur at different rates. Consider a random prelife landscape, where a fraction p of reactions are fast (ai = 1 + s), whereas the remaining reactions are slow (ai = 1). Fig. 2A shows the equilibrium distribution of all sequences as a function of the selection intensity, s. For larger values of s, some sequences are selected (highly prevalent), whereas the others decline to very low abundance. The fraction of sequences that are selected out of all sequences of length n is given by (1 − p)2[1 − p(1 − p)]n−1. See supporting information (SI) for all detailed calculations.
Selection can occur in prelife without replication. The equilibrium abundances of all sequences of length 1 to 6 are shown as a function of the intensity of selection, s. There are 2n sequences of length n. (A) In a random prelife landscape, half of all reactions occur at rate 1 + s, the other half at rate 1. As s increases, a small subset of sequences is selected, whereas the others decline to very low abundance. (B) All reactions leading to the one “master sequence” of length 6 occur at rate b = 1 + s, all others at rate a = 1. As s increases, the master sequence is selected. Lineages that share sequences with the master sequence are suppressed, whereas other lineages are unaffected. Color code: black, gray, green, light blue, blue, and red for sequences of length 1 to 6, respectively. Other parameters: a0 = a1 = 1/2 and d = 1.
Another example of an asymmetric prelife landscape contains a “master sequence” of length n (Fig. 2B). All reactions that lead to that sequence have an increased rate b, while all other rates are a. The master sequence is more abundant than all other sequences of the same length. But the master sequence attains a significant fraction of the population (= is selected) only if b is much larger than a. The required value of b grows as a linear function of n. In this prelife landscape, we can also discuss the effect of “mutation.” The fast reactions leading to the master sequence might incorporate the wrong monomer with a certain probability, u, which then acts as a mutation rate in prelife. We find an error threshold: The master sequence can attain a significant fraction of the population, only if u is less than the inverse of the sequence length, 1/n.
Let us now assume that some sequences can act as a templates for replication. These replicators are not only formed from their precursor sequences in prelife but also from active monomers at a rate that is proportional to their own abundance. We obtain the following differential equation
As before, the index i enumerates all binary strings of finite length. The first part of the equation describes prelife (exactly as in Eq. 2). The second part represents the standard selection equation of evolutionary dynamics (28). The fitness of sequence i is given by fi. All sequences have a frequency-dependent death rate, which represents the average fitness, φ = Σifixi/Σixi and ensures that the total population size remains at a constant value.
The parameter r scales the relative rates of template-directed replication and template-independent sequence growth. These two processes are likely to have different kinetics. For example, their rates could depend differently on the availability of activated monomers. In this case, r could be an increasing function of the abundance of activated monomers. Template-directed replication requires double-strand separation. A common idea is that double-strand separation is caused by temperature oscillations, which means that r is affected by the frequency of those oscillations. The magnitude of r determines the relative importance of life versus prelife. For small r, the dynamics are dominated by prevolution. For large r, the dynamics are dominated by evolution.
Fig. 3 shows the competition between life (replication) and prelife. We assume a random prelife landscape where the ai values are taken from a uniform distribution between 0 and 1. All sequences of length n = 6 have the ability to replicate. Their relative fitness values, fi, are also taken from a uniform distribution on [0,1]. For small values of r, the equilibrium structure of prelife is unaffected by the presence of potential replicators; longer sequences are exponentially less frequent than shorter ones. There is a critical value of r, where a number of replicators increase in abundance. For large r, the fastest replicator dominates the population, whereas all other sequences converge to very low abundance. In this limit, we obtain the standard selection equation of evolutionary dynamics with competitive exclusion.
The competition between life and prelife results in selection for (or against) replication. The equilibrium abundances of all sequences of length 1 to 6 are shown versus the relative replication rate, r. We assume a random prelife landscape, where the reaction rates ai are taken from a uniform distribution on [0,1]. All sequences of length n = 6 can replicate. Their fitness values are also taken from a uniform distribution on [0,1]. For small values of r, prelife prevails. For large values of r, the fastest replicator dominates the population. As r increases, there is a phase transition at the critical value rc. The fitness of the fastest replicator is given by fi = 0.999, its extension rates are ai0 = 0.4418 ai1 = 0.1284. The death rate is d = 1. We have rc = (d + ai0 + ai1)/fi = 1.572, which is indicated by the broken vertical line and is in perfect agreement with the numerical simulation. The color code is the same as in Fig. 2.
Between prelife and life, there is a phase transition. The critical replication rate, rc, is given by the condition that the net reproductive rate of the replicators becomes positive. The net reproductive rate of replicator i can be defined as gi = r(fi − φ) − (d + ai0 + ai1). For r < rc, the abundance of replicators is low, and therefore, φ is negligibly small. In Fig. 3, we have d = 1 and ai0 + ai1 = 1 on average. For the fastest replicator, we expect fi ≈ 1. Thus, the phase transition should occur around rc ≈ 2, which is the case. Using the actual rate constants of the fastest replicator in our system, we obtain the value rc = 1.572, which is in perfect agreement with the exact numerical simulation (see broken vertical line in Fig. 3).
Replication can be subject to mistakes. With probability u, a wrong monomer is incorporated. In Fig. 4, we consider a “single-peak” fitness landscape: One seqence of length n can replicate. The probability of error-free replication is given by q = (1 − u)n. The net reproductive rate of the replicator is now given by gi = r(fiq − φ) − (d + ai0 + ai1). The replicator is selected if the replication accuracy, q, is greater than a certain value, given by q > (d + ai0 + ai1)/rfi. Thus, mutation leads to an error threshold for the emergence of life. Replication is selected only if the mutation rate, u, is less than a critical value that is proportional to the inverse of the sequence length, 1/n. This finding is reminiscent of classical quasispecies theory (3, 4), but there, the error threshold arises when different replicators compete (“within life”). Here, we observe an error threshold between life and prelife.
There is an error threshold between life and prelife. We assume a “single-peak” fitness landscape, where one sequence of length n = 20 can replicate, but no other sequence replicates. Replication is subject to mutation. The mutation rate, u, denotes the error probability per base. Error-free replication of the entire sequence occurs with probability q = (1 − u)n. We show all sequences that belong to the lineage of the replicator. The replicator is shown in red; shorter sequences are light blue, and longer ones dark blue. For small mutation rates, the replicator dominates the population, and the equilibrium structure is given by the mutation-selection balance of life. There is a critical error threshold. The theoretical prediction for this threshold, uc = 1 −[ (d + 2a)/r]1/n = 0.058, is illustrated by the vertical broken line and is in perfect agreement with the numerical simulation. For larger mutation rates, we obtain the normal prelife equilibrium: Longer sequences (including the replicator) are exponentially less common than shorter ones. Parameter values: a0 = 1/2, a = 1, d = 1; supersymmetric prelife; r = 10, f20 = 1.
Traditionally, one thinks of natural selection as choosing between different replicators. Natural selection arises if one type reproduces faster than another type, thereby changing the relative abundances of these two types in the population. Natural selection can lead to competitive exclusion or coexistence. In the present theory, however, we encounter natural selection before replication. Different information carriers compete for resources and thereby gain different abundances in the population. Natural selection occurs within prelife and between life and prelife. In our theory, natural selection is not a consequence of replication, but instead natural selection leads to replication. There is “selection for replication” if replicating sequences have a higher abundance than nonreplicating sequences of similar length. We observe that prelife selection is blunt: Typically small differences in growth rates result in small differences in abundance. Replication sharpens selection: Small differences in replication rates can lead to large differences in abundance.
We have proposed a mathematical theory for studying the origin of evolution. Our aim was to formulate the simplest possible population dynamics that can produce information and complexity. We began with a “binary soup” where activated monomers form random polymers (binary strings) of any length (Fig. 1). Selection emerges in prelife, if some sequences grow faster than others (Fig. 2). Replication marks the transition from prelife to life, from prevolution to evolution. Prelife allows a continuous origin of life. There is also competition between life and prelife. Life is selected over prelife only if the replication rate is greater than a certain threshold (Fig. 3). Mutation during replication leads to an error threshold between life and prelife. Life can emerge only if the mutation rate is less than a critical value that is proportional to the inverse of the sequence length (Fig. 4). All fundamental equations of evolutionary and ecological dynamics assume replication (31–33), but here, we have explored the dynamical properties of a system before replication and the emergence of replication.
Acknowledgments
This work was supported by the John Templeton Foundation, the Japan Society for the Promotion of Science (H.O.), the National Science Foundation/National Institutes of Health joint program in mathematical biology (NIH Grant R01GM078986), and J. Epstein.
Footnotes
- †To whom correspondence should be addressed. E-mail: martin_nowak{at}harvard.edu
-
Author contributions: M.A.N. and H.O. wrote the paper.
-
The authors declare no conflict of interest.
-
This article contains supporting information online at www.pnas.org/cgi/content/full/0806714105/DCSupplemental.
- © 2008 by The National Academy of Sciences of the USA
References
- ↵
- ↵
- Miller SL,
- Orgel LE
- ↵
- ↵
- ↵
- Stein DL,
- Anderson PW
- ↵
- ↵
- ↵
- Fontana W,
- Buss LW
- ↵
- Fontana W,
- Buss LW
- ↵
- Dyson F
- ↵
- Miller SL
- ↵
- ↵
- Benner SA,
- Caraco MD,
- Thomson JM,
- Gaucher EA
- ↵
- Ricardo A,
- Carrigan MA,
- Olcott AN,
- Benner SA
- ↵
- ↵
- Joyce GF
- ↵
- ↵
- Bartel DP,
- Szostak JW
- ↵
- ↵
- ↵
- ↵
- ↵
- Johnston WK,
- Unrau PJ,
- Lawrence MS,
- Glasner ME,
- Bartel DP
- ↵
- ↵
- ↵
- Hanczyc MN,
- Fujikawa SM,
- Szostak JW
- ↵
- Chen IA,
- Roberts RW,
- Szostak JW
- ↵
- ↵
- Flory PJ
- ↵
- Szwarc M,
- van Beylen M
- ↵
- Nowak MA
- ↵
- Hofbauer J,
- Sigmund K
- ↵
- May RM