Exact solution of the Eigen model with general fitness functions and degradation rates

  1. David B. Saakian*, and
  2. Chin-Kun Hu*,
  1. *Institute of Physics, Academia Sinica, Nankang, Taipei 11529, Taiwan; and
  2. Yerevan Physics Institute, Alikhanian Brothers Street 2, Yerevan 375036, Armenia
  1. Edited by H. Eugene Stanley, Boston University, Boston, MA, and approved January 18, 2006 (received for review June 13, 2005)

Abstract

We present an exact solution of Eigen’s quasispecies model with a general degradation rate and fitness functions, including a square root decrease of fitness with increasing Hamming distance from the wild type. The found behavior of the model with a degradation rate is analogous to a viral quasispecies under attack by the immune system of the host. Our exact solutions also revise the known results of neutral networks in quasispecies theory. To explain the existence of mutants with large Hamming distances from the wild type, we propose three different modifications of the Eigen model: mutation landscape, multiple adjacent mutations, and frequency-dependent fitness in which the steady-state solution shows a multicenter behavior.

Molecular models of biological evolution have attracted much attention in recent decades (115). Among them, Eigen’s concept of quasispecies plays a fundamental role (1, 2). It describes the evolution of a population consisting of a wild type accompanied by a large number of mutant types in sequence space by a large system of ordinary differential equations. The Eigen model has been found to describe quite well the evolution of viral populations (3) and has deeply changed our view of the process of evolution: adaptation does not wait for better adapted mutants to arise but starts with the selection of the better adapted mutants and then explores by mutation the surrounding sequence space for even better mutants. When the mutation rate surpasses an error threshold, the population gets genetically unstable, and it could be shown that indeed virus populations can be driven to extinction when the error rate is artificially raised beyond the error threshold.

To describe the population precisely, we should know the fitness value of each type and the mutation rates to go from one type to another. The experimental efforts to do so are immense. During the last three decades, the model has been investigated numerically as well as analytically for a simple fitness function. Although this sort of data reduction does allow a view on a large population, the fitness functions chosen are too simplistic to explain realistic cases such as a population of RNA virus. In this work, we solve the system of differential equations exactly, assuming uniform degradation rates and fitness functions including a square root decrease of fitness with increasing Hamming distance (HD) from the wild type. Our exact solutions also revise the known theoretical results of neutral networks in quasispecies theory (2). To explain biological systems more realistically (16), we propose three different modifications of the Eigen model: mutation landscape, multiple adjacent mutations, and frequency-dependent fitness in which the steady-state solution shows a multicenter behavior.

Model

Several excellent reviews (25) emphasize the merits of the quasispecies model for the interpretation of virological studies. Let us give a brief description of the quasispecies model as we use it in this work: a sequence type of length N is specified by a sequence of N spin values s k = ±1, 1 ≤ kN (1, 2). In reality the spins can take four values corresponding to the natural nucleotide types, but a two-value spin model already catches the essential features and can be studied more easily. Two-value spin models also have been used to study long-range correlations in DNA sequences (17) and DNA unzipping (18, 19), and valuable results have been obtained. A generalization of our results for the four-value spin case is presented in the Supporting Text, which is published as supporting information on the PNAS website. However, such results include more cumbersome formula, and from now on we will only consider the two-value spin model.

Let s j = +1 represent purines (R) and s j = −1 pyrimidines (Y). Type i is then specified by S i ≡ (s1 (i), s 2 (i), …, s N (i)). The model describes replicating molecules under control of variation and natural selection with Eq. 1, to be defined below. The rate coefficients of replication and mutation are assumed to be independent of the concentration of the types. The model describes the exponential growth phase of virus evolution, in which there are enough nutrients and low virus concentration. The multistep cross-catalytic reactions are replaced by an autocatalytic one. Here the evolution picture is rather simple, compared with the linear growth phase in the case of strong saturation effects.

Selection is on a genotype level: fitness is a function of S i. The variation is assumed to be produced only by point mutations. Eigen made a deterministic approach with kinetic rate equations that requires an infinite population, whereas classical population genetics uses probabilistic equations. We denote the probability for the appearance of S i at time t by p Sip i(t) and define fitness r i of S i as the average number of offsprings produced per unit time and degradation rate D i of S i as an inverse mean longevity. The chosen r i and D i are functions in genome sequence space S i, i.e., r i = f(S i) and D i = D(S i).

The mutation matrix element Q ij is the probability that an offspring produced by state j changes to state i, and the evolution is given by the set of equations for 2N probabilities p i (2, 6) Formula Here p i satisfies Σi=1 2N p i = 1 and Q ij = q Nd(i,j) (1 − q)d(i,j) with q being the mean nucleotide incorporation fidelity, and d(i, j) ≡ (N − Σl=1 N s l (i) s l (j))/2 being the HD between S i and S j (1, 2); d(i, j) represents the total number of different spin values in S i and S j.

In ref. 1 the concept of an error threshold is introduced, and the error threshold has been quantified by a formula. For the calculation of the error threshold, the selection values and mutation rates of all types would be required, which is still not feasible. For data reduction, several fitness functions (landscapes) have been considered (2). The simplest one has a single peak (2) in a flat landscape. Without loss of generality we set the peak to S 1 ≡ (1, 1, …, 1), i.e., the state with all spin up, and have Formula for which Eigen error threshold formula (equation II-45 in ref. 1) gives an exact result Formula for successful selection (p. 180 in ref. 2). The parameter γ = N(1 − q) describes the mutation efficiency. When the inequality is satisfied, a mutant distribution is built around the peak configuration in the steady state. Otherwise the distribution is flat in the infinite genome length limit, i.e., no sequence is preferred [error catastrophe (1, 2)].

We consider general fitness function f(S i) and degradation D(S i), which depend on the HD (the number of nucleotide differences between two sequences) from the peak configuration S 1. In statistical physics this case corresponds to mean-field-like interaction, which is exactly solvable (20). We can write f(S i) and D(S i) as (6) Formula where f 0(k) and d 0(k) with k = Σl s l (i)/N are polynomials. It is easy to show that k = 1 − 2d 1/N, where d 1 is the HD between S i and S 1. Following ref. 13, in Supporting Text we exactly reformulate the solution of the system (Eq. 1) as a problem of statistical mechanics of quantum spins with some non-Hermitian Hamiltonian H. Error threshold corresponds to singularity in the phase structure of Eq. 1. The same singularity exists also in the partition function Z = Tr exp[−Hβ] at the limit β → ∞; here β, the inverse temperature in statistical mechanics model, corresponds to the time in Eq. 1. In Supporting Text, we show that, when β → ∞, the dominant contributions to ln Z/β come from spin configurations with a particular k resulting in Formula The mean growth rate Σi(r iD i)p i is given as the maximum of Eq. 5 at −1 ≤ k ≤ 1, where nonzero k means a successful selection. We also show that for the steady-state distribution p i the surplus production s = Σi=1 2N p i Σl=1 N s l i/N satisfy the equation Formula where k gives the maximum of Eq. 5. The surplus s has a direct biological meaning about how the population is grouped around the peak configuration. The difference between s and k has been discussed carefully in ref. 12.

Error Thresholds

Now we use Eq. 5 to study error thresholds for the following cases.

Case 1: Single Peak Fitness Function Without Degradation.

Let us first disregard the degradation (d 0(k) = 0), and take f 0(k) = 1 + (A − 1)k p at the limit p → ∞. There is a simple nonselective “paramagnetic” (PM) phase with k = 0 and ln Z = β. At high values of A there is a selective “ferromagnetic” (FM) phase with k = 1 and ln Z = βAe −γ. The system chooses the phase with the highest Z, and the Eigen error threshold formula of Eq. 3 is obtained. Eigen introduced Ae −γ as a “selective value” of the peak (equation II-31 in ref. 1). In case of several isolated peaks the system chooses the one with the maximal selective value.

Case 2: Single Peak Fitness Function with Degradation.

Consider nonzero degradation d 0(k) = c + αk, where the positive number α > 0 is the degradation parameter, and the same fitness function as in case 1. Physically, d 0(k) should be always positive for any value of k and we take c > α. Because of symmetry of Eigen’s equations under the transformation D iD i + C with C being a constant, we can simply choose d 0(k) = αk, which will be used in the following discussion. Now we have the PM phase with k = 0 and ln Z/β = 1 ≡ W PM, the FM phase with k = 1 and Formula Besides these phases, we also have the antiselective (AS) phase, which can be located by finding the maximum of Formula in the interval −1 ≤ k ≤ 1. To find the maximum we should take the value of Eq. 8 either at the border with k = −1 or at the local maximum point. The derivative of W AS(k) with respect to k gives Formula The solution of Formula gives a negative value of k 0, which means that most spin values of the spin state with k = k 0 are different from those of the peak configuration S 1; hence, we call the spin state AS phase. At k = 0, W AS(k) of Eq. 8 coincides with W PM and WAS(k) = −α < 0; W AS(k) decreases in the interval [0, 1]. As k moves from 0 to the negative k 0 of Eq. 9, WAS(k) moves from −α to 0. Thus, W AS(k) monotonically decreases at the interval [k 0, 0], and it is larger than W PM at the internal [k 0, 0). In case of several solutions of Eq. 9, we should choose the one that gives the maximal W AS and compare such W AS with the border value e −γ + α at k = −1 to determine the maximal weight W AS M for the AS phase Formula There are two possible stable phases, AS and FM, in the model. For given α, A, and γ, we can compare W FM and W AS M to determine which phase dominates. The error threshold (phase boundary) is given by Formula

For a nonzero degradation rate, ref. 2 gives the formula: Ae −γ > 1 + D 0 − 〈D i〉 (see equations III.4 and III.5 in ref. 2), where D 0 is the degradation rate of the master type and 〈D i〉 is the average degradation rate of all other types. The majority of types groups around k = 0, therefore 〈D i〉 = 0 and the error threshold condition of ref. 2 is Formula This error threshold deviates from Eq. 11 because the AS phase was not considered in ref. 2. This AS phase can represent a virus quasispecies under attack by the immune system (21).

Case 3: Flat Fitness Functions.

During the last decade the importance of extended or flat fitness landscapes for the biological evolution (2229) has been recognized. We first consider a simple mesa-type mount in the fitness landscape Formula Formula where k is the overlap of the points on the mesa with the configuration with all up spins S 1; 1 − k 0 defines the broadness of the mesa. For the fitness function of Eq. 13 and zero degradation, again a PM solution with k = 0 is found. In the FM phase Eq. 5 takes a maximum value at the border k = k 0, and we derive an error threshold Formula If there are several mesa mounts with different A and k 0, the system chooses the one with the maximal left expression of Eq. 14, and we call this expression the selective value of the mesa. We see that increasing the extension of the mesa increases its selective ability.

Case 4: Fitness Decreases Slowly with the HD d from the Peak.

We now consider a landscape with a mount that has shallow slopes from the peak till d m Formula and f(d) = 1, for d > d m. We consider the case d mN. Eqs. 5 and 15 with d 0 = 0 and small (1 − k) ≡ 2d/N gives an optimization problem for Ae −γ[1 + (γ − a) Formula]. For γ < a the population groups closely around the master type, whereas for γ > α the distribution becomes wider. Therefore, such a simple mechanism could describe an extensive genomic change. As in ref. 30, the population landscape has a narrow steep peak at low mutation rates but a wide shallow hill at high mutation rates.

Neutral Selective Value

When flat fitness function of Eq. 13 exists only for a fraction ν (0 < ν < 1) of N spins, then fitness f 0(k 1, k 2) is a function of two group of spins with νN and (1 − ν)N spins, respectively Formula Formula where k 1 ≡ Σi=1 νN s i/(νN) corresponds to “flat fitness” spin group, and k 2 ≡ Σi=1+νN N s i/[(1 − ν)N] the remainder spins. In Section II of Supporting Text, for the partition Z = Tr exp[−βH] we obtain Formula where m is a transverse magnetization (related to quantum spin operator σx) for both groups of spins, and k 1, k 2 are longitudinal magnetization (related to quantum spin operator σz) for the first and second groups; h, x 1, and x 2 are Lagrange multipliers corresponding to m, k 1, and k 2, respectively. Setting the derivatives of ln Z/β with respect to h, x 1, and x 2 to 0, we obtain Formula with m = ν Formula + (1 − ν) Formula and −1 ≤ k 1 ≤ 1, − 1 ≤ k 2 ≤ 1. The maximum is at k 2 = 1 and k 1 = k 0. The error threshold condition is Formula The right-hand side is the selective value in the considered case. The neutral selective value has a factor ν Formula in the exponent, which is a product of two factors: the “width” ν and the “length” Formula.

In refs. 25 (equation 4) and 27 (equation 2), Ae −γ(1−ν) has been considered as a neutral selective value, and a similar expression has been given in ref. 29. They defined ν as the average number of neutral neighbors. If we consider small HD from the peak, N(1 − k 0) ≪ Formula, then the mean neutrality coincides with ν in our fitness landscape. We see that in the formula of refs. 25 and 27 the factor Formula (length of neutrality) is missed. To define neutral selective value, we should take exact copying probability q N plus the probability of all neutral mutations. We should differentiate neutrality for single mutation from neutrality for multiple ones; the latter makes major contribution for selective ability, which is the origin of small cutoff factor Formula. Even if there are neutral paths with a large HD (31) of length 40, their contributions to the mean fitness and error threshold might still be negligible due to small ν (width of neutrality).

Steady-State Distribution

Now we solve the steady-state distribution of the Eigen model with a single peak fitness and d 0(k) = 0. Because of the symmetry of the problem, the probabilities p i of Eq. 1 depend only on the HD from the peak configuration S 1 whose probability to appear is p 1. We denote the representatives of other N classes as configurations C 2, …, C (N+1), with corresponding probabilities p 2, …, p N+1. The total probability of the class C l is: p l = p l(l−1 N). We suggest an ansatz for p l: p 1 ∼ 1 and p l ∼ (γ/N)l−1. From Eq. 1 with dp i/dt = 0, we can derive an expression for p 1 and write p n+1 for n ≥ 1 in terms of p 1, …, p n: Formula Formula Eq. 20 implies L ≡ (A − Σj p j r j)/A = (1 − e −γ), which is a generalization of Haldane result for the mutational load (28, 32) L.

Let us consider the crude picture of the Eigen model’s steady state. There is a central cluster around the peak configuration (with maximal density). The majority of population is concentrated within the HD ∼ N(1 − s)/2, where the surplus s is defined above by Eq. 6. For the further classes the density decreases N times with every step. The case when fitness is a function of distances from several peaks, again, could be solved and is consistent with the described picture.

Mutants with a Large HD from the Wild Type

Rohde, Daum, and Biebricher (16) observed many different mutants at large HDs (up to 9) from the wild type. Eq. 21 implies that mutants with an HD d from the wild type will appear with a probability of the order of (γ/N)d; thus the data in ref. 16 could not be described by the original Eigen model in any way. Here we propose three possible modifications of the Eigen model to explain the experimental data.

The first way is to introduce a nontrivial mutation landscape (33), which perhaps is the case for retroviruses. In this case q j depends on the site and the mutation matrix Q ij = q j Nd(i,j) (1 − q j)d(i,j) is nonsymmetric. When q j is a function of the HD from the wild configuration, Eq. 5 is modified: γ → γ(k), where γ(k) = N(1 − q j) for the configuration S j having the overlap k with the wild one. Mutants at large HDs distances can appear, if the mutation rate for those genotypes is small enough to have a selective value A exp[−γ(k)] close to the wild one.

The second way is to introduce the frequency-dependent fitness (even in exponentially growth phase); such a proposal is supported by experimental data for RNA viruses (34). For example, in Eq. 1 with D j = 0 we replace r j by r j(1 − cp j), where c is a small coefficient, and have Formula We consider the case of fitness r 1 > r 2 > 1 and r 2 > r 1(1 − c), 1 > c > 0. For other configurations again r j = 1. Configuration S 2 is located in some HD d(1, 2) ≥ 1 from the peak configuration S 1.

When e −γ < Q 1, where Q 1 = 1/r 1, there is no selection in the system; the steady-state distribution is flat. When Q 1 < e −γ < Q 2, there is macroscopic distribution only around first configuration (Q 2 will be defined later by Eq. 24). When e −γ > Q 2, there are macroscopic probabilities ∼ 1 only at configurations S 1 and S 2 with p 1 = x and p 2 = y. For other configurations p i ∼ 1/N d, where d is the shortest HD of the given configuration from S 1 or S 2. Then we immediately derive a system of two equations for x and y Formula where Q = q N. The system is easy to solve. From the condition x > 0 and y > 0, we derive an equation for the threshold Q 2 Formula When x > 0 and y > 0, the distribution is a sum of two distributions, decreasing quickly with the HDs from the two peak configurations. We have a recurrence formula for the probabilities p n+1 = x n+1 of the first configuration’s neighbors at the HD n Formula where x 1 = x.

Let us consider an experimental mutant spectrum with RNA replicated by Qβ replicase (16). In this work, mutations in 86 nucleotides (N = 86) with mutation rate per replication γ ∼ 0.03 have been carefully studied. Figure 4 in ref. 16 shows a central cluster around the peak (the top one labeled by I24 at the left-hand side of the figure), as well as mutants at HD 1 ∼ 4 from the central cluster, having relative frequencies (compared with the peak one p 1) between 6/6 = 1.0 (for I28 and I20 with HD 1 and 2, respectively) and 1/6 = 0.166 … (for I1 and I4 with HD 4, I11 with HD 2, etc.). Now let us take I1 with HD = 4 as an example to compare the results from the traditional approach and the frequency-dependent fitness approach.

Extending the method to derive Eq. 21 to the present case with r 1 > r 2 > 1 (but p i still satisfy Eq. 1), we have an estimate for the p 5 with HD = 4 Formula with r 1/(r 1r 2) ≈ 10. Of course the last formula could be modified because of finite population effects, but no “rugged” fitness landscape could increase the effect by 1013 times (107, in case γ ∼ 1). The frequency-dependent Eigen model could predict p 5/p 1 with a correct order of magnitude as follows. Using r 1 = 10, r 2 = 9, Q = 0.851, and c = 0.19 in Eq. 23, we have x ≈ 0.67 and y ≈ 0.16; the latter is consistent with the relative frequency 1/6 for I1 mentioned above. We see that the population of the second peak y is independent of the HD d from the main peak, and y is finite at small mutations, whereas in the case of the Eigen model (c = 0) y disappears as (γ/N)d(1,2).

The third possible explanation could be multiple (duplet, triplet …) adjacent mutations (35). We can suggest a simple explanation by assuming triplet adjacent mutations (36). Assuming the existence (besides the simple point mutations) of triplet mutations with per genome per replication mutation probability γt ∼ 1, r 1/(r 1r 2) ≈ 10, and missing the nonlinearity, we can get a reasonable estimate Formula which is of the same order as p I112/p 1 with p I112 being the probability for the appearance of I112 in figure 4 in ref. 16 (I112 has adjacent triplet mutations).

Discussion

More than 30 years after the first analytical result by Eigen (1), we presented exact formulas (Eqs. 5 and 6) for the mean fitness and surplus production in Eigen’s quasispecies model, using simplified landscapes with fitness values and degradation rates that are functions of the HD from the master type (Eqs. 1 and 4). In addition to the selective and nonselective phases, an AS phase is described at higher degradation rates. Depending on the landscape chosen, the threshold condition reported in ref. 2 had to be modified. The Eigen model immediately produces the notion of selective value (vs. fitness value on Wright’s fitness landscape in zero mutation case); the evolving system is choosing the peak with maximal selective value. For an isolated peak, the selective value is the product of fitness and copying fidelity (ref. 1). We derived a selective value for a general case (Eq. 5), including the neutral phenomenon (mesa with neutral spins) Eq. 19. The fitness function, decreasing from the master type with the square root of the HD, plays a special role. Such fitness allows a radical rearrangement of population. The model with nonzero degradation maybe realistic for the host–parasite interaction (21) and could be useful for the problems of immunology as well (37).

We have proposed three possible modifications of Eigen’s quasispecies model to describe the existence of mutants at large HDs from the wild type. All three mechanisms exist in experimental data (16, 33, 34). From the theoretical point of view the adjacent site mutations are especially interesting, because they can give rapid relaxation in changing environments in realistic case of finite population. Consider the situation, when the adjacent triplet mutations have the same rate as the single-site one, and new wild configuration in changing environments at the HD d from the current configuration could be achieved by triplet mutations. Let us denote by γ1 and γ3 the probabilities of single and triplet mutations per genome per replication. In ref. 38 it has been calculated that finite population relaxation period (fitness barrier crossing time) in case of single mutations scales as t 1 ∼ (N1)d. A similar estimate for triplet mutations gives t 3 ∼ (N3)d/3. If a system needs 1 million replication cycles (t 1) to reach the steady state in case of only single-point mutations; a similar system needs only ≈100 replication cycles (t 3) to reach approximately the same steady state through adjacent site triplet mutations. Thus, the latter can relax more quickly.

Although we propose to modify the quasispecies model to explain the experimental data, we agree with the most fundamental idea of the quasispecies theory: the whole collection of the viruses (quasispecies) acts as a target of selection and mutation. Further work should deal with more realistic or interesting situations, such as finite population problems (39, 40), random fitness landscapes, and the role of neutral networks in evolution.

Acknowledgments

We thank C. Biebricher, M. Deem, and H. Wagner for useful discussions. This work was supported by National Science Council of the Republic of China (Taiwan) Grants NSC 93-2112-M-001-027, 94-2811-M-001-014, and NSC 94-2119-M-002-001; Academia Sinica (Taiwan) Grant AS-92-TP-A09; and U.S. Civilian Research and Development Foundation Grant ARP2-2647-Ye-05.

Footnotes

  • To whom correspondence should be sent at the ∗ address. E-mail: huck{at}phys.sinica.edu.tw
  • Author contributions: C.-K.H. and D.B.S. designed research, performed research, and wrote the paper.

  • Conflict of interest statement: No conflicts declared.

  • This paper was submitted directly (Track II) to the PNAS office.

  • Abbreviations:
    AS,
    antiselective;
    FM,
    ferromagnetic;
    HD,
    Hamming distance;
    PM,
    paramagnetic.

References