## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Predictability of evolution depends nonmonotonically on population size

Edited by Richard E. Lenski, Michigan State University, East Lansing, MI, and approved November 16, 2012 (received for review August 6, 2012)

## Abstract

To gauge the relative importance of contingency and determinism in evolution is a fundamental problem that continues to motivate much theoretical and empirical research. In recent evolution experiments with microbes, this question has been explored by monitoring the repeatability of adaptive changes in replicate populations. Here, we present the results of an extensive computational study of evolutionary predictability based on an experimentally measured eight-locus fitness landscape for the filamentous fungus *Aspergillus niger*. To quantify predictability, we define entropy measures on observed mutational trajectories and endpoints. In contrast to the common expectation of increasingly deterministic evolution in large populations, we find that these entropies display an initial decrease and a subsequent increase with population size *N*, governed, respectively, by the scales *Nμ* and *Nμ*^{2}, corresponding to the supply rates of single and double mutations, where *μ* denotes the mutation rate. The amplitude of this pattern is determined by *μ*. We show that these observations are generic by comparing our findings for the experimental fitness landscape to simulations on simple model landscapes.

Evolutionary adaptations arise from an intricate interplay of deterministic selective forces and random reproductive or mutational events, and the relative roles of these two types of influences on the outcome of evolution has been subject to long-standing controversy with significant philosophical implications (1, 2). Although the vision of “replaying the tape of life” on Earth or on some extrasolar planet remains confined to the realm of imagination (3, 4), evolution experiments with microbial populations have begun to address predictability of adaptation on a microevolutionary scale (5⇓⇓⇓–9). In particular, strong signatures of parallel evolution have been observed in the context of the evolution of antibiotic resistance in pathogens, a finding that is of direct relevance to strategies of drug design and deployment (10⇓⇓⇓–14). As lack of knowledge of crucial parameters (e.g., the frequency of beneficial mutations) in such experiments prevents forward predictions, predictability is used in a weaker, a posteriori sense implying repeatability of evolutionary trajectories in replicate populations. For this reason, the two terms will often be used interchangeably in the following (15).

The repeatability of adaptive trajectories is expected to depend on the genetic constraints imposed by epistatic interactions as well as on parameters such as population size *N*, mutation rate *μ*, and the typical scale *s* of selection coefficients (16⇓–18). To be specific, consider a population evolving in the regime of strong selection and weak mutation (SSWM), where mutations are so rare that normally not more than one mutant is present simultaneously and the population can be represented as a single entity that performs an adaptive walk in the space of genotypes (19⇓–21). Such walks are constrained to move uphill in fitness (strong selection, *Ns* ≫ 1) in single mutational steps (weak mutation, *Nμ* ≪ 1). As a consequence, a mutational pathway connecting two genotypes is selectively accessible in the SSWM regime only if fitness increases in each step (22). A number of recent studies of empirical fitness landscapes have shown that, in most cases, only a small fraction of possible adaptive pathways are accessible in this sense, which implies a dramatic enhancement of evolutionary predictability (10, 11, 15, 23⇓⇓⇓–27). Moreover, the statistical weights of different accessible trajectories often vary widely, further narrowing the range of possibilities to a small number of dominant evolutionary pathways (10, 11, 15). In the SSWM regime, the likelihood of a given trajectory can be quantified straightforwardly in terms of the product of the relative fixation probabilities for individual mutational steps (10).

With increasing *N*, the simultaneous presence of several mutant clones becomes likely and clonal interference sets in (28⇓⇓–31). Clonal interference introduces a bias favoring mutations of large effect (32, 33), thus bringing the dynamics closer to the “greedy” limit, in which the mutation of largest effect is fixed deterministically in each step (34, 35). Although this in itself tends to reduce the heterogeneity of evolutionary trajectories (12, 35⇓–37) and thus increases predictability, it is counteracted by the increasing availability of genotypes carrying multiple mutations. For sufficiently large populations, the crossing of small fitness valleys (which is completely suppressed in the SSWM limit) becomes relatively facile (38⇓–40), opening up a host of previously inaccessible pathways and leading to a greater degree of randomness in the dynamics. The resulting overall effect on evolutionary repeatability in large populations is hard to assess without detailed analysis, and is expected to depend significantly on the structure of the underlying fitness landscape.

The objective of this article is to explore how the predictability of evolutionary dynamics depends on population parameters, primarily population size and mutation rate, in the presence of realistic epistatic interactions. To this end, we performed extensive simulations of standard asexual population dynamics of Wright–Fisher type on an empirical eight-locus fitness landscape obtained experimentally for the asexual filamentous fungus *Aspergillus niger* (26). We provide two definitions of adaptive pathways, which can be applied across all evolutionary regimes of interest, and reduce to the familiar adaptive walks in the SSWM regime. Probabilities of pathways and endpoints are then accumulated in a large number of independent runs, and their repeatability is quantified through the entropies of these empirical probability distributions. As usual, high predictability is signaled by low values of the entropies.

Our central result is that the entropies of evolutionary trajectories and endpoints vary nonmonotonically with population size and mutation rate. The variation with population size is governed by the parameters *Nμ* and *Nμ*^{2}, which describe the supply of single and double mutants, respectively, and it becomes more pronounced with decreasing *μ*. Simulations on the empirical *A. niger* landscape are complemented by a study using a class of model landscapes with tunable roughness (26, 41), which display the same type of behavior.

## Results

### Path Types and Arrow Plots.

There are different ways of defining the path taken by an adapting population in a genotypic fitness landscape, which are generally not equivalent but may yield complementary information. Here, we focus primarily on lines of descent (LODs), which represent the lineages that arrive at the most populated genotype at the final time (see below for a precise definition). Similar definitions of paths have been introduced previously (see, e.g., refs. 42 and 43). In addition, we will make use of the information supplied by the paths defined as the time ordered sets of genotypes that at some time contain the largest subpopulation. We will call such a path the path of the maximum (POM). The POMs have been studied extensively in the context of deterministic mutation-selection models (44). Note that single steps in POMs, in contrast to those in LODs, can connect states that are separated by an arbitrary number of point mutations.

To gain a better understanding of the factors that determine the shape of the paths, we find it convenient to introduce arrow plots representing ensembles of paths realized up to time *T* (Fig. 1). Details of the construction are explained in the caption. Note that the choice of the final time *T* is, up to a certain point, arbitrary, as the population dynamics will never terminate completely. Here, we generally choose *T* such that the population has time to find at least some local fitness maximum. Only for the smallest population sizes, where the dynamics become very slow, do we observe trajectories that do not terminate at a local maximum.

The LODs are obtained as follows: After a fixed time *T*, the most populated state is determined. The last step of the path is then defined as the connection between this state and the state from which it arose for the last time by mutation. By “arose,” we mean that the target state was unpopulated before the mutation occurred. A given genotype may undergo several episodes of “colonization” and extinction that are stored by the algorithm, and the last episode before the colonization of the final state is used to construct the step. Subsequent steps of the path are constructed analogously, starting from the latest ancestor state determined, i.e., we search for the state from which the latest ancestor arose for the last time before giving rise to the next genotype. The protocol is repeated until the starting point of the simulations is reached.

Note that the paths generated in this way do not include all paths explored by the population, nor all of the paths that contribute to the production of the final state. Rather, they represent the “stepwise fastest” paths through which the final state may have been accessed. The assumption made here is that this path should normally be the one responsible for creating the first mutants on the final state. Once these mutants have been created, selection will dominate the further evolution. Thus, the supply of additional mutants through other paths should play a minor role. If there are several paths with similar probabilities for being the fastest in this sense, they should all show up when the numerical experiment is repeated many times.

The POMs, however, are constructed by keeping track of the most populated genotype at every generation (Fig. S1). Note that maxima do not need to move between adjacent genotypes but can jump to states at Hamming distance larger than unity. Such events have sometimes been referred to as leapfrog events (28) (for experimentally observed examples, see, e.g., refs. 45 and 46). We depict them by wavy lines. By comparing LODs and POMs, we can thus obtain information about whether fitness valleys have been crossed by sequential fixation or by “stochastic tunneling” (38, 39). In the latter case, the deleterious mutation is not fixed, but the population on the deleterious state survives long enough for a secondary mutant of higher fitness to arise. As we will see later, tunneling is negligible as long as *N* is small compared with a threshold scaling as 1/*μ*^{2}, but becomes dominant for larger *N*.

### Population Size Dependence of Typical Paths.

Although the main focus of this article is on the statistical analysis of repeatability, it is instructive to first elucidate the effects of population size on the shape of evolutionary trajectories by means of a few typical examples. For this purpose, we refer to Fig. 1, where ensembles of LODs are shown that start from one of the four viable states at Hamming distance *d* = 7 from the global optimum (GO) in the *A. niger* landscape (see *Materials and Methods* for details on the landscape, Fig. S1 for the corresponding representation in terms of POMs, and Figs. S2–S4 for LODs starting from different initial genotypes). The figure was generated with a mutation rate of *μ* = 10^{−5}, and the data were accumulated over 1,000 realizations of the process. Note that, in the *A. niger* landscape, the wild type is the fittest state (the GO) and that, on average, the mutants are less fit the more mutations they have incorporated (26, 47). This is why most observed steps run in the direction of decreasing *d*.

For the population parameters used in Fig. 1*A* and Fig. S1*A*, SSWM behavior is expected. The population is mainly monomorphic, i.e., only single mutants appear on the background of the presently dominant genotype and fix with probabilities that are proportional to their selection coefficient but independent of *N* (19⇓–21). Hence all steps that lead to fitter states are realized with comparable probabilities, provided their fitness values are not too different. At the same time, the fixation probability for deleterious mutations is exponentially small in *N* (48), making transitions to less fit states very unlikely (Fig. S5*B*). This leads to a large number of realized paths and endpoints and highly unpredictable dynamics.

As the population size increases, several nearest neighbor mutants of the currently most populated genotype are present simultaneously, leading to competition between different mutants. Fitter mutations will be more commonly selected. Thus, in this regime, the dynamics becomes greedier and more deterministic (17, 35). The corresponding pronounced increase in predictability results in a dramatic thinning of the graph of adaptive pathways when going from Fig. 1*A* and Fig. S1*A* (with *N* = 10^{2}) to Fig. 1*B* and Fig. S1*B* (with *N* = 1.5 ⋅ 10^{4}). Direct evidence for the increasingly greedy nature of the dynamics is provided in Fig. S5*A*, which shows how the fraction of mutational steps that go to the fittest neighbor grows from *f*_{gs} ≃ 0.63 for *N* = 2^{10} to *f*_{gs} ≃ 0.88 for *N* = 2^{20}.

As *N* is increased further, the number of first step mutants, including deleterious ones, becomes larger and therefore second step mutants are created more frequently. If these second step mutants are sufficiently fit, they can eventually take over the population, effectively tunneling through (38) (or leaping over) the intermediate state. Such events yield a mechanism for crossing fitness valleys that becomes increasingly important for increasing *N*, as can be verified in Fig. 1 and Figs. S1–S4 and S5*B*. As long as only a few second step mutants are produced by this mechanism, the dynamics becomes again less deterministic, as it depends sensitively on which mutants are randomly created and the number of possible second step mutations is enormous. Although some indication of this effect can be seen in the comparison between Fig. 1*C* and Fig. 1*D*, it is brought out more clearly by the quantitative analysis that we turn to next.

### Entropy Analysis.

When quantifying the degree of determinism of the evolutionary dynamics, it is important to distinguish between the repeatability of endpoints and of the paths taken, as well as between different types of paths. That this distinction matters is easily understood in the context of infinite population sizes. In that limit, the population always finds the GO and this optimal state always takes over the population. Thus, with respect to endpoints, the dynamics becomes totally deterministic. However, in the same limit, all possible paths (in the sense of LODs) to the GO will be taken, and the predictability of LODs should be low for very large *N*. In contrast, the most populated genotype follows a unique path (POM) in the infinite population limit (44).

To study the determinism of the dynamics on more quantitative grounds, it is convenient to define entropies with respect to the endpoints and the paths taken, respectively. The standard choice for the entropy function is , where the sum runs over all endpoints (paths) and *p*_{i} is the probability to observe a certain endpoint (path). The *p*_{i} values are approximated by the fraction of times an endpoint (path) was observed among replicate simulation runs. The entropy is more appropriate to quantify determinism than just counting the number of endpoints or paths observed as it includes information about how often each outcome occurs. Note that the findings to be presented in the following do not depend strongly on the specific choice of the entropy function. Largely equivalent results are obtained for similar observables such as the repeatability measure used in refs. 49 and 15.

It is important to notice that the observed ensemble of pathways generally depends strongly on the initial state, as is apparent when comparing the arrow plots in Fig. 1 to Figs. S2–S4 (the corresponding entropies are shown in Fig. S6 *A* and *B*). Although this effect is interesting in itself, here we focus on investigating how entropies behave on average when considering ensembles of equivalent starting points. To maximize the number of possible starting points on the *A. niger* landscape, we consider all paths that start at one of the 46 viable genotypes at Hamming distance *d* = 4 from the GO. To illustrate the role of the scales *Nμ* and *Nμ*^{2}, we calculate the entropies for a broad range of mutation rates.

In Fig. 2, we plot the average endpoint entropy for starting points at Hamming distance *d* = 4, 〈*S*_{e}〉_{d=4}, obtained in the following way: First, the entropy was determined separately for each starting point by carrying out 100 independent evolutionary runs up to time *T* = 2^{15}. Subsequently, the entropies were averaged over the different starting points, and the procedure was repeated for different values of *μ* and *N*. Apart from the case with the largest mutation rate (*μ* = 10^{−5}), one observes an initial decrease of the entropy followed by a subsequent rise with increasing *N* (Fig. 2*A*). This can be explained by means of the qualitative arguments given in the last section: The initial decrease of the entropy, i.e., increase of determinism of the dynamics, is due to the competition between single mutants causing the dynamics to become greedier, whereas the subsequent increase is a consequence of the increased appearance of double mutants. Fig. S5*A* shows that the fraction *f*_{gs} of greedy steps goes through a maximum around the same value of *N* at which the entropy is minimal.

The initial transition toward greedier dynamics depends on the production of nearest neighbor mutants, the supply rate of which is proportional to *Nμ*. In contrast, the subsequent increase of the entropy due to the appearance of double mutants is linked to their production rate ∼ *Nμ*^{2}. The separation between these two scales becomes more pronounced the smaller the mutation rate *μ*, and correspondingly the minimum value reached by the entropy decreases with decreasing *μ*, as is clearly seen in Fig. 2. To make apparent the importance of the scales *Nμ* and *Nμ*^{2}, in Fig. 2 *B* and *C* we plot the entropy as a function of *Nμ* and *Nμ*^{2}, respectively. The approximate collapse of the decreasing parts of the curves in Fig. 2*B* and of the increasing parts in Fig. 2*C* immediately affirms the roles played by the two scales. The lack of an increase of the entropy for the largest mutation rate considered here is most likely due to the lack of clear separation of the two scales.

We also calculated the averaged entropy with respect to the paths, 〈*S*_{p}〉_{d=4}, for the same ensemble of pathways (Fig. S7). Despite the expected distinct behaviors of the two quantities in the limit *N* → ∞, we find essentially the same *N* dependence as for 〈*S*_{e}〉_{d=4}. This reflects the fact that, even for the largest values of *N* and *μ* used here, populations are likely to get trapped at local fitness maxima. New paths that open when increasing *N* often lead to formerly unexplored local maxima, which implies that an increase of the number of explored paths is strongly correlated with the number of endpoints.

### Finding the Fittest State.

In the SSWM regime, adaptation proceeds through single mutational steps moving uphill in fitness, and the fixation probability is independent of *N* or *μ* (19⇓–21). As a consequence, the statistical weights of different evolutionary trajectories are also independent of *N*, and the SSWM regime should therefore appear as a plateau at small population sizes in the graphs showing pathway or endpoint entropies as a function of *N*. The fact that no such plateau is observed in Fig. 2 and Fig. S7 shows that clonal interference already plays an appreciable role in the considered range of parameters, and that smaller populations or mutation rates would be needed to fully realize the SSWM regime.

Other quantities, however, seem to be more robust with respect to a certain level of clonal interference. As an example, we show in Fig. 3 the probability *P*_{GO} for the largest subpopulation to end up on the GO. The figure shows *P*_{GO} as a function of *N* averaged over starting points at a given Hamming distance *d* from the GO. The averaging is necessary, as the probability strongly depends on the specific starting point (Fig. S6*C*). For each starting point, 100–1,000 runs were carried out over *T* = 2^{17} generations.

The probability of finding the fittest state has a plateau for small *N* that coincides rather well with the SSWM value indicated by the horizontal dashed lines. The deviations that are particularly pronounced for *d* = 4 and 5 are most likely due to valley crossings that happen with a low probability for small populations but are prohibited within the SSWM approximation. When *P*_{GO} is small, these valley crossings, albeit very rare, may open up additional mutational pathways that are not accessible to SSWM dynamics because they contain at least one fitness decreasing step, thus increasing *P*_{GO} over the SSWM value. With increasing *N*, clonal interference sets in, making the dynamics more greedy and thus more deterministic. As this implies a decrease in the number of different paths that are explored, the probability to find the GO decreases below the SSWM level. This effect is more pronounced the further away the starting point is from the GO, as the probability for the greedy dynamics to miss the GO by leading the population to a suboptimal local fitness maximum increases. Only when *N* is increased further to such large values that double mutants are regularly produced, the dynamics becomes again more stochastic, leading to a higher number of explored paths and thus to a higher *P*_{GO} exceeding the SSWM value. As for the entropy measures discussed previously, the variation of *P*_{GO} with population size is distinctly nonmonotonic.

Apart from the probability for finding the fittest state, it is also of interest to study through which mutational pathways this state is reached. Here, we are particularly interested in the role played by paths along which fitness increases monotonically [monotonically increasing pathways (MIPs)]. These are the only paths that are accessible to adaptation in the SSWM regime and have therefore been at the focus of much recent theoretical and empirical work on fitness landscapes (10, 11, 22⇓⇓⇓⇓–27). What we would like to clarify is whether (or when) such paths are actually the dominating ones when it comes to finding the fittest state, and to what extent they are realized by the dynamics.

To address these questions, we identified all MIPs that start at Hamming distance *d* = 4 from the GO, restricting ourselves to direct paths along which the distance to the GO decreases at every step. As a first measure, we computed the fraction of MIPs among all observed LODs that reach the GO, *f*_{MIP/LOD}. This quantity was averaged over 100 realizations from each starting point with Hamming distance 4 from the GO from which at least one such path exists (Fig. 4*A*). One finds that at small *N*, almost all successful paths are monotonically increasing in fitness. However, as *N* increases, *f*_{MIP/LOD} decreases rapidly, showing that the MIPs become increasingly less relevant for adaptation. The dashed line in Fig. 4*A* represents the ratio of the number of MIPs to the total number of direct paths, equal to *d*!, averaged over all starting points at *d* = 4. Values of *f*_{MIP/LOD} below this line indicate that MIPs are selected even less frequently than would be expected if all direct paths were equally likely.

Furthermore, we have measured the fraction *f*_{MIP} of MIPs that are actually observed within the 100 simulational runs (Fig. 4*B*). This quantity displays a nonmonotonic dependence on *N* that can be explained in similar terms as for the entropies. It should, however, be noticed that, even at small *N*, less than 70% of all existing MIPs are observed. Thus, the sheer existence of, in principle, easily accessible paths leading to high fitness genotypes does not guarantee that they are actually realized by the dynamics. This can also be concluded from the low values of *P*_{GO} observed in Fig. 3 for small *N*.

### Comparison with Model Landscapes.

To demonstrate that the results described so far are not caused by the idiosyncrasies of the specific empirical landscape used in this work, we carried out simulations on a family of random model landscapes tuned to reproduce the overall features of the *A. niger* fitness data set. The model we consider is a slight variation of the rough Mount Fuji (RMF) model (26, 27) originally introduced in ref. 41. Within the RMF model, the fitness value *w*_{i} of a genotype *i* is determined according to the following:

where *d*_{i} is the Hamming distance of *i* to a reference state whose fitness is set to 1 and will be the GO, *c*_{1} and *c*_{2} are constants, and *ξ*_{i} is a Gaussian random variable with mean zero and SD σ. The constants *c*_{1} and *c*_{2} were obtained from the *A. niger* fitness data as follows: First, we averaged over all fitness values at a given Hamming distance *d*_{i} ≥ 1 from the GO, including the states with zero fitness corresponding to nonviable genotypes (26). Then a straight line was fitted to the averaged values plotted against Hamming distance. The slope yields *c*_{1} ≈ 0.064, and for the intercept we obtain the estimate *c*_{2} ≈ 0.730. The variance of the fitness values yields the estimate *σ*^{2} ≈ 0.091 for the variance of the *ξ*_{i}. Only values *w*_{i} < 1 are accepted, to ensure that the GO is located at the reference genotype. In cases when Eq. **1** yields a negative value, the corresponding fitness is set to zero. In principle the nonviable genotypes in the landscape could be modeled explicitly, e.g., along the lines of ref. 26. However, we prefer to keep the model simple and do not include a separate treatment of nonviable states.

The observed qualitative features have been reproduced by simulations of the RMF model over a broad range of parameters, but here we focus on the specific “*A. niger*” parameter set described above, which optimally matches the empirical landscape (see Fig. S8 for results covering a broader range of parameters). In Fig. 5 and Fig. S9, we plot the quantities obtained from simulations of adaptation on this model landscape. We considered different starting points at Hamming distance *d* = 4 from the GO, and 100 independent runs from each of them were carried out. For each of the starting points, a new fitness landscape was created using Eq. **1**.

Fig. 5 and Fig. S9*A* show that the entropies obtained for the model landscapes display a similar nonmonotonic dependence on *N* and *μ* as the empirical landscape. Again, rescaling *N* by *μ* and *μ*^{2}, respectively, leads to an approximate data collapse of the decreasing and increasing parts of the entropy curves, respectively. Although comparison with Fig. 2 and Fig. S7 reveals that the values of the entropies and the positions of the respective minima are not quantitatively recovered by the model, the qualitative behavior is well reproduced.

Fig. S9*B* depicts a similar comparison for the probability to reach the state with the highest fitness. Because *P*_{GO} ≪ 1 in most cases, this quantity is strongly affected by rare events and displays massive fluctuations between different realizations of the RMF model landscape. Averaging over realizations is therefore not appropriate for the comparison with the *A. niger* landscape. Instead, in Fig. S9*B*, we display data obtained for individual landscape realizations, which show that the overall shape of the variation of *P*_{GO} with *N* is reproduced by the model. Moreover, the supplementary results in Fig. S8 show that the nonmonotonic variation of the entropy with population size persists whenever the fitness landscape is sufficiently rugged, and disappears only when the limiting case of a smooth, additive landscape is approached.

## Discussion

The repeatability of evolutionary trajectories in replicate populations is determined jointly by the distribution of the fitness effects of beneficial mutations, by their epistatic interactions, and by the rate at which they appear in the population. Whereas previous work has addressed primarily the first two determinants of evolutionary predictability (14, 18, 22, 26, 49, 50), here we focused on the effect of mutation supply mediated by the population size *N* and the mutation rate *μ*. By performing simulations on an experimentally measured fitness landscape, we ensured a realistic representation of the distribution of mutational effects and their epistatic interactions.

Our key observation is that, because of the distinct roles played by the supply rate of single (∼*Nμ*) and double (∼*Nμ*^{2}) mutations, evolutionary predictability as quantified by the entropy measures *S*_{e} and *S*_{p} varies nonmonotonically with population size. Simulation results for the RMF model suggest that this behavior is generic whenever the underlying fitness landscape is rugged with many local optima, as is often the case for empirically determined fitness landscapes (27, 51). Similar to earlier observations of an evolutionary advantage of small populations in complex fitness landscapes (35⇓–37), the phenomenon depends crucially on the clonal interference among beneficial mutations and cannot be captured within the commonly used SSWM approximation. This also implies that the restriction of evolutionary accessibility to pathways with monotonically increasing fitness (MIPs) assumed in a number of recent studies (10, 22, 24, 26) may be of limited relevance to adaptation.

Although the endpoint entropy *S*_{e} is easier to access experimentally than the path entropy *S*_{p}, at least partial information about adaptive pathways can be inferred from microbial evolution experiments (12, 13, 45). Parallel evolution has been observed on several occasions, but very few studies explicitly addressed the effect of population size on repeatability. Among those, one found an increase of genotypic diversity with increasing population size (45). Other experiments have addressed the population size dependence of phenotypic diversity on the level of fitness trajectories. One study using *Escherichia coli* found a pronounced reduction of the variability of fitness trajectories with increasing population size (36), but another study using *Aspergillus nidulans* found no effect (52). More experimental work under precisely controlled conditions is clearly needed to test the predictions of the present article.

## Materials and Methods

### Empirical Fitness Landscape.

The construction of the *A. niger* strains and the measurement of their fitness values has been explained in detail elsewhere (47). The fitness landscape consists of the wild-type strain and combinations of eight marker mutations: *fwnA1* (fawn-colored conidiospores), *argH12* (arginine deficiency), *pyrA5* (pyrimidine deficiency), *leuA1* (leucine deficiency), *pheA1* (phenyl-alanine deficiency), *lysD25* (leucine deficiency), *oliC2* (oligomycin resistance), and *crnB12* (chlorate resistance). Of the 2^{8} = 256 possible combinations, a total of 186 were found to be viable and assigned nonzero Wrightian fitness (26). Among these, there are four mutants with seven mutations each, i.e., incorporating all but one mutation. In the order in which the genotypes were presented in table S1 of ref. 26, these are genotypes 250 (all but *pyrA5*), 251 (all but *leuA1*), 252 (all but *pheA1*), and 253 (all but *lysD25*).

### Evolutionary Dynamics.

The simulations presented here were performed using standard Wright–Fisher dynamics according to the following algorithm:

*i*) Draw the number*n*_{μ}of mutation events in a generation from an exponential distribution with mean*λ*=*NLμ*, where*N*is the population size,*L*is the number of loci, and*μ*is the mutation rate.*ii*) The*n*_{μ}mutations are distributed among the present mutations with probabilities corresponding to their frequencies. The possibility of individuals accumulating several mutations in a single time step is neglected. Mutations at all loci are chosen with equal probability.*iii*) Selection is carried out in two steps. First, frequencies are evolved analytically according to , where*f*_{i}denotes the frequency of the*i*th state before selection, i.e., at generation*t*, the*w*_{i}denote the respective fitnesses, and denotes the mean fitness of the population.*iv*) Finally, the frequencies at time step*t*+ 1 are obtained by drawing*N*individuals from a multinomial distribution with probabilities .

## Acknowledgments

We thank J.-M. Park for useful discussions. This work was supported by Deutsche Forschungsgemeinschaft within Sonderforschungsbereich 680.

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. E-mail: krug{at}thp.uni-koeln.de.

Author contributions: I.G.S., J.A.G.M.d.V., and J.K. designed research; I.G.S., J.F., and J.K. performed research; I.G.S. and J.F. analyzed data; and I.G.S., J.A.G.M.d.V., and J.K. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1213613110/-/DCSupplemental.

## References

- ↵
- Beatty J

- ↵
- Conway Morris S

- ↵
- Gould SJ

- ↵
- Conway Morris S

- ↵
- Travisano M,
- Mongold JA,
- Bennett AF,
- Lenski RE

- ↵
- Cooper TF,
- Rozen DE,
- Lenski RE

*Escherichia coli*. Proc Natl Acad Sci USA 100(3):1072–1077. - ↵
- Blount ZD,
- Borland CZ,
- Lenski RE

*Escherichia coli*. Proc Natl Acad Sci USA 105(23):7899–7906. - ↵
- ↵
- Wichman HA,
- Brown CJ

- ↵
- Weinreich DM,
- Delaney NF,
- Depristo MA,
- Hartl DL

- ↵
- Lozovsky ER,
- et al.

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Carneiro M,
- Hartl DL

- ↵
- ↵
- ↵
- Szendro IG,
- Schenk MF,
- Franke J,
- Krug J,
- de Visser JAGM

- ↵
- ↵
- ↵
- Park SC,
- Krug J

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Rozen DE,
- Habets MGJL,
- Handel A,
- de Visser JAGM

*PLoS One*3:e1715. - ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Østman B,
- Hintze A,
- Adami C

- ↵
- ↵
- ↵
- Woods RJ,
- et al.

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵

## Citation Manager Formats

## Sign up for Article Alerts

## Article Classifications

- Biological Sciences
- Evolution