New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Competition between recombination and epistasis can cause a transition from allele to genotype selection

Edited by Curtis G. Callan, Jr., Princeton University, Princeton, NJ, and approved February 27, 2009 (received for review December 14, 2008)
Abstract
Biochemical and regulatory interactions central to biological networks are expected to cause extensive genetic interactions or epistasis affecting the heritability of complex traits and the distribution of genotypes in populations. However, the inference of epistasis from the observed phenotype–genotype correlation is impeded by statistical difficulties, while the theoretical understanding of the effects of epistasis remains limited, in turn limiting our ability to interpret data. Of particular interest is the biologically relevant situation of numerous interacting genetic loci with small individual contributions to fitness. Here, we present a computational model of selection dynamics involving many epistatic loci in a recombining population. We demonstrate that a large number of polymorphic interacting loci can, despite frequent recombination, exhibit cooperative behavior that locks alleles into favorable genotypes leading to a population consisting of a set of competing clones. When the recombination rate exceeds a certain critical value that depends on the strength of epistasis, this “genotype selection” regime disappears in an abrupt transition, giving way to “allele selection”—the regime where different loci are only weakly correlated as expected in sexually reproducing populations. We show that large populations attain highest fitness at a recombination rate just below critical. Clustering of interacting sets of genes on a chromosome leads to the emergence of an intermediate regime, where blocks of cooperating alleles lock into genetic modules. These haplotype blocks disappear in a second transition to pure allele selection. Our results demonstrate that the collective effect of many weak epistatic interactions can have dramatic effects on the population structure.
Selection acting on genetic polymorphisms in populations is a major force of evolution (1–4) and it is possible to identify specific loci under positive selection, e.g., the Adh locus in Drosophila (4). However, the attribution of fitness differentials to specific allelic variants and combinations remains a great challenge (5). Efforts to correlate quantitative phenotypes with genetic polymorphisms typically identify a small number of loci with a significant contribution to the observed phenotypic variance, but leave much of the variance unaccounted for (6). This unaccounted variance is believed to arise from a large number of loci with small individual contributions, or be due to epistasis and quite likely involves both effects. New studies accumulate evidence that epistasis is widespread and accounts for a significant fraction of phenotypic variation, e.g., in yeast (7–9). Additional evidence for epistasis comes from crosses of mildly diverged strains, where the recombinant progeny often has reduced average fitness, i.e., displays outbreeding depression. The reduction in fitness is attributed to the breakdown of favorable combinations of alleles in the ancestral strains (10). Outbreeding depression is often observed in partly selfing organisms such as Caenorhabditis elegans (11) or plants (12), species with strong geographic isolation such copepod (13) or facultatively mating organisms such as yeast (14). Although most recombinant genotypes are less fit, novel genotypes that perform better than either parental strain can be generated as well (15). Such outcrossing events could play an important role in evolution.
Competition between epistatic selection and recombination, explicit in the outbreeding depression phenomenon, is the focus of the present study. In the presence of epistasis, selection, by increasing the frequency of favorable genotypes, establishes correlations between alleles at different loci. Recombination however reshuffles alleles and randomizes genotypes breaking up coadapted loci. Because the recombination rate between any 2 loci is largely determined by their physical distance on the chromosome, the effect of genetic interactions depends on gene location. It is known that functionally related genes tend to cluster (16, 17), suggesting selection on gene order. Furthermore, chromosomes have regions of infrequent recombination, interspersed with recombination hotspots (18). Does selection have a hand in defining low recombination regions? To understand how evolution shaped genomes as we observe them today, we have to tackle the problem of how selection acts on many interacting polymorphisms for a large range of recombination rates (19).
Standing variation harbored in natural population provides important raw material for selection to act upon, in particular after a sudden change in environments or hybridization events (20). In such a situation, selection will reduce genetic variation until a new mutationselection equilibrium is reached. Here, we show that the selection dynamics on standing variation at a large number of loci can be strongly affected by epistasis, even if the individual contribution of each locus is small. The competition between selection on epistasis and recombination gives rise to 2 distinct regimes at high and low recombination rates separated by a sharp transition. The population dynamics in the two regimes is illustrated in Fig. 1 A and B: (i) the “clonal competition” (CC) regime, which occurs for recombination rates r < r_{c} and (ii) the quasi linkage equilibrium (QLE) regime for r > r_{c}. The different nature of the two regimes is best understood by considering the limiting cases of no and frequent recombination. In the case of purely asexual reproduction, selection operates on entire genotypes and results in clonal expansion of the fitter ones. The genetic variation present in the initial population is lost on a timescale inversely proportional to the average magnitude of fitness differentials between genotypes present in the population. Successful genotypes persist in time, which is apparent as continuous broad stripes of one color in Fig. 1A. The amplification of a small number of fit genotypes induces strong correlations or linkage disequilibrium among loci. In presence of epistasis, a little recombination does not change this picture qualitatively, because most recombinant genotypes are less fit than the prevailing clones and novel successful clones are rare. Nevertheless recombination is very important because it continuously introduces new genotypes leading to an increase in fitness attained by the population at long times. In the limit of high recombination genotypes are shortlived and essentially unique, resulting in a “pointillist” color pattern in Fig. 1B. Each allelic variant is therefore selected on the basis of its effect on fitness, averaged over many possible genetic backgrounds. The time scale on which allele frequencies change is given by the inverse of these marginal fitness effects. The term “linkage equilibrium” in QLE refers to the negligible correlations between loci, which are constantly reshuffled by recombination.
As we show below, the transition between the two regimes sharpens as the number of segregating loci L increases. The sharpening of the transition is related to the different scaling of the time scale of selection in the two regimes. For large L, the marginal fitness effects of individual loci become small compared with fitness differentials among individuals (assuming they are all of similar size, this ratio decreases as ∼1/
To underscore the general nature of the results, have studied 2 different models of epistasis. The first model follows the common treatment of epistasis in quantitative traits, which assumes that the epistatic contribution to fitness is disrupted when the parental genes are mixed in sexual reproduction (25, 26). This assumption becomes exact when the epistatic component of fitness of a specific genotype is a random number (which depends on the genotype, but is fixed in time) and we call this model the random epistasis (RE) model. Within the RE model, any change in the genotype randomizes the epistatic component of fitness so that the latter is not heritable when nonidentical parents mate. It is, however, faithfully passed on to the offspring in asexual reproduction. For the RE model, genomes are propagated asexually with probability 1 − r and with probability r are a product of mating where all genes are reassorted, as would be exactly correct if all genes were on different chromosomes. This model of facultative mating approximates reproductive strategies common in fungi (e.g., yeast) or nematodes and plants. As a more realistic alternative, we also study a model with only pairwise interactions between loci (27). This pairwise epistasis (PE) model allows epistatic contribution to be partly heritable, because interacting pairs have a chance to be inherited together (28). For the PE model, we assume that all genes are arranged on a single chromosome with a uniform crossover rate ρ, which allows us to explore haplotype block formation and implications for recombination rate evolution.
The strength of selection is determined by the variance σ^{2} of the distribution of fitness in the population. Within our models, the fitness F(g) of a genotype g is the sum of an additive component A(g) representing independent contributions of alleles and an epistatic part E(g). For the RE model, the latter is a random number drawn from Gaussian distribution, whereas for the PE model it is a sum of pairwise interactions with random coefficients f_{ij}. The variances V_{A} and V_{I} of the distributions of A(g) and E(g) add up to σ^{2} and their relative magnitude determines the importance of additive effects compared with epistasis. The two different models and their parameters are given explicitly in Methods. For the sake of simplicity, we assume haploid genomes. Random and pairwise epistasis represent 2 opposite extremes in the complexity of epistasis. Although the pairwise model is more realistic, the generic behavior is most clearly demonstrated using the RE model with random gene reassortment and facultative mating.
Results
Two Regimes of Selection Dynamics.
We performed extensive computer simulation of our two models for different relative strength of epistasis, L = 25–200 loci and populations sizes between N = 500 and 10^{6}. We initialize simulations in a genetically diverse state as would result from multiple crossings of 2 diverged strains and examine the evolution under selection and recombination. The two regimes differ strongly in the amount of linkage disequilibrium (LD) (see Methods) build up by selection. Fig. 2A shows the average LD per locus pair for the RE model as a function of the outcrossing rate r. For r < r_{c}, the LD per locus pair is of order 1 and independent of L or N, indicating genomewide LD. LD builds up despite a large number of different genotypes in the population interbreeding constantly. For r > r_{c}, the LD is much smaller, with the observed value determined by the sampling noise due to the finite population size (see Fig. 2A Inset and Fig. S1). Similar behavior occurs in the PE model, as shown in Fig. 2B. Above a critical recombination rate ρ_{c}, the observed linkage disequilibrium is time independent and well described by the QLE approximation (21, 22) (straight line) (see SI Appendix). The QLE approximation (in the high ρ/σ limit) predicts LD to be proportional to the strength of pairwise epistasis. Below ρ_{c}, the observed LD is dramatically larger than the QLE expectation. Here, recombination is sufficiently infrequent such that genotypes with a synergistic alleles are amplified faster than they are taken apart by recombination, see below. As a result, the few fittest genotypes grow exponentially in number, leading to the strong correlation in the occurrence of cooperating alleles, independent of physical linkage (i.e., proximity on the chromosome). This extensive LD leads to a complete failure when extrapolating results valid in the high recombination regime across the transition. The relevant quantity that determines whether fit genotypes can be maintained is the probability that no crossover occurs, which is given by e^{−ρL}. Hence, ρ_{c} is inversely proportional to L.
SelfConsistency Condition for QLE.
The fitness of a genotype can be decomposed as F = A + E, where A is the heritable additive part and E is the nonheritable epistatic part. As a coarsegrained descriptor of the population, we consider the joint distribution P(A, E; t) of the fitness components. In the QLE state, P(A, E; t) evolves approximately as The first term accounts for the exponential growth of genotypes with fitness advantage F − F̄ and the loss due to recombination at rate r. The second term accounts for the production of genotypes through recombination. To a good approximation, the distribution of A among recombinant offspring is identical to that among the parents ϑ(A) = ∫ dE P(A, E), which in turn is approximately Gaussian (29). The distribution of E among recombinant offspring is independent of the parents and a random sample from the distribution of epistatic fitness ρ(E), which in our models is a zerocentered Gaussian. The latter is exactly true for the RE model and holds approximately for the PE model, where the correlation of E between ancestor and offspring halves every generation (28).
Eq. 1 admits the factorized solution P(A, E; t) = ϑ(A; t)ω(E) with ∂_{t} ϑ(A; t) = (A−Ā)ϑ(A; t) and a timeindependent distribution of E where Ē is determined by the condition that ω(E) has to be normalized. This solution exists only if E < r + Ē for all genotypes; otherwise, fit genotypes escape recombination and grow as clones. These two scenarios are illustrated in Fig. 3.
The normalization condition can be fulfilled only if r is larger than some r_{c}. Note that ρ(E) has to go to 0 faster than linear for r_{c} to exist. The value of r_{c} is proportional to the maximal E and hence proportional to the strength of epistasis
The breakdown of the QLE state has some similarity to the errorthreshold transition of a quasispecies model (30) in a rugged fitness landscape (31): Recombination of epistatic loci acts as deleterious mutations and prevents the emergence of quasispecies or clones (32, 33) for r > r_{c}.
Maintenance of Genetic Diversity.
The transition between the two regimes leaves its imprint in virtually every quantity of interest in population genetics. For instance, the characteristic time for the decay of genetic diversity, τ (which we quantify via allele entropy, see Methods) scales differently with L in the two regimes, as shown in Fig. 4A. At low outcrossing rates, τ depends only on the total variance in fitness and neither on the number of loci nor the relative strength of additive contributions. This is consistent with the notion that in the CC regime genotypes are the units on which selection acts. With more frequent outcrossing, τ tends to be larger for weak additive contributions and large L. Beyond a certain outcrossing rate r_{c}, τ becomes independent of r attaining a value inversely proportional to the additive contribution of the individual loci independent of V_{I} (black diamonds in Fig. 3A). This observation confirms our assertion that for r > r_{c}, outcrossing decouples the loci and that the allele frequencies evolve independently under the action of the additive component of fitness. Given an additive variance V_{A}, the typical single locus fitness differential is f ∼
The properties of the genotype that will eventually fixate in the population depend on the regime in which it was obtained. We find, that the fitness of this fixated genotype depends nonmonotonically on the outcrossing rate and peaks just below the transition, see Fig. 4B. This can be understood as follows. Without recombination, the final state can be no fitter than the fittest genotype initially present. With some recombination, the population explores a greater number genotypes, potentially finding ones with higher fitness so that the fitness of final state increases with r in the CC regime. A similar benefit of infrequent recombination due to exploration of genotype space has been studied in the context of virus evolution for additive fitness functions (35). As genotype selection gives way to allele selection, different loci decouple and the epistatic contribution to fitness is missed, leading to possible fixation of less fit genotypes and a sharp drop of the final fitness as r approaches r_{c}. The dependence of the final fitness on the population size N highlight the distinct properties the dynamics in the two regimes: In the QLE regime, the final fitness is virtually identical for different N. This is a consequence of the fact that the relevant dynamical variables are allele frequencies, which are well sampled by (N) individuals. Fluctuations of the allele frequencies are therefore negligible and the dynamics is essentially deterministic. This is different in the CC regime, where the dynamics is driven by the generation of rare, exceptionally fit genotypes. The rate, at which genotypes are generated is proportional to the N, resulting in a pronounced dependence on the population size. QLE ceases to be deterministic once the marginal fitness effects become comparable to inverse population size and random genetic drifts overwhelms selection (see Fig. S3).
Selection on Genetic Modules.
So far, we assumed that each pair of loci is equally likely to interact epistatically, regardless of their physical distance on the chromosome. However, there is evidence that the order of genes along the chromosome is far from random and that related genes tend to cluster (16, 17). To emulate such a situation we use the PE model and construct an interaction matrix f_{ij} where arbitrary pairs interact with a small probability while clusters of neighboring genes interact with a high probability (see Methods). For such a hierarchical epistatic structure, we observe, as a function of increasing crossover rate ρ, a sequence of 2 transitions that define, sandwiched between CC and QLE, an intermediate Modular Selection (MS) regime, where the genomewide LD characteristic of the CC regime has broken down to a set of modular blocks that are in quasi linkage equilibrium with each other. The resulting linkage disequilibrium patterns are shown in Fig. 5. The observed block structure of LD in the MS regime resembles haplotype blocks (18, 19), which are normally associated with regions of little recombination flanked by recombination hotspots. Indeed, the cumulative recombination history of the chromosomes in the population show a very heterogenous recombination distribution, as shown in Fig. 5D. However, here the origin of these blocks is not intrinsically low recombination (i.e., physical linkage) but the collective effect of epistatic selection: The surviving individuals have recombined more often in regions of low epistasis than in regions of high epistasis, even though the attempted crossovers are uniformly distributed along the chromosome. Clusters of epistatic interaction can therefore exert selective pressure to lower recombination within the cluster. This lack of recombinant survival has been observed in experiments with mice (36), where inbreeding results in strong selective pressure on localized clusters of genes generating blocks with high LD and reduced effective recombination.
Conclusion
We have shown that the competition of epistatic selection and recombination can give rise to distinct regimes of population dynamics, separated by a transition that becomes sharp for large number of interacting loci. The QLE and CC regimes are realizations of the opposing views on evolution of R. A. Fisher and S. Wright. For r > r_{c} alleles are selected for the their additive contributions while selection acts on whole genotypes for r < r_{c}. The fundamental differences between these two regimes show up most clearly in the different scaling properties of the total LD and the decay time of genetic diversity. In the low recombination regime, LD is produced independent of physical linkage by the collective effect of many interactions. In the high recombination regime, LD can be attributed to specific interactions between pairs of loci and its value, determined by the ratio of the interaction strength and the rate of recombination between the loci, is small. Our results not only apply to the transition between genotype and allele selection, but also to localized clusters of interacting genes on the chromosome. Whenever the epistatic fitness difference between different allelic compositions of a cluster exceeds the recombination rate of the cluster, the fittest will amplify exponentially. Because such clusters are often small (36) (one to a few Mb) their recombination rates are low (in the centimorgan range)—hence fitness differentials around 1% can suffice to establish CC dynamics. Selective pressure to reduce recombination load, i.e., the fitness loss through recombination, will therefore favor the evolution of clusters of interacting genes and might be an important driving force for the evolution of recombination rate (37, 38). The effects described above may provide an explanation for the functional clustering associated with low and high LD regions reported in HapMap (18).
Methods
Random Epistasis Model.
A genotype g is described by L binary variables s_{i} = ± 1, i = 1, …, L. To each genotype we assign a fitness
The first term is the sum of the additive fitness contributions of the individual loci, each of which has equal magnitude f =
Pairwise Epistasis Model.
Here, we consider epistasis due to pairwise interactions between the different loci. Such pairwise interactions correspond to s_{i}s_{j} terms in the fitness function. The fitness of a particular genotype g is determined by the independent effects of the individual loci and the sum of the interactions between all pairs. When assuming uniform epistasis between all possible pairs, we draw the interaction strength f_{ij} from a Gaussian distribution with 0 mean and variance
Clustered Epistasis.
To mimic localized clusters of strongly interacting genes on a weakly interacting background, we constructed the matrix of f_{ij}'s as follows. The sparse background epistasis was modeled by assigning each f_{ij} a Gaussian random number with probability P = 0.1 and 0 otherwise. Then we built 3 epistatic clusters with centers c_{k} = 10, 50, 90 by adding a Gaussian random number to each f_{ij} with probability with r = 10 for k = 1, 2, 3. All f_{ij} were rescaled such that Σ_{i < j}f _{ij}^{2} = V_{I}.
Selection.
Our model assumes nonoverlapping generations. In each generation a pool of gametes is produced, to which each individual contributes a number of copies of its genome, which is drawn from a Poisson distribution with parameter exp(F(g) − F̄).
Gene Reassortment.
To model gene reassortment in a facultatively mating population, 2 gametes are chosen with probability r and a new genotype is formed by assigning each locus the allele of one or the other parent at random. Otherwise, the new genotype is an exact copy of 1 gamete.
CrossOvers.
Given a crossover rate ρ per locus, the number of crossovers is drawn from a Poisson distribution with parameter (L − 1)ρ and the crossover locations are chosen at random. When the number of crossovers is 0, the offspring inherits the entire genome from 1 parent. To model circular chromosomes, the number of crossovers is multiplied by 2 enforcing an even number of crossovers.
Measuring Genetic Diversity.
The allele entropy is a convenient descriptor of genetic diversity that is readily calculated from the evolving population. It is defined as S_{A} = −Σ_{i} [ν_{i}ln ν_{i}+(1 − ν_{i})ln (1 − ν_{i})], where ν_{i} is the allele frequency at locus i.
Measuring Linkage Disequilibrium.
LD is the deviation of the frequency of a pair of alleles from the random expectation on the basis of the individual allele frequencies, i.e., D_{ij} = 〈s_{i}s_{j}〉 − 〈s_{i}〉〈s_{j}〉. Kimura (21) showed that in QLE is time independent despite changing allele frequencies ν_{i} and ν_{j} (ν̄_{i} = 1 − ν_{i}). To measure genome wide LD, we calculate the sum of all squared LD terms Σ_{i < j}ψ_{ij}^{2}. Pairs with ν_{i} or ν_{j} <0.01 or >0.99 were omitted. A different normalization is used in Fig. 5, where is shown (see ref. 19 for a recent review).
Acknowledgments
We thank Michael Elowitz and MarieAnne Felix for comments on the manuscript and acknowledge financial support from National Science Foundation Grant PHY05–51164.
Footnotes
 ^{1}To whom correspondence should be addressed. Email: shraiman{at}kitp.ucsb.edu

Author contributions: R.A.N. and B.I.S. designed research, performed research, and wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/cgi/content/full/0812560106/DCSupplemental.
References
 ↵
 Begun DJ,
 et al.
 ↵
 Desai MM,
 Fisher DS
 ↵
 Gerrish PJ,
 Lenski RE
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Dobzhansky T
 ↵
 ↵
 ↵
 Edmands S
 ↵
 ↵
 Wright S
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Kimura M
 ↵
 ↵
 ↵
 Franklin I,
 Lewontin RC
 ↵
 Falconer DS,
 Mackay TFC
 ↵
 Lynch M,
 Walsh B
 ↵
 ↵
 Bulmer MG
 ↵
 ↵
 ↵
 ↵
 ↵
 Park J.M,
 Deem MW
 ↵
 ↵
 Rouzine IM,
 Coffin JM
 ↵
 Petkov PM,
 et al.
 ↵
 Barton NH,
 Otto SP
 ↵
 Nei M
Citation Manager Formats
Sign up for Article Alerts
Jump to section
You May Also be Interested in
More Articles of This Classification
Physical Sciences
Physics
Biological Sciences
Related Content
 No related articles found.
Cited by...
 A rigorous measure of genomewide genetic shuffling that takes into account crossover positions and Mendels second law
 Why panmictic bacteria are rare
 Fast Estimation of Recombination Rates Using Topological Data Analysis
 Genetic draft and valley crossing
 A novel framework for inferring parameters of transmission from viral sequence data
 Withinhost recombination in structural proteins of the FootandMouth Disease Virus
 Fast Estimation of Recombination Rates Using Topological Data Analysis
 Effects of partial selfing on the equilibrium genetic variance, mutation load and inbreeding depression under stabilizing selection
 Rapid adaptation in large populations with very rare sex: scalings and spontaneous oscillations
 The influence of higherorder epistasis on biological fitness landscape topography
 Polygenicity and epistasis underlie fitnessproximal traits in the Caenorhabditis elegans multiparental experimental evolution (CeMEE) panel
 Epistasis detectably alters correlations between genomic sites in a narrow parameter window
 Universality and predictability in molecular quantitative genetics
 Scaling properties of evolutionary paths in a biophysical model of protein adaptation
 Rate and cost of adaptation in the Drosophila genome
 A rigorous measure of genomewide genetic shuffling that takes into account crossover positions and Mendels second law
 Weak Epistasis May Drive Adaptation in Recombining Bacteria
 Polygenicity and Epistasis Underlie FitnessProximal Traits in the Caenorhabditis elegans Multiparental Experimental Evolution (CeMEE) Panel
 A Framework for Inferring Fitness Landscapes of PatientDerived Viruses Using Quasispecies Theory
 The Rate of Adaptation in Large Sexual Populations with Linear Chromosomes
 Clonal Interference in the Evolution of Influenza
 Genetic Draft and QuasiNeutrality in Large Facultatively Sexual Populations
 Leading the dog of selection by its mutational nose
 The Rate of FitnessValley Crossing in Sexual Populations
 Scaling expectations for the time to establishment of complex adaptations
 Understanding the Evolution of Defense Metabolites in Arabidopsis thaliana Using Genomewide Association Mapping
 Rate of Adaptation in Large Sexual Populations
 Emergent gene order in a model of modular polyketide synthases