New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Adaptations to fluctuating selection in Drosophila

Edited by Tomoko Ohta, National Institute of Genetics, Mishima, Japan, and approved December 1, 2006 (received for review August 17, 2006)
Abstract
Timedependent selection causes the adaptive evolution of new phenotypes, and this dynamics can be traced in genomic data. We have analyzed polymorphisms and substitutions in Drosophila, using a more sensitive inference method for adaptations than the standard populationgenetic tests. We find evidence that selection itself is strongly timedependent, with changes occurring at nearly the rate of neutral evolution. At the same time, higher than previously estimated levels of selection make adaptive responses by a factor 10–100 faster than the pace of selection changes, ensuring that adaptations are an efficient mode of evolution under timedependent selection. The rate of selection changes is faster in noncoding DNA, i.e., the inference of functional elements can less be based on sequence conservation than for proteins. Our results suggest that selection acts not only as a constraint but as a major driving force of genomic change.
Phenotypic adaptations build on genomic sequence substitutions driven by a positive fitness effect. The distribution of these fitness differences (selection coefficients) has been debated since the advent of neutral theory (1–6). Coding DNA evolves under considerable constraint, i.e., nonsynonymous substitutions take place at a lower rate than synonymous changes (7). This shows that selection on protein evolution is predominantly negative. However, there is also evidence that the nonsynonymous substitutions that do occur are in part driven by positive selection (8–10). The evolutionary role of noncoding DNA is less clear. A particularly intriguing idea is that phenotypic evolution is due in a large part to changes in gene regulation, whereas proteins evolve more slowly (11, 12). This hypothesis lacks quantitative evidence so far, but a number of recent studies have found fitness effects in noncoding DNA. Transcription factor binding sites in bacteria are under substantial selection for functionality (13), and putative regulatory regions in eukaryotes also show substantial selective constraints (14, 15). Evidence of positive selection has been reported for intergenic DNA in Drosophila (16, 17), albeit with nearneutral selection coefficients (17). Inference methods rely on various implicit assumptions, and they differ considerably in the inferred strength of selection and in its contribution to genomic change (6).
Our phenotypic concept of adaptation contains more than the mere presence of positive selection. Migration of a population followed by adaptation to a new habitat, the conquest of an ecological niche in coevolution, incipient sympatric speciation driven frequencydependent selection: in all these examples, fitness itself is timedependent, and the adaptive evolution of new functions is the response to this change.
Including the dynamics of selection into a quantitative picture of genome evolution is the purpose of this paper. To illustrate our rationale, let us first consider the case of static fitness, where intuition suggests that evolution reaches a balance between advantageous and deleterious substitutions. This point can be made more precise: the longterm dynamics of substitutions leads to an evolutionary equilibrium, where the probability of a (fixed, haplotype) sequence state a depends exponentially on its Malthusian fitness f_{a} (scaled by the effective population size). This simple form of equilibrium, originally derived for a twoallele model (18), applies to arbitrary sequence spaces and genomic fitness landscapes (19). It is characterized by detailed balance: the likelihood of any substitution (i.e., the product of initial state probability and substitution rate) between sequence states with a scaled fitness difference f = f_{b} − f_{a} equals the likelihood of the corresponding backward process, which involves a fitness difference (−f). Hence, on average every deleterious substitution is offset by an advantageous mutation, the average fitness remains constant, and there are no adaptations. Now consider a simple switch in the genomic fitness function: at a single position of the sequence, the selection coefficients f of all point mutations change sign. If the magnitude of selection is substantial, the most likely genomic state before the switch is fixation of the fittest nucleotide. After the switch, this state becomes suboptimal, which gives rise to an adaptive substitution to the new fitness optimum. This point is generic: adaptation in a timedependent fitness landscape involves a surplus of advantageous substitutions compared to equilibrium. Thus, we define adaptation as a pure nonequilibrium phenomenon, i.e., more restrictively than in much of the literature, where any substitution with a positive fitness effect is counted as an adaptation. However, our definition is in tune with the phenotypic view of adaptation sketched above and has a conceptual advantage to be made precise below: it provides an unambiguous statistical distinction of adaptive substitutions and nearneutral background changes. The adaptive surplus in substitutions can be traced in genomic data, which allows us to infer the dynamics of selection together with the genome's adaptive response.
The quantitative analysis of this joint evolution process is based on a statistical model of timedependent selection: point mutations at individual positions have fitness effects f(t) = ±σ, which randomly and independently change sign at a rate γ. Thus, selection is characterized by two parameters, the strength σ and the fluctuation rate γ. The latter governs the relative weight of positive and negative selection: higher values of γ generate a larger function of sites under positive selection, as shown by the above example. In this picture, populations evolve under two stochastic forces: genetic drift and selection switches. The classical work on fluctuating selection by Wright, Kimura, Ohta, Gillespie, and others (2, 20–26) was aimed at describing ecological changes such as seasonality or frequencydependent selection. For these microevolutionary fluctuations, the correlation time of selection does not exceed the characteristic time scale of genetic drift given by the effective population size, 1/γ ≤ N (2). Here we focus on macroevolutionary fluctuations with much longer correlation times of the order of the mutation time scale, 1/γ ∼ 1/μ ≫ N, as discussed by Gillespie (26) and Cutler (27) as an explanation for the overdispersion of the molecular clock. We obtain analytical solutions for the joint statistics of polymorphisms and substitutions, which depends on the strength and fluctuation rate of selection. Based on these solutions, we infer timedependent selection and the resulting adaptation by a systematic Bayesian procedure. From a computational point of view, our method is a modelbased scoring system for hybrid intra and interspecies sequence alignments. As we show by explicit benchmarking, it is more sensitive than the classical populationgenetic tests for adaptation (8, 28), which address partial aspects of the polymorphism and substitution frequencies. Hence, time dependence is not just an additional facet of selection but a crucial part of its quantitative inference.
Applying this method to sequence data from Drosophila, we find evidence for genomewide selection with substantial amplitudes (2Nσ_{0} > 1) and fluctuation rates of the order of the neutral mutation rate (γ ∼ μ). As will be discussed below, it is both characteristics together that establish adaptations as a major driving force of genomic evolution.
Theory of Fluctuating Selection
MutationSelectionDrift Model.
We first consider a singlelocus model with two alleles a and b and denote by x the population frequency of allele b, various generalizations will be discussed later. In the diffusion approximation, the timedependent haplotype frequency distribution p(x, t) obeys the Fokker–Planck equation describing reproductive fluctuations (genetic drift) in a population of effective size N, selection with a Malthusian fitness difference 2N f _{0}(t) ≡ f_{b} (t) − f_{a} (t) between the two alleles, and mutations with a rate μ_{0} per individual per generation (assumed to be equal for forward and backward changes). We consider a simple model of timedependent selection, with constant magnitude σ_{0} and fluctuating direction η(t) = ±1, which follows a Poisson process with rate γ_{0}. This process defines a statistical ensemble given by the average and the covariance of the variables η(t). Measuring time in units of the diffusion scale 2N introduces the rescaled evolutionary parameters f(t) = 2N f _{0}(t), σ = 2σ_{0}N, μ = 2μ_{0} N, and γ = 2γ_{0} N.
Stationary Evolution Under Constant Selection.
First recall well known results for timeindependent selection, f(t) = σ (29). The Kimura equation (Eq. 1 ) has the form of a continuity equation, ṗ = −∇Jp with the rescaled probability current J = −∇x(1 − x) + σx(1 − x) + μ(1 − 2x). Two normalized and linearly independent stationary solutions p_{a} (x) (a = ±1), which are eigenfunctions of J, can be defined by the boundary conditions that p _{+}(x) remain finite at x = 1 and p _{−}(x) remain finite at x = 0. Asymptotically for small μ, these solutions take the form up to terms of order μ^{2}, with normalization factors Z_{a} given in terms of hypergeometric functions [see supporting information (SI) Appendix ]. Their current eigenvalues define the Kimura–Ohta substitution rates for these processes, i.e., Jp_{a} (x) = au_{a}p_{a} (x) with A generic normalized stationary solution p(x) of Eq. 1 can be written as a linear combination with 0 ≤ λ _{a} ≤ 1 and λ_{+} + λ_{−} = 1. Evolutionary equilibrium defines the unique stationary solution p _{eq}(x) with a vanishing substitution current, Jp _{eq}(x) = 0, i.e., there is detailed balance between forward and backward substitutions. The equilibrium distribution, given exactly for all values of μ by p _{eq}(x) = [x(1 − x)]^{−1+μ}e^{σx}/Z _{eq} with a normalization factor Z _{eq} (30), has the form (Eq. 6 ) with
Quasistationary Evolution.
For timedependent solutions of Eq. 1 , no generic closed form exists even in the case of constant selection. For μ ≪ 1, however, an important simplification arises due to a separation of dynamical regimes, which is illustrated in Fig. 1. From an arbitrary initial distribution p(x, t = 0), the frequency distributions describing forward and backward processes reach their stationary shapes p _{+}(x) and p _{−}(x) within an initial time regime of order 2N generations (i.e., of order 1 in our rescaled time units); see Fig. 1 a. For larger values of t, the distribution takes the quasistationary form (Eq. 6 ) with timedependent coefficients λ_{a}(t) (a = ±1) up to correction terms of order exp(−t); see Fig. 1 b. The longterm dynamics of substitutions is given by the rate equations and governs the approach to equilibrium,
Evolution Under Fluctuating Selection.
In the full model given by Eqs. 1 – 3 , allele frequencies evolve under two stochastic forces: genetic drift and selection fluctuations. The joint statistics of both processes is very complicated in general (2). The problem becomes again tractable in the quasistationary approximation, which can be applied if μ ≪ 1 and γ ≪ 1, i.e., if selection fluctuations are macroevolutionary. This is demonstrated in Fig. 1 c for the coefficient function λ_{+}(t), which follows Eq. 9 in each interval of constant selection for a given fluctuation history η(t). Generalizing Eq. 6 , we define the joint probability Λ_{a} ^{ϵ}(t) (a, ϵ = ±1) of allele a and the direction of selection η(t) = ϵ. These probabilities follow the quasistationary evolution equations defining the 4 × 4 rate matrix U. Given initial probabilities at an earlier time t _{0}, we obtain with the transition matrix T(t − t _{0}) = exp[(t − t _{0})U ]. In the following, we focus on key properties of this process relevant for crossspecies sequence comparisons.
Nonequilibrium Stationarity.
The asymptotic limit of Eq. 11 defines a nonequilibrium stationary state, which describes the longterm average over selection fluctuations and genetic drift for a family of genomic loci evolving independently. The approach to stationarity is governed by the rate u _{+} + u _{−} as in Eq. 9 . In the data discussed below, it is much faster than for neutral evolution due to substantial levels of selection, with u _{+} + u _{−} ≃ σμ. The statistics of the stationary state depends on two scaled selection parameters, the strength σ and the fluctuation rate in units of the neutral mutation rate, κ = γ/μ. The stationary allele probabilities counted in their contemporary direction of selection are given by the asymptotic state probabilities lim_{t → ∞} Λ_{a} ^{ϵ} = λ̄_{a·ϵ} with and should be compared with the equilibrium probabilities (Eq. 7 ) for constant selection. They determine the efficiency of the genomic response to timedependent selection as measured by the degree of adaptation where η̅f̅ is the average fitness of the genomic nucleotide, f̄ is the average fitness of a random nucleotide, and f _{max} = f _{ϵ·a} is the fitness of the preferred nucleotide. Sequence turnover takes place at an average substitution rate The rate of adaptations is given as the surplus of advantageous substitutions over deleterious ones, and these produce an average fitness flux, i.e., a gain in fitness per unit of time. We can thus decompose the sequence turnover into nearneutral substitutions due to genetic drift and adaptive substitutions generated by fluctuating selection. The fraction of adaptive substitutions at selection amplitude σ > 0 is
CrossSpecies Correlations.
The timedependent transition probabilities (Eq. 11 ) determine genomic correlations between species due to their common ancestry, which depend on their divergence time t. The joint probability of states a, ϵ and a′, ϵ′ for two species is a sum over the states a″, ϵ″ of their last common ancestor, assuming stationarity of the ancestor states and subsequent divergent evolution with independent selection fluctuations along the two branches of the phylogeny. Unlike the standard notion of derived alleles, we do not make any ad hoc assumption that the outgroup carries the ancestral or the preferred nucleotide, which would introduce a bias in the predicted frequencies. The crossspecies correlations between allele frequencies at orthologous loci are then given by the joint distribution
To analyze genomic polymorphism data, we have to compute the expected allele frequency distributions in a finite sample of individuals randomly drawn from a population (31). For two species with divergence time t, the joint probability Q(k, k′m, m′) of finding k alleles b (k = 0, 1, …, m) in a random sample of m individuals of the first species and k′ alleles b (k′ = 0, 1, …, m′) at the orthologous locus in a random sample of m′ individuals of the second species is where M_{a} ^{ϵ}(k, m) are the binomial moments of the elementary stationary distributions (Eq. 4 ), given in terms of hypergeometric functions (see SI Appendix ). To leading order in t and in the limit of neutral evolution, the expressions (Eq. 20 ) reduce to the well known sampling formulae of ref. 31. Here we use the full timedependence of (Eq. 18 ), which allows for multiple substitutions at the same site, since the shorttime approximation can produce a spurious signal of selection.
Data Analysis
Sequence Data and Alignments.
We have aligned 271 Drosophila melanogaster sequence fragments, which are scattered across the X chromosome and are sampled from 12 individuals of a Zimbabwe population (17, 32, 33), to a single Drosophila simulans outgroup sequence. The aligned loci are binned into five broad genomic categories: 4fold synonymous sites and nonsynonymous substitutions in protein coding DNA, intergenic regions, introns, and UTRs, similarly to the classification in ref. 17. In each category, we count kfold singlenucleotide polymorphisms for k = 1, …, 11 (i.e., positions where k ingroup sequences differ from the nucleotide of the outgroup sequence at the orthologous position) as well as conserved positions (k = 0) and point substitutions within the sample (k = 12). Normalizing these counts per unit sequence length then defines the frequency distributions Q̂(k) (k = 0, …, 12), which are stable with respect to alignment changes and, in particular, do not overestimate the number of substitutions (for details and count tables, see SI Appendix ). These genomic distributions can be compared to symmetrized pair probabilities Q(k) ≡ Q(k, 0m, m′) + Q(m − k, m′m, m′) with m = 12 and m′ = 1 as given in ref. 20.
Bayesian Inference of Evolutionary Parameters.
In a given genomic category, the distribution Q(k) has to be averaged over an a priori unknown distribution Ω(σ) of selection amplitudes, To infer this distribution, we use simple parameterizations which contain the average level of selection σ_{ave} ≡ ∫ σ Ω(σ)dσ and the fraction of selected sites c_{s} = ∫_{2} ^{∞} Ω(σ)dσ (with the threshold 2 chosen by convention) as independent parameters, see SI Appendix . The likelihood scoring function is then defined as where we choose as reference distribution Q _{0} the best neutral model for 4fold synonymous sites. (This choice does not influence our inferences, which are based on score differences.) A sequence category consisting of L independent loci with allele counts k _{1}, …, k_{L} has the total score We then infer c_{s} , σ_{ave}, κ, and μ for each genomic category and the common parameter t by maximizing S summed over all categories (see ref. 34 for a similar approach). Confidence intervals follow from sampling of the Bayesian posterior probability.
Model Discrimination, P Values.
For a set of allele counts k _{1}, …, k_{L} drawn from a distribution Q(kσ, γ, μ, t), any other model Q′ has a likelihood given by the corresponding difference of scores (Eq. 24 ), P ∼ exp[−β(S − S′)], with β = 1 for independent counts and a β < 1 for the actual datasets due to the partial linkage of loci (see SI Appendix ). In particular, we can quantify the evidence for adaptive evolution by the P value of the best equilibrium model Q′, which is obtained by maximizing S′ with the constraint κ = 0. The resulting score difference per locus, Δs ≡ (S − S′)/L, is shown in Fig. 2 a as a function of the parameters σ and κ of the input model Q. This can be compared with the analogous score difference Δs _{MK} of a McDonald–Kreitman (MK) test, which is based on the overall frequencies of polymorphisms Q^{p} ≡ Σ _{k=1} ^{m−1} Q(k) and of substitutions Q^{s} ≡ Q(m), and on their counterparts Q _{0} ^{p}, Q _{0} ^{s} in a neutral reference class, see Fig. 2 b and SI Appendix for details. The increase in statistical power is significant: (i) The full score difference Δs is positive and, thus, produces evidence for adaptation for all parameters σ, κ > 0, whereas the MK test does not infer adaptations in the region where Q^{s}Q _{0} ^{p}/Q^{p}Q _{0} ^{s} ≤ 1 and hence Δs _{MK} = 0. (ii) Δs is always higher than Δs _{MK}, which implies that the same P value is reached with (sometimes considerably) less loci. (iii) Our method correctly reconstructs the fraction α̃ of adaptive substitutions as given by Eq. 17 , whereas estimate α̃_{MK} = 1 − Q^{p}Q _{0} ^{s}/Q^{s}Q _{0} ^{p} (6, 9) based on the MK test leads to a systematic underestimation; see Fig. 2 c and d. As shown in SI Appendix , there is a similar difference in statistical power compared to all inferences based only on the poly morphism spectrum, such as Tajimas D test and its variants (28).
Demographic Effects.
The history of a population enters the allele frequency data through a timedependent population size N(t) = Nν(t) (where N is today's effective population size), leading to deviations from equilibrium even for stationary selection. These demographic effects have been studied extensively for different Drosophila populations in the recent literature (32, 33, 35, 36). In particular, reduced population sizes for D. melanogaster in the past can lead to an increased number of substitutions, and recent variations in population size will also influence the polymorphism spectrum. As an alternative model to fluctuating selection, we consider the evolution under timeindependent selection (κ = 0) and a bottleneck on the ingroup branch with strength ν _{b} , initial time t_{i} , and duration t_{f} − t_{i} (0 ≤ t_{i} ≤ t_{f} ≤ t). The corresponding distributions Q(kσ, μ, t, t_{i} , t_{f} , ν _{b} ) have a selectiondependent increase in the substitution frequency. These distributions are obtained numerically, older bottlenecks (t_{f} < t − 1) can also be treated analytically using the quasistationary approximation (Eq. 8 ) with scaled selection coefficients σ(t) = σν(t). The demography is shared between the different genomic categories and is hence treated with global parameters in the maximumlikelihood analysis (for further details, see SI Appendix ).
Linkage Between Loci, Hitchhiking.
The sequence data originate from a number of different contiguous fragments. Loci on different fragments can be assumed independent, but loci on the same fragment are partially linked. These correlations can confound the inference of selection through hitchhiking effects, by which a locus under sufficiently strong positive selection influences the polymorphism spectrum at nearby loci, which are not under positive selection themselves (37). The characteristic length of reduced variance due to a hitchhiking event is ξ ∼ σ/ρ, where ρ is the recombination rate per 2N generations. The expected distance from a polymorphic site to the nearest adaptive change taking place during the lifetime of the polymorphism is ℓ ≈ 1/c_{s} γ. A scaling argument then suggests that the importance of hitchhiking is measured by the average ratio which can be estimated from our inferred values of Φ and independent values of ρ. This is discussed in SI Appendix , where we show by a simulation of partially linked loci that hitchhiking may lead to an underestimation of some selection parameters but leaves the conclusions unaffected. In the same way, we verify numerically that other features not explicitly contained in our model, such as differences in neutral mutation rates and fourallele loci, do not confound our parameter inference.
Results and Discussion
Fluctuating Selection in Drosophila.
The model distributions obtained from Eq. 20 produce a good fit to the observed distributions Q̂(k) in all genomic categories (see Fig. 3).As expected, the frequency distribution of 4fold synonymous sites in coding DNA is nearneutral (σ_{ave} ∼ 1). In all other categories, we find substantial levels of selection affecting a substantial fraction of sites. The average selection strength over all sites in one category ranges from σ_{ave} ∼ 14 in intronic regions to σ_{ave} ∼ 100 for replacement substitutions in coding DNA (see Table 1). The fractions of selected sites are c_{s} > 0.6, but these may be overestimated due to the possible presence of hitchhiking (37, 38) (see SI Appendix ). The Bayesian posterior yields likelihood values P < 10^{−37} for neutral evolution, indicating that the presence of selection is statistically significant. Perhaps more surprisingly, the evolution is not only far from neutrality but also far from equilibrium. The inferred fluctuation rates of selection are of the same order of magnitude as the neutral mutation rates (0.1 < κ < 0.5). Hence, the fitness function at individual genomic sites is not static: it evolves at the same tempo as nucleotide changes, resembling more a seascape than a landscape. Because fluctuations at different sites are independent, these rates translate into a number of selection switches of order one per year in the Drosophila genome. The timedependence of selection is statistically significant, with likelihood values P < 10^{−17} for any model with static selection (κ = 0) at equilibrium and P < 10^{−10} with only demographic deviations from equilibrium by a timedependent effective population size, see Fig. 3 and SI Appendix .
Genetic Drift and Adaptations.
If genomic evolution is driven by fluctuations in reproduction and selection, can we disentangle the effects of these two stochastic forces? At a given selection amplitude σ, Eq. 17 provides an unambiguous decomposition of the observed sequence divergence into driftgenerated background changes and adaptations. Background substitutions are mostly nearneutral, and they have selection coefficients f = +σ and f = −σ with equal frequencies, because advantageous changes merely compensate deleterious ones. For κ > 0, there is a surplus of substitutions with f = +σ, signaling the adaptive response to timedependent selection. Averaging over the inferred distribution of selection amplitudes in a genomic category, we find substantial fractions of adaptive changes, α̃_{ave} ≡ u _{ad, ave}/u _{tot, ave} > 0.6, again with a caveat due to possible hitchhiking. Yet, it appears that adaptations and background changes contribute to genomic evolution in comparable measure, a salomonic note in the neutralist–selectionist debate. Moreover, both classes are strongly intertwined: neutral changes can open new paths for subsequent adaptations.
Evolvability by Adaptation.
For all genomic categories in D. melanogaster except 4fold degenerate sites, we find average levels of selection σ_{ave} about an order of magnitude higher than previously reported values (17, 31, 34, 39), due to the increased statistical sensitivity of our method discussed above. These substantial levels are crucial for adaptations to be an efficient mode of evolution under timedependent selection: fitness changes occur at about the rate of neutral evolution, but adaptive responses are faster by a factor σ/κ. For example, a selection change in either D. melanogaster or D. simulans at the time of speciation ≈3 million years ago will have generated an adaptive differentiation between today's lineages in >60% of the affected sites with σ > 20 but in <10% of weakly selected sites (σ < 2). Evolvability by adaptation can be quantified by the degree of adaptation at stationarity, α, which is defined in (Eq. 13 ). Selected sites in all of the genomic categories in D. melanogaster except 4fold synonymous sites have values α > 0.9 (see Table 1), due to their substantial average values of σ. This justifies our conclusion that the deviations from equilibrium in the observed distributions Q̂(k) are due to ongoing selection fluctuations in a nonequilibrium steady state, rather than just a poorly adapted genomic state approaching equilibrium in a static local fitness function. The timedependence of selection implies a fitness cost due to temporary misadaptation (23). Thus, the high values of α imply that most of the selection switches readily trigger an adaptive response, resulting in a number of adaptive substitution of order one per year in the Drosophila genome. Importantly, evolvability by adaptation does not require evolutionary tuning of the neutral mutation rate to the fluctuation rate of selection, provided selection amplitudes are sufficiently high.
Fitness Flux Quantifies Adaptations.
The relative contribution of genomic categories to phenotypic adaptations can be estimated on the basis of their fitness flux, i.e., the expected fitness gain per unit time. This weighted measure is more appropriate than the mere number of changes, since substitutions differ by orders of magnitude in their fitness effects and, hence, in their putative phenotypic consequences. According to Eq. 16 , the fitness flux Φ per unit sequence length is proportional to the selection parameters σ and κ for strong selection, but the contribution of weakly advantageous substitutions is suppressed: these either happen too slowly (if f < κ) or are too unstable against reversal (if f < 1). All genomic categories of the Drosophila except 4fold synonymous sites are well above this threshold and have comparable values of Φ (see Table 1). Hence, the total flux in noncoding DNA, obtained by multiplying Φ with the number of sites in those categories, may indeed outweigh that of protein evolution, as suggested previously for the numbers of adaptive changes (17). Notably, the fitness flux in UTRs and intergenic DNA results from more frequent changes with a lower value of f each than for replacement changes in coding DNA, where selection is stronger but more static. This may point to the role of regulation in the adaptive differentiation between species.
Fitness Is Correlated Between Sites and in Time.
The timedependence of selection can be put in perspective by comparison with known evolution models, which are related to the fluctuatingselection model. For κ = 0, individual sites evolve independently under timeindependent selection. If a substitution at a given site has selection coefficient f, its backward substitution has selection coefficient (−f), i.e., the fitness effects of subsequent substitutions at the same site are correlated. On the other hand, the evolution of long genomic sequences is often described by the infinitesites model, regarding any two subsequent substitutions as independent and neglecting correlations between their selection coefficients f. This approximation is justified to the extent that fitness correlations between subsequent substitutions at the same site have decayed, either because of external fluctuations or due to inbetween substitutions at other sites that have a fitness effect on the site in question (epistasis). This is precisely what the fluctuation parameter κ measures in the generic case: the fitness changes at a given site driven by external causes or by the coupled evolution of the remainder of the genome. Fitness interactions changing the direction of selection for substitutions at a given site, socalled sign epistasis (41), occur in a number of observations and models of protein, RNA, and regulatory evolution (19, 40, 42–46). Thus, the higher values of κ in UTRs, intronic and intergenic DNA shown in Table 1 are in accordance with the expected ubiquity of sign epistasis in regulatory sequences (13, 19). Our analysis relates epistasis to temporal fitness correlations at individual genomic sites. It suggests that sign epistasis may be pervasive, indicating a genomewide rugged fitness landscape.
Genomic evolution emerges as a complex stochastic process, shaped jointly be the driving force of timedependent selection, fitness interactions between sites, and the ongoing background of nearneutral changes. Much more remains to be learned about the interplay of these evolutionary forces: in a large and strongly coupled system, one external signal can trigger an avalanche of subsequent compensatory responses, which build up an evolutionary innovation. This dynamics seems now within reach of genomic sequence analysis.
Acknowledgments
We thank Joachim Hermisson, Diethard Tautz, and Thomas Wiehe for discussions and the anonymous referees for helpful comments. This work was supported by Deutsche Forschungsgemeinschaft (DFG) Grant SFB 680 and STIPCO European Network Contract HPRNCT200200319.
Footnotes
 *To whom correspondence should be addressed. Email: lassig{at}thp.unikoeln.de

Author contributions: V.M. and M.L. designed research, performed research, analyzed data, and wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS direct submission.

This paper contains supporting information online at www.pnas.org/cgi/content/full/0607105104/DC1.
 © 2007 by The National Academy of Sciences of the USA
References

↵
 Kimura M

↵
 Gillespie JH
 ↵

↵
 Nei M

↵
 Orr HA
 ↵

↵
 Li WH
 ↵
 ↵
 ↵

↵
 Monod J ,
 Jacob F

↵
 King MC ,
 Wilson AC

↵
 Mustonen V ,
 Lässig M

↵
 Chin CS ,
 Chuang JH ,
 Li H
 ↵

↵
 Kohn MH ,
 Fang S ,
 Wu CI
 ↵

↵
 Kimura M
 ↵

↵
 Wright S

↵
 Kimura M
 ↵
 ↵

↵
 Takahata N ,
 Ishii K ,
 Matsuda H

↵
 Takahata N ,
 Kimura M

↵
 Gillespie JH

↵
 Cutler DJ

↵
 Tajima F

↵
 Ewens WJ

↵
 Rouzine IM ,
 Rodrigo A ,
 Coffin JM

↵
 Sawyer SA ,
 Hartl DL

↵
 Glinka S ,
 Ometto L ,
 Mousset S ,
 Stephan W ,
 De Lorenzo D

↵
 Ometto L ,
 Glinga S ,
 De Lorenzo D ,
 Stephan W
 ↵
 ↵

↵
 Thorton K ,
 Andolfatto P

↵
 Fay JC ,
 Wu CI

↵
 Gillespie JH

↵
 Piganeau G ,
 EyreWalker A
 ↵

↵
 Weinreich DM ,
 Watson RA ,
 Chao L
 ↵
 ↵
 ↵

↵
 Weinreich DM ,
 Delaney NF ,
 Depristo MA ,
 Hartl DL
 ↵