Population-genomic inference of the strength and timing of selection against gene flow

Edited by Andrew G. Clark, Cornell University, Ithaca, NY, and approved May 18, 2017 (received for review October 9, 2016)
June 20, 2017
114 (27) 7061-7066

Significance

Selection against maladaptive gene flow drives local adaptation and speciation. We present an approach to quantify the genome-wide aggregate signal of such selection. We show that the product of the density of loci under selection times their selection coefficient forms a compound parameter that can be estimated from genome-wide polymorphism and recombination data. In addition, we can infer the timing of selection and the neutral baseline migration rate. Applications to Mimulus guttatus on the population and species level show evidence for selection maintaining locally adapted populations and a species barrier despite recent introgression. Our approach takes background selection into account and is applicable to the growing number of genomic datasets that include samples from multiple populations experiencing divergent selection.

Abstract

The interplay of divergent selection and gene flow is key to understanding how populations adapt to local environments and how new species form. Here, we use DNA polymorphism data and genome-wide variation in recombination rate to jointly infer the strength and timing of selection, as well as the baseline level of gene flow under various demographic scenarios. We model how divergent selection leads to a genome-wide negative correlation between recombination rate and genetic differentiation among populations. Our theory shows that the selection density (i.e., the selection coefficient per base pair) is a key parameter underlying this relationship. We then develop a procedure for parameter estimation that accounts for the confounding effect of background selection. Applying this method to two datasets from Mimulus guttatus, we infer a strong signal of adaptive divergence in the face of gene flow between populations growing on and off phytotoxic serpentine soils. However, the genome-wide intensity of this selection is not exceptional compared with what M. guttatus populations may typically experience when adapting to local conditions. We also find that selection against genome-wide introgression from the selfing sister species M. nasutus has acted to maintain a barrier between these two species over at least the last 250 ky. Our study provides a theoretical framework for linking genome-wide patterns of divergence and recombination with the underlying evolutionary mechanisms that drive this differentiation.
Estimating the timing and strength of divergent selection is fundamental to understanding the evolution and persistence of organismal diversity (13). Genes underlying local adaptation and speciation act as barriers to gene flow, such that genetic divergence around these loci is higher compared with the rest of the genome. However, a framework that explicitly links observable patterns of DNA polymorphism with the underlying evolutionary mechanisms and allows for robust parameter inference has so far been missing (4).
One way of studying adaptive genomic divergence in the face of gene flow is to apply methods for demographic inference to scenarios of speciation (e.g., refs. 5 and 6). This approach allows dating population splits and inferring the presence or absence of gene flow, yet generally does not explicitly account for natural selection (but see ref. 7). Another approach is to scan genomes for loci that are statistical outliers of divergence among populations. These scans are used to identify candidate loci underlying speciation or local adaptation (e.g., refs. 8 and 9) and include the search for so-called genomic islands of divergence (e.g., ref. 10) (i.e., extended genomic regions of elevated divergence). Methods of this type can be confounded by other modes of selection, as well as demography, and will always propose a biased subset of candidate loci (11, 12).
A third approach is to test for a negative correlation between absolute genetic divergence and recombination rate across the genome (e.g., refs. 1315). This approach is based on the prediction that divergence will be higher in regions of the genome where genetic linkage between neutral sites and loci under divergent selection is higher on average (16). Testing for this pattern of a negative correlation is powerful because it aggregates information across the entire genome and because it is specific to divergent selection with gene flow (17). However, this approach is purely descriptive and problematic if recombination covaries with a confounding factor (e.g., gene density) that in turn affects the intensity of selection (18).
Here, we develop a theory describing the pattern used by this third approach and a way of inferring the underlying parameters. Our approach explicitly models selection against gene flow and its effect on neutral variation, estimates the strength and timing of selection and gene flow, and filters out the confounding effect of background selection (BGS).

Idea of Approach and Population-Genomic Model

Here, we exploit the genome-wide variation in recombination rate and its effect on genetic divergence. Divergent selection reduces effective gene flow at neutral sites, and this effect decreases with the recombinational distance from the loci under selection. We conceptualize this relationship in terms of the effective migration rate and the expected pairwise between-population coalescence time (Fig. 1A). The latter directly connects to the absolute genetic diversity between populations, a quantity that is readily estimated from DNA sequence data. Our model considers two populations of diploids with effective sizes N1 and N2 and nonoverlapping generations. In population 1, a balance between one-way gene flow from population 2 at rate m per generation and local directional selection is maintained for τ generations before the present. In this “migration–selection” (MS) phase (Fig. 1A), selection against maladaptive immigrant alleles acts at an arbitrary number of biallelic loci that we call “MS polymorphisms” (MSPs). At each MSP, one allele is favored in population 1 over the other by an average selection coefficient s, whereas the deleterious allele is introduced by gene flow. We assume additive fitness and no dominance.
Fig. 1.
Divergent selection reduces gene flow and increases genetic divergence. (A) Selection against locally maladapted alleles at MSPs (black triangles) reduces the effective migration rate me. The effect is stronger in regions of low recombination (red; A, Left Upper) and decreases the probability that lineages sampled in different populations migrate and coalesce. Realizations of the coalescence process are shown in A, Left Lower for the (MS)P scenario (SI Appendix, Fig. S1). In regions of high recombination, me is higher (blue; A, Right Upper), such that migration and earlier coalescences are more likely (A, Right Lower). (B) The predicted between-population diversity πB=2u𝔼[TB] (curves) matches individual-based simulations (dots); error bars (±𝖲𝖤) are too short to be visible. The (MS)M scenario was used with N2=5,000, u=109, ν=2.5×107, m=m0=5×104, τ=4N2. (C) Approximately linear contour lines with slope 1 in the surface of πB as a function of log10(s) and log10(ν) support the compound parameter selection density, σ=sν. Here, rbp=108 (1 cM/Mb); other parameters are as in B.
Before the MS phase, we assume a “panmictic” (P) phase in an ancestral population of effective size N0 that starts τ generations ago and extends into the past (Fig. 1A). We call this the (MS)P demographic scenario. The P phase can be exchanged for an ancestral “migration” (M) phase with gene flow at rate m0. Here, we use the (MS)P and (MS)M scenarios to describe our approach. We provide extensions to more general scenarios with an intermediate “isolation” (I) phase in SI Appendix, part 1.
We denote the per-base-pair recombination rate by rbp and assume that the MSPs occur at a constant rate, ν, per base pair, such that the distance between consecutive MSPs is exponentially distributed with mean 1/ν base pairs.

Average Effective Gene Flow and Selection Density

Selection against maladapted immigrant alleles acts as a barrier to gene flow in the MS phase. At a focal neutral site, the baseline migration rate m is reduced to an effective migration rate me (19, 20). This reduction in effective gene flow increases with the strength of selection at the MSPs and decreases with their recombinational distance from the focal neutral site (Fig. 1A and SI Appendix, Eq. S1.1). To extrapolate from a given neutral site to the entire genome, we need to average over the possible genomic locations and selection coefficients of the MSPs. For simplicity, we assume an infinite chromosome with a linear relationship between physical and genetic map distance. Given an exponential distribution of selection coefficients, the expected effective migration rate depends on s, ν, and rbp exclusively through σ/rbp, where σ=sν is the product of the mean selection coefficient times the density of MSPs (SI Appendix, part 1). Note that σ/rbp has the meaning of a selection density per genetic map unit. For instance, conditioning on two MSPs on each side of an average neutral site, we find
𝔼[me(2,2)]m[1+2σ/rbpln(σ/rbp)].
[1]
This equation is a good approximation if σ/rbp0.1—that is, if recombination is at least 10 times stronger than selection, at which point effective gene flow is reduced by 50% (SI Appendix, Fig. S2). Eq. 1 shows that the mean effective gene flow decreases with selection density and increases with recombination rate. Adding increasing numbers of MSPs has a diminishing effect on 𝔼[me] (SI Appendix, Fig. S2A), so that Eq. 1 captures the essential pattern if σ/rbp0.1. The exclusive dependence of 𝔼[me] on selection and recombination through the compound parameter σ/rbp holds for any number of MSPs and so applies to the genome-wide average of me (SI Appendix, Eq. S1.8). Our results imply that doubling the number of MSPs has the same effect on average effective gene flow as doubling the mean selection coefficient. We therefore anticipate that, in practice, s and ν can be inferred only jointly as σ from population-genomic data in our framework.

Expected Pairwise Coalescence Time with Selection

To facilitate parameter inference from population-genomic data, we phrase our theory in terms of the expected coalescence time of two lineages, one from each population. The expectation of this coalescence time under neutrality, TB, depends on the baseline migration rate m (SI Appendix, Table S2). We incorporate the effect of selection by substituting the effective migration rate for m. Averaging over all possible numbers and genomic locations of MSPs, we obtain 𝔼[TB] and can predict the between-population diversity πB as 2u𝔼[TB], where u is the mutation rate per base pair and generation.
To better reflect real genomes, we now assume a finite genome size and define rf=0.5 as the recombination rate that corresponds to free recombination, such that MSPs located more than kf=1/(2rbp) base pairs from a neutral site are unlinked. We start by assuming that ν is so small that, at most, a single, nearest-neighboring MSP is linked to any focal neutral site. In the simplest case of the (MS)M scenario with m0=m (SI Appendix, Fig. S1C), the expected pairwise between-population coalescence time is approximately
𝔼[TB]2N2+1m+1m2σrbp(emτD+F)+1msrfe2νkf,
[2]
where D and F depend on m, τ, and ν (Materials and Methods). The first two terms in Eq. 2 are the expectation without selection (21) (SI Appendix, Table S2). The third and fourth terms reflect the increase in coalescence time if the MSP is linked and unlinked to the neutral site, respectively. Importantly, the term accounting for a linked MSP shows that σ/rbp strongly determines 𝔼[TB], although s, ν, and rbp also enter Eq. 2 independently. Indeed, given rbp and in the parameter range where Eq. 2 is a good approximation (i.e., for νrbp/s,m,τ), the effect of selection on 𝔼[TB] is entirely captured by σ (SI Appendix, Fig. S4). For details and other demographic scenarios, see SI Appendix, part 1. In this simplified model, the effect of all other MSPs is absorbed by m as a genome-wide reduction in gene flow that is independent of recombination.
In practice, we want to explicitly account for all MSPs possibly present in the genome, as well as for the average physical chromosome length. Finding 𝔼[TB] in this more realistic setting amounts to averaging over all possible numbers and genomic locations of the MSPs. We wrote a C++ program to do this integration numerically (SI Appendix, part 1). The result agrees well with individual-based forward simulations (Fig. 1B and SI Appendix, Fig. S5). As with a single MSP, if rbp is given, 𝔼[TB] depends on s and ν effectively only through the selection density σ (Fig. 1C). In fact, returning to the idealizing assumption of a global linear relationship between physical and genetic map distance (i.e., rf), we show that this dependence holds exactly (SI Appendix, Eq. S1.68). This finding corroborates σ as a key parameter and natural metric to quantify genome-wide divergent selection in the face of gene flow.

Application to Mimulus guttatus

We developed an inference procedure based on our theory and applied it to two datasets from the predominantly outcrossing yellow monkeyflower (Mimulus guttatus), an important model system for speciation and local adaptation (22). For both datasets, we fit the model with multiple MSPs to the empirical relationship between recombination rate (rbp, estimated from a linkage map) and putatively neutral between-population diversity (πB, estimated from fourfold degenerate coding sites), after correcting the latter for genomic correlates and divergence to the outgroup Mimulus dentilobus (SI Appendix, part 3). Our procedure computes the sum of squared deviations (SSD) across genomic windows between these observed values of πB and those predicted by our model, given the estimate of rbp for each window and a set of parameter values. Minimizing the SSD over a large grid of parameter values, we obtained point estimates for the selection density (σ), baseline migration rate (m), and duration of the MS phase (τ). We estimated 95% nonparametric confidence intervals (CIs) for the parameters by doing a block-bootstrap over genomic windows (SI Appendix, part 3). For both datasets, we explored two alternative demographic scenarios, but focus here on the one that provided more plausible parameter estimates and tighter 95% CIs. Unless otherwise stated, we only report results obtained with genomic windows of 500 kb because results for windows of 100 and 1,000 kb were very similar (SI Appendix, part 4).

Accounting for Background Selection.

In M. guttatus, pericentromeric regions are gene-poor and characterized by low recombination rates, which results in a genome-wide positive correlation between gene density and recombination rate (14) (SI Appendix, Fig. S13). Such a correlation could attenuate or even reverse the positive correlation between diversity and recombination rate otherwise expected under BGS and other modes of selection at linked sites, because the strength of such selection is, in turn, expected to be positively correlated with gene density (18). This effect could create a false signal of selection against gene flow, because the increased coalescence rate within populations in gene-dense regions could produce a negative correlation between rbp and πB. Comparing tests of a partial correlation between rbp and diversity with and without gene density as a covariate, we found that this effect might be present in our first dataset (SI Appendix, part 4 and Dataset S4). Therefore, we first fit a BGS model to the genetic diversity within source populations by allowing the effective population size (N2) to vary as a function of gene density and rbp. We then incorporated BGS into our migration-selection inference procedure, using these predicted relationships between N2, gene density, and rbp (SI Appendix, part 3). This procedure filters out the effect of BGS because, with unidirectional gene flow, BGS in the source, but not the focal population, may affect πB (Eq. 2).

Adaptive Divergence Maintained in the Face of Gene Flow.

M. guttatus can be found growing on serpentine soils throughout its range (ref. 25, p. 4). Although the mechanism and molecular basis of this adaptation are unresolved (26), strong differences in survival on serpentine soil exist between serpentine and nonserpentine ecotypes (25). To see whether there was a population-genomic signal of local adaptation, we used whole-genome pooled-by-population sequencing of 324 individuals collected from two pairs of geographically close populations growing on and off serpentine soil in California (the serpentine dataset; Fig. 2A and SI Appendix, part 2). We inferred the strength of selection in serpentine populations [McLaughlin Reserve (REM) and Red Hills Area (SLP)] against maladaptive immigrant alleles from the geographically proximate off-serpentine population [Soda Canyon, Napa (SOD) and Tulloch Reservoir (TUL), respectively], using the latter in each pair as a proxy for the source of gene flow. These pairs of geographically close serpentine×off-serpentine populations are genetically less diverged than any other population pair (Fig. 1A and SI Appendix, Fig. S10). Because we observed a strong signal of BGS in all populations (SI Appendix, part 4), we corrected for this signal when fitting our migration–selection model to the data (SI Appendix, part 3). We found that the conditional surface of the SSD (holding m and τ at their point estimates) showed a pronounced ridge for s and ν, with the 95% confidence hull falling along this ridge (Fig. 2B). With parameters on a log10 scale, the slope of this ridge is 1, nicely confirming our theoretical result that s and ν should be estimated jointly as their product, the selection density σ. We therefore adjusted our inference procedure to jointly infer m, τ, and σ, instead of m, τ, s, and ν (SI Appendix, part 3). This adjustment resulted in profile SSD surfaces for σ and m with a unique peak and tight confidence hulls (Fig. 2C).
Fig. 2.
Geographic context of serpentine dataset and quasi-likelihood surfaces. (A) Sampling sites in California (modified from ref. 23 with permission from Taylor and Francis Ltd.), and unrooted population phylogeny based on linearized genetic divergence (24). (B) The negative SSD (SSD) for the selection coefficient s and the genomic density ν of MSPs, conditional on point estimates m^5.6×104 and τ^5×107. The ridge with slope 1 confirms the compound parameter selection density, σ=sν. A cross denotes the point estimate and black hulls the 95% bootstrap confidence area. (C) Joint profile surface of the SSD for the baseline migration rate m and the selection density σ, maximized over τ. Results are shown for the population pair REM×SOD under the (MS)M scenario with genomic windows of 500 kb.
For both serpentine×off-serpentine pairs, we found a strong genome-wide signal of divergent selection against gene flow, with point estimates for σ of 8.3×104 and 4.8×104 per megabase (Mb) in REM×SOD and SLP×TUL, respectively, and tight 95% CIs (Fig. 3 A and C and Dataset S5). Given an assembled genome size of 320 Mb for M. guttatus, this selection density would, for instance, be consistent with 300 MSPs, each with a selection coefficient on the order of 104 to 103. The impact of this selection on genome-wide levels of polymorphism is reflected in an increase in the effective migration rate (me) with higher recombination rate (Fig. 3 B and D). The 95% CI of the relative difference (δme) between the maximum and minimum of me/m clearly excludes 0 (Fig. 3 B and D and SI Appendix, part 3), indicating a partial shutdown of gene flow due to selection. According to our estimates of m, selection maintains this divergence against a baseline rate of gene flow of 6.6×106 in REM×SOD and 3.5×106 in SLP×TUL (Fig. 3 B and C). Given the estimated effective population sizes of REM and SLP (SI Appendix, part 2), these rates of gene flow imply 3.8 and 2.1 diploid immigrants per generation, respectively.
Fig. 3.
Parameter estimates and model fit for the serpentine dataset. (A and C) Profile curves of the quasi-likelihood (SSD) for each parameter, maximizing over the two remaining parameters, for the serpentine×off-serpentine comparisons REM×SOD (A) and SLP × TUL (C) (Fig. 2). Vertical red and black dashed lines indicate the point estimate and 95% bootstrap CIs, respectively. (B and D) Raw data (blue dots) and model fit (red curve) with 95% CI (gray shading). The corresponding ratio of the effective to the baseline migration rate is shown on the right (red shading: 95% CI). The 95% CI of the distribution of the relative difference between the maximum and minimum me across all bootstrap samples, δme, is given on top. Other details are as in Fig. 2 B and C. For other population pairs, see SI Appendix, Fig. S26. Between-popul., between population.
We had little power to infer precise point estimates for τ, but lower bounds of the 95% CIs were 10 Mya. It seems unlikely that the two ecotypes persisted for so long, and so our parameter estimates should be interpreted as a long-term average over a potentially more complex scenario.
To assess whether the selection against gene flow we found is specific to serpentine×off-serpentine comparisons (REM×SOD, SLP×TUL), we also fit our model for the two long-distance off-serpentine×off-serpentine configurations (SOD×TUL, TUL×SOD) and the long-distance serpentine×off-serpentine pairs (REM×TUL, SLP×SOD). Interestingly, we inferred selection densities, durations of the MS phase, and migration rates on the same order as those estimated for the short-distance serpentine×off-serpentine comparisons (SI Appendix, Fig. S26 and Dataset S5). The signal we detected may therefore have little to do with local adaptation to serpentine per se, and not be specific to the history of particular pairs of populations. This interpretation is corroborated by the fact that, when pooling all nonfocal populations to a joint source of gene flow, we observed a similar, if not even stronger, signal of selection against migrants (SI Appendix, Fig. S27 and Dataset S5). Given the long time τ over which this selection appears to have acted, our estimates may reflect adaptive divergence between M. guttatus populations in response to locally varying conditions other than serpentine soil (e.g., refs. 2729). Our results could also imply that adaptation to serpentine has a simple genetic basis, because our approach only has power to detect a signal that is due to polygenic divergent selection acting across the entire genome.

Persistence of Species Barrier to M. nasutus.

Where M. guttatus has come into secondary contact with M. nasutus, a selfing sister species, hybridization occurs, despite strong reproductive barriers (30). A previous genome-wide analysis identified large genomic blocks of recent introgression from M. nasutus into sympatric M. guttatus populations (14). By using 100-kb genomic windows, this previous study also found a negative correlation between absolute divergence (πB=πGut×Nas) and recombination rate (rbp) in sympatric, but not allopatric, comparisons, as would be expected if there were selection against hybrids. Reanalyzing these data (the GutNas dataset; SI Appendix, part 2), we replicate this pattern of correlation. However, if we included gene density as a covariate, all previously negative partial correlations between rbp and πB became nonsignificant (SI Appendix, Fig. S21A). This observation might indicate that the positive correlation between recombination and gene density could have camouflaged an underlying signal of BGS in the source population, as was the case with the serpentine dataset above. To test this hypothesis, we fit a model of BGS in M. nasutus, but found no evidence for BGS (SI Appendix, part 4) (cf. ref. 14).
Applying our method to 100-kb windows, we indeed found a significant signal of selection against hybrids for one sympatric pair (CAC×Nas; σ^7.4×104/Mb), yet no signal in the other (DPR×Nas). We also inferred significant selection against gene flow in one of the allopatric comparisons (SLP×Nas; σ^1.3×104/Mb). The signal of selection in SLP×Nas could be due to the fact that, although allopatric, SLP is geographically close to M. nasutus populations (14). We might therefore be detecting selection against ongoing gene flow over a longer distance, or against past gene flow that has stopped only recently. Because levels of recent introgression are much lower in SLP than in the sympatric populations (AHQ and CAC) (14), the second explanation is more plausible. Indeed, repeating our analyses with blocks of recent introgression excluded, we found that the signal of selection against hybrids remained for SLP×Nas, but disappeared for sympatric comparisons (SI Appendix, Figs. S21B and S30).
Our estimates of m for CAC and SLP imply that selection maintains the species barrier against a baseline migration rate of 106 (i.e., 1.0 and 0.7 diploid introgressing genomes per generation, respectively) (SI Appendix, Fig. S28 and Dataset S6). With blocks of recent introgression excluded, our point estimate of m obtained with 100-kb windows dropped by a factor of 2.8 for CAC×Nas (Dataset S6), consistent with the removal of a substantial part of recently introgressed DNA. In contrast to the serpentine dataset, our results for the GutNas dataset were sensitive to the choice of window size. For 500- and 1,000-kb windows, the uncertainty in parameter estimates was higher (SI Appendix, Figs. S29 and S31 and Dataset S6).
With 100-kb windows and blocks of recent introgression included, lower 95%-CI limits for τ were all above 250 kya. Point estimates were between 550 kya (AHQ×Nas) and 1.6 Mya (DPR×Nas) (Dataset S6). These estimates are somewhat above a previous estimate of 196 kya for the divergence time between M. guttatus and M. nasutus (14). Our older estimates of τ are compatible with divergent selection acting already in the ancestral, geographically structured M. guttatus clade (14).

Discussion

The genomes of incompletely isolated species and locally adapted populations have long been thought of as mosaics of regions with high and low divergence (31, 32). This pattern is due in part to variation in effective gene flow along the genome, created by an interaction of divergent selection and recombination (33, 34). The recent explosion of genome-wide DNA sequencing data allows us to directly observe this mosaic and has spurred theoretical and empirical studies aiming to better understand the mechanisms underlying local adaptation and speciation (e.g., refs. 3538). However, an explicit, model-based framework linking observed genome-wide patterns of divergence with the underlying mechanism has hitherto been missing.
Here, we developed such a framework by merging the concept of effective migration rate with coalescence theory. We showed that a genome-wide negative correlation of between-population diversity with recombination rate (14, 17) can be described by the compound parameter “selection density,” such that very different genomic mosaic patterns are compatible with the same aggregate effect of divergent selection and gene flow: A large number of weak genetic barriers to gene flow (MSPs) are equivalent to a much smaller number of strong barriers. Our approach is most sensitive to polygenic selection and therefore complements existing genome scans for empirical outliers of population divergence (3942), which tend to identify only strong barriers to gene flow. It also provides a better null model for such genome scans, because outliers could be judged against the appropriate background level of divergence.
Our approach is inspired by earlier work exploiting the genome-wide relationship between recombination rate and genetic diversity within a population for quantitative inference about genetic hitchhiking (43, 44) and BGS (45, 46). In fact, we have used an established model of BGS (47, 48) to filter out any confounding effect of BGS and gene density in a first step, before fitting our model of divergent selection against gene flow to the relationship between recombination rate and diversity between populations.
We have assumed that MSPs occur at a constant rate ν along the genome. This assumption could be relaxed by making ν depend on the functional annotation of genomes (e.g., exon coordinates), which might allow ν and s to be estimated separately (49). We explored this solution heuristically for the serpentine dataset by setting ν proportional to the local density of exonic sites. Point estimates of σ were on the same order as before (1010 to 108.25). However, the 95% CIs became much wider, and the variance explained (R2) dropped from 15 to 5%. This reduced goodness of fit might suggest that selection against gene flow is not acting exclusively on coding (exonic) variation.
Our model currently does not account for the clustering of locally adaptive mutations arising in tight linkage to previously established MSPs and the synergistic sheltering effect among MSPs that protects them from being swamped by gene flow (20, 50). If accounted for, this clustering would lead to an even more pronounced uptick of between-population diversity in regions of low recombination. Therefore, one might be able to use deviations from our current model in regions of low recombination as a way of detecting the presence of clustering in empirical data. At the very least, our parameter estimates would indicate whether and in what genomic regions one should expect clustering of MSPs to have evolved.
An inherent limitation of our approach is that enough time must have passed for between-population divergence to accumulate. Otherwise, there is no power to detect variation in divergence among genomic regions. This limitation constrains the temporal resolution of our method, in particular if the duration of the MS phase is short, or if strong reproductive isolation evolved so quickly that gene flow was completely and rapidly reduced across the entire genome. Another potential limitation is a relatively low resolution to infer the duration of the MS phase. A genome-wide negative correlation of recombination rate with between-population diversity will persist for a long time, even after gene flow has come to a complete halt, because subsequent neutral divergence will just add uniformly to the existing pattern. Our inference approach should therefore still provide good estimates of the strength of selection and gene flow, even after speciation has completed, as long as these estimates are interpreted as averages over the inferred time τ. In this sense, our approach is likely robust to the specifics of the most recent demographic history of the populations or species of interest. To better resolve the timing of events, we suggest using the additional information contained in the entire distribution of pairwise coalescence times (and pairwise sequence differences), rather than relying on their mean, as we currently do.
The opposing roles of gene flow and selection in speciation and local adaptation have a long and contentious history in evolutionary biology and population-genetics theory (3, 51). We anticipate that the type of genome-wide quantitative inference developed here, applied to the growing amount of whole-genome polymorphism and recombination data, will help to resolve how gene flow is constraining divergent selection.

Materials and Methods

In Eq. 2, D=Ei[(1gf)mτ]Ei[(1g)mτ] and F=Ei[gmτ]Ei[gfmτ]+Ei[νkf]Ei[νk], where Ei[z]=zet/tdt is the exponential integral. Here, gf=[1+s/rf]1 and g=[1+s/(krbp)]1 are the contributions to the reduction in gene flow if the MSP is unlinked (k1=rf/rbp) or fully linked (k1=k, 0<k1/[rbpτ]); k is a small positive lower limit for the physical distance to the MSP. For details of our model, theory, and simulations, see SI Appendix, part 1. Statistical data analyses, bias corrections, and the inference procedure are described in SI Appendix, part 3. The Mimulus datasets (sampling design, DNA sequencing, and quality filtering) and the linkage map are discussed in SI Appendix, part 2. For complementary results, including tests of partial correlation between diversity and recombination rate, as well as the inference of BGS, see SI Appendix, part 4.

Acknowledgments

We thank Ben Blackman for help with collections; and Yaniv Brandvain, Lex Flagel, Amanda Kenney, Andrea Sweigart, Kevin Wright, Chenling Xu, and members of the G.C., Ross-Ibarra, and Schmitt laboratories at University of California, Davis for discussions and comments that improved our study. This work was supported by National Institute of General Medical Sciences of the National Institutes of Health Grants R01-GM083098 and R01-GM108779 (to G.C.); National Science Foundation (NSF) Grant 1354688 (to J.H.W.); NSF Grant 1353380 (to G.C.); NSF DDIG Grant 1110753 (to J.P.S.); and Swiss National Science Foundation Advanced Postdoc Mobility Fellowship P300P3_154613 (to S.A.). The computational results presented have been achieved in part by using the Vienna Scientific Cluster.

Supporting Information

Appendix (PDF)
Dataset_S01 (XLSX)
Dataset_S02 (XLSX)
Dataset_S03 (XLSX)
Dataset_S04 (XLSX)
Dataset_S05 (XLSX)
Dataset_S06 (XLSX)

References

1
JA Endler, Gene flow and population differentiation. Science 179, 243–250 (1973).
2
E Mayr, What is a species, and what is not? Philos Sci 63, 262–277 (1996).
3
JA Coyne, HA Orr Speciation (Sinauer Associates, 1st Ed, Sunderland, MA) Vol 37 (2004).
4
BA Payseur, LH Rieseberg, A genomic perspective on hybridization and speciation. Mol Ecol 25, 2337–2360 (2016).
5
LAF Frantz, O Madsen, HJ Megens, MAM Groenen, K Lohse, Testing models of speciation from genome sequences: Divergence and asymmetric admixture in island South-East Asian sus species during the plio-pleistocene climatic fluctuations. Mol Ecol 23, 5566–5574 (2014).
6
DA Filatov, OG Osborne, AS Papadopoulos, Demographic history of speciation in a Senecio altitudinal hybrid zone on Mt. Etna. Mol Ecol 25, 2467–2481 (2016).
7
A Kousathanas, et al., Likelihood-free inference in high-dimensional models. Genetics 203, 893–904 (2016).
8
MA Beaumont, RA Nichols, Evaluating loci for use in the genetic analysis of population structure. Proc R Soc Lond B Biol Sci 263, 1619–1626 (1996).
9
M Foll, O Gaggiotti, A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: A Bayesian perspective. Genetics 180, 977–993 (2008).
10
NJ Nadeau, et al., Genomic islands of divergence in hybridizing Heliconius butterflies identified by large-scale targeted sequencing. Philos Trans R Soc Lond B Biol Sci 367, 343–353 (2012).
11
J Hermisson, Who believes in whole-genome scans for selection? Heredity (Edinb) 103, 283–284 (2009).
12
TE Cruickshank, MW Hahn, Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Mol Ecol 23, 3133–3157 (2014).
13
A Keinan, D Reich, Human population differentiation is strongly correlated with local recombination rate. PLoS Genet 6, e1000886 (2010).
14
Y Brandvain, AM Kenney, L Flagel, G Coop, AL Sweigart, Speciation and introgression between Mimulus nasutus and Mimulus guttatus. PLoS Genet 10, e1004410 (2014).
15
A Geraldes, P Basset, KL Smith, MW Nachman, Higher differentiation among subspecies of the house mouse (Mus musculus) in genomic regions with low recombination. Mol Ecol 20, 4722–4736 (2011).
16
B Charlesworth, Measures of divergence between populations and the effect of forces that reduce variability. Mol Biol Evol 15, 538–543 (1998).
17
MW Nachman, BA Payseur, Recombination rate variation and speciation: Theoretical predictions and empirical results from rabbits and mice. Philos Trans R Soc Lond B Biol Sci 367, 409–421 (2012).
18
T Slotte, The impact of linked selection on plant genomic variation. Brief Funct Genomic 13, 268–275 (2014).
19
BO Bengtsson, The flow of genes through a genetic barrier. Evolution: Essays in Honour of John Maynard Smith, eds PJ Greenwood, P Harvey, M Slatkin (Cambridge Univ Press, New York) Vol 1, 31–42 (1985).
20
S Aeschbacher, R Bürger, The effect of linkage on establishment and survival of locally beneficial mutations. Genetics 197, 317–336 (2014).
21
M Notohara, The coalescent and the genealogical process in geographically structured population. J Math Biol 29, 59–75 (1990).
22
CA Wu, et al., Mimulus is an emerging model system for the integration of ecological and genomic studies. Heredity (Edinb) 100, 220–230 (2008).
23
S Harrison, H Safford, J Wakabayashi, Does the age of exposure of serpentine explain variation in endemic plant diversity in California? Int Geol Rev 46, 235–242 (2004).
24
M Slatkin, A measure of population subdivision based on microsatellite allele frequencies. Genetics 139, 457–462 (1995).
25
J Selby, The genetic basis of local adaptation to serpentine soils in Mimulus guttatus. PhD dissertation (Duke University, Durham, NC). (2014).
26
E Palm, K Brady, E Van Volkenburgh, Serpentine tolerance in Mimulus guttatus does not rely on exclusion of magnesium. Funct Plant Biol 39, 679–688 (2012).
27
DB Lowry, MC Hall, DE Salt, JH Willis, Genetic and physiological basis of adaptive salt tolerance divergence between coastal and inland Mimulus Guttatus. New Phytol 183, 776–788 (2009).
28
NJ Kooyers, AB Greenlee, JM Colicchio, M Oh, BK Blackman, Replicate altitudinal clines reveal that evolutionary flexibility underlies adaptation to drought stress in annual Mimulus guttatus. New Phytol 206, 152–165 (2015).
29
KM Wright, et al., Adaptation to heavy-metal contaminated environments proceeds via selection on pre-existing genetic variation. bioRxiv, 10.1101/029900. (2015).
30
AM Kenney, AL Sweigart, Reproductive isolation and introgression between sympatric Mimulus species. Mol Ecol 25, 2499–2517 (2016).
31
LH Rieseberg, J Whitton, K Gardner, Hybrid zones and the genetic architecture of a barrier to gene flow between two sunflower species. Genetics 152, 713–727 (1999).
32
RG Harrison, EL Larson, Heterogeneous genome divergence, differential introgression, and the origin and structure of hybrid zones. Mol Ecol 25, 2454–2466 (2016).
33
N Barton, BO Bengtsson, The barrier to genetic exchange between hybridising populations. Heredity (Edinb) 57, 357–376 (1986).
34
RG Harrison, Pattern and process in a narrow hybrid zone. Heredity (Edinb) 56, 337–349 (1986).
35
M Kronforst, C Salazar, M Linares, L Gilbert, No genomic mosaicism in a putative hybrid butterfly species. Proc R Soc Lond B Biol Sci 274, 1255–1264 (2007).
36
R Bürger, A Akerman, The effects of linkage and gene flow on local adaptation: A two-locus continent–island model. Theor Popul Biol 80, 272–288 (2011).
37
S Via, G Conte, C Mason-Foley, K Mills, Localizing FST outliers on a QTL map reveals evidence for large genomic regions of reduced gene exchange during speciation-with-gene-flow. Mol Ecol 21, 5546–5560 (2012).
38
J Hemmer-Hansen, et al., A genomic island linked to ecotype divergence in Atlantic cod. Mol Ecol 22, 2653–2667 (2013).
39
MA Beaumont, DJ Balding, Identifying adaptive genetic divergence among populations from genome scans. Mol Ecol 13, 969–980 (2004).
40
JL Strasburg, et al., What can patterns of differentiation across plant genomes tell us about adaptation and speciation? Philos Trans R Soc Lond B Biol Sci 367, 364–373 (2012).
41
KE Lotterhos, MC Whitlock, The relative power of genome scans to detect local adaptation depends on sampling design and statistical method. Mol Ecol 24, 1031–1046 (2015).
42
RJ Haasl, BA Payseur, Fifteen years of genomewide scans for selection: Trends, lessons and unaddressed genetic sources of complication. Mol Ecol 25, 5–23 (2016).
43
DJ Begun, CF Aquadro, Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature 356, 519–520 (1992).
44
TH Wiehe, W Stephan, Analysis of a genetic hitchhiking model, and its application to DNA polymorphism data from Drosophila melanogaster. Mol Biol Evol 10, 842–854 (1993).
45
M Nordborg, B Charlesworth, D Charlesworth, The effect of recombination on background selection. Genet Res 67, 159–174 (1996).
46
B Charlesworth, M Nordborg, D Charlesworth, The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. Genet Res 70, 155–174 (1997).
47
RR Hudson, NL Kaplan, Deleterious background selection with recombination. Genetics 141, 1605–1617 (1995).
48
RB Corbett-Detig, DL Hartl, TB Sackton, Natural selection constrains neutral diversity across a wide range of species. PLoS Biol 13, e1002112 (2015).
49
I Jurić, S Aeschbacher, G Coop, The strength of selection against Neanderthal introgression. PLoS Genet 12, e1006340 (2016).
50
S Yeaman, MC Whitlock, The genetic architecture of adaptation under migration-selection balance. Evolution 65, 1897–1911 (2011).
51
M Turelli, NH Barton, JA Coyne, Theory and speciation. Trends Ecol Evol 16, 330–343 (2001).

Information & Authors

Information

Published in

The cover image for PNAS Vol.114; No.27
Proceedings of the National Academy of Sciences
Vol. 114 | No. 27
July 3, 2017
PubMed: 28634295

Classifications

Submission history

Published online: June 20, 2017
Published in issue: July 3, 2017

Keywords

  1. speciation with gene flow
  2. local adaptation
  3. recombination
  4. divergence
  5. Mimulus

Acknowledgments

We thank Ben Blackman for help with collections; and Yaniv Brandvain, Lex Flagel, Amanda Kenney, Andrea Sweigart, Kevin Wright, Chenling Xu, and members of the G.C., Ross-Ibarra, and Schmitt laboratories at University of California, Davis for discussions and comments that improved our study. This work was supported by National Institute of General Medical Sciences of the National Institutes of Health Grants R01-GM083098 and R01-GM108779 (to G.C.); National Science Foundation (NSF) Grant 1354688 (to J.H.W.); NSF Grant 1353380 (to G.C.); NSF DDIG Grant 1110753 (to J.P.S.); and Swiss National Science Foundation Advanced Postdoc Mobility Fellowship P300P3_154613 (to S.A.). The computational results presented have been achieved in part by using the Vienna Scientific Cluster.

Notes

This article is a PNAS Direct Submission.

Authors

Affiliations

Department of Evolution and Ecology, University of California, Davis, CA 95616;
Institute of Ecology and Evolution, University of Bern, 3012 Bern, Switzerland;
Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland;
Jessica P. Selby
Department of Biology, Duke University, Durham, NC 27708
John H. Willis
Department of Biology, Duke University, Durham, NC 27708
Graham Coop
Department of Evolution and Ecology, University of California, Davis, CA 95616;

Notes

1
To whom correspondence should be addressed. Email: [email protected].
Author contributions: S.A. and G.C. designed research; S.A. performed research; S.A. analyzed data; J.P.S. and J.H.W. designed the sampling; J.P.S. processed samples; S.A. and J.P.S. performed bioinformatic analyses; and S.A., J.P.S., J.H.W., and G.C. wrote the paper.

Competing Interests

The authors declare no conflict of interest.

Metrics & Citations

Metrics

Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.


Citation statements




Altmetrics

Citations

Export the article citation data by selecting a format from the list below and clicking Export.

Cited by

    Loading...

    View Options

    View options

    PDF format

    Download this article as a PDF file

    DOWNLOAD PDF

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to access the full text.

    Single Article Purchase

    Population-genomic inference of the strength and timing of selection against gene flow
    Proceedings of the National Academy of Sciences
    • Vol. 114
    • No. 27
    • pp. 6873-E5485

    Figures

    Tables

    Media

    Share

    Share

    Share article link

    Share on social media