The limits of long-term selection against Neandertal introgression

Several studies have suggested that introgressed Neandertal DNA was subjected to negative selection in modern humans due to deleterious alleles that had accumulated in the Neandertals after they split from the modern human lineage. A striking observation in support of this is an apparent monotonic decline in Neandertal ancestry observed in modern humans in Europe over the past 45 thousand years. Here we show that this apparent decline is an artifact caused by gene flow between West Eurasians and Africans, which is not taken into account by statistics previously used to estimate Neandertal ancestry. When applying a more robust statistic that takes advantage of two high-coverage Neandertal genomes, we find no evidence for a change in Neandertal ancestry in Western Europe over the past 45 thousand years. We use whole-genome simulations of selection and introgression to investigate a wide range of model parameters, and find that negative selection is not expected to cause a significant long-term decline in genome-wide Neandertal ancestry. Nevertheless, these models recapitulate previously observed signals of selection against Neandertal alleles, in particular a depletion of Neandertal ancestry in conserved genomic regions that are likely to be of functional importance. Thus, we find that negative selection against Neandertal ancestry has not played as strong a role in recent human evolution as had previously been assumed.


Introduction
Interbreeding between Neandertals and modern humans approximately 55,000 years ago has resulted in all present-day non-Africans inheriting at least 1-2% of their genomes from Neandertal ancestors (1,2). There is significant heterogeneity in the distribution of this Neandertal DNA across the genomes of present-day people (3,4), including a reduction in Neandertal alleles in functionally constrained genomic regions (3). This has been interpreted as evidence that some Neandertal alleles were deleterious for modern humans and were subject to negative selection following introgression (3,5). Several studies have suggested that low effective population size (N e ) in Neandertals led to decreased efficacy of purifying selection and accumulation of weakly deleterious variants.
Following introgression, these deleterious alleles, along with linked neutral Neandertal alleles, would have been subjected to more efficient purifying selection in the modern human population (6,7).
In apparent agreement with this hypothesis, a study of Neandertal ancestry in a set of anatomically modern humans from Upper-Paleolithic Europe used two independent statistics to conclude that the amount of Neandertal DNA in modern human genomes decreased monotonically over the last 45 thousand years (8). This decline was interpreted as direct evidence for continuous negative selection against Neandertal alleles in modern humans (8)(9)(10)(11). However, it was not formally shown that selection on deleterious introgressed variants could produce a decline in Neandertal ancestry of the observed magnitude. Nevertheless, the decreasing Neandertal ancestry in modern humans over time, together with the suggestion of a higher burden of deleterious alleles in Neandertals are now commonly invoked to explain the fate of Neandertal ancestry in modern humans (9)(10)(11)(12).
Here, we re-examine estimates of Neandertal ancestry in ancient and present-day modern humans, taking advantage of a second high-coverage Neandertal genome that recently became available (13). This allows us to avoid some key assumptions about modern human demography that were made in previous studies. Our analysis shows that the Neandertal ancestry proportion in non-Africans has not decreased significantly through the last 45,000 years. We then compare this result to realistic, genome-scale simulations, and confirm that under a model of weak selection against Neandertal alleles, after an initial sharp decrease in the Neandertal ancestry in modern humans, an indefinite period during which this ancestry remains constant in the population is expected.

Previous Neandertal ancestry estimate
A number of methods have been developed to quantify Neandertal ancestry in modern human genomes (14). Among the most widely used is the f 4 -ratio statistic, which measures the fraction of drift shared with one of two parental lineages to determine the proportion of ancestry α contributed by that lineage (Fig. 1C and 1D) (15,16). Although it has been used to draw inferences about gene-flow between archaic and modern human populations, f 4 -ratio statistics are known to be sensitive to violations of the underlying population model (15). Estimating α, the proportion of ancestry contributed by lineage A to X, requires a sister lineage B to lineage A which does not share drift with X after separation of B from A (Fig. 1C, 1D). Fu et al. used an f 4 -ratio statistic to infer the contribution from an archaic lineage by first estimating the proportion of East African ancestry in a non-African individual X, under the assumption that Central and West Africans (A) are an outgroup to the East African lineage (B) and to the modern human ancestry in non-Africans (8). The proportion of Neandertal ancestry was then calculated simply as 1 -α, under the assumption that all ancestry that is not of East African origin must come from an archaic lineage (Fig. 1C). We refer to this statistic as an "indirect f 4ratio".
Given the sensitivity of the f 4 -ratio method to violations of the underlying population models (15), we explored the validity of assumptions on which this calculation was based.
In addition to the topology of the demographic tree, the indirect f 4 -ratio assumes that the relationship between Africans and West Eurasians has remained constant over time (8).
To test for gene-flow between Africans and West Eurasians over time, we calculated D statistics in the form of D(Ust'-Ishim, X; African population, Chimp), which tests for changes in affinity (i.e. the number of shared derived alleles) between a series of West Eurasians (X) and Africans with respect to Ust'-Ishim -a 45,000 year-old Eurasian individual who is the oldest modern human genome in this dataset. In the absence of gene-flow between a West Eurasian lineage X and Africans in this time-frame, the value of this D statistic will not be significantly different from 0 for any of the West Eurasians regardless of their age. However, in violation of the assumption of the indirect f 4 -ratio, we find that starting from as early as 20 thousand years ago (kya) this D statistic becomes increasingly negative, consistent with gene-flow between West Eurasians and Africans during this time (Fig. 2). To rule out potential technical issues, we repeated this analysis substituting Oceanians for Africans and find no changes in affinity over time (Fig. 2). We note that while the increase in affinity between African and West Eurasian populations shows that the assumption of a constant relationship between these populations over time is incorrect, it is not informative about the direction of the gene-flow.

A robust statistic to estimate Neandertal ancestry
The recent availability of a second high-coverage Neandertal genome allows us to estimate Neandertal ancestry using two Neandertals -an individual from the Altai Mountains, the so-called "Altai Neandertal" (17) and an individual from Vindija Cave in Croatia, the so-called "Vindija Neandertal" (13). Specifically, we estimate the proportion of ancestry coming from the Vindija lineage into a modern human (X) using the Altai Neandertal as a second Neandertal in an f 4 -ratio calculated as f 4 (Altai, Chimp; X, African) / f 4 (Altai, Chimp; Vindija, African) (Fig. 1D), which we refer to as a "direct f 4 -ratio". Note that unlike the indirect f 4 -ratio, the f 4 -ratio in this formulation does not make assumptions about deep relationships between modern human populations (Fig. 1D). Instead, it assumes that any Neandertal population that contributed ancestry to West Eurasians formed a clade with the Vindija Neandertal population. Recent analyses showed that this is the case for all non-African populations studied to date, including the ancient modern humans included in this study (13,18). Crucially, when we use the direct f 4  This result has several implications for our understanding of the fate of Neandertal ancestry in modern humans. First, it constrains the timescale during which selection could have significantly affected the average genome-wide Neandertal ancestry in modern humans, an issue addressed below in more detail. Second, it has consequences for the so-called "dilution" hypothesis, which suggests that the larger proportion of Neandertal ancestry in East Eurasians compared to West Eurasians can be explained by West Eurasian admixture with a "Basal Eurasian" population who may have carried less Neandertal ancestry than other non-Africans (19,20). We find that a statistic informative for Basal Eurasian ancestry (f 4 (West Eurasian W, East Asian X; Ust'-Ishim, Chimp)), is significantly negative in present-day West Eurasians (Fig. S1), implying that all West Eurasians have a contribution from a lineage that predates the split of East and West Eurasians. However, our finding that there is no significant decline in Neandertal ancestry in West Eurasians suggests that admixture with this population did not influence their Neandertal ancestry in a significant way (Fig. 1B). This is consistent with a previous analysis of an early European farmer from Germany, who was found to have carried a similar amount of Neandertal ancestry to present-day Europeans (~1.8%), despite having a significant proportion of her genome derived from Basal Eurasian ancestry (44%) (21).

Testing the robustness of Neandertal ancestry statistics
Gene-flow between West Eurasians and Africans is evident in our D statistic results (Fig.   2), and has also been detected by earlier studies of present-day Africans (22,23).
Crucially, such gene-flow violates the assumptions of the indirect f 4 -ratio, but it is not clear exactly how this statistic will be affected. Furthermore, the direction of gene-flow may be of interest, yet cannot be determined by the D-statistics presented above. Thus, we simulated several scenarios of migration between West Eurasians and Africans post-Neandertal introgression, and calculated both direct and indirect f4-ratios on these simulated sequences, along with the true Neandertal proportion (Fig. 3).
In short, we find that even moderate levels of gene flow from West Eurasians into Africans (0.0001 migrants per generation, starting at 20 kya) lead to mis-estimates of Neandertal ancestry when using the indirect f 4 -ratio statistic (Fig. 3C). Such scenarios result in depressed estimates of Neandertal ancestry, with this effect being more pronounced in simulated genomes sampled closer to the present day, incorrectly resulting in an apparent decline in Neandertal ancestry over time. This is qualitatively consistent with the apparent decline in Neandertal ancestry when using the indirect f 4ratio on real West Eurasian individuals, suggesting that the previous observations are artifacts produced by gene flow from West Eurasia into Africa.
In contrast, migration only from Africa to West Eurasia results in a true decline of Neandertal ancestry, due to its replacement by modern human alleles. In this scenario, the true decline is accurately estimated by both the indirect and direct f 4 -ratios (Fig. 3A).
In scenarios of bi-directional migration, this true decline is only accurately measured by the direct f 4 -ratio (Fig. 3B).
An independent statistic, using a different set of genomic sites in the same ancient individuals, has been used as a second line of evidence for an ongoing decrease in Neandertal ancestry (8). This statistic, which we refer to as the "admixture array statistic", measures the proportion of Neandertal-like alleles in a given sample at sites where present-day Yoruba individuals carry a nearly-fixed allele that differs from homozygous sites in the Altai Neandertal (24). Using the simulations discussed above, we selected genomic sites using the same conditioning. We then calculated the proportion of Neandertal-like alleles in each simulated West Eurasian genome. Much like the indirect f 4statistic, the admixture array statistic is affected by gene flow from West Eurasians into Africans, incorrectly inferring a decline of Neandertal ancestry (Fig. 3C).
We note that the direct f 4 -ratio is not immune to the effects of migration from West Eurasia to Africa. Specifically, the direct f 4 -ratio measures the amount of Neandertal ancestry in West Eurasians in excess of any Neandertal ancestry present in Africans.
Thus, it is likely that even our updated estimates of Neandertal ancestry are in fact underestimates.

Long-term dynamics of selection against introgressed DNA
Our observation that Neandertal ancestry levels did not significantly decrease from about 40,000 years ago until today is seemingly at odds with the hypothesis that lower effective population sizes in Neandertals led to an accumulation of deleterious alleles, which were then subjected to negative selection in modern humans (3,(8)(9)(10). To investigate the The depletion of Neandertal ancestry around functional genomic elements in modern human genomes has also been taken as evidence for selection against Neandertal introgressed DNA (3,8). We next examined the genomic distribution of Neandertal markers at different time-points in our simulations to determine whether our models can recapitulate these signals. In agreement with empirical results in present-day humans (3), we found a strong negative correlation between the proportion of Neandertal introgression surviving at a locus and distance to the nearest region under selection (Fig.   4C). Furthermore, we found that the strength of this correlation increases over time, with the bulk of these changes occurring between 10 and 400 generations post-admixture (mean Pearson's correlation coefficient ρ = 0.026, 0.8, 0.96 at generations 10, 400 and 2200, respectively (Fig. S9)). We note that this time period predates all existing ancient modern human sequences, frustrating any current comparison to empirical data. Despite no apparent change in average Neandertal ancestry proportion, we observe a smaller though still significant decrease in linked Neandertal ancestry during the time period for which modern human sequences exist (approximately 400-2200 generations postadmixture) (Fig. 4C, 4B). Indeed, by looking at the average per-generation changes in frequencies of simulated Neandertal mutations (that is, derivatives of allele frequencies in each generation), we observe the impact of negative selection on linked neutral Neandertal markers until at least ~700 generations post admixture (Fig. 4D), and find that it closely follows the pattern of frequency derivatives of introgressed deleterious mutations (Fig. 4D). After this period of gradual removal, selection against linked neutral variation slows down significantly as genome-wide Neandertal ancestry becomes largely unlinked from regions that are under negative selection (Fig. 4D). In contrast, the selected variants themselves are still removed, although at increasingly slower rates (Fig. 4D). Due to this slow rate, and the small contribution these alleles make to the overall levels of Neandertal ancestry, their continued removal has little impact on the slope of Neandertal ancestry over time.

Conclusions
Our re-evaluation of Neandertal ancestry in modern human genomes indicates that overall levels of Neandertal ancestry in West Eurasia have not significantly decreased over the past 45 thousand years, and that previous observations of continuous Neandertal ancestry decline were likely an artifact of unaccounted-for gene-flow from West Eurasia into Africa. Furthermore, we find that negative selection against introgression is expected to have the strongest impact on genome-wide Neandertal ancestry during the first few hundred generations, and becomes negligible in the time frame for which ancient samples are currently available.
Our findings can be extrapolated to other cases where one species or population contributes a fraction of ancestry to another species or population, a frequent occurrence in nature (5,(25)(26)(27)(28). Even in cases where the introgressing population carries a high burden of deleterious mutations, negative selection is not expected to result in an extended decrease in the overall genome-wide ancestry contributed by that population.
Therefore, any long-term shifts in overall ancestry proportions over time in such situations are likely to be the result of forces other than negative selection, for example admixture with one or more other populations.

Source code and computational notebooks
Complete source code for data processing and simulation pipelines, as well as R and Python Jupyter notebooks with all analyses, can be downloaded from the project repository on GitHub: https://www.github.com/bodkan/nea-over-time.

Data processing
SNP data captured at ~2.2 million loci (combination of SNP panels 1, 2, 3, described in Altai and Vindija Neandertal genotypes were converted from VCF to EIGENSTRAT format after filtering the data using the Map35_100% criteria described in (17). For comparisons with present-day populations, we used genotype calls published by the Simons Genome Diversity Project (SGDP) (29), and converted them to EIGENSTRAT format. All data were then combined into single EIGENSTRAT dataset using the "mergeit" command from the ADMIXTOOLS package (15). SNP data captured using the "archaic admixture array" (described as SNP panel 4 in (24)) published by Fu et al. (8) were also downloaded from the Reich lab website. To enrich for sites that truly originated in the Neandertals, we further restricted to sites homozygous in the Altai and Vindija Neandertal genomes, which results in a set of approximately 480 thousand sites carrying such nearly fixed Yoruba-Neandertal differences.

Admixture statistics
All D statistics, f 4 statistics and f 4 -ratio statistics were calculated on the merged 2.2 million loci EIGENSTRAT dataset using our R package admixr (available from https://www.github.com/bodkan/admixr, release v0.1) which utilizes the ADMIXTOOLS software suite for all underlying calculations (15).

Indirect f 4 -ratio Neandertal ancestry estimate
Indirect f 4 -ratio Neandertal ancestry estimates (Fig. 1A)  Archaics are the Altai Neandertal and the high-coverage Denisovan (30) individuals. This is the f 4 -ratio calculation used in the original Fu et al. study (8).

Direct f 4 -ratio Neandertal ancestry estimate
Direct f 4 -ratio estimates (Fig. 1B) were calculated on the same data as indirect f 4 -ratio estimates (see above), but using the following setup: f 4 (Altai, Chimp; X, African) / f 4 (Altai, Chimp; Vindija, African) for a combined set of African populations Yoruba, Dinka and Mbuti.

Admixture array proportion
Neandertal ancestry levels using the using the set of nearly-fixed African-Neandertal sites were calculated as a proportion of alleles in a test individual matching the allele seen in the Neandertals.

Affinity of EMH individuals towards present-day populations over time
We calculated a D statistic in the form D(Ust-Ishim, X; Y, Chimp), which tests for changes in the sharing of derived alleles between a series of West Eurasians (X) and population Y with respect to Ust'-Ishim (Fig. 2). Admixture between X and Y is expected to lead to an increase in the proportion of shared derived alleles. We set Y in our analyses to East, West and Central Africans and Oceanians, in which the value of the D statistic should not be significantly different from 0 if there is no admixture between X and Y.

Testing for the presence of Basal Eurasian ancestry
We used the statistic f 4 (West Eurasian W, East Asian or early hunter gatherer X; Ust'-Ishim, Chimp) to look for the presence of Basal Eurasian ancestry in a West Eurasian W ( Fig. S1) (19). This statistic tests if the data is consistent with a tree in which W and X lineages form a clade, which results in f 4 statistic not significantly different from 0.
Significantly negative values are evidence for an affinity between the W and X lineages, which has been most parsimoniously explained by W carrying some ancestry from a population that split off from the non-African lineage prior to the separation of Ust'-Ishim (19).

Simulations of selection
We used the simulation framework SLiM 2 (31) to build a realistic model of the human genome with empirical distributions of functional regions and selection coefficients, extending and generalizing a strategy previously applied by Harris and Nielsen (6). To In each simulation, we encoded those regions in a genomic structure in SLiM's Eidos programming language, maintaining the distances between them. In order to model the heterogeneity of recombination rate along a genome in our simulations, we used empirically estimated genetic distances between all simulated genomic features using a recombination map inferred by the HapMap project (http://hapmap.ncbi.nlm.nih.gov/downloads/recombination/2011-01_phaseII_B37/) (33).
To approximate a distribution of fitness effects (DFE) of new deleterious mutations, we used the DFE estimated from the frequency spectrum of human non-synonymous mutations, which represents a mixture of strongly, weakly and nearly-neutral mutations (34). The rate of accumulation of new mutations was set to 1x10 -8 per bp per generation.
The simulations themselves were performed in two steps (Fig. 4A), using a combination of human and Neandertal demographic models used in previous introgression studies (4,6).
In the first step, we simulated a simplified demography of modern humans and Neandertals prior to the introgression, starting with a burn-in period of 70,000 generations, to let the simulated genomes with mutations reach an equilibrium state (the length of this burn-in period was determined as 7 * ancestral human Ne, which was therefore set to a constant 10,000). The split of Neandertals and modern humans was set to 500,000 years ago, with N e of Neandertals and modern humans set to constant values of 1,000 and 10,000, respectively. This burn-in period was performed for each specific simulation scenario separately. At the end of the burn-in step, we simulated the split of African and non-African populations at 55 thousand years ago. Following the split, the non-African population experiences a bottleneck with N e = 1861 (as inferred by Gravel et al. (35)). All simulated individuals and accumulated mutations were saved to a population output file for use in the second step.
In the second step, we simulated a single pulse of admixture from Neandertals into the non-African population at a rate of 10%. To track Neandertal ancestry along simulated genomes through time, we placed 50,000 neutral Neandertal markers outside of any potentially functional sequence (which was determined as a union of all annotated Ensembl regions mentioned above) (Fig. 4A). The locations of these markers were randomly sampled from the set of nearly-fixed Yoruba-Neandertal differences present on the archaic admixture array (SNP panel 4 in (24)). Furthermore, to be able to track Neandertal ancestry within regions directly under negative selection, we placed additional set of fixed Neandertal markers within those regions (Fig. 4A).

Evaluating the effect of negative selection against introgression
All of the following analyses were performed on VCF outputs from 20 replicates of our  (x -y) / (a -b). This calculation was repeated for all 20 simulation replicates and mean frequency changes were plotted for each replicate separately (Fig. 4D).

Simulations of gene-flow between non-Africans and Africans
We simulated different scenarios of gene-flow between Africans and non-Africans after Neandertal introgression using the neutral coalescent programming framework msprime (37) (Fig. S2). We used the following demographic parameters: split of a chimpanzee lineage at 6 million years ago, split of Neandertals from anatomically modern humans at 500 kya, a split within Africa at 150 kya, and split of non-Africans from one of the two African lineages at 60 kya with a 5 ky long bottleneck of N e = 2000. We simulated a single 3% pulse of Neandertal admixture into a constant-size non-African population (N e = 10,000) at 55 kya. We sampled one chimpanzee chromosome, 4 Neandertal chromosomes sampled at 80 kya, single chromosomes from the non-African lineage sampled at time-points corresponding to dates of Upper-Paleolithic samples from our data, two sets of 50 chromosomes from the two present-day African populations and a set of 20 present-day non-African chromosomes. We simulated 500 Mb chromosomes using a mutation rate of 1x10 -8 mutations per bp per generation and a recombination rate of 1x10 -8 crossovers per bp per generation. We converted the sampled chromosomes into a table of all simulated SNPs (to represent genome-wide capture data similar to the set of 2.2 million sites from SNP panels 1, 2, 3 from (24)). For some analyses, we also generated a second set of SNPs by filtering only for sites carrying fixed African-Neandertal differences (to approximate the ascertainment of the archaic admixture array -SNP panel              Neandertal ancestry trajectories following entirely opposite patterns (6). Specifically, models with only recessive mutations lead to an initial increase of the Neandertal ancestry proportions due to heterosis (6). Due to these opposing effects of dominance, we investigated scenarios with different mixtures of dominance coefficients of deleterious mutations. We found that changing the ratios of recessive and additive mutations affects only the final baseline of Neandertal ancestry in the population, and does not lead to a steady decline in Neandertal ancestry over time.