New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology
Early history of Neanderthals and Denisovans
Edited by Richard G. Klein, Stanford University, Stanford, CA, and approved July 7, 2017 (received for review April 18, 2017)
This article has a reply. Please see:

Significance
Neanderthals and Denisovans were human populations that separated from the modern lineage early in the Middle Pleistocene. Many modern humans carry DNA derived from these archaic populations by interbreeding during the Late Pleistocene. We develop a statistical method to study the early history of these archaic populations. We show that the archaic lineage was very small during the 10,000 y that followed its separation from the modern lineage. It then split into two regional populations, the Neanderthals and the Denisovans. The Neanderthal population grew large and separated into largely isolated local groups.
Abstract
Extensive DNA sequence data have made it possible to reconstruct human evolutionary history in unprecedented detail. We introduce a method to study the past several hundred thousand years. Our results show that (i) the Neanderthal–Denisovan lineage declined to a small size just after separating from the modern lineage, (ii) Neanderthals and Denisovans separated soon thereafter, and (iii) the subsequent Neanderthal population was large and deeply subdivided. They also (iv) support previous estimates of gene flow from Neanderthals into modern Eurasians. These results suggest an archaic human diaspora early in the Middle Pleistocene.
Around 600 kya, Europe was invaded by large-brained hominins using Acheulean stone tools (1, 2). They were probably African immigrants, because similar fossils and tools occur earlier in Africa. They have been called archaic Homo sapiens, Homo heidelbergensis, and early Neanderthals, yet they remain mysterious. They may have been ancestors of Neanderthals and modern humans (3), or ancestors of Neanderthals only (4, 5), or an evolutionary dead end. According to this last hypothesis, they were replaced later in the Middle Pleistocene by a wave of African immigrants that separated Neanderthals from modern humans and introduced the Levallois stone tool tradition to Europe (6, 7). To address this controversy, we introduce a statistical method and use it to study genetic data of Africans, Eurasians, Neanderthals, and Denisovans.
Our method extends an idea introduced by Reich et al. (8, 9). Their “ABBA-BABA” statistics infer admixture from the frequency with which derived alleles are shared by pairs of samples. As we have shown (10), these estimators have large biases when populations receive gene flow from more than one source. The magnitudes of these biases depend on the sizes and separation times of ancestral populations. Our method avoids bias by estimating these parameters simultaneously.
To accomplish this, our method uses an expanded dataset. ABBA-BABA statistics summarize allele sharing by pairs of samples. We extend this approach to include larger subsets, such as trios of samples, and to use all available subsets. This opens a rich and heretofore unused window into population history.
Nucleotide Site Patterns
Although our method can accommodate complex models, we work here with a four-population model of history (Fig. 1A), which has broad empirical support (11, 12). In this model, Neanderthals (
(A) Population tree representing an African population,
The gene tree describes how genes coalesce within the tree of populations. Fig. 1B illustrates one of many possible gene trees. Although closely linked nucleotide sites tend to share the same gene tree, this is not the case for sites farther apart on the chromosome, and any set of autosomal sequence data will encompass a multitude of gene trees.
The gene tree determines opportunities for allele sharing among samples. For example, a mutation on the solid red branch in Fig. 1B would be present in
Results
We studied site-pattern frequencies in four populations at a time: an African population (
One set of 10 site-pattern frequencies is shown in Fig. 2A. About 30% of the nucleotide sites in these data exhibit the
(A) Open circles show relative frequencies (horizontal axis) of nucleotide sites exhibiting each site pattern (vertical axis) in four populations:
As noted above, our model of history (Fig. 1A) excludes gene flow from Denisovans into Eurasians. This is not a limitation of our method; it is motivated by the structure of the datasets under study. To see why, consider Fig. 2B. Note first that
The analysis proceeds in two stages: one to discover dependencies among parameters and a second one imposing constraints to cope with these dependencies. In stage 1, we fit an unconstrained model to the observed data and also to 50 bootstrap replicates. With the data in Fig. 2A, stage 1 revealed strong dependencies among several parameters (Fig. S1). For example, there is a positive relationship between
Covariation of estimates of
Scatter-plot matrix showing associations among pairs of parameters. Each point represents a bootstrap replicate in the YRI.CEU data from Fig. 2A.
Such associations make estimation difficult, because points along the regression line have similar effects on the data. To reduce such issues, stage 2 of our analysis uses associations in the bootstrap data to impose constraints. Each constraint replaces one parameter with its regression on several others, as described in Section S1.4. Because this involves ignoring some of the sampling variation, we do not estimate confidence intervals for constrained parameters.
To calibrate the molecular clock, we use published estimates of
All four analyses—YRI.CEU, YRI.CHB, LWK.CEU, and LWK.CHB—yield similar results. Estimates of Neanderthal admixture (
Estimates of Neanderthal admixture (
All estimates of
Could these results be artifacts of a misspecified model? Our model (Fig. 1A) requires that
To test this “boundary-compression” hypothesis, we used our simulation program legosim, which is described in Section S1.5. We simulated 50 datasets under the model implied by one set of estimates and then estimated parameters from each simulated dataset. The resulting data (Fig. 6) show how our estimator behaves in the absence of specification error. Our simulation algorithm ignores linkage disequilibrium and may therefore underestimate the widths of sampling distributions. Nonetheless, these widths are similar to those of the confidence intervals in Figs. 4 and 5, suggesting that the bias in our simulations is small. Thus, it is interesting that the spreads of
Population size estimates. All point estimates are based on stage 2 of the analysis. Confidence interval for
These simulations also show that estimates of
Our base model (Fig. 1A) omits several forms of gene flow that are known or suspected, and these omissions may have introduced bias. We therefore fitted four alternative models, as described in Section S3. None of these explains the surprising features of our estimates. We have found no way to explain these features as artifacts of a misspecified model.
Our estimate of the Neanderthal–Denisovan separation time is surprisingly old. The most recent whole-genome estimate of this parameter is 381 kya (ref. 14, table S12.2), which corresponds to 502 kya or 17,318 generations under our molecular clock. To determine the cause of this inconsistency, we fitted a model in which
Poor fit of two constrained models. Horizontal axis shows deviation of fitted from observed site-pattern frequencies under two constraints:
Our estimate of
Our own date estimates inherit the uncertainty of the molecular clock. Using the YRI.CEU data, our point estimate of the Neanderthal–Denisovan separation time is 744 kya. Many authors prefer a higher mutation rate of
Discussion
These results contradict current views about Neanderthal population history. For example, Prüfer et al. (14) estimate that the Neanderthal population was very small—declining toward extinction. This view receives additional support from research showing elevated frequencies of nonsynonymous (and presumably deleterious) mutations among Neanderthals (22⇓–24). This abundance of deleterious alleles implies that drift was strong and thus that population size was small. Yet our estimate of Neanderthal population size is large—in the tens of thousands.
To reconcile these views, we suggest that the Neanderthal population consisted of many small subpopulations, which exchanged mates only rarely. In such a population, the effective size of the global population can be large, even if each local population is small (25). A sample from a single subpopulation would show a misleading signal of gradual population decline, even if the true population were constant (26). Furthermore, there is direct evidence of large genetic differences among Neanderthal populations (22, 27). Finally, the rich and widespread fossil record of Neanderthals is hard to reconcile with the view that their global population was tiny. We suggest that previous research has documented the small size of local Neanderthal populations, whereas our own findings document the large effective size of the metapopulation that contributed genes to modern humans.
This interpretation implies that at least some of the Neanderthals who contributed to the modern gene pool were distant relatives of the Altai Neanderthal. On the other hand, there is also evidence of gene flow from moderns into the Altai Neanderthal (28). This suggests contact between modern humans and at least two groups of Neanderthals: one that was ancestral to the Altai fossil and one or more others whose relationship to Altai was distant.
As discussed above, our results also disagree with previous estimates of the Neanderthal–Denisovan separation time. On the other hand, Meyer et al. (29) show that 430 ky-old fossils from Sima de los Huesos, Spain are more closely related to Neanderthals than to Denisovans. This implies an early separation of the two archaic lineages. Our own estimate—25,660 generations, or 744 ky—is earlier still. It is consistent with the results of Meyer et al. (29) but not with those of Prüfer et al. (14), as discussed above. The cause of this discrepancy is unclear. Prüfer et al. use the pairwise sequentially Markovian coalescent (PSMC) method (30), which may give biased estimates of separation times in subdivided populations (ref. 26, p. 6).
Our results shed light on the large-brained hominins who appear in Europe early in the Middle Pleistocene. Various authors have suggested that these were African immigrants (1, 2). This story is consistent with genetic estimates of the separation time of archaics and moderns (14). Our own results imply that, by the time these hominins show up in European archaeological sites, they had already separated from Denisovans. This agrees with Meyer et al. (29), who show that the hominins at Sima de los Huesos were genetically more similar to Neanderthals than to Denisovans. It also agrees with Hublin (4, 5), who argues that Neanderthal features emerged gradually in Europe, over an interval that began 500–600 kya.
We estimate a small effective size in the population ancestral to Neanderthals and Denisovans. The population may have been small throughout the interval between
Conclusions
It appears that Neanderthals and Denisovans separated only a few hundred generations after their ancestors left the modern lineage. During the intervening interval, the Neanderthal–Denisovan lineage was small. After separating from Denisovans, the Neanderthal population grew large and fragmented into largely isolated local groups. The Neanderthal metapopulation that contributed genes to modern humans was much larger than the local population of the Altai Neanderthal fossil.
This story is similar to that of modern Eurasians, who also separated from an African population and then experienced a population size bottleneck and split into regional populations. The modern Eurasian diaspora seems to have been foreshadowed by another one, which happened more than half a million years earlier.
Materials and Methods
Vcf files for archaic genomes were downloaded from cdna.eva.mpg.de/denisova/VCF and from cdna.eva.mpg.de/neandertal/altai/AltaiNeandertal/VCF. Ancestral-allele calls are from the Denisova genome.
We filter sites using the Map35_100% criteria (14). The minimum filtered site list was downloaded from bioinf.eva.mpg.de/altai_minimal_filters. We include only SNPs on chromosomes 1–22 that are biallelic across all samples and exclude sites in a CpG context, with systematic errors, or with missing data in any individual.
Statistical methods are described in Sections S1 and S2.
S1 Legofit, A Computer Package for Analysis of Nucleotide Site Patterns
We developed a package of computer programs called legofit, which is available free of charge at https://github.com/alanrogers/legofit. Documentation is at content.csbs.utah.edu/∼rogers/src/legofit/index.html.
S1.1 Site Patterns, Counts, and Their Expectations.
The solid black lines in Fig. 1B show the ancestry of four populations,
Given
At any given nucleotide position, the gene tree is a random variable, and so are the various branch lengths. Site pattern
For simple models such as the one in Fig. 1, it is possible to derive analytical formulas for
The input file of our programs legosim and legofit specifies the population tree and the location of genetic samples within it. It also specifies how population size varies throughout the tree and the times at which populations separate or introgress. These parameters fall into three categories: (i) Free parameters are estimated by legofit, (ii) fixed parameters have values that do not change, and (iii) constrained parameters are specified as known functions of one or more free parameters. These constraints make it possible to model relationships among variables that are implied either by theory or by analysis of variation among replicates generated by simulation or bootstrap.
We use computer simulations to estimate expected site-pattern counts. In each iteration of the simulation, the coalescent algorithm builds a gene genealogy analogous to the one in Fig. 1B. From this genealogy, we calculate the various branch lengths—
This approach simulates branch lengths but not mutations. For a given level of accuracy, it is orders of magnitude faster than programs that simulate both mutation and recombination. This speed makes it possible to deal with the entire suite of site patterns and with complex models involving tens of populations. We have validated it by comparison with theoretical results in models for which analytical theory is feasible (ref. 10, supplementary materials).
S1.2 Tabulating Site Patterns from Data.
Our analysis is in terms of site-pattern frequencies, which we tabulate from sequence data using our program tabpat. This tabulation does not require phased data. In introducing site patterns above, we assumed that one haploid genome is sampled from each population. Real samples are larger, and a given nucleotide site may contribute to several site patterns. The contribution to a given site pattern is the probability that a subsample, consisting of one haploid genome drawn at random from the larger sample of each population, would exhibit this site pattern. For example, consider a model with three populations,
Note that we have just redefined the
S1.3 Estimator.
To estimate parameters, we maximize the composite likelihood,
We use the method of differential evolution (DE) (32) to maximize
S1.4 Constraints.
There are strong dependencies among several of the parameters in our model. For example, Fig. S1 shows a tight relationship between estimates of
To solve this problem, we use legofit’s constraint mechanism. This allows us to define some parameters as polynomial functions of one or more free parameters. These polynomials are estimated by regression analysis across replicates generated by bootstrap or computer simulation.
This analysis proceeds in two stages. In stage 1, we define all parameters as free, except those used to calibrate the clock. This stage generates estimates from each replicate. We then study variation among replicates to identify associations among variables, such as those seen in Fig. S1. Several of the relationships in Fig. S1 are tight, suggesting that we need only bivariate regressions. This appearance, however, is misleading. Much of the scatter in these bivariate graphs is not noise; it is deterministic variation in other dimensions. For this reason, our regressions are usually multivariate.
Based on the results from stage 1, we identify a set of parameters that will be constrained in stage 2 of the analysis. For each constrained parameter, we begin by regressing against all free parameters and use residual dependence plots (33) to evaluate the fit. We then add terms to our model or remove them as needed. For each analysis in the main text, we defined three constrained parameters (
S1.5 Simulations.
Our simulation program, legosim, ignores linkage disequilibrium and makes use of the fact that a sum of Poisson random variables is itself Poisson. Consider the problem of simulating the number of instances of the
We take
S2 Calibrating the Molecular Clock
We assume generations of 29 y and a mutation rate,
To calibrate the molecular clock, we must specify the length of one interval within the population tree. Because we ignore singleton site patterns, none of the intervals in our tree extends to the present. Consequently, we base our calibration on two published estimates of population splits.
First, we assume that the archaic and modern lineages separated
Second, we assume that African and Eurasian populations separated
The time,
S3 Alternative Models of History
Our estimates (Figs. 4 and 5) of
Model HtoD assumes that Denisovans received gene flow from a “hyperarchaic” population (14). Specifically, it assumes hyperarchaics introgress into Denisovans 58 kya. The hyperarchaics had separated 1 Mya from other humans. Model XYtoN assumes that the ancestors of modern humans introgressed into Neanderthals 110 kya, before the separation of Africans and Eurasians (28). Model DtoX assumes that Denisovans introgressed into Africans, as suggested by Fig. 2B. Model NtoD assumes that Neanderthals introgressed into Denisovans (ref. 14, pp. 46–47). Each of these modified models adds an additional episode of gene flow, and in each case, the new admixture fraction is fixed at 1%.
Fig. S2 compares estimates under these models to those under the base model. Models HtoD, XYtoN, and NtoD give estimates similar to those under the base model. They therefore provide no alternative explanation for the surprising features of our results. Model DtoX yields an estimate of
How estimates respond to various modifications of the base model. Key: base, model in Fig. 1A; DtoX, gene flow from Denisovans into Africans; HtoD, gene flow from hyperarchaics into Denisovans; NtoD, gene flow from Neanderthals to Denisovans; XYtoN, gene flow from early moderns into Neanderthals. All estimates are based on the YRI.CEU dataset.
In summary, we are unable to explain our main results in terms of an alternative model. This suggests that these results are not artifacts of a misspecified model.
Acknowledgments
We are grateful for comments from Elizabeth Cashdan, Lounes Chikhi, Mitchell Lokey, Nala Rogers, Jon Seger, and Serena Tucci. A.R.R. was supported by National Science Foundation Award BCS 1638840 and by the Center for High Performance Computing at the University of Utah. R.J.B. was supported by National Cancer Institute Awards R25CA057730 (PI: Shine Chang, PhD) and CA016672 (Principal investigator: Ronald Depinho, MD). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Footnotes
- ↵1To whom correspondence should be addressed. Email: rogers{at}anthro.utah.edu.
Author contributions: A.R.R. and C.D.H. designed research; A.R.R. and R.J.B. performed research; A.R.R. and R.J.B. analyzed data; and A.R.R. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
See Commentary on page 9761.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1706426114/-/DCSupplemental.
Freely available online through the PNAS open access option.
http://www.pnas.org/site/misc/userlicense.xhtmlReferences
- ↵
- ↵.
- Klein RG
- ↵.
- Rightmire GP
- ↵.
- Hublin JJ
- ↵.
- Hublin JJ
- ↵
- ↵
- ↵
- ↵.
- Green RE, et al.
- ↵.
- Rogers AR,
- Bohlender RJ
- ↵
- ↵.
- Meyer M, et al.
- ↵.
- Liu RY,
- Singh K
- ↵
- ↵
- ↵
- ↵.
- Skoglund P,
- Jakobsson M
- ↵
- ↵.
- Wall JD, et al.
- ↵
- ↵
- ↵.
- Castellano S, et al.
- ↵
- ↵.
- Harris K,
- Nielsen R
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Crow JF,
- Kimura M
- ↵.
- Price K,
- Storn RM,
- Lampinen JA
- ↵.
- Cleveland WS
- ↵
- ↵
























