Previous Article |
Table of Contents
| Next Article
BIOLOGICAL SCIENCES / ANTHROPOLOGY
Statistical evaluation of alternative models of human evolution
,
,

,||
,

,

,

Laboratório de Biologia Genômica e Molecular, Faculdade de Biociências, Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), 90619-900 Porto Alegre, RS, Brazil;
Departamento de Genética, Universidade Federal do Rio Grande do Sul, 91501-970 Porto Alegre, RS, Brazil;
Computational and Molecular Population Genetics (CMPG), Zoological Institute, University of Bern, CH-3012 Bern, Switzerland; ¶School of Animal and Microbial Sciences, University of Reading, Reading RG6 6AJ, United Kingdom; and ||Department of Ecology and Evolution, University of Lausanne, Biophore, CH-1015 Lausanne, Switzerland
Contributed by Francisco M. Salzano, August 31, 2007 (received for review August 1, 2007)
| Abstract |
|---|
|
|
|---|
141 thousand years ago (Kya), an exit out-of-Africa
51 Kya, and a recent colonization of the Americas
10.5 Kya. We also find that the African replacement model explains not only the shallow ancestry of mtDNA or Y-chromosomes but also the occurrence of deep lineages at some autosomal loci, which has been formerly interpreted as a sign of interbreeding with Homo erectus.
Bayesian analysis | DNA nuclear data | multiregional hypothesis | out of Africa hypothesis
Many general scenarios of human evolution have been proposed based on paleontological, archeological, or genetic data (5, 6), and their fit to various aspects of our genetic diversity has been investigated (3, 7–9). The current debate over recent human evolution can be simplified by considering the alternative scenarios shown in Fig. 1(5). The African replacement scenarios (Fig. 1A), which posit a single and recent African origin for all modern humans, are mainly supported by mitochondrial DNA (mtDNA) and Y-chromosome polymorphisms (4), by the current lack of Neanderthal mtDNA genes in modern humans (10), and by gradients of nuclear genetic diversity from Africa toward the Americas (4, 11). Recent examination of nuclear DNA has, however, revealed some polymorphism patterns that were judged incompatible with a pure African replacement scenario (7, 12–17). For instance, the presence of very old lineages in Africa and Asia raised claims for some degree of interbreeding between modern and archaic Homo forms (13, 14, 16, 17). Such interbreeding can occur under assimilation scenarios (Fig. 1B), where modern humans migrating out of Africa would have hybridized with local Homo erectus and incorporated old lineages (15, 18) or under multiregional scenarios (Fig. 1C), where migrants would have been continuously exchanged between Africa and Asia, leading to a synchronized emergence of modern anatomy. Note that these simple scenarios are somewhat arbitrary and that human evolution has certainly been more complex, but they certainly incorporate key debated features of human evolution, such as population expansions within continents, intercontinental migrations, and potential hybridization with archaic forms.
|
| Results |
|---|
|
|
|---|
We then compared the best model of each scenario. We find that AFREG has the highest posterior probability (PAFREG = 0.781), followed by the best MRE (PMREBIG = 0.218) and ASEG (PASEG = 0.001) models (Fig. 1). Neutral nuclear sequence data thus give strong support to a recent African origin of modern humans without interbreeding with archaic Homo forms, at least in Asia. To study the power of our model choice procedure in the context of human evolution, we simulated 1,000 random data sets under the best model for African replacement (AFREG), assimilation (ASEG), and multiregional (MREBIG) models and each time estimated the posterior probability of the three models. We find that the AFREG and MREBIG models are correctly recovered (have the highest posterior probability) in 79.3% and 80.1% of the cases, respectively, but the ASEG model is correctly identified in only 50.3% of the cases [supporting information (SI) Fig. 3] and, thus, seems to be the model most difficult to identify. Assuming that the three models have the same prior probability, we can compute the probability that AFREG is the correct model given our observation that PAFREG = 0.78 as Pr(PAFREG = 0.78|AFREG)/[Pr(PAFREG = 0.78|AFREG) + Pr(PAFREG = 0.78|ASEG) + Pr(PAFREG = 0.78|MREBIG)] = 0.817 (see SI Fig. 4). Similarly, we obtain Pr(ASEG|PAFREG = 0.78) = 0.147 and Pr(MREBIG|PAFREG = 0.78) = 0.036. Because the ASEG model is generally difficult to identify (SI Fig. 3), a large posterior probability of the AFREG model is relatively likely under this scenario but not under the MREBIG model, which explains the low probability of <4% for the MREBIG model.
We then estimated the parameters of the overall best African replacement model (AFREG, Table 1 and SI Fig. 5) under an ABC framework based from 5 million simulations. Under this model, we find that an archaic African population of
12,800 effective individuals gave rise to modern humans
141 thousand years ago (Kya) after a bottleneck involving
600 effective individuals. The Out-of-Africa migration, initially involving only
450 effective individuals would have occurred some 51 Kya, and the Americas would have been colonized only
10.5 Kya by
450 individuals.
|
| Discussion |
|---|
|
|
|---|
12,500 individuals (3, 7). The size and timing of the exit out of Africa are in excellent agreement with recent molecular and archeological studies suggesting that this migration resulted in a limited number of lineages having left Africa only
55–65 Kya (6, 11, 24). Finally, the estimates for the current effective continental population sizes show a net decrease from Africa to America compatible with a series of spatial expansions and founder effects during the colonization of the world (4, 11).
Our estimated date for the colonization of the Americas (10.3 Kya) is more recent than usually considered, but the upper limit of the 95% highest posterior density includes the dates of the oldest archaeological sites of
14 Kya (25). This young settlement time could result in part from the sole sampling of Central and South American individuals. It is indeed known from the study of mtDNA and the Y chromosome that some rare alleles (haplogroups) were found only in North America (26). Therefore, the inclusion of northern Native Americans could lead to increased genetic diversity and colonization time estimates. This result nevertheless suggests a late, postglacial maximum colonization of the Americas, which is in better agreement with the estimates of
14 Kya based on the Y chromosome (27) than on those of
30 Kya based on mtDNA control region (28). The estimated founder population size for America is about six times larger than that recently proposed by Hey (29), who suggested that <80 effective individuals would have colonized the Americas, but a moderate bottleneck for the settlement of the New World agrees with recent results from nuclear loci (30) and with previous mtDNA studies (28). Differences in sampling design and marker choice between studies could explain this discrepancy: Although our study is based on a homogeneous set of 50 nuclear loci genotyped in the same individuals, the former study (29) used a mixture of fewer autosomal, X-linked, and uniparentally inherited markers assessed on a very heterogeneous set of individuals and sample sizes.
Even though the ASEG model is clearly not supported by our analysis, it is interesting to see that the archaic contribution to the current Asian gene pool estimated under this model is extremely small (median: 0.007; 95% highest posterior density: 6.3 x 10–5 – 0.023, SI Table 2). It shows that the observed data are only compatible with minute or virtually no admixture with previous members of the Homo genus, at odds with recent studies of nuclear diversity suggesting a substantial contribution of archaic genes to the modern gene pool (7, 15), but in keeping with analyses of mitochondrial DNA (31–33). Despite its convergence to the AFREG model, which assumes no admixture, the ASEG model has a much lower posterior probability, probably because the prior for the admixture proportion was chosen to be uniform between 0 and 1, to include potentially high levels of admixture (15). Note that an ASEG model with a narrower or a negative exponential prior for an archaic contribution would certainly be better supported. In contrast to the ASEG model, the MRE model most compatible with our data set (MREBIG), however, does not show any convergence toward an African replacement model, or toward previous implementations of a multiregional model (e.g., refs. 8 and 34), where a large archaic African population would send more migrants to Asia than the reverse. The median estimates of the MREBIG model (SI Table 2) rather suggest small archaic population sizes in both continents (<600 effective individuals) and recent migration rates between continents being very small but still larger than between the two archaic populations.
Because the occurrence of deep lineages in modern humans has sometimes been taken as evidence against replacement models (e.g., refs. 13 and 16), we have computed the empirical distribution of the times to the most recent common ancestors (TMRCAs) for the best model under each of the three scenarios (Fig. 2A and SI Table 3). We see that the multiregional model has the narrowest and shortest distribution because of the small estimated archaic population size that promotes coalescent events as soon as archaic Asian lineages are brought back (looking backward in time) to Africa
800 Kya in our model. On the other hand, very old TMRCAs exceeding several millions of years can be readily obtained under the African replacement models (in agreement with previous expectations, see, e.g., ref. 35), because the larger ancestral size in Africa prevents a rapid coalescence of the lineages that passed through the speciation bottleneck. When computing continent-specific TMRCAs under the overall best African replacement model (AFREG) (Fig. 2B and SI Table 4), we see that very ancient TMRCAs are not restricted to African samples but that they are also found for Asian and Amerindian samples. Our results therefore question the hypothesis that very old TMRCAs should be taken as evidence for interbreeding events between modern humans and individuals of other Homo species (13). Unexpectedly, we find that
10% of autosomal loci should have TMRCAs younger than 140 Kya. This is at odds with a recent review reporting no TMRCA younger than 600 Ky among 27 autosomal genes (13), even though we cannot exclude the possibility that loci with little or no variation are underrepresented in the published literature. However, in keeping with these results, we note that 4 of our 50 loci (8%) are entirely monomorphic in the three samples and, therefore, indicative of a shallow ancestry.
|
Our model choice framework relies on massive computer simulations and makes it now possible to estimate the posterior probability of relatively realistic (but still debated) models of human evolution. The ability to assign probabilities to various competing scenarios without need to estimate parameters seems much more satisfying than previous approaches based on goodness-of-fit statistics (e.g., refs. 7–9). However, this probabilistic approach still does not guarantee that the best supported scenario is the correct one, which could be an untested model. Although we considered a variety of alternative scenarios, we did not specifically attempt to design models of human evolution that would maximize the fit between observed and simulated data. However, these very simple models capture the basic differences between proposed alternative scenarios of human evolution (see, e.g., ref. 5). More elaborate models incorporating intracontinental population subdivisions, long-distance dispersal, or spatially explicit information (8) could be implemented, and the current model choice framework could be used to evaluate their respective merits, which would be impossible under a likelihood framework. An additional improvement would be to account for the spatial and population structure of the sampling. We have indeed performed here all simulations assuming random mating, whereas sampled individuals from different continents were each drawn from different subpopulations. The coalescent process in a structured population is similar to the coalescent in an unsubdivided population if one samples a single gene by subpopulation (39). Because we sampled two genes per subpopulation, we may have underestimated the frequency of recent coalescent events within subpopulations, which could, for instance, lead to some underestimation of divergence time between continents, potentially explaining the recent colonization time we find for the Americas. The choice of summary statistics may have an influence on our results, and we used only statistics that do not require information on the gametic phase (haplotypic information). It is likely that additional information could lead to a better discrimination among models. The analysis of larger data sets, such as genome-wide resequencing data (e.g., ref. 40) or STR data on population samples from various continents (41) would be helpful to confirm our results, and it could also lead to increased power for our model choice procedure. However, these much larger data sets would require much more computer power than that used in our study, which already exceeded 10 CPU-months of computations on a Linux cluster.
In conclusion, although our best supported model (AFREG) certainly does not represent the exact history of modern humans, we show here that it is much better supported by a random set of neutral loci than any other models involving interbreeding with other Homo species. Although we cannot exclude that any interbreeding ever occurred between modern and archaic humans or that any favorably selected H. erectus genes could have spread into modern humans (see, e.g., ref. 18), our results suggest that this archaic contribution, if present, should be very small. Our results therefore confirm that our modern gene pool has a recent and predominant African origin, and they offer a neutral demographic scenario that could be used to detect ancient admixture for specific gene regions. Moreover, the best African replacement model explains key features of other data sets, such as recent TMRCAs for mtDNA or Y-chromosome loci, as well as occasional deep lineages of nuclear loci, previously thought to be indicative of balancing selection or interbreeding with H. erectus or Neanderthals (7, 13). The demographic parameters of this model should prove useful to improve our ability to detect loci involved in complex diseases or in past adaptive events by providing better null distributions of various statistics used in genome scans or linkage disequilibrium mapping studies.
| Materials and Methods |
|---|
|
|
|---|
500 bp each, providing a total of
25,000 bp for each individual (see SI Table 6). These loci have been studied in human and chimpanzee populations (42–44). Additionally, each locus is short enough so that it can be considered as a nonrecombining segment. Because these data have been generated through DNA sequencing, they are not likely to be affected by ascertainment bias. To complement a first data set consisting of 10 African, and 8 Asian individuals previously analyzed (43), we sequenced here 12 Native American individuals, each affiliated to a different tribe. This sampling scheme is similar to that used for Africa and Asia (43) (see also SI Text).
We performed PCR amplification using primers and conditions described in ref. 43, except for five loci for which we designed new primers whose sequence is available in SI Text. Sequencing was performed in a MegaBACE1000 system (GE Healthcare, Chalfont St. Giles, U.K.). Individual reads were assembled in the PhredPhrap package (45), together with a reference containing the known variants for each locus. Assemblies were visually inspected by using Consed (46), and all possible heterozygous sites have been rechecked sequencing a new PCR product. Mutation rates at all loci were estimated after gametic phase estimation and comparison with chimpanzee sequences, as explained in SI Table 6.
Tested Evolutionary Scenarios. We modeled three different sets of scenarios constructed to capture most of the current debate concerning modern human evolution (see, e.g., refs 5 and 47 for a general account on different models of human evolution). Because there is still some uncertainty on the exact details of past human demography (48), we chose to evaluate several alternative models within each class of scenarios. For example, previous attempts of fitting molecular data to the African replacement scenario have used different demographic growth models (instantaneous, exponential, linear, or logistic) (3, 7–9, 23), but it is still unclear whether one of these models has better properties than others.
A general representation of the models contrasted in this study is shown in Fig. 1, and a detailed schematic representation is shown in SI Fig. 7, where we list the parameters of all models. The African replacement models (SI Fig. 7A) are simulated with instantaneous (AFRIG) or exponential (AFREG) growth after bottlenecks. Looking forward in time, both models start with an ancestral (archaic) population in Africa that passes through a bottleneck and gives rise to a population of modern humans. After the bottleneck, the population is allowed to grow to its current size, either instantaneously or exponentially, depending on the model. After this event, a migration occurs from Africa to Asia and, finally, from Asia to the Americas. In both cases, after a few generations, the founding population is allowed to expand to its current size.
Multiregional evolution generally refers to a class of models in which the transition toward modern morphology occurs simultaneously because of ongoing gene flow between continents (17, 48). We simulated four different models (see SI Fig. 7C) that differ in the way population sizes change over time and whether population growth has been instantaneous or exponential. Forward in time, all models start with an archaic African population that moves out of Africa in an event that attempts to model the peopling of Asia by H. erectus. Since then, and up to the present, Africa and Asia exchange migrants. Another major migration event takes place only from Asia to the Americas. In model MRE1S, African and Asian population sizes and migration rates are held constant over the whole simulated period. In model MRE2S, there is a transition between an "archaic" and "modern" population size that occurs independently first in Africa and then in Asia, with new migration rates occurring after the demographic transition in Africa. The remaining models implement a bottleneck in Africa during the emergence of modern humans: In model MREBIG, all populations grow instantaneously, whereas in model MREBEG, all "modern" populations grow exponentially.
Finally, the African origin with assimilation (see SI Fig. 7B) is a "hybrid" model that includes an early dispersal of H. erectus out of Africa, but it differs from MRE in two major aspects: There is no migration between continents, and a fraction of "modern" Asian lineages have originated recently from Africa, like in the African replacement model. However, another fraction of the "modern" Asian lineages come from the archaic Asian population. The ASIG and ASEG models differ by implementing instantaneous or exponential growth, respectively, after the bottlenecks associated with the founding of each continent by "modern" humans. These scenarios have been adapted from the models reviewed in Stringer (47). The prior distributions of the parameters of the eight tested models are described in SI Table 7 (see next section).
Approximate Bayesian Computations.
Parameter estimation and model evaluation were done under an ABC framework (21). The different steps of the ABC parameter estimation procedure are described in detail (see refs. 21 and 49), but we briefly outline them below. For each model, we first perform a large number of genetic simulations based on a demographic history that describes the model using the program SIMCOAL ver. 2 (50). Some or all parameters that define the model (e.g., population sizes, migration rates, timing of the demographic events, mutation rates) are considered as random variables for which some prior distribution must be defined, as shown in SI Table 7. For each simulation, the parameter values are drawn from their prior distributions defining a demographic history that is used to build a specific input file for the SIMCOAL program. SIMCOAL then performs coalescent-based (51) simulations to generate the genetic diversity of samples, with the same number of gene copies and loci than those observed. Summary statistics (S) identical to those computed on the observed data (S*) are then calculated for the simulated data set. As in any coalescent approach, our simulations were performed considering haploid individuals and with time scaled in generations. Following Beaumont et al. (21), a Euclidean distance
is calculated between normalized S and S* for each simulated data set.
Prior Distributions. The prior distributions of the parameters of all eight models are shown in SI Table 7.
Summary Statistics.
Summary statistics of genetic diversity were calculated by using the program Arlequin ver 3.1 (52). The following summary statistics were computed: total and per population number of segregating sites (S), nucleotide diversity (
) for each population, and total and pairwise FSTs (53). Because there is some uncertainty associated with the phasing procedure, we used only summary statistics that do not depend on phase information. Summary statistics calculated for the 50 loci are reported in SI Table 8.
Model Choice.
The posterior probability of each model is estimated by an approach based on a weighted multinomial logistic regression procedure (54). This approach is an extension of ordinary logistic regression to more than two categories, and it is explained in more detail in SI Text and SI Fig. 8. Briefly, Beaumont (54) has suggested that one can sample the model indicator (i.e., {1, ..., m} for models M1, ..., Mm) from the prior,
(M) and treat it as a categorical random variable, Y, in the ABC simulations. We can then apply categorical regression to estimate P(Y = y|S = S*), where y = 1... m is the indicator for model My, and S* is the vector of observed summary statistics. Specifically, we estimate the coefficients
in
![]() |
using the VGAM package implemented in R. We make the regression estimate locally around S* in the same way as in the standard regression approach, i.e., only the 5,000 simulations closest to the observations are retained, and these are weighted by an Epanechnikov kernel that has a maximum when S = S* (see SI Text for details). This procedure has been shown (54) to substantially improve on a previous method (55, 56) for selecting models by using ABC, and its application to a mixture model shows that it performs almost as well as a reversible jump Markov Chain Monte Carlo method using the full data (57). Model selection within each set of scenarios was based on 2 million simulations for each model. We performed an additional 3 million simulations for each of the best African replacement, multiregional, and assimilation models to obtain their posterior probability based on a total of 5 millions simulations for each model.
Parameter Estimation.
For the best model within each set of scenarios, we retained the 5,000 simulations with smallest associated Euclidean distance
computed on a total of 5 million simulations. Then posterior distributions of the parameters were obtained by means of a locally weighted multivariate regression (see ref. 21 for more details). Parameters (x) were transformed as z = log[tan(x)–1] before regression to prevent estimations from exceeding distribution limits (58). Although several point estimators can be computed from these distributions, we report only the median in Table 1 because it has been shown to have the overall lowest associated mean square error (59), but we also report the mode of the parameter distributions for all models in SI Table 2.
TMRCA Simulations. We generated for each model the expected distribution of the TMRCA by performing 5,000 simulations of 50 loci, using as fixed parameter values the median estimates obtained under our ABC approach, which is a reasonably good point in the parameter space. In the same way, we generated the distribution of TMRCAs for uniparentally inherited markers by dividing the population sizes by four because the effective size for these markers is four times less than for nuclear loci.
| Acknowledgements |
|---|
|
|
|---|
| Footnotes |
|---|
Abbreviations: ABC, approximate Bayesian computation; AFREG, African replacement with exponential population growth; ASEG, assimilation with exponential population growth; Kya, thousand years ago; MRE, multiregional evolution; MREBIG, MRE with bottleneck and instantaneous population growth; TMRCA, time to the most recent common ancestor.

To whom correspondence may be addressed. E-mail: francisco.salzano{at}ufrgs.br, slbonatto{at}pucrs.br, or laurent.excoffier{at}zoo.unibe.ch
Author contributions: S.L.B. and L.E. designed research; N.J.R.F. and N.R. performed research; N.R., M.B., S.N., F.M.S., and L.E. contributed new reagents/analytic tools; N.J.R.F., N.R., and L.E. analyzed data; and N.J.R.F., N.R., M.B., F.M.S., S.L.B., and L.E. wrote the paper.
The authors declare no conflict of interest.
Data deposition: Sequences reported in this paper have been deposited in the GenBank database (accession nos. EU105474–EU106045).
This article contains supporting information online at www.pnas.org/cgi/content/full/0708280104/DC1.
© 2007 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
D. Garrigan and M. F. Hammer Ancient lineages in the genome: A response to Fagundes et al. PNAS, January 15, 2008; 105(2): E3 - E3. [Full Text] [PDF] |
||||
![]() |
N. J. R. Fagundes, N. Ray, M. Beaumont, S. Neuenschwander, F. M. Salzano, S. L. Bonatto, and L. Excoffier Reply to Garrigan and Hammer: Ancient lineages and assimilation PNAS, January 15, 2008; 105(2): E4 - E4. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||