## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Y genetic data support the Neolithic demic diffusion model

Edited by Henry C. Harpending, University of Utah, Salt Lake City, UT, and approved June 11, 2002 (received for review March 18, 2002)

## Abstract

There still is no general agreement on the origins of the European gene pool, even though Europe has been more thoroughly investigated than any other continent. In particular, there is continuing controversy about the relative contributions of European Palaeolithic hunter-gatherers and of migrant Near Eastern Neolithic farmers, who brought agriculture to Europe. Here, we apply a statistical framework that we have developed to obtain direct estimates of the contribution of these two groups at the time they met. We analyze a large dataset of 22 binary markers from the non-recombining region of the Y chromosome (NRY), by using a genealogical likelihood-based approach. The results reveal a significantly larger genetic contribution from Neolithic farmers than did previous indirect approaches based on the distribution of haplotypes selected by using post hoc criteria. We detect a significant decrease in admixture across the entire range between the Near East and Western Europe. We also argue that local hunter-gatherers contributed less than 30% in the original settlements. This finding leads us to reject a predominantly cultural transmission of agriculture. Instead, we argue that the demic diffusion model introduced by Ammerman and Cavalli-Sforza [Ammerman, A. J. & Cavalli-Sforza, L. L. (1984) *The Neolithic Transition and the Genetics of Populations in Europe* (Princeton Univ. Press, Princeton)] captures the major features of this dramatic episode in European prehistory.

It is widely accepted that the onset of agriculture in the Near East triggered a cultural change that brought farming and associated technologies across Europe about 10,000 years ago (1). Two alternative demographic scenarios have been proposed to account for this transition, documented in the archaeological record (2). In the demic diffusion model (DDM; ref. 1), the spread of technologies involved a massive movement of people, which implies a significant genetic input of Near Eastern genes from Neolithic farmers. Under the cultural diffusion model (CDM; refs. 3 and 4), on the contrary, the transition to agriculture is regarded essentially as a cultural phenomenon, involving the movement of ideas and practices rather than people. Consequently, it would not imply major changes at the genetic level.

Proponents of both models acknowledge that there is a spectrum of intermediate scenarios, which are essentially admixture models: settlements were founded by a mixture of farmers whose ancestors originally came from the Near East and indigenous hunter-gatherers. The question is, therefore, whether the dispersing farmers were few, (as in the CDM) or many (as in the DDM).

The DDM seemed to explain the major geographic trends detected in allele frequencies at conventional marker loci, such as blood groups and enzymes (5, 6). Conversely, recent mtDNA data have been interpreted in favor of the CDM, thereby generating a controversy (7–15). Similarly, Semino *et al.* (16) have used their results from the non-recombining Y-chromosome region (NRY) to argue that the genetic contribution of Neolithic people may have been as low as 22%. This figure represents the proportion in Europe of the four haplotypes (Eu4, -9, -10, and -11), which were singled out because they show a distinct gradient from the epicenter of the agricultural revolution in the Levant. Although this gradient may well have been established during the Neolithic transition, it is not clear that the proportion of these haplotypes should provide an estimate of admixture proportions. Indeed, admixture is a demographic process, and, as such, it affects the entire genome. In particular, simulation studies demonstrated that only a limited fraction of alleles will exhibit a clinal pattern after expansion and introgression (1), and only a fraction of these will be visible thousands of years later.

The best way to quantify the relative contributions of different populations is far from trivial (see refs. 17–19). The limited genetic differentiation between human populations indicates that traces of ancient population movements will be uncovered only by efficient statistical methods (20). Although indirect evidence, such as correlations between genetic and non-biological information (archaeology, linguistics), can be persuasive, the full use of genetic data requires explicit models of the admixture process (21). In particular, we argue that it is necessary to base the analysis on estimates of the ancestral allele frequencies in each population. By doing so, it becomes possible to distinguish the relative contribution of genetic drift and admixture. Because the ancestral frequencies cannot be known exactly, the calculations must take into account the range of possible histories (19).

Recent innovations in computational statistics, such as the extension of importance sampling and Markov chain Monte Carlo (MCMC) exploration to genealogical models, allow inferences about demographic history by using likelihood-based methods. In this paper, we make use of a MCMC method that we developed (19) to estimate the genetic contributions, *p*_{1} and (1 − *p*_{1}), of two parental populations, P_{1} and P_{2}, into a third, hybrid population, H, and applied it to the NRY data of Semino *et al.* (16).

We use the method to estimate the change from place to place in Europe of admixture proportions of “Neolithic” and “Palaeolithic” genes. Importantly, the method does not require us to define these “Palaeolithic” or “Neolithic” alleles. It requires only the definition of parental populations, which is fortunately one of the few points on which there is a broad agreement (8, 16). The method takes into account, and quantifies, the effect of genetic drift since the time of admixture in each population. This innovation in the method is important because the populations are expected to have expanded after acquiring agriculture and, consequently, to have experienced a reduction in genetic drift. Because the archaeological data suggest that the timing of this transition varied from place to place in Europe, the method should be able to pick up a signature of this sequence in the genetic data.

## Materials and Methods

### Populations Used.

We have used the genetic data of Semino *et al.* (16) comprising 22 binary markers from the NRY in a large number of European populations (*n* = 1,007 chromosomes from 25 samples, Table 1). These markers are considered to be the result of unique mutational events and are called unique-event polymorphisms (UEPs; refs. 21 and 22). They are thought to be rare enough to have occurred only once in the recent history of human populations. The presence of these UEPs in different populations is thus unlikely to indicate recurrent mutation but rather common ancestry, migration, or admixture events. These data are therefore particularly appropriate for our admixture analysis.

The method requires that we define two populations as the descendants of the original parental populations. Our choice is based on current archaeological, linguistic, and genetic knowledge and is similar to the conventions found in the literature. To represent descendants of Near Eastern Neolithic farmers, previous studies (e.g., refs. 8, 16, and 23) have used available samples from Turkey, Iraq, Iran, Lebanon, or Syria. The Y chromosome data of Semino *et al.* (16) had three samples from these areas: Turkey (*n* = 30), Lebanon (*n* = 31), and Syria (*n* = 20). Given their limited size, and because they had similar heterozygosity (*H _{e}* = 0.83 for Turkey and Lebanon and

*H*= 0.87 for Syria, whereas

_{e}*H*ranged between 0.38 and 0.83 in the European samples) and essentially the same haplotypes in similar frequencies, it seemed sensible to pool them. This choice also aids comparisons with previous studies and with the interpretation of the same data by Semino

_{e}*et al.*(16).

Under the CDM, all European populations are mostly derived from local Palaeolithic ancestors. As a consequence, under this hypothesis, any European sample could have been used to represent the Y-chromosomes of the Palaeolithic parental population. We used the two Basques samples (*n* = 45 + 22) because linguistic, archaeological, and genetic data agree in suggesting a persistence of pre-Neolithic features in the Basque country (24). This choice was cross-validated by comparison with an analysis using the Sardinians as an alternative approximation of the parental population (see *Results* and *Discussion*).

Three samples were not analyzed in the present study because of their geographical location: these were the Saami (*n* = 24), Udmurt (*n* = 43), and Mari (*n* = 46) samples. These samples come from Uralic-speaking populations of North-Eastern Europe and are well away from the supposed route of Neolithic immigrants. The admixture model is therefore unlikely to hold, given the parental populations used.

### The Admixture Model and Estimation Methodology.

Our method assumes a simple admixture model in which two independent parental populations, P_{1} and P_{2}, of size *N*_{1} and *N*_{2}, have contributed a proportion *p*_{1} and *p*_{2} (*p*_{2} = 1 − *p*_{1}) of the genes of a third “hybrid” or “admixed” population, H of size *N _{h}*,

*T*generations ago. At the time of the admixture event, the gene frequencies are given by the vectors

*x*

_{1}and

*x*

_{2}in the two parental populations and by

*p*

_{1}

*x*

_{1}+

*p*

_{2}

*x*

_{2}in the hybrid population. After the admixture event, the three populations are isolated from each other and diverge because of genetic drift, the magnitude of which is determined by

*t*

_{1}=

*T*/

*N*

_{1},

*t*

_{2}=

*T*/

*N*

_{2}, and

*t*=

_{h}*T*/

*N*. The assumption of independent drift implies negligible gene flow between the populations after time

_{h}*T*, which is reasonable given the large geographical distances between them. We assess this assumption in

*Results*.

Full details of the derivation of the calculation are given in ref. 19. In outline, the likelihood for a sample of size *n*_{1} from P_{1} (or any other) is the product of three components. It depends on the number of coalescent events, *c*_{1}*,* between the present and the time of admixture *T*, the probability of which is *p*(*c*_{1} | *T*/*N*_{1}, *n*_{1}). The number of coalescent events determines the number of lineages in the ancestral population that have left descendants, and hence the number of each allele with descendants, *f*_{1}. The probability of a particular vector of counts, *p*(*f*_{1} | *x*_{1}, *c*_{1}), additionally depends on the ancestral allele frequencies *x*_{1}_{.} Finally, the probability of the observed counts of allele in the present sample, *p*(*a*_{1} | *f*_{1}) can be calculated from *f*_{1}. Because the three populations are assumed to be independent, the probability of the full data set *D* is obtained from the product of the three probabilities above for each of the three populations. The value must be summed over all possible values of *c _{i}* and

*f*: where The likelihood specified by Eq. 1 is difficult to evaluate directly, and we estimate it by using the Griffiths and Tavaré (25) scheme, as described in ref. 19. Having obtained the likelihood, it is useful to be able to make inferences about parameters without assuming any particular value of the others. This result can be achieved by using the Metropolis-Hastings algorithm (e.g., ref. 26), which allows us to obtain samples from the posterior distribution of

_{i}*p*

_{1},

*T*/

*N*

_{1},

*T*/

*N*

_{2},

*T*/

*N*,

_{h}*x*

_{1}, and

*x*

_{2}. Posterior distributions of each parameter, independent of the values of the others (in particular independent of the “nuisance parameters'

*x*

_{1}and

*x*

_{2}), can be obtained by simply looking at the samples corresponding to the parameter of interest.

We chose flat priors for *p*_{1}*, T*/*N*_{1}*, T*/*N*_{2}, and *T*/*N _{h}*. For

*x*

_{1}and

*x*

_{2}, we chose a prior in which all possible allele frequencies have equal probability; this prior is given by a uniform Dirichlet distribution (19). The posterior distributions generated by the MCMC scheme described above are therefore proportional to the likelihood curves.

### Principle and How Drift Is Taken into Account.

The MCMC method reconstructs ancestral allelic configurations compatible with the data while estimating the probability of the observed (present-day) allelic configurations for different values of the admixture parameter (*p*_{1}), of the times because admixture (*T*/*N*_{1}, *T*/*N*_{2}, *T*/*N _{h}*), and of the parental allelic distributions just before admixture (

*x*

_{1},

*x*

_{2}). The

*T*/

*N*values measure genetic drift since admixture. For example, if one of the ancestral populations has remained relatively small since admixture (as might be the case for the Basques) it is expected to deviate more from its ancestral frequencies than a community that has grown in size. Because the

_{i}*T*/

*N*and

_{i}*x*values are not constrained, the method encompasses different demographic scenarios

_{i}*before*admixture (leading to different

*x*distributions) and allows for the three populations to experience different amounts of drift

_{i}*after*admixture (leading to different

*T*/

*N*distributions). This aspect of the analysis is important and allows the effect of pure drift and of admixture to be distinguished. For example, two populations may have similar allelic compositions, as a result of genetic drift, yet the method can still detect large differences in their admixture proportions (see

_{i}*Discussion*).

The method assumes a model of pure drift without mutations. In practice, it means that mutations since the time of admixture have negligible effect on our estimate. This assumption is reasonable for these NRY data because the admixture events we are studying can be dated by the archaeological record to less than 10^{4} years ago (2) and the mutation rate for these markers appears to be less than 10^{−8} per site per year (27). Furthermore, the small effective size of Y-linked loci enhances the effect of drift (16, 21).

### Regression Analysis.

A linear regression approach was used to detect, quantify, and assess the significance of any geographical trend in admixture proportions across Europe. By combining information across locations, this procedure reduces the uncertainty in admixture proportions at each distance. As in Semino *et al.* (16), the geographic distance was calculated from the middle point between Syria and Lebanon.

We could have obtained the regression of average *p*_{1} values against distance. We rejected such an approach because it would have ignored the error on each *p*_{1} estimate. We therefore assessed the uncertainty in the regression estimate by repeatedly sampling from the *p*_{1} distributions in the following manner. For each of the European samples, one *p*_{1} value was randomly sampled from the corresponding posterior distribution (Fig. 1*a*). A linear regression was then calculated between this set of values and geographic distance. This process was repeated 1,000 times to obtain the empirical distribution of regression curves shown in Fig. 1*b*. A similar approach was used for *T*/*N _{h}*.

## Results

### Admixture Proportions.

Table 1 shows summary statistics for the posterior distributions for *p*_{1}, the Palaeolithic contribution to 17 European populations (represented in Fig. 1*a*). The modes correspond to the most probable values (equivalent to the maximum likelihood estimates in a classical likelihood framework). The distributions are clearly rather wide, as expected from simulations (19). For instance, even for populations as far from the Levant as France or Germany, Palaeolithic (hunter-gatherers) contributions as low as 20% are within the 90% most probable values. Similarly, for Greece and Albania, *p*_{1} values as high as 70% cannot be rejected (Table 1). Thus, our approach highlights that estimates of admixture in a particular population made from a single locus are often imprecise.

Despite uncertainty on *p*_{1} for specific populations, there is a clear trend across Europe, with the proportion of Neolithic genes decreasing from modal values around 85–100% in Albania, Macedonia or Greece to around 15–30% in France, Germany, or Catalunia. The statistical significance of this trend can be assessed and quantified by combining information from the individual populations and their geographic distance from the Near East and by plotting the regressions as shown in Fig. 1*b*. Using the same data, Semino *et al.* (16) obtained the regression represented by the blue dotted line in Fig. 1*b*, which is significantly different from the 1,000 regressions obtained by our randomization approach (see Fig. 1 legend and *Materials and Methods*). This approach allows us to reject both the regression and the associated 22% estimate for the Neolithic contribution (*P* < 0.001).

Note that the regression analysis might be biased if the information from adjacent populations were non-independent because of local gene flow. However, previous work by Sokal and collaborators (e.g., ref. 28) has shown that such effects are found over scales less than 300 km, whereas the current samples were more widely spaced than this. We checked the plot of residuals from the regression and found no evidence of non-independence. Thus, it appears that the linear regression is a reasonable approximation and although gene flow might have had some influence locally, it cannot explain the trend in admixture proportions observed across Europe.

Geographical patterns cannot be completely summarized by one average value. Nevertheless, it is instructive to estimate the average *p*_{1} value across Europe to compare it with the value given by Semino *et al.* (16). The estimate was obtained by averaging *p*_{1} values drawn from the distributions shown in Fig. 1*a*. We found an average Neolithic contribution of 50% across all samples, 56% for the Mediterranean subset and 44% in non-Mediterranean samples. Thus, whichever region of Europe is considered, we find that the average value is more than twice that suggested by Semino *et al.* on the basis of the more readily apparent trends.

Another important result of the admixture analysis is the *p*_{1} distribution obtained for the Sardinian sample. Sardinia appears as a clear outlier from other European samples, showing a very tight distribution compared with other populations, with a peak at *p*_{1} = 1, indicating a high proportion of genes derived from the Palaeolithic inhabitants of Europe. This point is discussed below.

### Drift.

The method also generates estimates of the *t _{i}* =

*T*/

*N*values (

_{i}*T*/

*N*,

_{1}*T*/

*N*

_{2},

*T*/

*N*), which indicate the amount of genetic drift since admixture. Fig. 2 shows the posterior distributions for the two parental populations. Remember that the same two parental populations are used but that there are 17 European samples, and therefore 17 estimates for

_{h}*T*/

*N*

_{1}and

*T*/

*N*

_{2}. Each curve in Fig. 2

*a*(showing

*T*/

*N*

_{1}) or

*b*(showing

*T*/

*N*

_{2}) thus corresponds to an estimate of the effects of drift among the Basques and the Near Easterners, respectively, obtained from the analysis of a particular hybrid (European) population. It is expected that

*T*/

*N*

_{1}and

*T*/

*N*

_{2}curves should be different because Basques and Near Eastern populations acquired agriculture at different times and therefore were subjected to different amounts of drift. This result is indeed, what we see with

*T*/

*N*

_{2}curves being almost identical (Fig. 2

*b*), suggesting limited drift, and hence rather large long-term population size for the Near East population. For the Basques (Fig. 2

*a*), the

*T*/

*N*

_{1}distributions are much wider and more variable, although all modes but one fall in the interval between 0.1 and 0.2. This effect is expected because simulations have shown that, as drift increases,

*T*/

*N*distributions both become wider and have variable modal values (19). Such a variation suggests a smaller population size and some level of differentiation among the hunter-gatherers that originally contributed to the pre-Neolithic gene pool (see below). The difference observed between

_{i}*T*/

*N*

_{1}and

*T*/

*N*

_{2}distributions is to be expected when an expanding population (here, the one from the Near East) disperses into scarcely populated areas (whose descendants are here represented by the Basque sample).

The *T*/*N _{h}* would be expected to show a geographical trend because of the change in population size as agriculture arrived. To test this effect,

*T*/

*N*values were randomly drawn from the

_{h}*T*/

*N*distributions and were regressed against geographical distance. Fig. 2

_{h}*c*shows that the

*T*/

*N*values increase as distance from the Near East increases. In other words, drift was greater where the archaeological record suggests a later arrival of agriculture, in agreement with the idea that demographic growth started when food began to be produced. To obtain an absolute dating scale for this geographic trend, we have plotted calibrated radiocarbon dates (S. Shennan, personal communication) of the first arrival of agriculture in a number of populations across Europe. A good fit between the absolute dates and the

_{h}*T*/

*N*values is obtained when we assume a starting date around 10,000 years B.P. and an average rate of 1 km/yr, both figures being widely accepted (1, 6). These values are in agreement with our current knowledge of human history.

_{h}## Discussion

### Admixture and Drift.

Europe-wide gradients of allele frequencies have repeatedly been described since the early work of Ammerman and Cavalli-Sforza (1, 5, 12, 13, 28). They were originally interpreted as a consequence of the admixture between low-density local hunter-gatherers and the large numbers of newcoming farmers from the Near East.

A number of studies based on mtDNA have recently criticized this view and suggested that the Neolithic contribution could have been much smaller, perhaps around 15%. This assertion has generated a controversy (9, 11, 15, 29, 30) over the interpretation of mtDNA data. Y chromosome data on the contrary appeared to confirm previous work on nuclear genes (23, 31).

It therefore came as a surprise when Semino *et al.* (16) analyzed the largest set of NRY data at the time and proposed that Y chromosome data also favor limited contribution from Near Eastern farmers.

One basic reason for the discrepancy between our and Semino *et al*.'s interpretation is that they used only a subset of information from selected haplotypes. Such an approach could make inefficient use of the data, or introduce bias. Conversely, the likelihood calculations on which our method is based can take advantage of all of the information present in the allelic distributions, without preselection of any allele. For instance, haplotype Eu17 is observed twice in the Near Eastern and Calabrian samples and once in the Georgian, Greek, Andalusian, and Hungarian samples. Although Eu17 and other similar haplotypes are unlikely to show any *visible* spatial pattern such as those shown by Eu4, -9, -10, and –11, they may convey relevant information. A closer look at Semino *et al*.'s table 1 shows that 60% of non-empty cells are singletons, doublets, or triplets. The total frequency of these “rare” haplotypes represents approximately 17%. This calculation is given here as an illustration and should not be used to estimate how much information was lost because such a computation is not trivial.

A particular innovation of our approach is that it estimates the trend in the Neolithic contribution directly, rather than evaluating it indirectly from the clines in allele frequencies.

One of the most striking results was obtained for the Sardinian sample (Fig. 1). Semino *et al*.'s ordination of the haplotype frequencies showed the Sardinian sample clustering with Greek and Albanian samples, far removed from the Basque samples. That result appeared at odds with archaeological data that suggest a limited Neolithic immigration in Sardinia (e.g., ref. 32). Conversely, in Fig. 1*a*, Sardinia appears as an outlier with a significantly high proportion of Palaeolithic genes. This result suggests that the Y-chromosome differentiation observed between Basques and Sardinians today is due to drift from common Palaeolithic ancestors, with little input of genes from the Near East, rather than to a greater Neolithic immigration in Sardinia. This result shows the importance in separating drift from admixture in the analysis of ancient demographic events.

This result prompted us to carry out a reanalysis of the data by using the Sardinian sample to represent descendants of the Palaeolithic people instead of the Basques. Although this was not our original choice, it is consistent with the archaeological evidence and provides and interesting comparison. Indeed any SE-NW geographical trend of *p*_{1} values could not be attributed to geographic proximity to Sardinia. This new analysis confirms and strengthens the results obtained with the Basque samples. The regression of the Neolithic contribution against geographic distance is very similar to that in Fig. 1*b* (not shown), and the proportions are again significantly higher (*P* < 0.001) than estimates of Semino *et al.* Indeed, the average value of the Neolithic contribution is actually higher, ≈65%.

Our analysis also showed differences between Mediterranean and non-Mediterranean samples, which are in agreement with archaeological evidence for an earlier development of farming communities along the Mediterranean shores and with mitochondrial studies suggesting a greater introgression of Near Eastern genes in Southern European populations (30).

It is worth stressing again that the analyses presented here rest on the use of Basques (or Sardinians) as descendants of Palaeolithic people. Because the Basques are likely to contain an unknown proportion of Neolithic genes, there is reason to believe that the Palaeolithic contribution has actually been overestimated, even though we cannot say by how much.

### Preadmixture Population Structure and Selection.

The existence of population structure in Europe before the arrival of farmers might influence our admixture estimates (and any other published estimates, in fact). For instance, it has been suggested that geographic diversity of mtDNA reflects population contractions and expansions, occurring in response to movements of the ice sheet, before and after the last glacial maximum, i.e., in the Mesolithic period. Whereas the relative importance of differentiation during this period is uncertain, it is very likely that hunter-gatherer population exhibited some level of genetic differentiation, and this has to be accounted for.

As explained in *Materials and Methods*, our analysis does not require the allele frequencies among hunter-gatherers to be uniform across Europe. If the initial European populations resembled the ancestral Basques in some cases and were more differentiated in others, then this would generate different *T*/*N*_{1} estimates. The 17 *T*/*N*_{1} estimates shown in Fig. 2*a* are similar to each other, suggesting that the hunter-gatherers were not dramatically different compared with the amount of drift in the last 5,000 to 10,000 yr.

An additional point that needs to be considered is whether the observed patterns can be attributed to the action of natural selection. This is an important issue because this could mean that estimation of genetic admixture may not properly represent demographic admixture. In other words, a selective sweep might lead us to overestimate the overall demographic impact. Conversely, balancing selection would lead to underestimates of the demographic impact. Whereas the data from a single locus cannot exclude the possibility of selection, we do have background information from other studies that suggest that selection may not be a significant issue. First, assuming that drift has been more important in the last 10,000 yr for the Y chromosome is in line with most population-based studies and therefore should not bias our study more than previous ones tackling similar questions (16). Second, and more importantly, the patterns observed here are in agreement with those observed at a number of independent loci and are therefore most likely to reflect demographic rather than selective processes (5, 12). A recent review by Harpending and Rogers also indicates that selection is likely to act on other sets of loci (33). Finally, we have applied a similar approach to a set of mtDNA data, and the preliminary results indicate that similar trends are emerging. This finding again would argue for the signal of a demographic event.

### Implications for the DDM vs. CDM Controversy.

Our analysis clearly suggests that the contribution of genes from the Near East to Europe was substantial. The average values are 50% and 65% by using Basques and Sardinians as references, respectively, and these are likely to be underestimates. It is however important to realize that the CDM/DDM controversy is not directly and simply related to such average values. In particular, they do not represent the relative proportions of farmers and hunter-gatherers *during the initial formation* of settlements, but rather the proportion of genes that can be traced back to ancestors in the Near East. This very important distinction has been neglected in much of the recent literature, even though it was clearly made by Ammerman and Cavalli-Sforza (1).

By way of clarification, consider a simple “stepping-stone” model that assumes that admixture took place, across Europe, in the form of a series of steps, where farmers migrated to areas occupied by hunter-gatherers and mixed to create new communities of farmers. If we call *P _{N}* the proportion of farmers in the admixed populations, the Neolithic contribution in each location will decrease geometrically from

*P*to

_{N}*P*, where

*n*is the number of steps or admixture events taking place as populations move toward Western Europe. When

*n*is large, the Neolithic contribution will appear to decrease very quickly and then stay at low values across most of Europe. Indeed, farmers could contribute as much as 90% at each settlement, yet with

*n*= 20 the westernmost populations will have only 12% of Neolithic genes, and the average contribution will be only 100 × (

*P*+

_{N}*P*+ ⋯ +

*P*)/

*n ≈*40%. Thus, high

*P*values are required to maintain a cline across the whole of Europe and even low averages can correspond to high

_{N}*P*values.

_{N}Although the model is clearly a simplification, and many different combinations of *P _{N}* and

*n*are compatible with the data, it is instructive to look at the implications for extreme values. A minimum Neolithic contribution at each event can be found by fitting

*P*and

_{N}*n*values to the trend obtained in Fig. 1

*b*, for low

*n*values. In the extreme case of

*n*= 3, it would require

*P*≈ 0.7 to explain the observed trend (mean and slope). Because the archaeological evidence suggests a much more gradual expansion across Europe (i.e., larger

_{N}*n*values) as shown by the radiocarbon dates plotted in Fig. 2

*c*, it appears that

*P*must have been larger than 0.7. More reasonable values of

_{N}*n*suggest

*P*values between 0.8 and 0.95 (see legend of Fig. 1b

_{N}*)*. These values are in agreement with previous estimates obtained in simulation studies, showing that, to generate gradients similar to those observed in proteins, the genetic contribution of Neolithic farmers had to be between 65 and 100% (34, 35).

## Conclusion

In summary, our results provide direct estimates of the Neolithic contribution in Europe, and suggest that large movements of people accompanied the introduction of farming to Europe. Of course, farming practice may have spread concurrently by imitation and cultural transmission. Different processes are likely to have been important at different localities and at different times. Nevertheless, the broad picture produced by our method has led to diametrically different conclusions from the previous interpretations of the same data. We therefore argue that drawing inferences indirectly from the clines in haplotype frequency could be misleading. We suggest that data from other independent loci should be analyzed by using a similar approach to separate the effects of demography and selection. Our assessment of the demographic impact of the Neolithic expansion into Europe is largely independent from, but appears consistent with, archaeological evidence, simulations, and classical studies of allele frequencies. Despite some reports of its demise, the original model proposed by Ammerman and Cavalli-Sforza is more alive than ever.

## Acknowledgments

We are grateful to S. Aris-Brosou, T. Burland, D. Goldstein, R. Gray, S. Jones, S. Rossiter, S. Shennan, M. Trindade, Z. Yang, and G. Zampieri for reading and commenting on earlier versions of the manuscript and to anonymous reviewers for constructive and helpful criticisms. We also wish to thank S. Shennan for the archaeological data and S. Rossiter for the use of his computer to do many of the MCMC runs presented here, and Z. Yang for the use of the Linux cluster for the Sardinian analysis. L.C. was supported by a Small Natural Environment Research Council (NERC) grant (ref. GR9/04474, awarded to M.B, L.C, and R.N.) and by a Biotechnology and Biological Sciences Research Council (BBSRC) grant (31/G13580 attributed to Z. Yang, University College London, London). G.B. was supported by funds from the University of Ferrara.

## Footnotes

↵† To whom reprint requests should be addressed. E-mail: l.chikhi{at}ucl.ac.uk.

This paper was submitted directly (Track II) to the PNAS office.

## Abbreviations

DDM, demic diffusion model

CDM, cultural diffusion model

NRY, non-recombining region of the Y chromosome

MCMC, Markov chain Monte Carlo

- Received March 18, 2002.

- Copyright © 2002, The National Academy of Sciences

## References

- ↵
- Ammerman A. J.

- ↵
- ↵
- Zvelebil M.

- ↵
- Whittle A.

- ↵
- Menozzi P.

**,**786-792.pmid:356262 - ↵
- Cavalli-Sforza L. L.

- ↵
- ↵
- ↵
- Barbujani G.

**,**22-25.pmid:11136246- ↵
- ↵
- Chikhi L.

**,**9053-9058.pmid:9671803 - ↵
- Barbujani G.

- ↵
- Chikhi L.

- ↵
- Semino O.

*et al.*(2000) Science 290**,**1155-1159.pmid:11073453 - ↵
- Dupanloup I.

**,**672-675.pmid:11264419- ↵
- Chikhi L.

**,**1347-1362.pmid:11454781 - ↵
- ↵
- Stumpf M. P.

**,**1738-1742.pmid:11249819 - ↵
- ↵
- ↵
- Wilson J. F.

**,**5078-5083.pmid:11287634 - ↵
- ↵
- Hastings W. K.

**,**97-109. - ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Pinhasi R.

- ↵
- ↵
- ↵