Skip to main content

Main menu

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
    • Front Matter Portal
    • Journal Club
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Home
Home
  • Log in
  • My Cart

Advanced Search

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
    • Front Matter Portal
    • Journal Club
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
Research Article

Distance from sub-Saharan Africa predicts mutational load in diverse human genomes

Brenna M. Henn, Laura R. Botigué, Stephan Peischl, Isabelle Dupanloup, Mikhail Lipatov, Brian K. Maples, Alicia R. Martin, Shaila Musharoff, Howard Cann, Michael P. Snyder, Laurent Excoffier, Jeffrey M. Kidd, and Carlos D. Bustamante
  1. aDepartment of Ecology and Evolution, Stony Brook University, The State University of New York, Stony Brook, NY 11794;
  2. bInstitute of Ecology and Evolution, University of Berne, 3012 Berne, Switzerland;
  3. cSwiss Institute of Bioinformatics, 1015 Lausanne, Switzerland;
  4. dInterfaculty Bioinformatics Unit, University of Berne, 3012 Berne, Switzerland;
  5. eDepartment of Genetics, Stanford University School of Medicine, Stanford, CA 94305;
  6. fCentre d’Etude du Polymorphisme Humain, Foundation Jean Dausset, 75010 Paris, France;
  7. gDepartment of Human Genetics and Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109

See allHide authors and affiliations

PNAS January 26, 2016 113 (4) E440-E449; first published December 28, 2015; https://doi.org/10.1073/pnas.1510805112
Brenna M. Henn
aDepartment of Ecology and Evolution, Stony Brook University, The State University of New York, Stony Brook, NY 11794;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: brenna.henn@stonybrook.edu cdbustam@stanford.edu
Laura R. Botigué
aDepartment of Ecology and Evolution, Stony Brook University, The State University of New York, Stony Brook, NY 11794;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stephan Peischl
bInstitute of Ecology and Evolution, University of Berne, 3012 Berne, Switzerland;
cSwiss Institute of Bioinformatics, 1015 Lausanne, Switzerland;
dInterfaculty Bioinformatics Unit, University of Berne, 3012 Berne, Switzerland;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Isabelle Dupanloup
bInstitute of Ecology and Evolution, University of Berne, 3012 Berne, Switzerland;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mikhail Lipatov
aDepartment of Ecology and Evolution, Stony Brook University, The State University of New York, Stony Brook, NY 11794;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brian K. Maples
eDepartment of Genetics, Stanford University School of Medicine, Stanford, CA 94305;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alicia R. Martin
eDepartment of Genetics, Stanford University School of Medicine, Stanford, CA 94305;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shaila Musharoff
eDepartment of Genetics, Stanford University School of Medicine, Stanford, CA 94305;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Howard Cann
fCentre d’Etude du Polymorphisme Humain, Foundation Jean Dausset, 75010 Paris, France;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael P. Snyder
eDepartment of Genetics, Stanford University School of Medicine, Stanford, CA 94305;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Laurent Excoffier
bInstitute of Ecology and Evolution, University of Berne, 3012 Berne, Switzerland;
cSwiss Institute of Bioinformatics, 1015 Lausanne, Switzerland;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jeffrey M. Kidd
gDepartment of Human Genetics and Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Carlos D. Bustamante
eDepartment of Genetics, Stanford University School of Medicine, Stanford, CA 94305;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: brenna.henn@stonybrook.edu cdbustam@stanford.edu
  1. Edited by Charles F. Aquadro, Cornell University, Ithaca, NY, and accepted by the Editorial Board November 13, 2015 (received for review June 9, 2015)

  • Article
  • Figures & SI
  • Info & Metrics
  • PDF
Loading

Significance

Human genomes carry hundreds of mutations that are predicted to be deleterious in some environments, potentially affecting the health or fitness of an individual. We characterize the distribution of deleterious mutations among diverse human populations, modeled under different selection coefficients and dominance parameters. Using a new dataset of diverse human genomes from seven different populations, we use spatially explicit simulations to reveal that classes of deleterious alleles have very different patterns across populations, reflecting the interaction between genetic drift and purifying selection. We show that there is a strong signal of purifying selection at conserved genomic positions within African populations, but most predicted deleterious mutations have evolved as if they were neutral during the expansion out of Africa.

Abstract

The Out-of-Africa (OOA) dispersal ∼50,000 y ago is characterized by a series of founder events as modern humans expanded into multiple continents. Population genetics theory predicts an increase of mutational load in populations undergoing serial founder effects during range expansions. To test this hypothesis, we have sequenced full genomes and high-coverage exomes from seven geographically divergent human populations from Namibia, Congo, Algeria, Pakistan, Cambodia, Siberia, and Mexico. We find that individual genomes vary modestly in the overall number of predicted deleterious alleles. We show via spatially explicit simulations that the observed distribution of deleterious allele frequencies is consistent with the OOA dispersal, particularly under a model where deleterious mutations are recessive. We conclude that there is a strong signal of purifying selection at conserved genomic positions within Africa, but that many predicted deleterious mutations have evolved as if they were neutral during the expansion out of Africa. Under a model where selection is inversely related to dominance, we show that OOA populations are likely to have a higher mutation load due to increased allele frequencies of nearly neutral variants that are recessive or partially recessive.

  • mutation
  • founder effect
  • range expansion
  • expansion load
  • purifying selection

It has long been recognized that a human genome may carry many strongly deleterious mutations; Morton et al. (1) estimated that each human carries on average four or five mutations that would have a “conspicuous effect on fitness” if expressed in a homozygous state. Empirically estimating the deleterious mutation burden is now feasible through next-generation sequencing (NGS) technology, which can assay the complete breadth of variants in a human genome. For example, recent sequencing of over 6,000 exomes revealed that nearly half of all surveyed individuals carried a likely pathogenic allele in a known Mendelian disease gene (i.e., from a disease panel used for newborn screening) (2). Although there is some variation across individuals in the number of deleterious alleles per genome, we still do not know whether there are significant differences in deleterious variation among populations. Human populations vary dramatically in their levels of neutral genetic diversity, which suggests variation in the effective population size, Ne. Theory suggests that the efficacy of natural selection is reduced in populations with lower Ne because they experience greater genetic drift (3, 4). In an idealized population of constant size, the efficacy of purifying selection depends on the relationship between Ne and the selection coefficient s against deleterious mutations. If 4Nes << 1, deleterious alleles evolve as if they were neutral and can, thus, reach appreciable frequencies. This theory raises the question of whether human populations carry differential burdens of deleterious alleles due to differences in demographic history.

Several recent papers have tested for differences in the burden of deleterious alleles among populations; these papers have focused on primarily comparing populations of western European and western African ancestry. Despite similar genomic datasets, these papers have reached a variety of contradictory conclusions (4⇓⇓⇓⇓–9). Initially, Lohmueller et al. (10) found that a panel of European Americans carried proportionally more derived, deleterious alleles than a panel of African Americans, potentially as the result of the Out-of-Africa (OOA) bottleneck. More recently, analyses using NGS exome datasets from samples of analogous continental ancestry found small or no differences in the average number of deleterious alleles per genome between African Americans and European Americans—depending on which prediction algorithm was used (11⇓–13). Simulations by Fu et al. (11) found strong bottlenecks with recovery could recapitulate patterns of differences in the number of deleterious alleles between African and non-African populations, supporting Lohmueller et al. (10), but in contrast to work by Simons et al. (12).

It is important to note two facts about these contradictory observations. First, these papers tend to use different statistics, which differ in power to detect changes across populations, as well as the impact of recent demographic history (6, 11). Lohmueller et al. (10) compared the relative number of nonsynonymous to synonymous (or “probably damaging” to “benign”) SNPs per population in a sample of n chromosomes, whereas Simons et al. (12) examined the special case of n = 2 chromosomes, namely, the average number of predicted deleterious alleles per genome (i.e., heterozygous + 2 * homozygous derived variants per genome). One way to think about these statistics is that the total number of variants, S, gives equal weight, w = 1, to an SNP regardless of its frequency, p. The average number of deleterious variants statistic gives weights proportional to the expected heterozygous and homozygous frequencies or w = 2p(1 − p) + p2 = 2p − p2. The average number of deleterious alleles per genome is fairly insensitive to differences in demographic history because heterozygosity is biased toward common variants. In contrast, the proportion of deleterious alleles has greater power to detect the impact of recent demographic history for large n across the populations because it is sensitive to rare variants that tend to be more numerous, younger, and enriched for functionally important mutations (14⇓–16). Second, empirical comparisons between two populations have focused primarily on an additive model for deleterious mutations, even though there is evidence for pathogenic mutations exhibiting a recessive or dominant effect (17, 18), and possibly an inverse relationship between the strength of selection s and the dominance parameter h (19).

There remains substantial conceptual and empirical uncertainty surrounding the processes that shape the distribution of deleterious variation across human populations. We aim here to clarify three aspects underlying this controversy: (i) Are there empirical differences in the total number of deleterious alleles among multiple human populations? (ii) Which model of dominance is appropriate for deleterious alleles (i.e., should zygosity be considered in load calculations)? (iii) Are the observed patterns consistent with predictions from models of range expansions accompanied by founder effects? We address these questions with a new genomic dataset of seven globally distributed human populations.

Results

Population History and Global Patterns of Genetic Diversity.

We obtained moderate coverage whole-genome sequence (median depth 7×) and high coverage exome sequence data (median depth 78×) from individuals from seven populations from the Human Genome Diversity Panel (HGDP) (20). Unrelated individuals (no relationship closer than first cousin) were selected from seven populations chosen to represent the spectrum of human genetic variation from throughout Africa and the OOA expansion, including individuals from the Namibian San, Mbuti Pygmy (Democratic Republic of Congo), Algerian Mozabite, Pakistani Pathan, Cambodian, Siberian Yakut, and Mexican Mayan populations (Fig. 1A). The 2.48-Gb full genome callset consisted of 14,776,723 single nucleotide autosomal variants, for which we could orient 97% to ancestral/derived allele status (SI Appendix).

Fig. 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 1.

Decrease in heterozygosity and estimated Ne with distance from southern Africa. (A) Locations of HGDP populations sampled for genome and exome sequencing are indicated on the map. Putative migration paths after the origin of modern humans are indicated with arrows (adapted from ref. 46). (B) PSMC curves for individual genomes, corrected for differences in coverage. Whereas populations experiencing an OOA bottleneck have substantially reduced Ne, African populations also display a reduction in Ne between ∼100 kya and 30 kya (see SI Appendix for simulations of population history with resulting PSMC curves). (C) For each individual’s exome, the number of putatively deleterious variants (equivalent to number of heterozygotes + twice the number of derived homozygotes) is shown by population.

Heterozygosity among the seven populations decreases with distance from southern Africa, consistent with an expansion of humans from that region (21). The Namibian San population carried the highest number of derived heterozygotes, ∼2.39 million per sample, followed closely by the Mbuti Pygmies (SI Appendix, Table S1 and Fig. S5). The North African Mozabites carry more heterozygotes than the OOA populations in our dataset (2 million) but substantially fewer than the sub-Saharan samples, likely reflecting a complex history of an OOA migration, followed by reentry into North Africa and subsequent recent gene flow with neighboring African populations (22). The Maya have the lowest median number of heterozygotes in our sample, ∼1.5 million, which may be inflated due to recent European admixture (23). Two Mayan individuals displayed substantial recent European admixture (>20%) as assessed with local ancestry assignment (24) (SI Appendix, Fig. S6); these individuals were removed from analyses of deleterious variants. When we recalculated heterozygosity in the Maya, it was reduced by 3.5%. The decline in heterozygosity in OOA populations with distance from Africa strongly supports earlier results based on SNP array and microsatellite data for a serial founder effect model for the OOA dispersal (25, 26). We analyzed population history for individuals having sufficient coverage from five of the studied populations using the pairwise sequential Markovian coalescent software (PSMC) to estimate changes in Ne (11, 12, 27). Because dating demographic events with PSMC is dependent on both the assumed mutation rate and the precision with which a given event can be inferred, we compare relative bottleneck magnitudes and timing among the seven HGDP populations. Consistent with previous analyses (27), the OOA populations show a sharp reduction in Ne, with virtually identical population histories (Fig. 1B and SI Appendix). Simulations indicate that the magnitude of the 12-fold bottleneck is accurately estimated (SI Appendix, Fig. S7), even if the time of the presumed bottleneck is difficult to estimate precisely using PSMC. Interestingly, both the Mbuti and the Namibian San show a moderate reduction in Ne relative to the ancestral maximum, with the San experiencing an almost twofold reduction in Ne and the Mbuti displaying a reduction intermediate between the San and OOA populations (see also refs. 20, 28, and 29). These patterns are consistent with multiple population histories (e.g., both short and long bottlenecks) and multiple demographic events, including a reduction in substructure from the ancestral human population rather than a bottleneck per se (27).

Differences in Deleterious Alleles per Individual Genomes.

Owing to differences in coverage among the whole genome sequences, our subsequent analyses focus on the high-coverage exome dataset (78× median coverage) to minimize any bias in comparing populations (Materials and Methods). We classified all mutations discovered in the exome dataset into categories based on Genomic Evolutionary Rate Profiling (GERP) Rejected Substitution (RS) scores. These conservation scores reflect various levels of constraint within a mammalian phylogeny (Materials and Methods) and are used to categorize mutations by their predicted deleterious effect (30, 31). Importantly, the allele present in the human reference genome was not used in the GERP RS calculation, avoiding the reference-bias effect previously observed in other algorithms (11, 12) (SI Appendix, Fig. S8A). Variants were sorted into four groups reflecting the likely severity of mutational effects: “neutral” (−2 < GERP < 2), “moderate” (2 ≤ GERP < 4), “large” (4 ≤ GERP < 6), and “extreme” (GERP ≥ 6) (SI Appendix, Fig. S9). GERP categories were concordant with ANNOVAR functional annotations (SI Appendix, Table S2 and Fig. S8B).

When considering the total number of derived alleles per individual, defined here as AI = (1 × HET) + (2 × HOMder), we observe an increase of predicted deleterious alleles with distance from Africa (Fig. 1C). The number of predicted deleterious alleles per individual increases along the range expansion axis (from San to Maya), consistent with theoretical predictions for expansion load (32). The maximal difference in the number of deleterious alleles between African and OOA individuals is ∼150 alleles. This result is consistent with theoretical predictions; the rate at which deleterious mutations accumulate in wave-front populations is limited by the total number of mutations occurring during the expansion (32). Assuming an exomic mutation rate of u = 0.5 per haploid exome and an expansion that lasted for t = 1,000 generations, a very conservative upper limit for the excess of deleterious alleles in OOA individuals would be 2*u*t = 1,000. The cline in AI is most pronounced for large-effect alleles (4 ≤ GERP < 6, Fig. 2E), whereby the San individuals carry AI = 4,450 large-effect alleles on average, increasing gradually to 4,550 in Yakut. The Mayans carry slightly fewer large-effect mutations per individual than the Yakut, which may be influenced by the residual European ancestry (between 5–20%) in our sample. For extreme alleles (GERP ≥ 6), each individual in the dataset carries on average 110–120 predicted highly deleterious alleles with no significant differences among populations (Fig. 2F). The average additive GERP score—obtained by counting the GERP scores at homozygous sites twice—for all predicted deleterious variants per individual is lowest in the San (∼3.3) and highest in the Maya (∼3.8).

Fig. 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 2.

Individual counts of deleterious variants. (A–C) For each individual’s exome, the number of derived homozygotes is plotted by population for moderate-, large-, and extreme-effect GERP categories. (D–F) For each individual’s exome, the number of derived variants (equivalent to number of heterozygotes + twice the number of homozygotes) is plotted by population for moderate-, large-, and extreme-effect GERP categories.

Similar patterns are found when we consider the number of derived homozygous sites per individual. We find that individuals from OOA populations exhibit significantly more homozygotes for moderate, large, and extreme variants than African populations (Fig. 2 A–C). In addition, we observe a clear increase in the number of derived homozygotes with distance from Africa for moderate (2 ≤ GERP < 4) and large (4 ≤ GERP < 6) mutation effects categories, whereas the number of derived “extreme” homozygotes (GERP ≥ 6) is similar among OOA populations: All OOA genomes possess 30–40 extremely deleterious alleles in homozygous state (Fig. 2C). These patterns are in excellent agreement with theoretical predictions for the evolution of genetic variation during range expansions (7). The average GERP score per individual for derived homozygous variants is less differentiated than the additive model (above), varying between 2.43–2.49.

It is important to note that AI is strongly influenced by common variants. Goode et al. (33) observed that as much as 90% of deleterious alleles in a single genome have a derived allele frequency greater than 5%, suggesting that the bulk of mutational burden using this metric will come from common variants. To explore this idea, we randomly chose an individual in each population and calculated the proportion of deleterious variants that are rare (<10%, i.e., a singleton within our population samples) and common (>10%), for each GERP category (Fig. 3A). Common deleterious alleles contribute to more than 90% of an individual’s AI, and the proportion of common deleterious variants increases with distance from Africa, as can be seen by the decrease of rare deleterious variants. This includes common large-effect variants, which make up proportionally more of AI for an OOA individual than for an African individual. For example, in a Mayan individual, 93% of large-effect variants are common compared with a San individual, where only 85% of large-effect variants are common (SI Appendix, Fig. S12). Given the small number of chromosomes in each population (n = 14–16), estimates of allele frequencies are subject to sampling effects. We recently performed the same analysis on exome data from the 1000 Genome Phase 1 Project (34). We find a similar pattern as in our HGDP data: On a per-genome basis, common variants represent a majority of the alleles predicted to be deleterious (5).

Fig. 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 3.

Differences in the proportion of deleterious alleles by frequency class. (A) The proportion of rare versus common deleterious variants per individual. For a given individual, deleterious variants were divided into common (>10%, solid colors) and rare (<10%, white space). The contribution of common deleterious variants to an individual’s burden is much greater than rare variants. (B) For each population, we calculated the proportional site frequency spectrum by plotting the proportion of deleterious large-effect alleles in each frequency class (translucent coloring) along with the proportion of neutral alleles for each frequency class (opaque coloring). African populations have proportionally fewer rare deleterious alleles than expected from neutrality. Populations with OOA ancestry have proportionally more fixed deleterious mutations.

Differences in Deleterious Alleles at the Population Level.

To further elucidate the relationship between predicted mutation effect and allele frequencies, we compared the site frequency spectrum (SFS) for neutral and large- (4 ≤ GERP < 6) effect variants (Fig. 3B; see SI Appendix, Fig. S14 for a comparison between neutral and extreme variants). For all populations, singletons are enriched for deleterious variants (compared with neutral variants), consistent with the effect of purifying selection against deleterious variants (15, 35). However, the SFSs of OOA and African populations show marked differences. The neutral and deleterious SFSs of OOA populations show a global shift toward higher frequencies, consistent with the effects of serial bottlenecks/founder effects. It follows that OOA populations have fewer rare deleterious variants than Africans, as well as a larger proportion of fixed deleterious alleles; almost 7.9% of large-effect variants are fixed in the Maya, whereas the San have only 1.8% of deleterious variants fixed (Fig. 3B).

Simulations of Purifying Selection Under a Range Expansion.

We sought to interpret the population-specific patterns of genetic diversity for each GERP category under a model including serial founder effects across geographic space and purifying selection. We simulated the evolution of both neutral and deleterious mutations under a simple model of range expansion in a 2D habitat (SI Appendix, Fig. S21). At selected loci, the ancestral allele was assumed selectively neutral and mutants reduced an individual’s fitness by a factor 1 − s only if it was present in homozygous state, that is, deleterious mutations were assumed to be completely recessive. Three thousand generations (corresponding to about 75 kya) after the onset of the range expansion, we computed the average expected heterozygosity for all populations. Computational limitations of individual-based simulations prohibit a complete exploration of the parameter space for this model, but, by varying migration rates and selection coefficients, we identified parameter values that fit the observed clines in heterozygosity reasonably well (Fig. 4B). Specifically, we first identified selection coefficients that yield the same relative differences between observed neutral and selected heterozygosities (Fig. 4A). Then, the migration rate was adjusted to fit the observed clines in heterozygosities, assuming that the distance between two demes is 250 km (Fig. 4B). The fit selection coefficients were 0, 1.25 × 10−4, 1 × 10−3, and 2 × 10−3 for neutral, moderate, large, and extreme GERP scores categories, respectively; the GERP ≥ 6 category showed the worst fit and observed counts indicate that even stronger selection coefficients should be considered for these extreme mutations (16). We performed the same analysis using a model in which mutations are codominant and, as expected, we found that the fit selection coefficients are smaller than those obtained a recessive model. These coefficients are estimated as s = 0, 0.5 × 10−4, 1.2 × 10−4, and 2 × 10−4, respectively (SI Appendix, Fig. S16) (16).

Fig. 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 4.

Heterozygosity under range expansion simulations with different selection coefficients. (A) Observed and simulated patterns of the reduction of heterozygosity (RH). Selection coefficients used in the simulations are s = 0 (black), s = −0.000125 (lavender), s = −0.001 (red), and s = −0.002 (orange). (B) Colored circles show average expected heterozygosity for populations with ancestry from the OOA bottleneck. Solid lines show the regression lines obtained from simulations and dashed lines indicate 95% confidence intervals for the regression. The boxplots and colored circles on the left show the simulated heterozygosities in ancestral (i.e., African) populations, and the observed heterozygosity in our African dataset (San/Mbuti), respectively. (C) Comparison of the distribution of RH between African and non-African individuals for different GERP categories, tested with a two-tailed Student t test (SI Appendix, Fig. S15).

Evolutionary Forces Acting on Heterozygosity.

To better understand which evolutionary forces have acted in different populations to shape their levels of genetic diversity, we define a new statistic, RH. RH measures the reduction in heterozygosity at conserved sites relative to neutral heterozygosity, RH = (Hneu − Hdel)/Hneu, where Hneu indicates heterozygosity at neutral sites and Hdel at GERP score categories >2. RH can be seen as a way to quantify changes of functional diversity across populations relative to neutral expectations. For instance, a constant RH value across populations would suggests that average functional diversity is determined by the same evolutionary force(s) as neutral diversity, that is, genetic drift and migration. In contrast, if RH changes across populations, it suggests that different evolutionary forces have shaped neutral and functional diversity, that is, selection has changed functional allele frequencies.

In our dataset, RH is significantly larger in sub-Saharan Africans than in OOA populations across all functional GERP categories (Fig. 4C), indicating that selection has acted differently relative to drift between the two groups. The correlation between RH value and predicted mutation effect observed in Africa (Fig. 4A) confirms that purifying selection has kept strongly deleterious alleles at lower frequencies than in OOA populations. We then asked whether there were significant differences across OOA population, as oriented by their distance from eastern Africa. Interestingly, we see that the OOA RH values do not depend on their distance from Africa for predicted moderate-effect alleles (P = 0.82; SI Appendix, Fig. S15), suggesting that the frequencies of moderate mutations have evolved mainly according to neutral demographic processes during the range expansion out of Africa. In contrast, for strongly deleterious variants (large and extreme GERP categories) we see a significant cline in RH (P = 0.01 and P = 1.12 × 10–6, respectively; SI Appendix, Fig. S15), which implies that purifying selection has also contributed to their evolution relative to demographic processes.

Models of Dominance.

We next considered whether there is empirical evidence for nonadditive effects for deleterious variants. Prior studies generally calculated “mutation load” by assuming an additive model, summing the number of deleterious alleles per individual, without factoring in whether an SNP occurs in a homozygous or heterozygous state. Determining an individual’s mutation load is, however, highly dependent on the underlying model of dominance (36) (a formal definition of mutation load is given below). For humans, Mendelian diseases tend to be overrepresented in endogamous populations or consanguineous pairings, indicating that many of these mutations are recessive (37); Gao et al. (38) estimate 0.58 lethal recessive mutations per diploid genome in the Hutterite population. Gene conversion can also lead to differential burden of derived, recessive diseases alleles among populations (39). Even height, a largely quantitative trait, seems to be affected by the architecture of recessive homozygous alleles in different populations (40).

To further clarify the impact of dominance, we compared the distribution of deleterious variants across genes associated with dominant or recessive disease as reported in Online Mendelian Inheritance in Man (OMIM) (41). We expect to see a lower proportion of large- and extreme-effect variants in genes with dominant OMIM mutation annotations, compared with genes with recessive OMIM mutation annotations. We tested this hypothesis with the HGDP as well as the much larger 1000 Genomes Phase 1 dataset (SI Appendix, Fig. S18B). We averaged the proportion of variants within each effect category and performed a Wilcoxon test to determine whether the distribution of the proportion of large-effect variants was different between dominant and recessive genes. In the HGDP dataset, we observed P = 0.06, and for the larger 1000 Genomes dataset, P = 0.03. Our results indeed show a significantly higher proportion of large-effect variants in genes with recessive annotations, compared with genes with dominant annotations, suggesting that deleterious variants in the genome may tend to be recessive. However, we caution that OMIM genes are here annotated as dominant or recessive, whereas dominance is a property of specific mutations, and therefore all deleterious variants in a gene will not necessarily have the same dominance coefficient. Nonetheless, our results are consistent with an interpretation that genes may have certain properties, for example negative selection against dominant mutations in crucial housekeeping or developmental genes, that influence the tolerable distribution of dominance among variants. We consider the effect of dominance (summarized by h, which measures the effect of selected mutations in heterozygotes relative to homozygotes) on mutation load in the HGDP population samples given the observed differences in heterozygosity.

Modeling the Burden of Deleterious Alleles.

We modeled three different scenarios to estimate the burden of deleterious alleles across populations. The relationship between fitness W and load for a given locus v is classically defined (36) asLv=1−W=1−(1−Whet−Whom).Whet = gAa × (1 − hs) and Whom = gaa × (1 − s), where gAa and gaa are the observed genotype frequencies of the heterozygotes and derived homozygotes, respectively. The estimated population load (ignoring epistasis) is the sum of the load for all variants: LT=∑vLv. For each variant we assigned the selection coefficient inferred by the range expansion simulations according to its GERP score [see also Henn et al. (5)]. Given that we do not know the distribution of dominance effects in human variation, we started by estimating the bounds for the mutation load for each population by considering two extreme scenarios: completely recessive and complete additive models for deleterious variants. We calculated LT for each HGDP population (Fig. 5). When all mutations are considered strictly additive (h = 0.5), values for mutation load are very similar across populations, with sub-Saharan African populations having the lowest mutational load (LT =2.83), followed by the Pathan and Mozabites, and finally the Asian and Native American populations showing the highest load (LT = 2.89) (Fig. 5B). We consider this model, as adopted in earlier studies, to demonstrate that even under an additive assumption there is a statistically significant 1.7% difference in the spectrum of load between populations (SI Appendix, Fig. S24). When all mutations are considered recessive (h = 0), this model yields a much larger 45% difference in load (LT ranges between 1.27 and 1.85) between the San and the Maya (Fig. 5A). Although this is surely an overestimate, it illustrates the broad range of potential values and consistent signal in the data for differences among populations in estimated load. The mutation load under a recessive model is not explained by inbreeding, as measured by the cumulative amount of the genome in runs of homozygosity (cROH) greater than 1 Mb (r = 0.27, P = 0.55) (SI Appendix, Fig. S25); this is because the African hunter–gatherers have relatively high cROH compared with other global populations, as is commonly observed in small endogamous populations (21, 42).

Fig. 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 5.

Estimates of mutational load in seven populations as a function of dominance assumptions. Total mutation load was summed over all annotated mutations in the exome dataset for the observed heterozygote and derived homozygote genotype frequencies in each population. The cumulative mutational load is shown in increased order from neutral to extremely deleterious mutations. Strongly deleterious mutations contribute the most to mutational load. Mutations were assigned an s, selection coefficient, based on their GERP score. (A) h = 0, recessive model; (B) h = 0.05, additive model; (C) h(s), intermediate dominance model. For each selection coefficient, an h dominance coefficient was assigned based on the inverse relationship between s and h.

For the third scenario we used a model based on studies of dominance in yeast and Drosophila (19, 43, 44), in which there is an inverse relationship between selection and dominance (highly deleterious mutations tend to be recessive), and where h is sampled from a distribution following Agrawal and Whitlock (19). The maximal difference in load under this model was 30.8% (Fig. 5C), again between the San and Maya, and the minimum difference in load was 1%, between the Cambodians and Yakut. We note that the difference in relative fitness [e−L(T)] is much less than the difference in mutation load (i.e., a relative reduction of 79% in the San versus 87% in the Maya translates to a 8% difference between the two populations under the h(s) model; see also Discussion). As in the other modeled dominance scenarios, the majority of calculated mutational load is contributed by the large-effect mutational category, because this category has a relatively strong selection coefficient and thousands of mutations (>4,000 on average per individual). Thus, this category contributes proportionally more to the total load, even though the extreme-effect mutations have a higher selection coefficient. We note, however, that our assumed selection coefficients, particularly for the extreme effect, are somewhat lower than those obtained by other distribution of fitness effect studies (16, 45) and simulations under an additive model results in even smaller selection coefficients (discussed above). Because selection coefficients are the same across populations in our calculations, s will affect the absolute value of load but not relative differences across populations.

Discussion

Two primary demographic signals are reflected in human genetic data from non-African populations. First, a major 5- to 10-fold population bottleneck is associated with the OOA dispersal(s) (46⇓–48). Second, the distribution of genetic diversity among non-African populations is characterized by a decrease in heterozygosity proportional to geographic distance from northeastern Africa. A model of serial founder effects in the ancestral populations of Eurasia, Oceania, and the Americas has been posited as the most likely model for explaining the systematic variation in genetic diversity across this geographic range for humans (25, 26), as well as commensal human species (49, 50). By directly ascertaining genomic variation in over 50 individuals from seven populations, we observe a clear cline of genetic diversity as a function of distance from Africa, supporting evidence for a serial founder effect model. We also observe differences in the amount of predicted deleterious variation across populations. These differences seem to result from the genetic drift of existing deleterious variants to higher frequencies during the sequential range expansion after the OOA exit (Fig. 3B). Clines in heterozygosity for the different mutational effect categories can be reproduced by spatially explicit simulations with negative selection and recessive mutations (Fig. 4; see also codominant simulations in SI Appendix, Fig. S16). Although both moderate- and large-effect deleterious mutations have evolved under negative selection in Africa (Fig. 4C and SI Appendix, Fig. S15), many predicted moderate variants have evolved as if they were neutral in non-African populations. However, selection has remained a major force during the OOA expansion for strongly deleterious variants.

Impact of the OOA Bottleneck.

There is an ongoing debate on whether selection has been equally or more efficient in African versus non-African populations due to the major bottleneck that occurred in the ancestors of OOA populations (10, 12, 13, 35). Two studies found no significant differences in mutation load between European Americans and African Americans under an additive model with two classes of alleles: deleterious and neutral (12, 13, 33). Fu et al. (11) identified small but significant differences in the average number of alleles and the SFS, potentially due to a different algorithm for predicting mutation effect than earlier studies. We argue that estimates of the efficacy of selection should take into account not only the number of mutations per individual but also the predicted severity of mutational effect. Here, we classify mutations into four categories and find differences across populations in some, but not all, mutational categories. For variants that have putatively moderate (2 ≤ GERP < 4) or extreme deleterious effect (GERP ≥ 6), we do not see a significant difference between African and non-African populations in the number of mutations per individual. Significant per-individual differences are only observed for the intermediate large-effect category. We used PhyloP scores (51) as an alternative measure of conservation to verify our main results (SI Appendix, Fig. S26). We found qualitatively very similar patterns for both the spatial distribution of the number of derived homozygous sites per individual (SI Appendix, Fig. S26A) as well as the number of derived alleles per individual, suggesting that our results are robust to the choice of prediction algorithm that is used to estimate deleteriousness of mutations.

We note that the observed differences between populations are relatively small compared with the within-population variance (Fig. 2). Nonetheless, a novel measure of the efficacy of selection, RH, is significantly different across all three mutational categories (Fig. 4C and SI Appendix, Fig. S15) between sub-Saharan Africans and non-Africans in our dataset. That is, the observed heterozygosity at deleterious loci is greater in non-Africans than in Africans—after correcting for neutral genetic diversity in each group. This is particularly significant for moderate- and large-effect mutations, in agreement with theory that would suggest that differences in purifying selection will primarily emerge for variants at the Nes boundary.

Serial Founder Effects/Range Expansion.

Several simulation studies have attempted to characterize the distribution of deleterious alleles under OOA demographic scenarios. Some simulations focused on differences in the cumulative number of deleterious alleles per individual; others focused on differences in the proportion of segregating alleles within a population that are deleterious. Lohmueller et al. (10) found that a long bottleneck lasting more than 7,500 generations (>150,000 y) could produce the excess proportion of deleterious mutations observed in European Americans. A bottleneck model with subsequent explosive growth has also been proposed to explain the proportionally greater number of nonsynonymous or deleterious mutations in Eurasian populations (52, 53). As a consequence, deleterious mutations accumulate in populations during the expansion process. Simons et al. (12) tested a long bottleneck and subsequent population expansion model contrasting African and non-African populations and found no evidence that human demography played a role in the differential accumulation of deleterious alleles per individual.

A recent theoretical study of spatial range expansions (i.e., a model similar to geographic serial founder effects) showed that strong genetic drift at the wave front of expanding populations decreases the efficiency of selection (32). Under a spatial range expansion model, deleterious variants, unless they have a large selection coefficient, should evolve as if they are neutral on the wave front (32), and their overall frequency should therefore not change much during the range expansion (7). The loss of deleterious variants at some loci should be compensated by an increase of their frequencies at other loci. The frequency of deleterious homozygotes should therefore increase with distance from Africa, which is observed here in the rightward shift of the SFS in OOA populations (Fig. 3), except for the most evolutionarily constrained sites. We can address the question of whether this increased frequency is driven entirely by drift and gene surfing or by differential selection in non-African populations by considering the spatial distribution of the RH statistic (Fig. 4C). The fact that RH does not change among OOA populations for moderately deleterious alleles suggests that they have evolved as if they were neutral alleles during the expansion and that selection has not yet purged the deleterious mutations that increased in frequency. In contrast, extremely deleterious alleles (GERP ≥ 6) exhibit similar heterozygosity in all OOA populations, suggesting that they are subject to similar levels of purifying selection in these populations. The remaining deleterious alleles (4 ≤ GERP < 6) present an intermediate pattern, implying that both drift and selection have acted on this category of sites.

A recent controversy concerns whether there are differences in the efficacy of purifying selection between African and non-African populations (6, 12, 13). It is difficult to discuss our results in the context of this controversy because there is no generally accepted definition of “efficacy of selection,” and different definitions will lead to different interpretations (4). We therefore prefer to interpret our results in the context of our spatially explicit model of range expansions, and the relative roles of drift and selection in this model. Recurrent founder events should contribute to a decrease in the effective population sizes with distance from Africa, and it is commonly assumed that selection will become weaker with smaller effective population sizes. However, reducing the impact of a range expansion to a simple gradient in effective size, and thus to a decrease of the efficacy of selection, can be misleading. Diversity-based estimates of Ne are not necessarily informative about the strength of selection in nonequilibrium scenarios because estimates of Ne may lag behind recent demographic changes (e.g., ref. 54). Rather, if one considers that deleterious alleles were kept at low frequencies by purifying selection in ancestral African populations, those that increased in frequency by gene surfing during the OOA expansion also became more accessible to subsequent selection, especially for those alleles that were recessive. The observed cline in RH for large-effect mutations is more compatible with an unequal purging of deleterious variants by selection. Indeed, selection will have had less time to act on newly formed populations that are further away from Africa, and it will also operate more slowly on populations that have less diversity and therefore lower interindividual differences in fitness. Furthermore, the fact that our simulations can reproduce the observed pattern with spatially uniform population sizes and strength of selection against deleterious mutations implies that the simulated gradients in RH in Fig. 4A, as well as the increased number of deleterious homozygous sites, is not the consequence of reduced strength of selection away from Africa. Rather, it is caused by increased drift during the expansion, as well as by differential purging of deleterious mutations after the expansion.

The Importance of Dominance.

Multiple modeling assumptions are crucial when considering the burden of deleterious alleles across populations. In addition to the selection coefficients, the assumed dominance terms are critical. An estimated 16% of Mendelian diseases are known to be autosomal recessive (estimated from the OMIM) and many contribute significantly to infant mortality. Owing to the difficulty of detecting recessive diseases, unless they are extremely damaging, there are potentially many more disease mutations that have an h coefficient less than 0.5. Autosomal recessive diseases seem to be more frequent than autosomal dominant diseases (55), and even mildly deleterious mutations are predicted to have a mean h of 0.25 (56). Although formal calculations of genetic load require multiple assumptions, we demonstrate that differences in calculated load across human populations are primarily sensitive to assumptions about dominance, as expected given the increased extent of homozygosity in OOA populations.

We have modeled deleterious mutations as having variable h coefficients. Whereas strongly deleterious mutations are likely recessive, dominance for weakly deleterious mutations is particularly problematic to estimate because there is less power to measure weak effects and h may be upwardly biased in model organism competition experiments (19). When sampling h coefficients under our model, we allowed weakly deleterious mutations to be assigned a coefficient h > 0.5, but this had little effect on mutational load because the bulk of the load was contributed by large-effect variants. However, a fraction of strongly deleterious mutations are clearly dominant, as ascertained from disease studies, and future work may need to model different mixtures distributions on h. We also note that the absolute mutational load is twofold higher under an additive model than under a recessive model (Fig. 5), as expected from theory (36).

Estimates of Mutational Load.

We estimate that there are differences in mutational burden calculated using a formal load model, among extant human populations, particularly if we depart from a simple additive assumption. We found that the change in mutation load between sub-Saharan African populations versus Native American populations (the two ends of the range) were significantly different at P < 0.05 under recessive, partially recessive, and additive models (SI Appendix, Fig. S24). Mutational load under a fully or partially recessive model is 10 to 30% greater in non-African populations (Fig. 5A), as the result of higher homozygosity from the legacy of the OOA bottleneck across all (deleterious) mutation categories [e.g., LT(Mbuti) = 1.59 and LT(Yakut) = 1.95 under the h(s) model]. All populations carry significant load, relative to a population with the alternate, ancestral allele genotype. Under a model where fitness differences are determined only by genotype and environments are equal across individuals, the relative fitness [e−L(T)] of 0.204 for the Mbuti indicates a reduction in fitness of 79.6%, whereas a relative fitness of 0.142 for the Yakut indicates an 85.7% reduction. These fitness differences are relatively small, even under a partially recessive model.

Although illustrative, such models of load have important limitations. The mutations identified in this dataset have not been functionally characterized and are predicted to be deleterious based on degree of sequence conservation. The assumed selective coefficients across GERP categories are fit based on a recessive model, which is not applicable to all sites. However, although different selection coefficients will change the values of load in our calculation, it will not change the relative difference among populations because the same set of coefficients were applied to all populations (5). If mutations have different fitness effects across heterogeneous global environments, then the values of mutation load will change. Indeed, a proportion of the alleles may be locally adaptive, or neutral, and hence the sign of the selection coefficient for the mutation would be misestimated in our analysis. For example, the Duffy null allele is classified as a large-effect mutation using GERP (RS = 4.27) and is found at high frequency in western Africa; however, it has likely increased in frequency due to positive selection as a response to malaria (57). Recent genome-wide studies have stressed the paucity of selective sweeps in the human genome (35, 58, 59); only 0.5% of nonsynonymous mutations in 1000 Genomes Pilot Project were identified has having undergone positive selection. Others have emphasized evidence for pervasive adaptive selection (60, 61) and a variety of studies have identified specific beneficial alleles locally adapted to high altitude, immune response, and pigmentation (62⇓–64). We considered local adaptive evolution by examining highly differentiated alleles in our dataset, that is, alleles that differ by 80% in frequency between a pair of populations, indicative of a strong local adaptation. We find that highly differentiated alleles have the same GERP score distribution as nondifferentiated alleles, indicating there is little reason to believe that most large- and extreme-effect mutations have been subjected to strong local adaptation (SI Appendix, Fig. S20; also see ref. 65). We conclude that the raw, calculated mutational burden may differ across human populations, although the effects of positive selection, varying environments, and epistasis have yet to be explored and remain a significant challenge to fully understanding mutational burden.

Conclusions.

A major difference between our work and previous results is the interpretative framework we present, which underlines the role of range expansions out of Africa to explain patterns of neutral and functional diversity. Whereas previous comparisons between African and non-African diversity attributed the observed increased proportion of deleterious variants in non-Africans to the OOA bottleneck (10), our study shows that a single bottleneck is not sufficient to reproduce the gradient we observe in the number of deleterious alleles per individual with distance from Africa (Fig. 2). Taking into account the range expansion of modern humans (66) sheds new light on this apparent controversy. Finally, we note that recent simulation work (4) suggests that the impact of a bottleneck on the efficacy of natural selection depends critically on the distribution of fitness and dominance effects as well as postbottleneck demographic history. Although these models and parameter choices clearly affect the interpretation of the pattern of deleterious alleles across populations, we find empirical evidence for significant differences in deleterious alleles as tabulated by a variety of statistics across the spectrum of human genetic diversity.

Materials and Methods

Samples and Data.

Aliquots of DNA isolated from cultured lymphoblastoid cell lines were obtained from Centre d’Étude du Polymorphisme and prepared for both full genome sequencing on Illumina HiSeq technology and exome capture with an Agilent SureSelect 44Mb array. One hundred one base pair read-pairs were mapped onto the human genome reference (GRCh37) using a mapping and variant calling pipeline designed to effectively manage massive amounts of short-read data. This pipeline followed many of the best practices developed by the 1000 Genomes Project Consortium (34).

Variant Annotation.

Ancestral state was inferred based on orthologous regions in a great ape and rhesus macaque phylogeny as reported by Ensembl Compara and used by the 1000 Genomes Project. To determine the biological impact of a variant we used GERP score (30) as a measure of conservation across a phylogeny. Positive scores reflect a site showing a high degree of conservation, based on the inferred number of “rejected substitutions” across the phylogeny. GERP scores were obtained from the University of California, Santa Cruz genome browser (hgdownload.cse.ucsc.edu/gbdb/hg19/bbi/All_hg19_RS.bw) based on an alignment of 35 mammals to human. The allele represented in the human hg19 sequence was not included in the calculation of GERP RS scores. The human reference sequence was excluded from the alignment for the calculation of both the neutral rate and site-specific “observed” rate for the RS score to prevent any bias in the estimates. In addition to GERP, we also used PhyloP scores (51) as measures of genomic constraint during the evolution of mammals. We used the PhyloPNH scores computed in Fu et al. (11) from the 36 eutherian-mammal EPO alignments [available in Ensembl release 70 (67)], which is also computed without using the human reference sequence.

Classification of Mutation Effects by GERP Scores.

Variants were classified as being neutral, moderate, large, or extreme for GERP scores with ranges [−2,2], [2,4], [4,6], and [6,max], respectively. The use of four “bins” of GERP scores simplifies the range expansion simulations performed for distinct selection coefficients. For every individual the total number of derived deleterious counts found in homozygosity (i.e., 2 × HOM), and the total number of deleterious counts [i.e., HET + (2 × HOM)] within each category was recorded.

Individual-Based Simulations.

To simulate changes in heterozygosity, we modeled human range expansion across an array of 10 × 100 demes (32). After reaching migration-selection-drift equilibrium, populations expand into the empty territory, which is separated from the ancestral population by a geographical barrier, through a spatial bottleneck (SI Appendix, Fig. S21). After 3,000 generations, we computed the average expected heterozygosity for all populations. The migration rate and selection coefficients were adjusted to generate heterozygosity consistent with the observed data, without formally maximizing the fit. The code used for simulations can be downloaded from https://github.com/CMPG/ADMRE.

Calculating Load.

Mutational load was calculated following Kimura et al. (36), but using observed genotype frequencies instead of inferring them from Hardy–Weinberg based on the allele frequencies. In this way, the fitness of the heterozygotes and the homozygotes will be Whet = Aa × (1 − hs) and Whom = aa × (1 − s), where Aa and aa are the genotype frequencies of the heterozygotes and derived homozygotes, respectively. The fitness for a given variant will be relative to that of the ancestral variant, which for numerical convenience is set to 1. The relationship between fitness and load is Lv = 1 – W = 1 – (1 – Whet – Whom), and the total population load is the sum of the load for all variants, LT=∑vLv.

Acknowledgments

We thank Chris Tyler-Smith, David Reich, Yuval Simons, Spencer Koury, and Simon Gravel for helpful discussion. L.R.B. was supported by a Beatriu de Pinós Programme Fellowship. This work was supported by NIH Grants 3R01HG003229 (to C.D.B. and B.M.H.) and DP5OD009154 (to J.M.K.). S.P. and I.D. were supported by Swiss SNSF Grant 31003A-143393 (to L.E.).

Footnotes

  • ↵1B.M.H., L.R.B., and S.P. contributed equally to this work.

  • ↵2To whom correspondence may be addressed. Email: brenna.henn{at}stonybrook.edu or cdbustam{at}stanford.edu.
  • ↵3Deceased May 3, 2014.

  • ↵4L.E., J.M.K., and C.D.B. contributed equally to this work.

  • Author contributions: B.M.H., M.P.S., L.E., J.M.K., and C.D.B. designed research; B.M.H., L.R.B., S.P., and J.M.K. performed research; S.P., H.C., and L.E. contributed new reagents/analytic tools; B.M.H., L.R.B., S.P., I.D., M.L., B.K.M., A.R.M., S.M., and J.M.K. analyzed data; and B.M.H., L.R.B., S.P., L.E., J.M.K., and C.D.B. wrote the paper.

  • Conflict of interest statement: C.D.B. is the founder of IdentifyGenomics, LLC, and is on the scientific advisory boards of Personalis, Inc. and Ancestry.com as well as the medical advisory board InVitae. None of this played a role in the design, execution, or interpretation of experiments and results presented here.

  • This article is a PNAS Direct Submission. C.F.A. is a guest editor invited by the Editorial Board.

  • Data deposition: The sequence reported in this paper has been deposited in the NCBI Sequence Read Archive (accession no. SRP036155).

  • See Commentary on page 809.

  • This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1510805112/-/DCSupplemental.

Freely available online through the PNAS open access option.

References

  1. ↵
    1. Morton NE,
    2. Crow JF,
    3. Muller HJ
    (1956) An estimate of the mutational damage in man from data on consanguienous marriages. Proc Natl Acad Sci USA 42(11):855–863
    .
    OpenUrlFREE Full Text
  2. ↵
    1. Tabor HK, et al., NHLBI Exome Sequencing Project
    (2014) Pathogenic variants for Mendelian and complex traits in exomes of 6,517 European and African Americans: Implications for the return of incidental results. Am J Hum Genet 95(2):183–193
    .
    OpenUrlCrossRefPubMed
  3. ↵
    1. Ohta T
    (1973) Slightly deleterious mutant substitutions in evolution. Nature 246(5428):96–98
    .
    OpenUrlCrossRefPubMed
  4. ↵
    1. Gravel S
    (2014) When is selection effective? bioRXiv, dx.doi.org/10.1101/010934
    .
  5. ↵
    1. Henn BM,
    2. Botigué LR,
    3. Bustamante CD,
    4. Clark AG,
    5. Gravel S
    (2015) Estimating the mutation load in human genomes. Nat Rev Genet 16(6):333–343
    .
    OpenUrlCrossRefPubMed
  6. ↵
    1. Lohmueller KE
    (2014) The distribution of deleterious genetic variation in human populations. Curr Opin Genet Dev 29:139–146
    .
    OpenUrlCrossRefPubMed
  7. ↵
    1. Peischl S,
    2. Excoffier L
    (2015) Expansion load: Recessive mutations and the role of standing genetic variation. Mol Ecol 24(9):2084–2094
    .
    OpenUrlCrossRef
  8. ↵
    1. Casals F, et al.
    (2013) Whole-exome sequencing reveals a rapid change in the frequency of rare functional variants in a founding population of humans. PLoS Genet 9(9):e1003815
    .
    OpenUrlCrossRefPubMed
  9. ↵
    1. Kehdy FSG, et al., Brazilian EPIGEN Project Consortium
    (2015) Origin and dynamics of admixture in Brazilians and its effect on the pattern of deleterious mutations. Proc Natl Acad Sci USA 112(28):8696–8701
    .
    OpenUrlAbstract/FREE Full Text
  10. ↵
    1. Lohmueller KE, et al.
    (2008) Proportionally more deleterious genetic variation in European than in African populations. Nature 451(7181):994–997
    .
    OpenUrlCrossRefPubMed
  11. ↵
    1. Fu W,
    2. Gittelman RM,
    3. Bamshad MJ,
    4. Akey JM
    (2014) Characteristics of neutral and deleterious protein-coding variation among individuals and populations. Am J Hum Genet 95(4):421–436
    .
    OpenUrlCrossRefPubMed
  12. ↵
    1. Simons YB,
    2. Turchin MC,
    3. Pritchard JK,
    4. Sella G
    (2014) The deleterious mutation load is insensitive to recent population history. Nat Genet 46(3):220–224
    .
    OpenUrlCrossRefPubMed
  13. ↵
    1. Do R, et al.
    (2015) No evidence that selection has been less effective at removing deleterious mutations in Europeans than in Africans. Nat Genet 47(2):126–131
    .
    OpenUrlCrossRefPubMed
  14. ↵
    1. Fu W, et al., NHLBI Exome Sequencing Project
    (2013) Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493(7431):216–220
    .
    OpenUrlCrossRefPubMed
  15. ↵
    1. Tennessen JA, et al., Broad GO; Seattle GO; NHLBI Exome Sequencing Project
    (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337(6090):64–69
    .
    OpenUrlAbstract/FREE Full Text
  16. ↵
    1. Boyko AR, et al.
    (2008) Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet 4(5):e1000083
    .
    OpenUrlCrossRefPubMed
  17. ↵
    1. Bittles AH,
    2. Black ML
    (2010) Evolution in health and medicine Sackler colloquium: Consanguinity, human evolution, and complex diseases. Proc Natl Acad Sci USA 107(Suppl 1):1779–1786
    .
    OpenUrlAbstract/FREE Full Text
  18. ↵
    1. Slatkin M
    (2004) A population-genetic test of founder effects and implications for Ashkenazi Jewish diseases. Am J Hum Genet 75(2):282–293
    .
    OpenUrlCrossRefPubMed
  19. ↵
    1. Agrawal AF,
    2. Whitlock MC
    (2011) Inferences about the distribution of dominance drawn from yeast gene knockout data. Genetics 187(2):553–566
    .
    OpenUrlAbstract/FREE Full Text
  20. ↵
    1. Cann HM, et al.
    (2002) A human genome diversity cell line panel. Science 296(5566):261–262
    .
    OpenUrlCrossRefPubMed
  21. ↵
    1. Henn BM, et al.
    (2011) Hunter-gatherer genomic diversity suggests a southern African origin for modern humans. Proc Natl Acad Sci USA 108(13):5154–5162
    .
    OpenUrlAbstract/FREE Full Text
  22. ↵
    1. Henn BM, et al.
    (2012) Genomic ancestry of North Africans supports back-to-Africa migrations. PLoS Genet 8(1):e1002397
    .
    OpenUrlCrossRefPubMed
  23. ↵
    1. Wang S, et al.
    (2007) Genetic variation and population structure in native Americans. PLoS Genet 3(11):e185
    .
    OpenUrlCrossRefPubMed
  24. ↵
    1. Maples BK,
    2. Gravel S,
    3. Kenny EE,
    4. Bustamante CD
    (2013) RFMix: A discriminative modeling approach for rapid and robust local-ancestry inference. Am J Hum Genet 93(2):278–288
    .
    OpenUrlCrossRefPubMed
  25. ↵
    1. Ramachandran S, et al.
    (2005) Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc Natl Acad Sci USA 102(44):15942–15947
    .
    OpenUrlAbstract/FREE Full Text
  26. ↵
    1. Prugnolle F,
    2. Manica A,
    3. Balloux F
    (2005) Geography predicts neutral genetic diversity of human populations. Curr Biol 15(5):R159–R160
    .
    OpenUrlCrossRefPubMed
  27. ↵
    1. Li H,
    2. Durbin R
    (2011) Inference of human population history from individual whole-genome sequences. Nature 475(7357):493–496
    .
    OpenUrlCrossRefPubMed
  28. ↵
    1. Meyer M, et al.
    (2012) A high-coverage genome sequence from an archaic Denisovan individual. Science 338(6104):222–226
    .
    OpenUrlAbstract/FREE Full Text
  29. ↵
    1. Kidd JM, et al.
    (2012) Population genetic inference from personal genome data: Impact of ancestry and admixture on human genomic variation. Am J Hum Genet 91(4):660–671
    .
    OpenUrlCrossRefPubMed
  30. ↵
    1. Cooper GM, et al., NISC Comparative Sequencing Program
    (2005) Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 15(7):901–913
    .
    OpenUrlAbstract/FREE Full Text
  31. ↵
    1. Cooper GM, et al.
    (2010) Single-nucleotide evolutionary constraint scores highlight disease-causing mutations. Nat Methods 7(4):250–251
    .
    OpenUrlCrossRefPubMed
  32. ↵
    1. Peischl S,
    2. Dupanloup I,
    3. Kirkpatrick M,
    4. Excoffier L
    (2013) On the accumulation of deleterious mutations during range expansions. Mol Ecol 22(24):5972–5982
    .
    OpenUrlCrossRef
  33. ↵
    1. Goode DL, et al.
    (2010) Evolutionary constraint facilitates interpretation of genetic variation in resequenced human genomes. Genome Res 20(3):301–310
    .
    OpenUrlAbstract/FREE Full Text
  34. ↵
    1. 1000 Genomes Project Consortium
    (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65
    .
    OpenUrlCrossRefPubMed
  35. ↵
    1. Lohmueller KE, et al.
    (2011) Natural selection affects multiple aspects of genetic variation at putatively neutral sites across the human genome. PLoS Genet 7(10):e1002326
    .
    OpenUrlCrossRefPubMed
  36. ↵
    1. Kimura M,
    2. Maruyama T,
    3. Crow JF
    (1963) The mutation load in small populations. Genetics 48:1303–1312
    .
    OpenUrlFREE Full Text
  37. ↵
    1. Reich DE,
    2. Lander ES
    (2001) On the allelic spectrum of human disease. Trends Genet 17(9):502–510
    .
    OpenUrlCrossRefPubMed
  38. ↵
    1. Gao Z,
    2. Waggoner D,
    3. Stephens M,
    4. Ober C,
    5. Przeworski M
    (2015) An estimate of the average number of recessive lethal mutations carried by humans. Genetics 199(4):1243–1254
    .
    OpenUrlAbstract/FREE Full Text
  39. ↵
    1. Lachance J,
    2. Tishkoff SA
    (2014) Biased gene conversion skews allele frequencies in human populations, increasing the disease burden of recessive alleles. Am J Hum Genet 95(4):408–420
    .
    OpenUrlCrossRefPubMed
  40. ↵
    1. McQuillan R, et al., ROHgen Consortium
    (2012) Evidence of inbreeding depression on human height. PLoS Genet 8(7):e1002655
    .
    OpenUrlCrossRefPubMed
  41. ↵
    1. Hamosh A,
    2. Scott AF,
    3. Amberger JS,
    4. Bocchini CA,
    5. McKusick VA
    (2005) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33(Database issue):D514–D517
    .
    OpenUrlAbstract/FREE Full Text
  42. ↵
    1. Henn BM, et al.
    (2012) Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples. PLoS One 7(4):e34267
    .
    OpenUrlCrossRefPubMed
  43. ↵
    1. Mukai T,
    2. Chigusa SI,
    3. Mettler LE,
    4. Crow JF
    (1972) Mutation rate and dominance of genes affecting viability in Drosophila melanogaster. Genetics 72(2):335–355
    .
    OpenUrlAbstract/FREE Full Text
  44. ↵
    1. Houle D,
    2. Hughes KA,
    3. Assimacopoulos S,
    4. Charlesworth B
    (1997) The effects of spontaneous mutation on quantitative traits. II. Dominance of mutations with effects on life-history traits. Genet Res 70(1):27–34
    .
    OpenUrlCrossRefPubMed
  45. ↵
    1. Racimo F,
    2. Schraiber JG
    (2014) Approximation to the distribution of fitness effects across functional categories in human segregating polymorphisms. PLoS Genet 10(11):e1004697
    .
    OpenUrlCrossRefPubMed
  46. ↵
    1. Henn BM,
    2. Cavalli-Sforza LL,
    3. Feldman MW
    (2012) The great human expansion. Proc Natl Acad Sci USA 109(44):17758–17764
    .
    OpenUrlAbstract/FREE Full Text
  47. ↵
    1. Laval G,
    2. Patin E,
    3. Barreiro LB,
    4. Quintana-Murci L
    (2010) Formulating a historical and demographic model of recent human evolution based on resequencing data from noncoding regions. PLoS One 5(4):e10284
    .
    OpenUrlCrossRefPubMed
  48. ↵
    1. Marth GT,
    2. Czabarka E,
    3. Murvai J,
    4. Sherry ST
    (2004) The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics 166(1):351–372
    .
    OpenUrlAbstract/FREE Full Text
  49. ↵
    1. Tanabe K, et al.
    (2010) Plasmodium falciparum accompanied the human expansion out of Africa. Curr Biol 20(14):1283–1289
    .
    OpenUrlCrossRefPubMed
  50. ↵
    1. Linz B, et al.
    (2007) An African origin for the intimate association between humans and Helicobacter pylori. Nature 445(7130):915–918
    .
    OpenUrlCrossRefPubMed
  51. ↵
    1. Pollard KS,
    2. Hubisz MJ,
    3. Rosenbloom KR,
    4. Siepel A
    (2010) Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 20(1):110–121
    .
    OpenUrlAbstract/FREE Full Text
  52. ↵
    1. Keinan A,
    2. Clark AG
    (2012) Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 336(6082):740–743
    .
    OpenUrlAbstract/FREE Full Text
  53. ↵
    1. Lohmueller KE
    (2014) The impact of population demography and selection on the genetic architecture of complex traits. PLoS Genet 10(5):e1004379
    .
    OpenUrlCrossRefPubMed
  54. ↵
    1. Pennings PS,
    2. Kryazhimskiy S,
    3. Wakeley J
    (2014) Loss and recovery of genetic diversity in adapting populations of HIV. PLoS Genet 10(1):e1004000
    .
    OpenUrlCrossRefPubMed
  55. ↵
    1. Erickson RP,
    2. Mitchison NA
    (2014) The low frequency of recessive disease: Insights from ENU mutagenesis, severity of disease phenotype, GWAS associations, and demography: An analytical review. J Appl Genet 55(3):319–327
    .
    OpenUrlCrossRefPubMed
  56. ↵
    1. Manna F,
    2. Martin G,
    3. Lenormand T
    (2011) Fitness landscapes: An alternative theory for the dominance of mutation. Genetics 189(3):923–937
    .
    OpenUrlAbstract/FREE Full Text
  57. ↵
    1. Sabeti PC, et al., International HapMap Consortium
    (2007) Genome-wide detection and characterization of positive selection in human populations. Nature 449(7164):913–918
    .
    OpenUrlCrossRefPubMed
  58. ↵
    1. Hernandez RD, et al., 1000 Genomes Project
    (2011) Classic selective sweeps were rare in recent human evolution. Science 331(6019):920–924
    .
    OpenUrlAbstract/FREE Full Text
  59. ↵
    1. Granka JM, et al.
    (2012) Limited evidence for classic selective sweeps in African populations. Genetics 192(3):1049–1064
    .
    OpenUrlAbstract/FREE Full Text
  60. ↵
    1. Enard D,
    2. Messer PW,
    3. Petrov DA
    (2014) Genome-wide signals of positive selection in human evolution. Genome Res 24(6):885–895
    .
    OpenUrlAbstract/FREE Full Text
  61. ↵
    1. Grossman SR, et al., 1000 Genomes Project
    (2013) Identifying recent adaptations in large-scale genomic data. Cell 152(4):703–713
    .
    OpenUrlCrossRefPubMed
  62. ↵
    1. Yi X, et al.
    (2010) Sequencing of 50 human exomes reveals adaptation to high altitude. Science 329(5987):75–78
    .
    OpenUrlAbstract/FREE Full Text
  63. ↵
    1. Pickrell JK, et al.
    (2009) Signals of recent positive selection in a worldwide sample of human populations. Genome Res 19(5):826–837
    .
    OpenUrlAbstract/FREE Full Text
  64. ↵
    1. Scheinfeldt LB,
    2. Tishkoff SA
    (2013) Recent human adaptation: Genomic approaches, interpretation and insights. Nat Rev Genet 14(10):692–702
    .
    OpenUrlPubMed
  65. ↵
    1. Coop G, et al.
    (2009) The role of geography in human adaptation. PLoS Genet 5(6):e1000500
    .
    OpenUrlCrossRefPubMed
  66. ↵
    1. Sousa V,
    2. Peischl S,
    3. Excoffier L
    (2014) Impact of range expansions on current human genomic diversity. Curr Opin Genet Dev 29:22–30
    .
    OpenUrlCrossRefPubMed
  67. ↵
    1. Flicek P, et al.
    (2013) Ensembl 2013. Nucleic Acids Res 41(Database issue):D48–D55
    .
    OpenUrlAbstract/FREE Full Text
PreviousNext
Back to top
Article Alerts
Email Article

Thank you for your interest in spreading the word on PNAS.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Distance from sub-Saharan Africa predicts mutational load in diverse human genomes
(Your Name) has sent you a message from PNAS
(Your Name) thought you would like to see the PNAS web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Deleterious alleles in human genomes
Brenna M. Henn, Laura R. Botigué, Stephan Peischl, Isabelle Dupanloup, Mikhail Lipatov, Brian K. Maples, Alicia R. Martin, Shaila Musharoff, Howard Cann, Michael P. Snyder, Laurent Excoffier, Jeffrey M. Kidd, Carlos D. Bustamante
Proceedings of the National Academy of Sciences Jan 2016, 113 (4) E440-E449; DOI: 10.1073/pnas.1510805112

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Request Permissions
Share
Deleterious alleles in human genomes
Brenna M. Henn, Laura R. Botigué, Stephan Peischl, Isabelle Dupanloup, Mikhail Lipatov, Brian K. Maples, Alicia R. Martin, Shaila Musharoff, Howard Cann, Michael P. Snyder, Laurent Excoffier, Jeffrey M. Kidd, Carlos D. Bustamante
Proceedings of the National Academy of Sciences Jan 2016, 113 (4) E440-E449; DOI: 10.1073/pnas.1510805112
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Mendeley logo Mendeley

Article Classifications

  • Biological Sciences
  • Genetics
  • Evolution

See related content:

  • The human mutational burden
    - Jan 19, 2016
Proceedings of the National Academy of Sciences: 113 (4)
Table of Contents

Submit

Sign up for Article Alerts

Jump to section

  • Article
    • Abstract
    • Results
    • Discussion
    • Materials and Methods
    • Acknowledgments
    • Footnotes
    • References
  • Figures & SI
  • Info & Metrics
  • PDF

You May Also be Interested in

Water from a faucet fills a glass.
News Feature: How “forever chemicals” might impair the immune system
Researchers are exploring whether these ubiquitous fluorinated molecules might worsen infections or hamper vaccine effectiveness.
Image credit: Shutterstock/Dmitry Naumov.
Reflection of clouds in the still waters of Mono Lake in California.
Inner Workings: Making headway with the mysteries of life’s origins
Recent experiments and simulations are starting to answer some fundamental questions about how life came to be.
Image credit: Shutterstock/Radoslaw Lecyk.
Cave in coastal Kenya with tree growing in the middle.
Journal Club: Small, sharp blades mark shift from Middle to Later Stone Age in coastal Kenya
Archaeologists have long tried to define the transition between the two time periods.
Image credit: Ceri Shipton.
Illustration of groups of people chatting
Exploring the length of human conversations
Adam Mastroianni and Daniel Gilbert explore why conversations almost never end when people want them to.
Listen
Past PodcastsSubscribe
Panda bear hanging in a tree
How horse manure helps giant pandas tolerate cold
A study finds that giant pandas roll in horse manure to increase their cold tolerance.
Image credit: Fuwen Wei.

Similar Articles

Site Logo
Powered by HighWire
  • Submit Manuscript
  • Twitter
  • Facebook
  • RSS Feeds
  • Email Alerts

Articles

  • Current Issue
  • Special Feature Articles – Most Recent
  • List of Issues

PNAS Portals

  • Anthropology
  • Chemistry
  • Classics
  • Front Matter
  • Physics
  • Sustainability Science
  • Teaching Resources

Information

  • Authors
  • Editorial Board
  • Reviewers
  • Subscribers
  • Librarians
  • Press
  • Cozzarelli Prize
  • Site Map
  • PNAS Updates
  • FAQs
  • Accessibility Statement
  • Rights & Permissions
  • About
  • Contact

Feedback    Privacy/Legal

Copyright © 2021 National Academy of Sciences. Online ISSN 1091-6490