Skip to main content
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian
  • Log in
  • My Cart

Main menu

  • Home
  • Articles
    • Current
    • Latest Articles
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • Archive
  • Front Matter
  • News
    • For the Press
    • Highlights from Latest Articles
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Purpose and Scope
    • Editorial and Journal Policies
    • Submission Procedures
    • For Reviewers
    • Author FAQ
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Home
Home

Advanced Search

  • Home
  • Articles
    • Current
    • Latest Articles
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • Archive
  • Front Matter
  • News
    • For the Press
    • Highlights from Latest Articles
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Purpose and Scope
    • Editorial and Journal Policies
    • Submission Procedures
    • For Reviewers
    • Author FAQ

New Research In

Physical Sciences

Featured Portals

  • Physics
  • Chemistry
  • Sustainability Science

Articles by Topic

  • Applied Mathematics
  • Applied Physical Sciences
  • Astronomy
  • Computer Sciences
  • Earth, Atmospheric, and Planetary Sciences
  • Engineering
  • Environmental Sciences
  • Mathematics
  • Statistics

Social Sciences

Featured Portals

  • Anthropology
  • Sustainability Science

Articles by Topic

  • Economic Sciences
  • Environmental Sciences
  • Political Sciences
  • Psychological and Cognitive Sciences
  • Social Sciences

Biological Sciences

Featured Portals

  • Sustainability Science

Articles by Topic

  • Agricultural Sciences
  • Anthropology
  • Applied Biological Sciences
  • Biochemistry
  • Biophysics and Computational Biology
  • Cell Biology
  • Developmental Biology
  • Ecology
  • Environmental Sciences
  • Evolution
  • Genetics
  • Immunology and Inflammation
  • Medical Sciences
  • Microbiology
  • Neuroscience
  • Pharmacology
  • Physiology
  • Plant Biology
  • Population Biology
  • Psychological and Cognitive Sciences
  • Sustainability Science
  • Systems Biology
Research Article

Evidence for recent, population-specific evolution of the human mutation rate

Kelley Harris
PNAS March 17, 2015 112 (11) 3439-3444; first published March 2, 2015 https://doi.org/10.1073/pnas.1418652112
Kelley Harris
Department of Mathematics, University of California, Berkeley, CA 94703
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: harris.kelley@gmail.com
  1. Edited by Mark Stoneking, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, and accepted by the Editorial Board February 6, 2015 (received for review September 26, 2014)

  • Article
  • Figures & SI
  • Info & Metrics
  • PDF
Loading

Significance

Most, but not all, human genetic variation is shared between populations, and whole-genome sequencing is now making it possible to catalogue population-private mutations that occur in only one ethnic group and are especially informative about recent human history. By contrasting frequencies of mutations private to Europe, Asia, and Africa, I have revealed a previously undetected difference between Europeans and other ethnic groups: Europeans experience higher rates of a specific mutation type that has known associations with UV light exposure. Although it is not clear whether the excess mutations are harmful or directly related to the UV sensitivity of light skin, this result demonstrates that the human mutation rate has evolved on a much faster timescale than previously believed.

Abstract

As humans dispersed out of Africa they adapted to new environmental challenges, including changes in exposure to mutagenic solar radiation. Humans in temperate latitudes have acquired light skin that is relatively transparent to UV light, and some evidence suggests that their DNA damage response pathways have also experienced local adaptation. This raises the possibility that different populations have experienced different selective pressures affecting genome integrity. Here, I present evidence that the rate of a particular mutation type has recently increased in the European population, rising in frequency by 50% during the 40,000–80,000 y since Europeans began diverging from Asians. A comparison of SNPs private to Africa, Asia, and Europe in the 1000 Genomes data reveals that private European variation is enriched for the transition 5′-TCC-3′ → 5′-TTC-3′. Although it is not clear whether UV played a causal role in changing the European mutational spectrum, 5′-TCC-3′ → 5′-TTC-3′ is known to be the most common somatic mutation present in melanoma skin cancers, as well as the mutation most frequently induced in vitro by UV. Regardless of its causality, this change indicates that DNA replication fidelity has not remained stable even since the origin of modern humans and might have changed numerous times during our recent evolutionary history.

  • mutation rate
  • molecular clock
  • human evolution
  • genetic variation
  • UV damage

Anatomically modern humans left Africa less than 200,000 y ago and have since dispersed into every habitable environment (1). Because different habitats have presented humans with diverse environmental challenges, many local adaptations have caused human populations to diverge phenotypically from one another. Some adaptations such as light and dark skin pigmentation have been studied since the early days of evolutionary theory (2⇓–4). However, other putative genetic signals of local adaptation are poorly understood with respect to their phenotypic effects (5⇓–7).

One phenotype that is notoriously hard to measure is the human germ-line mutation rate. It recently became possible to estimate this rate by sequencing parent–offspring trios and counting new mutations directly; however, the resulting estimates are complicated by sequencing error and differ more than twofold from earlier estimates inferred indirectly from the genetic divergence between humans and chimpanzees (8⇓⇓–11). One possible explanation for this discrepancy is a “hominoid slowdown”: a putative mutation rate decrease along the human ancestral lineage that might be related to lengthening generation time (12, 13).

A hominoid slowdown would present a caveat to the standard “molecular clock” assumption, which posits that genetic differences accumulate at a constant rate. This assumption underlies most methods for inferring demographic history from genetic variation data. There is widespread interest in using genetic variation to infer the timing of divergence and gene flow (14⇓–16), and the accuracy of such inference is limited by the accuracy of our knowledge about mutation rates (9).

Trio-based estimates of human and chimp mutation rates have so far both fallen in the range of 1.0−1.25×10−8 mutations per site per generation (10, 11, 17). However, the two species seem to differ in the distribution of de novo mutations between the male and female germ lines and among different mutation types (e.g., a higher proportion of chimp mutations are CpG transitions) (17). These patterns suggest that there has been some degree of mutation rate evolution since the two species diverged.

To my knowledge, previous studies have presented no evidence of mutation rate evolution on a timescale as recent as the human migration out of Africa. Most human trios that have been sequenced are European in origin (SI Appendix, section 9), meaning that there exist few measurements of de novo mutation patterns on diverse genetic backgrounds. However, there is some reason to suspect that mutation rates might have changed owing to recent regional adaptations affecting DNA repair. SNPs that affect gene expression in DNA damage response pathways show evidence of recent diversifying selection, exhibiting geographic frequency gradients that seem to be correlated with environmental UV exposure (7). I sought to test whether mutation rates vary between populations using rare segregating SNPs that arose as new mutations relatively recently (11, 18), examining the 1000 Genomes data for mutation spectrum asymmetries that could be informative about human mutation rate evolution.

Results

Mutation Spectra of Continent-Private Variation.

To test for differences in the spectrum of mutagenesis between populations I compiled sets of population-private variants from the 1000 Genomes phase I panel of 1,092 human genome sequences (18). Excluding singletons and SNPs with imputation quality lower than RSQ = 0.95, which might be misleadingly classified as population-private owing to imputation error, there remain 462,876 private European SNPs (PE) that are variable in Europe but fixed ancestral in all nonadmixed Asian and African populations, as well as 265,988 private Asian SNPs (PAs) that are variable in Asia but fixed ancestral in Africa and Europe. These SNPs should be enriched for young mutations that arose after humans had already left Africa and begun adapting to temperate latitudes. I compared PE and PAs to the set of 3,357,498 private African SNPs (PAf) that are variable in the Yorubans (YRI) and/or Luhya (LWK) but fixed ancestral in Europe and Asia. One notable feature of PE is the percentage of SNPs that are C→T transitions, which is higher (41.01%) than the corresponding percentages in PAs (38.99%) and PAf (38.29%).

Excess C→T transitions are characteristic of several different mutagenic processes including UV damage and cytosine deamination (19). To some extent, these processes can be distinguished by partitioning SNPs into 192 different context-dependent classes, looking at the reference base pairs immediately upstream and downstream of the variable site (20). For each mutation type m=B5′BAB3′→B5′BDB3′ and each private SNP set P, I obtained the count CP(m) of type-m mutations in set P and used a χ2 test to compare CPE(m) and CPAs(m) to CPAf(m).

As shown in Fig. 1A, the strongest candidate for mutation rate change is the transition 5′-TCC-3′→5′-TTC-3′ (hereafter abbreviated as TCC→T). Combined with its reverse strand complement 5′-GGA-3′→5′-GAA-3′, TCC→T has frequency 3.32% in PE compared with 1.98% in PAf and 2.04% in PAs. Several other C→T transitions are also moderately more abundant in PE than PAf, in most cases flanked by either a 5′ T or a 3′ C.

Fig. 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 1.

Overrepresentation of 5′-TCC-3′→5′-TTC-3′ within Europe. (A and B) The x coordinate of each point in gives the fold frequency difference (fPE(m)−fPAf(m))/fPAf(m) [respectively (fPAs(m)−fPAf(m))/fPAf(m)], and the y coordinate gives the Pearson’s χ2 P value of its significance. Outlier points are labeled with the ancestral state of the mutant nucleotide flanked by two neighboring bases, and the color of the point specifies the ancestral and derived alleles of the mutant site. (C and D) The χ2 contingency tables used to compute the respective P values in A and B. (E) The distribution of f(TCC) across bins of 1,000 consecutive population-private SNPs. Only chromosome-wide frequencies are shown for chromosome Y because of its low SNP count.

The TCC→T frequency difference holds genome-wide, evident on every chromosome except for chromosome Y, which has too little population-private variation to yield accurate measurements of context-dependent SNP frequencies (Fig. 1E). The most parsimonious explanation is that Europeans experienced a genetic change increasing the rate of TCC→T mutations. C→T transitions may not be the only mutations that experienced recent rate change; for example, TTA→TAA mutations seem to be less abundant in Europe than in Africa. Several CpG transitions including ACG→ATG seem to have higher frequencies in Asia than in Africa, but these differences disappear when the spectrum is folded to, for instance, identify ACG→ATG with the inverse mutation ATG→ACG (SI Appendix, section 5). This is not surprising given that high CpG mutation rates makes these sites especially susceptible to ancestral state miscalls (21).

If mutation type m occurs at a higher rate in Europe than in Asia, a European haplotype should contain excess type-m derived alleles compared with an Asian haplotype. This prediction is tested in SI Appendix, section 1. The results suggest that many mutation types occur at slightly higher rates in Europe compared with Asia, with C→T transitions, particularly TCC→T, showing the strongest signal of rate differentiation. This asymmetry cannot be explained by a demographic event such as a population bottleneck; however, it should be interpreted with caution because many bioinformatic biases have the potential to confound this test. Prüfer et al. (22) used a similar technique to quantify divergence between archaic and modern genomes and found that branch length differences between sequencing batches often exceeded branch length differences between populations. In addition, because the 1000 Genomes phase I dataset is heavily imputed and contains more European genomes than Asian genomes, rare European variants might be ascertained more completely than rare Asian variants. This could produce a false overall excess of European-derived alleles but seems unlikely to elevate the discovery rate of TCC→T in Europe relative to other mutation types.

Robustness to Sources of Bioinformatic Error.

Fig. 1 and SI Appendix, Figs. S2 and S3 suggest that the human mutation rate is remarkably labile, with significant change having occurred since the relatively recent European/Asian divergence. In this section, I summarize evidence that this conclusion is not founded on bioinformatic artifacts. I focus on confirming the veracity of the TCC→T excess in Europe but do not discount the possibility that other mutation types might have experienced smaller rate changes.

To rule out the possibility that TCC→T excess in Europe is a bioinformatic artifact specific to the 1000 Genomes data, I reproduced Fig. 1 A and B in a set of human genomes sequenced at high coverage using Complete Genomics technology (SI Appendix, section 3) (23). I also folded the context-dependent mutation frequency spectrum to check for effects of ancestral misidentification (SI Appendix, section 5). Finally, I partitioned the 1000 Genomes data into bins based on GC content, sequencing depth, and imputation accuracy, finding that the TCC→T excess in Europe was easily discernible within each bin (SI Appendix, sections 6, 7, and 10). Three other C→T transitions (TCT→TTT, ACC→ATC, and CCC→CTC) are also more abundant in Europe than in Africa across a broad range of GC contents and sequencing depths. In contrast, genomic regions that differ in GC content and/or sequencing depth show little consistency as to which mutation types show the most frequency differentiation between Africa and Asia.

As mentioned previously, singleton variants (minor allele count = 1) were excluded from all analyses. When singletons are included they create spurious between-population differences that are not reproducible with nonsingleton SNPs (SI Appendix, section 8). This is true of both the low-coverage 1000 Genomes dataset and the smaller, higher-coverage Complete Genomics dataset, suggesting that singletons are error-prone even in high-coverage genomes.

A particularly interesting class of singletons are de novo mutation calls in trios. Barring bioinformatic problems, counting these mutations should yield an accurate estimate of the current human mutation rate and spectrum. I compared PE and PAs to the de novo mutation calls from 82 Icelandic trios by rank-ordering mutation types in each dataset from most frequent to least frequent (SI Appendix, section 9) (10). In PE, TCC→TTC and its complement GGA→GAA are the 16th and 17th most common SNP types, respectively, whereas they are only ranked 27th and 32nd in PAs. In the Icelandic trios, TCC→TTC and GGA→GAA are ranked 20th and 18th. By this measure, the Icelandic mutations resemble PE more closely than PAs, as expected for European trios.

Antiquity of the European Mutation Rate Change.

The 1000 Genomes phase I dataset contains samples from five European subpopulations: Italians (TSI), Spanish (IBS), Utah residents of European descent (CEU), British (GBR), and Finnish (FIN). All of these populations have elevated TCC→T frequencies, suggesting that the European mutation rate changed before subpopulations diversified across the continent. To assess this, I let Ptotal denote the combined set of private variants from PE, PAs, and PAf, and for each haplotype h let Ptotal(h) denote the subset of Ptotal whose derived alleles are found on haplotype h. fh(TCC) then denotes the frequency of TCC→T within Ptotal(h). For each 1000 Genomes population P, Fig. 2 shows the distribution of fh(TCC) across all haplotypes h sampled from P, and it can be seen that the distribution of f(TCC) values found in Europe does not overlap with the distributions from Asia and Africa. In contrast, the four admixed populations ASW (African Americans), MXL (Mexicans), PUR (Puerto Ricans), and CLM (Colombians) display broader ranges of f(TCC) with extremes overlapping both the European and non-European distributions. The African American f(TCC) values are only slightly higher on average than the nonadmixed African values, but a few African American individuals have much higher f(TCC) values in the middle of the admixed American range, presumably because they have more European ancestry than the other African Americans who were sampled.

Fig. 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 2.

Variation of f(TCC) within and between populations. This plot shows the distribution of f(TCC) within each 1000 Genomes population (i.e., the proportion of all derived variants from PA, PE, and PAf present in a particular genome that are TCC→T mutations). There is a clear division between the low f(TCC) values of African and Asian genomes and the high f(TCC) values of European genomes. The slightly admixed African Americans and more strongly admixed Latin American populations have intermediate f(TCC) values reflecting partial European ancestry.

Within Europe, Fig. 2 shows a slight f(TCC) gradient running from north to south; the median f(TCC) is lowest in the Finns and highest in the Spanish and Italians. In this way, TCC→TTC frequency seems to correlate negatively with recent Asian coancestry (SI Appendix, section 2).

To roughly estimate the time when the TCC→T rate increased, I downloaded allele age estimates that were generated from the Complete Genomics data using the program ARGweaver (compgen.bscb.cornell.edu/ARGweaver/CG_results/) (24). Based on these estimates, TCC→T rate acceleration seems to have occurred between 25,000 and 60,000 y ago, not long after Europeans diverged from Asians (SI Appendix, section 4). In the 1000 Genomes data, TCC→T frequency differentiation is greatest for private alleles of frequency less than 0.02 (SI Appendix, Fig. S6B).

It is hard to tell from current data whether skin lightening predated TCC→T acceleration in Europe. A 7,000-y-old Early European farmer was found to be homozygous for the skin-lightening SLC24A5 allele (25), suggesting that light skin was relatively prevalent by 7,000 y ago and could have originated much earlier. An attempt to date the origin of this allele yielded a 95% confidence interval of 6,000–38,000 y ago (26), which overlaps with the time interval when the TCC→T rate seems to have accelerated.

Reversal of TCC→T Transcription Strand Bias in Europe.

Transcribed genomic regions are subject to transcription-coupled repair (TCR) of damaged nucleotides that occur on the sense DNA strand. This can lead to patterns of strand bias that contain information about underlying mutational mechanisms. For example, CpG transitions generally result from deamination damage to the cytosine rather than the guanine, and damaged Cs that occur on the transcribed strand are repaired more often than damaged Cs occurring on the nontranscribed strand. As a result, CpG transitions in genic regions are usually oriented with the C→T change on the nontranscribed strand (27).

To assess the strand bias of genic TCC→T mutations and look for strand bias differences between populations I counted the occurrences of each A/C ancestral mutation m from each private SNP set P on transcribed gene strands versus nontranscribed gene strands, denoting these counts T(P,m) and N(P,m), respectively. Strand biases S(P,m)=N(P,m)/T(P,m) were compared between populations using a χ2 test (Fig. 3). Private Asian and African TCC→T SNPs were found to exhibit the strand bias that is typical of A/C→G/T mutations (28), with the C→T change usually affecting the antisense strand and the G→A change usually affecting the sense strand. In contrast, private European TCC→T SNPs exhibit no discernible strand bias; the C→T change affects the sense strand about 50% of the time (Fig. 3E). TCC→T is the only mutation type that exhibits a significant strand bias difference between populations at the level P<0.01.

Fig. 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 3.

Differences in transcriptional strand bias. Each point in A and B represents a mutation type with an A or C ancestral allele. The x coordinate of each point in A is the PAf strand bias minus the PE strand bias; similarly, the x coordinates in B describe the PAf strand bias minus the PAs strand bias. The y coordinate of each point is the χ2 P value of the strand bias difference. The P values in A and B were computed from contingency tables C and D, respectively, using a χ2 test. At the P=0.01 significance level (gray dashed line), only TCC→T has a significant strand bias difference between Europe and Africa, whereas no mutation type significantly differs in strand bias between Asia and Africa. E shows the variance of strand bias in each population across 100 bootstrap replicates. Similarly, F shows the distribution across bootstrap replicates of the ratio between genic f(TCC) and intergenic f(TCC).

If we assume that the TCC→T mutation rate is the same in genic and intergenic regions because TCR is ineffective at preventing TCC→T mutations in Europeans, we should expect the frequency f(TCC) to be slightly higher among private genic SNPs than among private intergenic SNPs. This is because the frequencies of all mutation types sum to 1; mutation types that are efficiently prevented by TCR should have lower frequencies in genic regions than in intergenic regions, and mutation types that are not very susceptible to TCR must have higher genic frequencies to compensate. As predicted by this logic, f(TCC) is higher among private genic European SNPs than among private intergenic European SNPs (Fig. 3F). In contrast, when PAs and PAf are partitioned into genic and intergenic SNPs, the genic SNP sets have lower TCC→T frequencies, suggesting that TCR of this mutation type is relatively efficient in non-Europeans. This TCR differential alone could modestly elevate the European TCC→T mutation rate. However, it is not likely to be the sole cause of the observed TCC→T rate acceleration because this acceleration is evident in both genic and intergenic regions.

Discussion

It is beyond the scope of this paper to pinpoint why the rate of TCC→T increased in Europe. However, some promising clues can be found in the literature on UV-induced mutagenesis. In the mid-1990s, Drobetsky et al. (29) and Marionnet et al. (30) observed that TCC→T dominated the mutational spectra of single genes isolated from UV-irradiated cell cultures. Much more recently, Alexandrov et al. (19) systematically inferred “mutational signatures” from 7,042 different cancers and found that melanoma has a unique mutational signature not present in any other cancer type they studied. Melanoma somatic mutations consist almost entirely of C→T transitions, 28% of which are TCC→T mutations (19, 31). The mutation types CCC→CTC and TCT→TTT, two other candidates for rate acceleration in Europe, are also prominent in the spectrum of melanoma (SI Appendix, section 11). Incidentally, melanoma is not only associated with UV light exposure but also with European ancestry, occurring at very low rates in Africans, African Americans, and even light-skinned Asians (32⇓–34). A study of the California Cancer Registry found that the annual age-adjusted incidence of melanoma cases per 100,000 people was 0.8–0.9 for Asians, 0.7–1.0 for African Americans, and 11.3–17.2 for Caucasians (35). Melanoma incidence in admixed Hispanics is strongly correlated with European ancestry (33⇓–35).

The association of TCC→T mutations with UV exposure is not well understood, but two factors seem to be in play: (i) the propensity of UV to cross-link the TC into a base dimer lesion and (ii) poorer repair efficacy at TCC than at other motifs where UV lesions can form (36, 37). Drobetsky et al. (29) compared the incidence of UV lesions to the incidence of mutations in irradiated cells and found that TCC motifs were not hotspots for lesion formation but instead were disproportionately likely to have lesions develop into mutations rather than undergoing error-free repair.

Despite the strong evidence that UV causes TCC→T mutations, the question remains how UV could affect germ-line cells that are generally shielded from solar radiation. Although the testes contain germ-line tissue that lies close to the skin with minimal shielding, to my knowledge it has not been tested whether UV penetrates this tissue effectively enough to induce spermatic mutations. Another possibility is that UV might indirectly cause germ-line mutations by degrading folate, a DNA synthesis cofactor that is transmitted through the bloodstream and required during cell division (3, 4, 38, 39). Folate deficiency is known to cause DNA damage including uracil misincorporation and double-strand breaks, leading in some cases to birth defects and reduced male fertility (40⇓–42). It is therefore possible that folate depletion could cause some of the mutations observed in UV-irradiated cells, and that these same mutations might appear in the germ line of a light-skinned individual rendered folate-deficient by sun exposure. It has also been hypothesized that, in a variety of species, differences in metabolic rate can drive latitudinal gradients in the rate of molecular evolution (43⇓–45).

Although the data presented here do not reveal a clear mechanism, they leave little doubt that the European population experienced a recent mutation rate increase. TCC→T and a few other C→T transitions exhibit the clearest evidence of European rate acceleration, but other mutation types might have experienced smaller rate changes within Europeans or other human populations. Pinpointing finer-scale mutation rate changes will be an important avenue for future work.

Even if the overall European mutation rate increase was small, it adds to a growing body of evidence that molecular clock assumptions break down on a faster timescale than generally assumed during population genetic analysis. It was once assumed that the human lineage’s mutation rate had changed little since we shared a common ancestor with chimpanzees, but this assumption is losing credibility owing to the conflict between direct mutation rate estimates and molecular-clock-based estimates (8, 9). Although this conflict might have arisen from a gradual decrease in the rate of germ-line mitoses per year as our ancestors evolved longer generation times (12, 13), the results of this paper indicate that another force may have come into play: change in the mutation rate per mitosis. If the mutagenic spectrum was able to change during the last 60,000 y of human history, it might have changed numerous times during great ape evolution and beforehand. Given such a general challenge to the molecular clock assumption, it may be wise to infer demographic history from mutations such as CpG transitions that accumulate in a more clocklike way than other mutations (8, 20). At the same time, less clocklike mutations may provide valuable insights into the changing biology of genome integrity.

Materials and Methods

Publicly available VCF files containing the 1000 Genomes phase I data were downloaded from www.1000genomes.org/data. Ancestral states were inferred using the UCSC alignment of the chimp PanTro4 to the human reference genome hg19. These data were then subsampled to obtain four sets of SNPs: PE (derived allele private to Europe), PAs (derived allele private to Asia), PAf (derived allele private to Africa), and PAsE (fixed ancestral in Africa but variable in both Asia and Europe).

Construction of Private SNP Sets PE, PAs, PAf, and PAsE.

The definitions of PE, PAs, and PAf differ slightly from the definitions of continent-private SNPs in the paper announcing the release of the 1000 Genomes phase I data (18). In that paper, an SNP is considered private to Africa if it is variable in at least one of the populations LWK (Luhya from Kenya), YRI (Yoruba from Nigeria), and ASW (African Americans from the southwestern United States). In contrast, I consider an SNP to be private to Africa if it is variable in either LWK or YRI and is not variable in any of the following samples: CHB (Chinese from Beijing), CHS (Chinese from Shanghai), JPT (Japanese from Tokyo), CEU (Individuals of Central European descent from Utah), GBR (Great Britain), IBS (Spanish from the Iberian Peninsula), TSI (Italians from Tuscany), and FIN (Finnish). A private African SNP might or might not be variable in any of the admixed samples ASW, MXL (Mexicans from Los Angeles), CLM (Colombians from Medellin), and PUR (Puerto Ricans). Similarly, a private European SNP in PE is variable in one or more of the CEU, GBR, IBS, TSI, and FIN; is not variable in any of YRI, LWK, CHB, CHS, or JPT; and might or might not be variable in ASW, MXL, CLM, and PUR. The private Asian SNPs in PAs are variable in one or more of CHB, CHS, or JPT; are not variable in any of YRI, LWK, CEU, GBR, IBS, TSI, and FIN; and might or might not be variable in ASW, MXL, CLM, and PUR. These definitions are meant to select for mutations that have been confined to a single continent for most of their history except for possible recent transmission to the Americas. The shared European–Asian SNPs in PAsE are variable in one or more of CHB, CHS, or JPT plus one or more of CEU, GBR, IBS, TSI, and FIN and are not variable in YRI or LWK. Singletons are excluded to minimize the impact of possible sequencing error, and variants with imputation quality lower than RSQ=0.95 are excluded to minimize erroneous designation of shared SNPs as population-private.

Statistical Analysis of Frequency Differences.

Given two SNP sets P1 and P2 and one SNP type m, a Pearson’s χ2 test was used to measure the significance of the difference between the frequency of m in P1 and the frequency of m in P2.

Let CPi(m) denote the number of type-m SNPs in set Pi, and let T(P)=∑m∈MCP(m) denote the total number of SNPs in P. The expected values of CP1(m) and CP2(m), assuming no frequency differences between P1 and P2, are calculated as follows based on the 4×4 contingency tables in Fig. 1 C and D:E(CPi(m))=T(Pi)(CP1(m)+CP2(m))T(P1)+T(P2).

The following χ2 value with one degree of freedom measures the significance of the difference between fm(P1) and fm(P2):χ2=∑i=12(CPi(m)−E(CPi(m)))2E(CPi(m))+(T(Pi)−CPi(m)−E(T(Pi)−CPi(m)))2E(T(Pi)−CPi(m)).

Nonparametric Bootstrapping Within Chromosomes.

To assess the variance of f(TCC) within each of the autosomes and the X chromosome, each private SNP set PE, PAs, and PAf was partitioned into nonoverlapping bins of 1,000 consecutive SNPs. The frequency f(TCC) of the mutation TCC→T was computed for each bin and the distribution of these estimates is shown in Fig. 1C. No partitioning into separate bins was performed for chromosome Y because the entire chromosome has only 1,130 private European SNPs, 1,857 private Asian SNPs, and 3,852 private African SNPs. Instead, the global frequency of TCC→T was computed for each SNP set restricted to the Y chromosome.

Quantifying Strand Bias.

Gene locations and transcription directions for hg19 were downloaded from the UCSC Genome browser. For the purpose of this analysis, each SNP located between the start codon and stop codon of an annotated gene is considered to be a genic SNP. All SNPs not located within introns or exons are considered to be intergenic SNPs.

Within each private SNP set P, each mutation type m with an A or C ancestral allele was counted separately on transcribed and nontranscribed genic strands to obtain counts T(P,m) and N(P,m). (Each mutation with a G/T ancestral allele on the transcribed strand is equivalent to a complementary A/C ancestral mutation on the nontranscribed strand.) The strand bias of mutation m is then defined to be S(P,m)=N(P,m)/T(P,m). The significances of the strand bias differences S(PAf,m)−S(PE,m) and S(PAf,m)−S(PAs,m) were assessed using a χ2 test with one degree of freedom. Assuming no difference in strand bias between P1 and P2, the expected numbers of transcribed-strand and nontranscribed-strand mutations are the following:E(T(Pi,m))=(T(P1,m)+T(P2,m))(T(Pi,m)+N(Pi,m))T(P1,m)+T(P2,m)+N(P1,m)+N(P2,m)E(N(Pi,m))=(N(P1,m)+N(P2,m))(T(Pi,m)+N(Pi,m))T(P1,m)+T(P2,m)+N(P1,m)+N(P2,m).

The χ2 value measuring the significance of the difference between N(P1,m)/T(P1,m) and N(P2,m)/T(P2,m) is computed as follows:χ2=∑i=12(T(Pi,m)−E(T(Pi,m)))2E(T(Pi,m))+(N(Pi,m)−E(N(Pi,m)))2E(N(Pi,m)).

Nonparametric bootstrapping was used to estimate the variance of TCC→T strand bias within each population. The transcribed portion of the genome was partitioned into 100 bins containing approximately equal numbers of SNPs, and 100 replicates were generated each by sampling 100 bins with replacement. For each replicate, the frequency of TCC→T was calculated on the transcribed and nontranscribed strands. These two frequencies were added together to obtain the cumulative TCC→T frequency within genic regions. The distribution of S(TCC→T) across replicates is shown for each population in Fig. 3E.

Bootstrapping was similarly applied to intergenic SNPs by partitioning the nongenic portion of the genome into 100 bins with similar SNP counts. One hundred bootstrap replicates were generated by sampling 100 bins with replacement, and the intergenic TCC→T frequency was computed for each replicate.

The distribution of ratios in Fig. 3F was generated by pairing up each genic bootstrap replicate with a unique intergenic bootstrap replicate and calculating the ratio of genic f(TCC) to intergenic f(TCC), thereby obtaining 100 estimates of the ratio fgenic(TCC)/fintergenic(TCC).

Acknowledgments

I thank Rasmus Nielsen for advice and comments and two anonymous reviewers for providing suggestions that improved earlier drafts. David Reich shared valuable insights into gene flow between early European farmers and hunter gatherers, and Richard Durbin, Stuart Linn, and Elad Ziv contributed additional helpful comments and suggestions. This work was supported by a National Science Foundation Graduate Research Fellowship (to K.H.) and NIH Grant IR01GM109454-01 (to Rasmus Nielsen, Yun Song, and Steve Evans).

Footnotes

  • ↵1Email: harris.kelley{at}gmail.com.
  • Author contributions: K.H. designed research, performed research, analyzed data, and wrote the paper.

  • The author declares no conflict of interest.

  • This article is a PNAS Direct Submission. M.S. is a guest editor invited by the Editorial Board.

  • This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1418652112/-/DCSupplemental.

References

  1. ↵
    1. Cann RL,
    2. Stoneking M,
    3. Wilson AC
    (1987) Mitochondrial DNA and human evolution. Nature 325(6099):31–36
    .
    OpenUrlCrossRefPubMed
  2. ↵
    1. Loomis WF
    (1967) Skin-pigment regulation of vitamin-D biosynthesis in man. Science 157(3788):501–506
    .
    OpenUrlAbstract/FREE Full Text
  3. ↵
    1. Jablonski NG,
    2. Chaplin G
    (2000) The evolution of human skin coloration. J Hum Evol 39(1):57–106
    .
    OpenUrlCrossRefPubMed
  4. ↵
    1. Jablonski NG,
    2. Chaplin G
    (2010) Colloquium paper: Human skin pigmentation as an adaptation to UV radiation. Proc Natl Acad Sci USA 107(Suppl 2):8962–8968
    .
    OpenUrlAbstract/FREE Full Text
  5. ↵
    1. Pritchard JK,
    2. Pickrell JK,
    3. Coop G
    (2010) The genetics of human adaptation: Hard sweeps, soft sweeps, and polygenic adaptation. Curr Biol 20(4):R208–R215
    .
    OpenUrlCrossRefPubMed
  6. ↵
    1. Lachance J,
    2. Tishkoff SA
    (2013) Population genomics of human adaptation. Annu Rev Ecol Evol Syst 44:123–143
    .
    OpenUrlCrossRefPubMed
  7. ↵
    1. Fraser HB
    (2013) Gene expression drives local adaptation in humans. Genome Res 23(7):1089–1096
    .
    OpenUrlAbstract/FREE Full Text
  8. ↵
    1. Ségurel L,
    2. Wyman M,
    3. Przeworski M
    (2014) Determinants of mutation rate variation in the human germline. Annu Rev Genomics Hum Genet 15:19.1–19.24
    .
    OpenUrl
  9. ↵
    1. Scally A,
    2. Durbin R
    (2012) Revising the human mutation rate: Implications for understanding human evolution. Nat Rev Genet 13(10):745–753
    .
    OpenUrlCrossRefPubMed
  10. ↵
    1. Kong A, et al.
    (2012) Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488(7412):471–475
    .
    OpenUrlCrossRefPubMed
  11. ↵
    1. 1000 Genomes Project
    (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073
    .
    OpenUrlCrossRefPubMed
  12. ↵
    1. Goodman M
    (1961) The role of immunochemical differences in the phyletic development of human behavior. Hum Biol 33:131–162
    .
    OpenUrlPubMed
  13. ↵
    1. Li WH,
    2. Tanimura M
    (1987) The molecular clock runs more slowly in man than in apes and monkeys. Nature 326(6108):93–96
    .
    OpenUrlCrossRefPubMed
  14. ↵
    1. Gutenkunst RN,
    2. Hernandez RD,
    3. Williamson SH,
    4. Bustamante CD
    (2009) Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet 5(10):e1000695
    .
    OpenUrlCrossRefPubMed
  15. ↵
    1. Li H,
    2. Durbin R
    (2011) Inference of human population history from individual whole-genome sequences. Nature 475(7357):493–496
    .
    OpenUrlCrossRefPubMed
  16. ↵
    1. Harris K,
    2. Nielsen R
    (2013) Inferring demographic history from a spectrum of shared haplotype lengths. PLoS Genet 9(6):e1003521
    .
    OpenUrlCrossRefPubMed
  17. ↵
    1. Venn O, et al.
    (2014) Nonhuman genetics. Strong male bias drives germline mutation in chimpanzees. Science 344(6189):1272–1275
    .
    OpenUrlAbstract/FREE Full Text
  18. ↵
    1. 1000 Genomes Project
    (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65
    .
    OpenUrlCrossRefPubMed
  19. ↵
    1. Alexandrov LB, et al., Australian Pancreatic Cancer Genome Initiative, ICGC Breast Cancer Consortium, ICGC MMML-Seq Consortium, ICGC PedBrain
    (2013) Signatures of mutational processes in human cancer. Nature 500(7463):415–421
    .
    OpenUrlCrossRefPubMed
  20. ↵
    1. Hwang DG,
    2. Green P
    (2004) Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc Natl Acad Sci USA 101(39):13994–14001
    .
    OpenUrlAbstract/FREE Full Text
  21. ↵
    1. Hernandez RD,
    2. Williamson SH,
    3. Bustamante CD
    (2007) Context dependence, ancestral misidentification, and spurious signatures of natural selection. Mol Biol Evol 24(8):1792–1800
    .
    OpenUrlAbstract/FREE Full Text
  22. ↵
    1. Prüfer K, et al.
    (2014) The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505(7481):43–49
    .
    OpenUrlCrossRefPubMed
  23. ↵
    1. Drmanac R, et al.
    (2010) Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327(5961):78–81
    .
    OpenUrlAbstract/FREE Full Text
  24. ↵
    1. Rasmussen MD,
    2. Hubisz MJ,
    3. Gronau I,
    4. Siepel A
    (2014) Genome-wide inference of ancestral recombination graphs. PLoS Genet 10(5):e1004342
    .
    OpenUrlCrossRefPubMed
  25. ↵
    1. Lazaridis I, et al.
    (2014) Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513(7518):409–413
    .
    OpenUrlCrossRefPubMed
  26. ↵
    1. Beleza S, et al.
    (2013) The timing of pigmentation lightening in Europeans. Mol Biol Evol 30(1):24–35
    .
    OpenUrlAbstract/FREE Full Text
  27. ↵
    1. Skandalis A,
    2. Ford BN,
    3. Glickman BW
    (1994) Strand bias in mutation involving 5-methylcytosine deamination in the human hprt gene. Mutat Res 314(1):21–26
    .
    OpenUrlCrossRefPubMed
  28. ↵
    Green P, Ewing B, Miller W, Thomas P, NISC Comparative Sequencing Program, Green E (2003) Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet 33:514–517
    .
  29. ↵
    1. Drobetsky EA,
    2. Sage E
    (1993) UV-induced G:C—>A:T transitions at the APRT locus of Chinese hamster ovary cells cluster at frequently damaged 5′-TCC-3′ sequences.→Mutat Res 289(2):131–138
    .
  30. ↵
    1. Marionnet C,
    2. Benoit A,
    3. Benhamou S,
    4. Sarasin A,
    5. Stary A
    (1995) Characteristics of UV-induced mutation spectra in human XP-D/ERCC2 gene-mutated xeroderma pigmentosum and trichothiodystrophy cells. J Mol Biol 252(5):550–562
    .
    OpenUrlCrossRefPubMed
  31. ↵
    1. Pleasance ED, et al.
    (2010) A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463(7278):191–196
    .
    OpenUrlCrossRefPubMed
  32. ↵
    1. Crombie IK
    (1979) Racial differences in melanoma incidence. Br J Cancer 40(2):185–193
    .
    OpenUrlCrossRefPubMed
  33. ↵
    1. Hu D,
    2. Yu G,
    3. McCormick S,
    4. Schneider S,
    5. Finger P
    (2005) Population-based incidence of uveal melanoma in various races and ethnic groups. Am J Ophthalmology 140:612.e1–612.e6
    .
    OpenUrl
  34. ↵
    1. Bakos L, et al.
    (2009) European ancestry and cutaneous melanoma in Southern Brazil. J Eur Acad Dermatol Venereol 23(3):304–307
    .
    OpenUrlCrossRefPubMed
  35. ↵
    1. Cress RD,
    2. Holly EA
    (1997) Incidence of cutaneous melanoma among non-Hispanic whites, Hispanics, Asians, and blacks: An analysis of california cancer registry data, 1988-93. Cancer Causes Control 8(2):246–252
    .
    OpenUrlCrossRefPubMed
  36. ↵
    1. Brash DE,
    2. Seetharam S,
    3. Kraemer KH,
    4. Seidman MM,
    5. Bredberg A
    (1987) Photoproduct frequency is not the major determinant of UV base substitution hot spots or cold spots in human cells. Proc Natl Acad Sci USA 84(11):3782–3786
    .
    OpenUrlAbstract/FREE Full Text
  37. ↵
    1. Drobetsky EA,
    2. Grosovsky AJ,
    3. Glickman BW
    (1987) The specificity of UV-induced mutations at an endogenous locus in mammalian cells. Proc Natl Acad Sci USA 84(24):9103–9107
    .
    OpenUrlAbstract/FREE Full Text
  38. ↵
    1. Branda RF,
    2. Eaton JW
    (1978) Skin color and nutrient photolysis: An evolutionary hypothesis. Science 201(4356):625–626
    .
    OpenUrlAbstract/FREE Full Text
  39. ↵
    1. Off MK, et al.
    (2005) Ultraviolet photodegradation of folic acid. J Photochem Photobiol B 80(1):47–55
    .
    OpenUrlCrossRefPubMed
  40. ↵
    1. Blount BC, et al.
    (1997) Folate deficiency causes uracil misincorporation into human DNA and chromosome breakage: Implications for cancer and neuronal damage. Proc Natl Acad Sci USA 94(7):3290–3295
    .
    OpenUrlAbstract/FREE Full Text
  41. ↵
    1. Wallock LM, et al.
    (2001) Low seminal plasma folate concentrations are associated with low sperm density and count in male smokers and nonsmokers. Fertil Steril 75(2):252–259
    .
    OpenUrlCrossRefPubMed
  42. ↵
    1. Stover PJ
    (2009) One-carbon metabolism-genome interactions in folate-associated pathologies. J Nutr 139(12):2402–2405
    .
    OpenUrlAbstract/FREE Full Text
  43. ↵
    1. Gillooly JF,
    2. Allen AP,
    3. West GB,
    4. Brown JH
    (2005) The rate of DNA evolution: Effects of body size and temperature on the molecular clock. Proc Natl Acad Sci USA 102(1):140–145
    .
    OpenUrlAbstract/FREE Full Text
  44. ↵
    1. Allen AP,
    2. Gillooly JF,
    3. Savage VM,
    4. Brown JH
    (2006) Kinetic effects of temperature on rates of genetic divergence and speciation. Proc Natl Acad Sci USA 103(24):9130–9135
    .
    OpenUrlAbstract/FREE Full Text
  45. ↵
    1. Wright S,
    2. Keeling J,
    3. Gillman L
    (2006) The road from Santa Rosalia: A faster tempo of evolution in tropical climates. Proc Natl Acad Sci USA 103(20):7718–7722
    .
    OpenUrlAbstract/FREE Full Text
View Abstract
PreviousNext
Back to top
Article Alerts
Email Article

Thank you for your interest in spreading the word on PNAS.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Evidence for recent, population-specific evolution of the human mutation rate
(Your Name) has sent you a message from PNAS
(Your Name) thought you would like to see the PNAS web site.
Citation Tools
Evidence for recent human mutation rate evolution
Kelley Harris
Proceedings of the National Academy of Sciences Mar 2015, 112 (11) 3439-3444; DOI: 10.1073/pnas.1418652112

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Request Permissions
Share
Evidence for recent human mutation rate evolution
Kelley Harris
Proceedings of the National Academy of Sciences Mar 2015, 112 (11) 3439-3444; DOI: 10.1073/pnas.1418652112
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Mendeley logo Mendeley
Proceedings of the National Academy of Sciences: 112 (11)
Table of Contents

Submit

Sign up for Article Alerts

Article Classifications

  • Biological Sciences
  • Evolution
  • Social Sciences
  • Anthropology

Jump to section

  • Article
    • Abstract
    • Results
    • Discussion
    • Materials and Methods
    • Acknowledgments
    • Footnotes
    • References
  • Figures & SI
  • Info & Metrics
  • PDF

You May Also be Interested in

Efforts are underway to exploit a strategy that could generate fusion with relative ease. Image credit: Princeton Plasma Physics Laboratory.
News Feature: Small-scale fusion tackles energy, space applications
Efforts are underway to exploit a strategy that could generate fusion with relative ease.
Image credit: Princeton Plasma Physics Laboratory.
A deep-learning algorithm could potentially improve diagnosis and classification of neurological abnormalities. Image courtesy of Weicheng Kuo, Christian Hӓne, Pratik Mukherjee, Jitendra Malik, and Esther Lim Yuh
Brain hemorrhage detection by artificial neural network
A deep-learning algorithm could potentially improve diagnosis and classification of neurological abnormalities.
Image courtesy of Weicheng Kuo, Christian Hӓne, Pratik Mukherjee, Jitendra Malik, and Esther L. Yuh.
A study finds a shift in onset of El Niño events from eastern to western Pacific and increased frequency of extreme El Niño events since the late 1970s. Image courtesy of NOAA National Environmental Satellite, Data, and Information Service (NESDIS).
Changing El Niño properties
A study finds a shift in onset of El Niño events from eastern to western Pacific and increased frequency of extreme El Niño events since the late 1970s.
Image courtesy of NOAA National Environmental Satellite, Data, and Information Service (NESDIS).
A study explores how various types of food affect both human health and the environment. Image courtesy of Pixabay/esigie.
Environmental and health impacts of food
A study explores how various types of food affect both human health and the environment.
Image courtesy of Pixabay/esigie.
Profile of NAS member and molecular biologist Mary Lou Guerinot. Image courtesy of Olga Zhaxybayeva (Dartmouth College, Hanover, NH).
Featured Profile
Profile of NAS member and molecular biologist Mary Lou Guerinot
Image courtesy of Olga Zhaxybayeva (Dartmouth College, Hanover, NH).

Similar Articles

Site Logo
Powered by HighWire
  • Submit Manuscript
  • Twitter
  • Facebook
  • RSS Feeds
  • Email Alerts

Articles

  • Current Issue
  • Latest Articles
  • Archive

PNAS Portals

  • Classics
  • Front Matter
  • Teaching Resources
  • Anthropology
  • Chemistry
  • Physics
  • Sustainability Science

Information

  • Authors
  • Editorial Board
  • Reviewers
  • Press
  • Site Map
  • PNAS Updates

Feedback    Privacy/Legal

Copyright © 2020 National Academy of Sciences. Online ISSN 1091-6490