Maternal age effect and severe germ-line bottleneck in the inheritance of human mitochondrial DNA
- Departments of aBiochemistry and Molecular Biology,
- bBiology, and
- gStatistics,
- cForensic Science Program, Pennsylvania State University, University Park, PA 16802;
- dSchool of Science and Technology, Nottingham Trent University, Nottingham NG1 4BU, United Kingdom;
- eDepartment of Integrative Biology, University of California, Berkeley, CA 94720;
- fCentre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark; and
- hDepartment of Pediatrics, College of Medicine, Pennsylvania State University, Hershey, PA 17033
See allHide authors and affiliations
Edited by Michael Lynch, Indiana University, Bloomington, IN, and approved September 8, 2014 (received for review May 20, 2014)

Significance
The frequency of intraindividual mitochondrial DNA (mtDNA) polymorphisms—heteroplasmies—can change dramatically from mother to child owing to the mitochondrial bottleneck at oogenesis. For deleterious heteroplasmies such a change may transform alleles that are benign at low frequency in a mother into disease-causing alleles when at a high frequency in her child. Our study estimates the mtDNA germ-line bottleneck to be small (30–35) and documents a positive association between the number of child heteroplasmies and maternal age at fertilization, enabling prediction of transmission of disease-causing variants and informing mtDNA evolution.
Abstract
The manifestation of mitochondrial DNA (mtDNA) diseases depends on the frequency of heteroplasmy (the presence of several alleles in an individual), yet its transmission across generations cannot be readily predicted owing to a lack of data on the size of the mtDNA bottleneck during oogenesis. For deleterious heteroplasmies, a severe bottleneck may abruptly transform a benign (low) frequency in a mother into a disease-causing (high) frequency in her child. Here we present a high-resolution study of heteroplasmy transmission conducted on blood and buccal mtDNA of 39 healthy mother–child pairs of European ancestry (a total of 156 samples, each sequenced at ∼20,000× per site). On average, each individual carried one heteroplasmy, and one in eight individuals carried a disease-associated heteroplasmy, with minor allele frequency ≥1%. We observed frequent drastic heteroplasmy frequency shifts between generations and estimated the effective size of the germ-line mtDNA bottleneck at only ∼30–35 (interquartile range from 9 to 141). Accounting for heteroplasmies, we estimated the mtDNA germ-line mutation rate at 1.3 × 10−8 (interquartile range from 4.2 × 10−9 to 4.1 × 10−8) mutations per site per year, an order of magnitude higher than for nuclear DNA. Notably, we found a positive association between the number of heteroplasmies in a child and maternal age at fertilization, likely attributable to oocyte aging. This study also took advantage of droplet digital PCR (ddPCR) to validate heteroplasmies and confirm a de novo mutation. Our results can be used to predict the transmission of disease-causing mtDNA variants and illuminate evolutionary dynamics of the mitochondrial genome.
The centerpiece of cellular metabolic machinery—the mitochondrion—harbors a 16.5-kb genome, mitochondrial DNA (mtDNA). Mutations in mtDNA cause over 200 diseases and contribute to diabetes, cancer, male infertility, Parkinson’s and Alzheimer’s diseases (1). In mammals, mtDNA mutates at high rates and is maternally inherited, making it a popular marker in evolutionary genetics (2). Despite its importance, mtDNA has drifted away from the spotlight eclipsed by nuclear DNA studies (3), and there are still gaps in our understanding of the basic aspects of human mtDNA biology. The lack of cures for diseases caused by mtDNA mutations makes it critical to understand how these mutations arise and are transmitted between generations.
Heteroplasmy, the presence of more than one mtDNA variant in a cell or a tissue, is the result of a de novo mtDNA mutation occurring in an individual or inherited through the maternal lineage. Currently there is no consensus about how prevalent mtDNA heteroplasmy is in human populations (4, 5). Such knowledge is crucial for assessing the load of mtDNA pathogenic mutations, formulating prognoses for patients with mtDNA diseases, and preimplantation diagnostics after mtDNA replacement in oocytes. Most mtDNA diseases are heteroplasmic and their phenotype depends on the allele frequency of the pathogenic variant (1).
Heteroplasmy levels can change dramatically between generations owing to genetic drift during the germ-line bottleneck—a reduction in the number of mtDNA segregating units during oogenesis (6⇓–8). The size of the bottleneck for mice has been evaluated to be 185 (9), yet for humans this size is difficult to obtain experimentally. Published estimates of the human bottleneck size are too broad [1–200 (10, 11)] to be useful in predicting the transmission of disease variants. Genetic drift theory predicts that a small bottleneck size will result in drastic shifts in heteroplasmy levels from a mother to her child, potentially reaching nondisease levels or levels with higher disease severity. After fertilization, mtDNA variants are distributed among cells owing to mitotic segregation—the random partitioning of mitochondria during cell divisions (12). We also lack an accurate estimate of the germ-line mtDNA mutation rate in humans, with pedigree and phylogenetic studies producing conflicting results (13, 14).
To conduct a population study of heteroplasmy transmission, we analyzed full-length mtDNA in 39 mother–child pairs using the MiSeq platform. Accounting for PCR and sequencing errors, we were able to accurately score heteroplasmies with allele frequency above 1%. With these data, we addressed (i) how common heteroplasmy is in a human population, (ii) how heteroplasmy frequency changes between tissues of the same individual and between generations, and (iii) whether maternal age at conception influences heteroplasmy occurrence in a child. We also estimated the size of the germ-line mtDNA bottleneck and the germ-line mutation rate via population genetics modeling of heteroplasmies. Focused on heteroplasmy inheritance in healthy individuals, our data serve as a valuable baseline to study disease associations for mtDNA and provide important insights into mtDNA evolution.
Results
Samples, mtDNA Enrichment, and Sequencing.
We studied the prevalence of mtDNA heteroplasmy in blood and buccal cells from 39 mother–child pairs residing in central Pennsylvania, analyzing 156 samples (39 mothers × 2 tissues + 39 children × 2 tissues) grouped in sets of four (two tissues from a mother and two tissues from her child). Total genomic DNA was isolated from each sample. Haplogroup analysis conducted via Sanger sequencing of the D-loop indicated European ancestry for all families (Dataset S1, Table S1). For each sample, we amplified mtDNA from total DNA in two overlapping 9-kb fragments and sequenced them with paired-end 250-bp reads on a MiSeq instrument (Materials and Methods). This enriched for mtDNA and minimized the presence of numts (Fig. S1 and SI Materials and Methods), the majority of which are short (15). Multiplexing 12 samples per run resulted in ∼106 read pairs per sample. We confirmed the efficacy of our approach for mtDNA enrichment by applying it to Rho0 cells not harboring mitochondria (Fig. S2). To minimize potential contamination among samples, we followed previously devised guidelines (16) and used pUC18 and PhiX174 as spike-ins (Materials and Methods).
Heteroplasmy Discovery.
The sequencing read pairs were mapped to human mtDNA and nuclear genomes. On average, 85% of the reads per sample mapped to mtDNA (97% for samples without a spike-in; Fig. S3). We then applied our heteroplasmy discovery pipeline (Materials and Methods and Fig. S4). We required both reads of the pair to map uniquely, and in a proper orientation, to the reference mtDNA (Fig. S5). To compute minor allele frequency (MAF) at each site for each sample, we used bases with sequencing quality ≥30 from reads with mapping quality ≥20 (other thresholds led to almost identical results; Fig. S6). The mean sequencing depth per sample (averaged across sites) was 19,789× ± 770× (mean ± SEM). Tabulating depth on a per-site basis, 90% of sites in the mitochondrial genome were sequenced at ≥7,858× per sample (Fig. S7). The proportion of spike-in reads aligning to their respective references was as anticipated, suggesting absence of contamination among adjacent samples (Fig. S3).
In the search for heteroplasmies, we first identified sites with MAF ≥1% in individual samples. The sequencing depth per site required to detect true heteroplasmies with MAF ≥1% over the base quality error (0.1% for Phred score 30) with 99% power is 839× per site (one-sided power calculation for one-sample proportion test). Conservatively, we rounded up the depth requirement to 1,000×. A detection limit of MAF ≥1% allows detection of inherited and de novo variants that pass through the bottleneck if its size is <100. Mutations with lower frequency are accounted for with population genetics modeling (discussed below). After filtering for potential sequencing artifacts (Dataset S1, Table S2 and Materials and Methods), we identified 174 point heteroplasmies distributed among 100 quartets—groups of site-specific heteroplasmy frequencies from two tissues of a mother and two tissues from her child (Dataset S1, Table S3). These heteroplasmies were found in 31 families (eight families had no heteroplasmies).
Statistical Validation of Point Heteroplasmies.
To validate heteroplasmies, we used a novel statistical method that identifies heteroplasmic sites via a likelihood function accounting for instrument sequencing and mapping errors (SI Materials and Methods). With this method, all 174 point heteroplasmies were significant (P < 0.0003 for each site; Dataset S1, Table S4). Additionally, the allele counts for all 174 heteroplasmies tested were significant (P < 0.0003 for 172 sites, and P < 0.03 for the remaining two sites; Table S4) based on the variability observed for the same position among all samples (17).
Experimental Validation of Point Heteroplasmies.
We used Sanger sequencing to test all point heteroplasmies with MiSeq MAF ≥10% (Sanger method detection limit, Fig. S8A and Dataset S1, Table S5) in at least one sample per family and the corresponding sites from the other samples from the same family (we always sequenced newly amplified fragments). In total, we examined 21 sites × 4 samples = 84 sites, 44 of which had MiSeq MAF ≥10% (Dataset S1, Table S6). The presence of heteroplasmy was successfully validated in all these 44 cases. Thus, our false-positive rate for detecting heteroplasmies with MAF ≥10% is below 0.023 (1/44). The MAFs from the MiSeq and Sanger methods were well correlated (R2 = 75%; Fig. S9A).
A set of point heteroplasmies with MiSeq MAF <10% was analyzed with droplet digital PCR (ddPCR) (18), which can detect heteroplasmies with MAF >0.2% (Fig. S8 B and C and Dataset S1, Table S7). Here we analyzed point heteroplasmies with MiSeq MAF between 1% and 10% in at least one sample per family and the corresponding sites from the other samples of the same family, a total of 10 sites × 4 samples = 40 sites, 18 of which had MiSeq MAF ≥1% (Fig. S9B and Dataset S1, Table S8). When we assayed the original amplicons used for MiSeq sequencing, the presence of heteroplasmy was confirmed in all these 18 instances. However, when we reamplified mtDNA from these 18 samples, in two instances (site 11,616 in M203C5-ch and site 11,825 in M210-bl) ddPCR did not confirm the presence of heteroplasmy. Repeating amplification and ddPCR for a third time again did not detect heteroplasmy (Dataset S1, Table S8), suggesting PCR errors in the amplicons sequenced with MiSeq. Thus, our false-positive rate for detecting heteroplasmies with MAF between 1% and 10% is 0.11 (2/18). Overall, the MAFs from the MiSeq and ddPCR methods were well correlated for the sequenced and newly amplified amplicons (R2 = 95% and 79%, respectively; Fig. S9B).
Distribution of Point Heteroplasmies.
After removing two sites that failed to validate with ddPCR (discussed above), we retained 172 point heteroplasmies in 98 quartets (Dataset S1, Table S3). We assumed that these 98 point mutations arose independently in the families analyzed (or in their maternal ancestors). Point heteroplasmies were found at 87 unique mtDNA positions (Fig. S10). Six positions (185, 189, 214, 215, 16,093, and 16,183) were heteroplasmic in multiple families (four, three, three, two, three, and two families, respectively; Dataset S1, Table S3), likely owing to high mutation rate at the D-loop (13). Each mother on average carried 1.13 ± 0.04 heteroplasmies in her blood. This value was similar for maternal buccal tissue and for buccal and blood tissues of children (Fig. S11). Among the 98 point heteroplasmies 96 were transversions, resulting in a transition-to-transversion ratio of 48 (Fig. S10 and Dataset S1, Table S3).
There were significantly more and significantly fewer heteroplasmies in the D-loop and protein-coding regions, respectively, than expected based on their length and assuming equal propensity to harbor a heteroplasmy along mtDNA (Table 1). A high mutation rate for the D-loop had been documented previously (13). The nonsynonymous-to-synonymous rate ratio (dN/dS) at (concatenated) protein-coding genes was significantly lower than 1 (P = 5 × 10−3; Fisher’s exact test; Dataset S1, Table S9), suggesting purifying selection (19). Most nonsynonymous mutations were predicted to affect protein function (Dataset S1, Table S10).
The distribution of point heteroplasmies among mtDNA regions
Disease-Associated Mutations and Mutation Burden.
Eight families harbored eight point heteroplasmies (one per family) that can cause disease when present at high allele frequencies (Table 2). Among 39 mothers, 5 (or 1 in 8) were carriers of disease-associated mtDNA mutations in at least one of the two tissues analyzed. Mutations at four of the eight sites are associated with disease when homoplasmic for the mutant allele (20⇓⇓–23); however, in our data these were heteroplasmic (Table 2). For the other four of the eight sites above, disease can develop even when mutant alleles are heteroplasmic—with disease severity depending on the allele frequency. For A1555G, G13708A, and G3242A mutations, the allele frequencies were much lower than disease-associated frequencies (Table 2) (24⇓–26), suggesting lack of symptoms. Mutations at tRNA-Leu sites 3,242 and 3,243 contribute to several mitochondrial diseases (25, 27); notably, allele frequencies observed at site 3243 in the child of family M512 (Table 2) were comparable to those observed in mitochondrial encephalomyopathy, lactic acidosis, and stroke-like episodes patients (27).
Disease-causing heteroplasmies
Transmission of Heteroplasmies.
Considering heteroplasmy-containing quartets (Dataset S1, Table S3), we used the presence of heteroplasmy with MAF ≥1% in at least one sample from a family as a “prior” to support the existence of heteroplasmy at the same position for other samples of the same family if their MAF was ≥0.2% (greater than or equal to twice the value of the allowed sequencing quality error of 0.1%—Phred score 30). We classified 98 quartets into five categories (Table 3 and Dataset S1, Table S3) based on whether heteroplasmies were present in (i) both tissues of a mother and both tissues of her child (category “all,” which included dramatic shifts in allele frequency from mother to child, suggesting the germ-line bottleneck); (ii) both tissues of a mother, but absent from both tissues of her child (category “mother,” suggesting loss of a variant in the child owing to the germ-line bottleneck); (iii) both tissues of a child, but absent from both tissues of a mother (category “child” with candidate germ-line de novo mutations); (iv) both tissues of a mother and one tissue of a child, or in one tissue of a mother and both tissues of a child (category “somatic loss,” suggestive of a change in MAF in tissues owing to mitotic segregation) (12); and (v) one tissue of one individual of a family (category “somatic gain” with candidate somatic de novo mutations).
Categories of quartets with examples
Site 4191 in family M500 seemed to harbor a de novo mutation in the child. ddPCR confirmed complete absence of the mutant allele in both maternal tissues but presence in both tissues of the child (with MAF of 4.7% and 5.6% in buccal and blood tissues, respectively; Table 3 and Dataset S1, Table S8). Examination of hair from the same individuals indicated homoplasmy in the mother and MAF of 1.2% in the child (Dataset S1, Table S11), confirming emergence of a novel allele.
The changes in allele frequencies between tissues of an individual, or between two generations, tabulated for our 98 quartets (Dataset S1, Table S3) followed an approximately normal distribution with mean zero (Fig. S12), corroborating the action of genetic drift as the major force affecting heteroplasmy allele frequencies (28). A decrease in allele frequency for a variant from mother to child will be indicative of purifying selection (29). When we plotted the relative change in allele frequency between mothers and children (Fig. S13), such a decrease was significant for nonsynonymous sites (P = 9.54 × 10−7, one-tailed nonparametric sign test), suggestive of purifying selection. Consistent with selection operating against transmission of nonsynonymous mutations, we observed a significantly lower proportion of these mutations among transmitted heteroplasmies (5 out of 43, or 12%, in “all” and “somatic loss” categories) compared with untransmitted heteroplasmies (8 out of 22, or 36%, in “mother” category; P = 0.025, Fisher’s exact test).
Comparing MAFs Between Tissues and Generations.
Compared with mtDNA in maternal tissues, mtDNA in child tissues underwent fewer mitotic segregations and replications and was exposed to mutagens for a shorter time. Therefore, we expect heteroplasmy allele frequency at a site to diverge less in the tissues of a child than in those of a mother. Indeed, the allele frequencies for the sites tabulated as quartets (Dataset S1, Table S3) were more strongly correlated between the two tissues for children (R2 = 92%, Fig. 1A) than between the two tissues for mothers (R2 = 49%, Fig. 1B). Stronger correlation for allele frequencies was observed between two tissues of a mother or of a child (discussed above) than between a mother and a child for the same tissue (R2 = 13% for buccal, Fig. 1C; R2 = 29% for blood, Fig. 1D), likely owing to the stronger action of the mtDNA germ-line bottleneck relative to mitotic segregation.
Correlation in heteroplasmy allele frequencies between (A) the two tissues of children, (B) the two maternal tissues, (C) buccal tissues of mothers and children, and (D) blood of mothers and children.
Maternal Age Effect.
We explored the relationship between age and the total number of point heteroplasmies for each individual. No association was found for children. For mothers we found a significant positive association (P = 0.039, 0.049, and 0.055 for combined, buccal, and blood heteroplasmies, respectively, Poisson regression; Fig. 2 and Fig. S14). Thus, older mothers accumulate more mutations in their somatic tissues, with the number of point heteroplasmies tripling over 30 y of life. Intriguingly, a positive association exists between the number of heteroplasmies in children and maternal age at fertilization (P = 0.010, 0.005, and 0.006, for combined, buccal, and blood heteroplasmies, respectively; Fig. 2 and Fig. S14). This suggests that older mothers accumulate more mutations in their germ-line tissues. In our dataset, there was a correlation between maternal age at fertilization and maternal age at sampling (R2 = 43%, P = 0.002, linear regression), and we found that older mothers, who also had children later, likely transferred a larger number of accumulated mutations to their children (Fig. 2); whereas mothers who conceived under the age of 20 transmitted zero to one heteroplasmies, this number was two to three for mothers conceiving in their late 30s.
Maternal age effect. The dependence of the total number of heteroplasmies (blue) in mothers on their age at collection and (red) in children on maternal age at fertilization. Poisson generalized linear model fitted curves are indicated.
Estimating the Size of the Germ-Line mtDNA Bottleneck and Mutation Rate.
Because most heteroplasmy allele frequency changes between the two generations are consistent with genetic drift (Fig. S12), we can estimate the effective size of the germ-line bottleneck, that is, the size of the bottleneck in a traditional population model required to explain the observed amount of genetic drift [the actual number of mtDNA molecules passing through the bottleneck might be different, because they might segregate in units (8)]. Following the method developed by Millar, Hendy, and coworkers (30, 31), we assume that a child samples mutant mtDNA alleles at a given site from a binomial distribution with parameters p, the maternal MAF in the germ line (estimated here from somatic tissues), and N, the germ-line bottleneck size (SI Materials and Methods). Then the variance of the child’s heteroplasmy frequency at conception, or genetic variance, is σ2gen = p(1 − p)/N. Solving for N, we obtain N = p(1 − p)/σ2gen. We estimate the genetic variance as σ2gen = σ2raw − 4σ2measure, where σ2raw is the squared difference between the maternal and the child MAF at the site and σ2measure is the uncertainty in measuring heteroplasmy frequency (includes sampling, PCR, and sequencing errors), which we estimated from sequencing amplified D-loop–containing clones (SI Materials and Methods; σ2measure was multiplied by 4 because we are taking four measurements). This procedure produced an estimate of N in a quartet. To minimize false positives, we applied this approach to quartets where heteroplasmy was present in both maternal tissues (51 quartets in which at least one tissue in the mother had heteroplasmy with MAF ≥1% and the other tissue had MAF ≥0.2%; Dataset S1, Table S3), and thus likely was present in the maternal germ line. The median estimated N across these 51 quartets, when MAFs were averaged between the two maternal tissues and (separately) between the two child tissues, was 32.3 [interquartile range (IQR) 10.5–103.3; Fig. S15]. Similar estimates of bottleneck size were obtained when only blood (median N = 33.5, IQR 14.1–79.6) or only buccal tissues were used (median N = 29.8, IQR 9.5–68.1), and when quartets with nonsynonymous mutations were excluded (median N = 31.9, IQR 8.8–99.2). Accounting for the variance owing to mitotic segregation (SI Materials and Methods) led to median N = 35.0 (IQR 10.0–141.4; Fig. S15). Also, assuming that the single germ-line mutation we observe (at site 4191 in family M500, Table 3) originated in a single mtDNA segregating unit in the maternal germ line, and that its MAF in the child’s zygote was 3.8% (averaged across three tissues), we can estimate N as 1/0.038 = 26.3.
Next, we estimated the mtDNA germ-line mutation rate μ as in Millar, Hendy, and coworkers (30, 31). Assuming that new mutations enter the germ line at rate α, and that they are neutral and have equal probability to be transmitted to the next generation, only 1/N of them will go to fixation, leading to μ = α/N. A heteroplasmy can only be observed when its MAF is above a detection threshold θ. Analytically, it was shown (30, 31) that most heteroplasmies are lost without reaching θ, that most heteroplasmies reaching θ do not go to fixation, and that the rate of observed heteroplasmies can be approximated as μ0 = 2αln(1/θ − 1). Solving for α, one obtains α = μ0/(2 ln(1/θ − 1)). Thus, μ = μ0/(2N ln(1/θ − 1)). Setting θ = 0.01 (our detection threshold) results in μ = 0.109 μ0/N. Having observed 51 germ-line point heteroplasmies among 39 mothers, we estimated μ0 as 51/(39 × 16,569 bp) = 7.9 × 10−5 heteroplasmies per transmission per site. Using N = 32.3, we thus estimated the mutation rate μ as 2.7 × 10−7 mutations per site per generation (IQR 8.3 × 10−8 to 8.2 × 10−7), or, assuming a generation time of 20 y, 1.3 × 10−8 mutations per site per year (IQR 4.2 × 10−9 to 4.1 × 10−8). The mutation rate estimate excluding nonsynonymous sites was 4.4 × 10−7 mutations per site per generation (IQR 1.4 × 10−7 to 1.6 × 10−6), or 2.2 × 10−8 mutations per site per year (IQR 7.0 × 10−9 to 7.9 × 10−8). That for the D-loop was 1.5 × 10−6 mutations per site per generation (IQR 4.8 × 10−7 to 4.7 × 10−6), or 7.7 × 10−8 mutations per site per year (IQR 2.4 × 10−8 to 2.4 × 10−7).
Indel Heteroplasmies.
Using the same thresholds (MAF ≥1% and depth >1,000×), we identified 120 instances of small indels affecting 10 unique mtDNA sites and 28 families (Dataset S1, Table S12 and SI Materials and Methods). All indels occurred in repeats—eight in homopolymer runs, one in a 9-bp tandem repeat, and one in a CA repeat. The latter indel was validated with ddPCR (Dataset S1, Table S13). The MAFs of indels in our samples were above microsatellite sequencing errors for our long-range PCR protocol (SI Materials and Methods). Further experimental validation will allow us to determine indel MAFs more accurately.
Discussion
Prevalence of Heteroplasmy in Humans.
mtDNA heteroplasmy has strong associations with neurodegenerative diseases, aging, and tumorigenesis (1). Our results indicate that an individual carries on average one heteroplasmic variant with allele frequency ≥1% pointing to the ubiquitous occurrence of heteroplasmy (4) and are in remarkable agreement with a recent analysis of the 1,000 Genomes Project data (19) as well as several smaller-scale studies (5, 11, 32⇓–34) (Dataset S1, Table S14).
Maternal Age Effect.
A positive association between an individual’s age and the number of heteroplasmies in postmitotic somatic tissues had already been demonstrated (e.g., refs. 35 and 36). Here, we found evidence for it in the dividing tissues as well. Kennedy et al. (35) found that the frequency of point mutations in brain increases fivefold during 80 y of life. In our data, the number of heteroplasmies in the maternal buccal and blood tissues triples over 30 y. Likewise, with high transition-to-transversion ratio in our data, we do not find transversion-causing oxidative damage (37) to be the major driver of such mutation accumulation.
The positive association we found between maternal age at conception and the number of heteroplasmies in her child has important medical implications. The frequencies of large mitochondrial deletions (38) and the T414G mutation (39) were shown to increase in oocytes as a function of age—consistent with altered mitochondrial cytochemistry and a mutagenic environment with increased glycation and carbonyl stress in aging oocytes (40). However, the number of oocytes with defective mitochondria is significantly reduced during oogenesis (41). Our results suggest that, despite this process, some oocytes with suboptimal mitochondria (e.g., with negatively selected amino acid changes), which are more likely to occur in older women, do proceed to fertilization. This predicts an increase in mtDNA diseases in children born to older mothers—a prediction not evaluated to date—and could be one of the reasons for a lower success rate of assisted reproduction in older women (42).
Germ-Line Bottleneck Size.
Our results support a severe germ-line bottleneck—with effective size of only 30–35—and are particularly striking given ∼100,000 mtDNAs in mature human oocytes (12). Our findings corroborate strong shifts in heteroplasmy frequency observed in Holstein cows (43, 44) but are more robust because they are based on more accurate estimation of MAF changes at many sites, in two tissues, and for a large number of transmissions from multiple families. Moreover, results obtained for other species are not directly applicable to humans, especially for the purposes of genetic counseling. Most previous human studies analyzed one or two sites, some of which were disease-associated, and usually a small number of transmissions (Dataset S1, Table S15). Our estimate is comparable to those in some earlier studies [e.g., 36–180 (45)], higher than in some other studies [e.g., 1–5 (10)], but substantially lower than the recently proposed estimate of 200 (11).
The number 30–35 is obtained by taking medians over effective bottleneck sizes estimated for individual transmission sites, which show a broad variation (Fig. S15). Some of this variation is random, because only one transmission was examined for each site. Selection acting at some sites might have contributed to this variation as well; however, the median bottleneck size remained very similar when nonsynonymous sites were removed. Another contributor to the observed variation in bottleneck size might be the variation among women (46, 47). Future studies examining multiple offspring per mother will allow one to evaluate the differential contribution of these factors to the variability in the bottleneck size in more detail.
Germ-Line Mutation Rate.
The germ-line mutation rate estimated here for mtDNA is an order of magnitude higher than that for the human nuclear genome [1.2 × 10−8 mutations per site per generation (48)], in agreement with previous studies (1). It is similar to estimates obtained in phylogenetic studies (e.g., refs. 13 and 14) and an order of magnitude lower than estimates in most pedigree studies (13, 49⇓–51) (Dataset S1, Table S16). In part this is due to the fact that analyzing two tissues allowed us to identify germ-line (and discard somatic) heteroplasmies (51). However, we also had to perform strict filtering of candidate heteroplasmic sites to minimize sequencing artifacts when estimating the bottleneck size—which may have led to the removal of some real heteroplasmies. Our mtDNA mutation rate estimate should therefore be seen as a “lower bound” (our bottleneck size estimate is not affected by this potential limitation). In agreement with this, our estimate is only two- and fourfold higher than estimates from mutation accumulation cell lines for Caenorhabditis elegans and Drosophila melanogaster mtDNA (9.7 × 10−8 and 6.2 × 10−8 mutations per site per generation, respectively) (52, 53); human mutation rates were shown to be approximately fivefold the rates in these species (54, 55).
Disease-Causing Mutations.
The high prevalence of mtDNA disease-associated mutations found here—with one carrier in eight individuals—is similar to that reported from the 1,000 Genomes Project data (19) and has important practical implications. Indeed, as we demonstrated, the severe germ-line bottleneck can lead to drastic changes in allele frequencies between generations, potentially affecting the manifestation of 200 diseases caused by mtDNA mutations. In one instance we found a disease-associated mutation present at high allele frequencies in child tissues. Genetic background (both mtDNA and nuclear), known to significantly modulate mtDNA disease manifestation (56), may be preventing symptoms in this individual.
Materials and Methods
Sample Collection, DNA Isolation, and Sequencing.
DNA from buccal and blood cells (collected under IRB 30432EP) was isolated as described (32). To determine mtDNA haplogroup, mtDNA was amplified and sequenced using the Sanger method (Dataset S1, Table S17). Before MiSeq sequening, mtDNA was amplified in two ∼9-kb amplicons (16) that were mixed at an equimolar ratio and spiked with 5% (wt/wt) of pUC18 or PhiX174 DNA, or with no spike-in. Sequencing libraries were prepared according to the customized Nextera XT protocol (57).
Experimental Validation of Heteroplasmic Sites.
The primers used for heteroplasmy validation with Sanger sequencing are listed in Dataset S1, Table S17. For ddPCR, we followed the manufacturer’s protocol. TaqMan probes are listed in Dataset S1, Table S18. All experiments were performed in duplicates. To assess the detection limit for ddPCR and Sanger sequencing, we examined artificially mixed variant alleles at predetermined frequencies (SI Materials and Methods).
Preprocessing of Next-Generation Sequencing Data.
Parameters and versions of all tools are listed in Dataset S1, Table S19. The sequencing read pairs were mapped to chrM and hg19 (Fig. S4). For the pair to be retained, we required both reads to (i) map to chrM, (ii) map properly in a pair, (iii) have read length ≥100 bp, and (iv) not form a chimeric alignment.
Identification of Point Heteroplasmic Sites.
The tools Naive Variant Caller and Variant Annotator implemented in Galaxy (16) were used to extract the counts of each nucleotide per position in each strand. We selected sites with MAF ≥1% and depth ≥1,000× . We discarded sites with MAF <1% on one strand or with strand bias >1 (58), low complexity regions as annotated in ref. 32, sites at positions 3106–3107, and sites with the proportion of reads supporting an alternative base within the first and last 25 bp >85%.
Acknowledgments
We are grateful to Jessica Beiler, MPH, for coordinating sample collection, to clinical nurses from Penn State College of Medicine Pediatric Clinical Research Office, to Lily Borhan for collecting the samples, and to volunteers for donating the samples. Bonnie Higgins isolated DNA from hair for family M500. Michael DeGiorgio made useful comments on the earlier drafts of the manuscript and Prabhani Kuruppummulage Don provided statistical advice. This work was funded by Battelle Memorial Institute, the Huck Institutes of Life Sciences and Eberly College of Sciences at Pennsylvania State University, and Penn State Clinical and Translational Science Institute. Additional funding was provided, in part, under a grant from the Pennsylvania Department of Health using Tobacco Settlement Funds. The department specifically disclaims responsibility for any analyses, interpretations, or conclusions.
Footnotes
↵1B.R.-J. and M.S.-W.S. contributed equally to this work.
- ↵2To whom correspondence may be addressed. Email: anton{at}bx.psu.edu or kdm16{at}psu.edu.
Author contributions: B.R.-J., M.S.-W.S., F.C., R.N., M.M.H., I.M.P., A.N., and K.D.M. designed research; B.R.-J., M.S.-W.S., N.S., J.A.M., and B.D. performed research; D.B., T.S.K., and R.N. contributed new reagents/analytic tools; I.M.P. organized sample collection; B.R.-J., M.S.-W.S., N.S., and K.D.M. analyzed data; and B.R.-J., M.S.-W.S., N.S., F.C., M.M.H., A.N., and K.D.M. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: The sequences reported in this paper have been deposited in the Sequence Read Archive, www.ncbi.nlm.nih.gov/sra (accession no. SRP047378).
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1409328111/-/DCSupplemental.
Freely available online through the PNAS open access option.
References
- ↵.
- Wallace DC,
- Chalkia D
- ↵
- ↵.
- Pesole G, et al.
- ↵.
- Payne BA, et al.
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Henn BM,
- Gignoux CR,
- Feldman MW,
- Mountain JL
- ↵
- ↵
- ↵
- ↵
- ↵.
- Ye K,
- Lu J,
- Ma F,
- Keinan A,
- Gu Z
- ↵
- ↵
- ↵
- ↵
- ↵.
- del Castillo FJ, et al.
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Hendy MD,
- Woodhams MD,
- Dodd A
- ↵
- ↵
- ↵.
- Avital G, et al.
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Barritt JA,
- Brenner CA,
- Cohen J,
- Matt DW
- ↵
- ↵
- ↵.
- Ashley MV,
- Laipis PJ,
- Hauswirth WW
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Santos C, et al.
- ↵
- ↵.
- Denver DR,
- Moris K,
- Lynch M,
- Vassilieva LL,
- Thomas WK
- ↵
- ↵
- ↵.
- Sung W,
- Ackerman MS,
- Miller SF,
- Doak TG,
- Lynch M
- ↵.
- Kenney MC, et al.
- ↵
- ↵
Citation Manager Formats
Article Classifications
- Biological Sciences
- Evolution