New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology
Evolution of the rapidly mutating human salivary agglutinin gene (DMBT1) and population subsistence strategy
Edited by Huntington F. Willard, The Marine Biological Laboratory, Woods Hole, MA, and approved March 4, 2015 (received for review August 27, 2014)

Significance
Humans have undergone an evolutionary very recent change in environment of their own making. The development of agriculture profoundly altered diet and exposure to pathogens, and yet the evolutionary response to this is still poorly understood. Here, we characterize extensive copy number variation (CNV) of the gene encoding salivary agglutinin (deleted in malignant brain tumors 1, DMBT1). Salivary agglutinin comprises 10% of salivary protein and binds bacteria, including mediating the attachment of the causative agent of dental caries, Streptococcus mutans, to teeth. We show that DMBT1 is a very fast-mutating protein-coding locus, and DMBT1 CNV correlates with a population history of agriculture. Furthermore, we examine the relationship between variation of the S. mutans region that binds salivary agglutinin and CNV of the DMBT1 gene.
Abstract
The dietary change resulting from the domestication of plant and animal species and development of agriculture at different locations across the world was one of the most significant changes in human evolution. An increase in dietary carbohydrates caused an increase in dental caries following the development of agriculture, mediated by the cariogenic oral bacterium Streptococcus mutans. Salivary agglutinin [SAG, encoded by the deleted in malignant brain tumors 1 (DMBT1) gene] is an innate immune receptor glycoprotein that binds a variety of bacteria and viruses, and mediates attachment of S. mutans to hydroxyapatite on the surface of the tooth. In this study we show that multiallelic copy number variation (CNV) within DMBT1 is extensive across all populations and is predicted to result in between 7–20 scavenger–receptor cysteine-rich (SRCR) domains within each SAG molecule. Direct observation of de novo mutation in multigeneration families suggests these CNVs have a very high mutation rate for a protein-coding locus, with a mutation rate of up to 5% per gamete. Given that the SRCR domains bind S. mutans and hydroxyapatite in the tooth, we investigated the association of sequence diversity at the SAG-binding gene of S. mutans, and DMBT1 CNV. Furthermore, we show that DMBT1 CNV is also associated with a history of agriculture across global populations, suggesting that dietary change as a result of agriculture has shaped the pattern of CNV at DMBT1, and that the DMBT1-S. mutans interaction is a promising model of host-pathogen-culture coevolution in humans.
The effect of the agricultural transition on human genome variation has been extensive (1). In addition to the indirect effect of an exponential increase in population size, direct effects on particular genes have occurred, most notably the evolution, at multiple locations through multiple alleles, of lactase persistence at the LCT gene, enabling adults to drink milk generated from domesticated mammals (2). The agricultural transition is also thought to have had an impact on the oral commensal microbiota, in particular Streptococcus mutans, the causative agent of dental caries which is the most common chronic infectious disease in humans. Analysis of ancient skeletal remains (3) and modern genomic diversity (4) have suggested that S. mutans became a major oral pathogen only after the development of agriculture and the concomitant increase in availability of sugars consumed directly or derived from starchy foods. The increased level of caries in individuals from agricultural societies is observed in both modern and prehistoric populations (5⇓–7). This increase in caries was likely to have profound consequences to the health of the individuals concerned before the development of modern dental treatment (8). Caries left untreated leads to tooth loss, potential severe infections, and a decrease in masticatory efficiency potentially leading to a reduction in access of enzymes to the food bolus (9). It is unclear whether human genetic variation has responded to this change in dental health via natural selection.
We analyzed the variation of the deleted in malignant brain tumors 1 (DMBT1) gene encoding a major salivary glycoprotein salivary agglutinin, also known as gp-340, hensin or muclin, and hereafter referred to DMBT1SAG (10). This protein comprises ∼10% of total salivary protein in children and 5% in adults (11), and is also present at other mucosal surfaces (12). DMBT1SAG is a component of innate immunity, acting as a pattern recognition receptor interacting with bacteria such as S. mutans and Helicobacter pylori and viruses such as HIV-1 and influenza (12). Variation between host saliva affects the adhesion of S. mutans (13), and protein variants of DMBT1SAG have been suggested to affect caries susceptibility in children (14).
Copy number variation (CNV) describes a difference in DNA dosage between different individuals, and includes simple deletion and duplications as well as more complex multicopy and multiallelic variation (15). CNV can affect gene expression by altering the total number of copies of individual genes and therefore gene dosage, by changing tissue-specific enhancers or by varying the number of exons within a gene, potentially altering the number of protein-coding subunits, for example (16). CNV can also show a germ-line mutation rate at least an order of magnitude higher than single nucleotide substitutions, because of the distinct mutational processes that underlie copy number change (16). Genome wide, CNVs are enriched for genes that encode proteins that interact with the environment, particularly those in host defense (17), and a high mutation rate of these loci may contribute to immunological individuality of the host. Whether selection or a relaxation of functional constraint is responsible for this bias in genome-wide distribution remains unresolved, although there are strong arguments for the role of gene duplication in evolution (18). There is convincing evidence that CNV in humans can affect the host’s susceptibility to infectious diseases, including the well established effect of α-globin deletion on malaria susceptibility (19). Furthermore, it has been suggested that the frequency of high copy number alleles of the salivary amylase gene AMY1 has increased by natural selection in populations that eat a carbohydrate-rich diet (20).
DMBT1SAG mostly consists of an array of scavenger receptor cysteine-rich (SRCR) domains which bind bacteria, including S. mutans (21) and promote their adherence to hydroxyapatite of the tooth (22, 23), which is critical for the cariogenic activity of the bacteria. The canonical DMBT1 gene annotated in the hg19 human genome assembly has 13 repeats each containing a SRCR domain (Fig. 1A). The repeats containing the SRCR domain, hereafter known as SRCR repeats, within the DMBT1 gene are distinct at the DNA level but share ∼80% identity at the protein level. Within the SRCR domain, smaller regions that bind to S. mutans and hydroxyapatite have been identified (Fig. 1D), although bacterial binding is inhibited by sialidases, showing glycosyl groups are also important in bacterial binding (24). Two polymorphic deletions within the DMBT1 gene, involving variable numbers of SRCR repeats, has been partially described previously but the nature and extent of CNV within this gene remained incompletely characterized. For example, a polymorphic deletion involving SRCR3–SRCR6 was associated with Crohn’s disease. Furthermore, a polymorphic deletion involving at least repeats SRCR9–SRCR11 has been described (25, 26). Genome-wide arrayCGH (aCGH) analysis has identified two CNVs consistent with the known polymorphic deletions, but showing extensive complex loss and gain of copy number (17) (Fig. 1B).
Analysis of DMBT1 structure and CNV. (A) Dotplot of the DMBT1 gene (exons and introns) aligned against itself. Lines indicate high similarity and emphasize the repeated nature of the structure of the gene. Individual SRCR domains are indicated and numbered. Note that the canonical DMBT1 gene sequence has one fewer SRCR domain that that predicted by the genome assembly, and the extra SRCR domain is labeled 9’. (B) Exon–intron structure of the DMBT1 gene with CNV signals. Three DMBT1 gene annotations derived from different transcripts are shown. CNV signals from the Database of Genome Variants (dgv.tcag.ca/dgv/app/home) are shown below, with red indicating loss of signal compared with a reference genome, blue gain of signal, and brown both loss and gain of signal observed in different samples. Note that these annotations are often larger than the actual CNV, because they can represent large insert clones that detect a CNV, but with the CNV boundaries unknown. CNV1 and CNV2 are annotated, with the reference genomic sequence showing one copy of the CNV1 region and four copies of the CNV2 region. Figure is based on UCSC Genome Browser screenshot hg19 assembly. (C) Comparison of copy number calling methods for CNV1 (Left) and CNV2 (Right). Each point on the scatterplot represents an individual sample, with different symbols reflecting the final copy number call. The x axes individuate the copy number value estimated from paralog ratio tests (PRTs), and the y axes indicate the first principal component of probe intensity data for probes spanning the CNV in array comparative genomic hybridization (data from ref. 17). (D) Sequence relationship of SRCR repeats. A maximum-likelihood tree shows the relationship of the SRCR repeats (between 3 and 4 kb) carrying the SRCR-coding-domain exon. Scale bar indicates 0.1 substitutions per site. Amino acid sequences of the SRCR domains corresponding to the S. mutans and hydroxyapatite-binding regions are arranged alongside the tree, ordered according to the order of SRCR domains on the tree. Note that the divergent SRCR14 domain does not bind bacteria (54), is not coded by a repeated DNA region (A), and is located at the C-terminal end of the DMBT1SAG protein.
We aimed to fully characterize the CNV involving DMBT1, investigate its mutation rate, question whether it has adapted to different environments across human populations, and dissect the variation in the context of its interaction with S. mutans.
Results
Characterization of Copy Number Variation at DMBT1.
Genome-wide analysis using high-resolution tiling arrayCGH (aCGH) identified two CNVs whose location was consistent with the known polymorphic deletions described in the literature, but showing a much more extensive and complex loss and gain of copy number (17) (Fig. 1B). We used these two CNV regions as a starting point for our analysis, by interrogating the two CNVs (CNV1 involving SRCR3–SRCR6 and CNV2 involving SRCR9–SRCR11, Fig. 1B) separately. To this end, we designed several paralog ratio tests (PRTs), a form of quantitative PCR that is particularly robust in accurately calling CNVs (27). These PRTs were used to estimate copy number at each CNV in 270 individuals from HapMap phase 1. To verify our PRTs, we used concordance between PRT assays, clustering of PRT copy number estimates into distinct groups reflecting integer copy number and comparison with aCGH probe intensities (Fig. 1C and Fig. S1). By typing samples in duplicate, we estimated an upper 95% confidence limit for the error rate in determining CNV1 and CNV2 copy number to be 0.37% (366 samples, no discrepancies) and 0.33% (407 samples, no discrepancies), respectively. In addition, we further validated a subset of samples using long-PCR and fiber-FISH (Figs. S2 and S3). We show that CNV1 is a multiallelic CNV with a copy number varying between 0 and 5 per diploid genome, and the copy number variable unit includes four SRCR domains (Fig. S1). For CNV1, zero, one, and two or more copies reflect homozygous deletion, heterozygous deletion, and normal genotype of the deletion described previously (26). Sanger sequencing of homozygous deleted samples suggests that nonallelic homologous recombination (NAHR) between the 98% identical SRCR repeats carrying SRCR2 and SRCR6 is responsible for CNV1 (Fig. 1D and SI Methods). It is also clear that CNV2 is considerably more complex than the small deletion described previously, being a multiallelic CNV ranging between 1 and 11 copies per diploid genome with each repeat unit carrying a single SRCR domain.
Analysis of further samples from the CEPH-Human Genome Diversity Project (HGDP) panel of 971 individuals from 52 populations worldwide (28) showed rare individuals with a CNV2 copy number of zero. Sanger sequencing of PCR products from these individuals showed that all of the zero-copy CNV2 alleles had a breakpoint within 33 bp of sequence identical between SRCR8 and SRCR11 (Fig. S4), just upstream of the exon encoding the SRCR domain, suggesting that this allele was generated by NAHR between these repeats (SI Methods). This finding suggests that other larger CNV2 alleles have also been generated by NAHR between any of the repeats carrying SRCR domains 8–11.
DMBT1 Copy Number Variation Has a High Mutation Rate.
The extensive allelic diversity and repetitive genomic structure of DMBT1, together with the knowledge that NAHR is likely to have mediated generation of new alleles, led us to consider whether CNV1 and CNV2 had a high mutation rate. To study this directly, we used our validated PRT assays to call copy number of DMBT1 CNV1 and CNV2 on 522 samples from 40 large multigenerational families from the Centre d’Etude de Polymorphisme Humain (CEPH) collection. We robustly identified de novo copy number mutations at both loci (Fig. S5 and Datasets S1 and S2). The mutation rate at CNV1 is estimated to be 1.4% per gamete (9 out of 632 meioses, 95% CI 0.7–2.7%) and the mutation rate at CNV2 is 3.3% per gamete (21 out of 632 meioses, 95% CI 2.1–5.0%). These mutation rate estimates place both loci among the most highly mutating loci known, with comparable rates seen only for noncoding minisatellites (29) or for the mitochondrial D-loop (30). Error rates for CNV1 and CNV2 of 0.37% and 0.33% respectively are below the lower 95% CI bound of both mutation rates, showing that these high mutation rates are not due to errors in copy number calling. Furthermore, examination of the copy number calls of the individuals showing de novo mutations indicates a high posterior probability of that copy number call (Fig. S5). All mutations were of a loss or gain of one CNV repeat unit, with no evidence of a bias toward loss or gain. Analysis of our data showed that two CNV1 mutational events and one CNV2 mutational event were associated with a crossover involving flanking marker exchange at the correct position in that individual. This observation suggests that although NAHR events involving homologous chromosomes do contribute to CNV mutation rate, most events are likely to be inter- or intrachromatidal.
Global Diversity of DMBT1 and Agriculture.
To examine global diversity of DMBT1, we determined diploid copy number of CNV1 and CNV2 on the CEPH-HGDP panel (Dataset S2). We observed a similar range for CNV1 (0-5 copies per diploid genome) and for CNV2 (0–11 copies per diploid genome) as in the HapMap samples (Fig. S5). Although there was no linkage disequilibrium between copy number alleles at CNV1 and CNV2 (r2 = 0 for CEU parents, r2 = 0.01 for YRI parents, r2 = 0.01 for all HGDP, Fig. 2A), there was a clear negative relationship between average copy number at CNV1 and CNV2 at the population level (r2 = 0.11, Fig. 2B) and at the continental level (r2 = 0.43). Because, across populations, the increase in CNV2 is mirrored in part by a decrease in CNV1, the total predicted number of the SRCR domains for the two copies of DMBT1 on homologous chromosomes does not mirror this trend. Nevertheless the total number of SRCR domains per diploid DMBT1 is highly variable, and the number of SRCR domains in a given DMBT1SAG molecule is predicted to range between 7 and 20, at least (Fig. S5G).
Distribution of DMBT1 copy number values in the CEPH-HGDP panel. (A) Across individuals. Each point represents the mean unrounded PRT copy number of an individual, with the histogram on each axis representing the distribution of CNV1 copy numbers (x axis) and CNV2 copy numbers (y axis). (B) Across populations. The means of CNV1 and CNV2 in each population are plotted, colored according to continent of origin. The red dashed lines represent the value above which 99.5% of mean copy numbers of simulated populations fall. (C) Average CNV1 and CNV2 copy number in agricultural and nonagricultural populations. Populations are colored according to region (legend in B), thick line indicates median value and thin lines are 25th and 75th centiles, and P values from logistic regression, with distance from Africa as a covariate (Table 1). Agricultural population definition is according to ref. 33.
Given the fact that the SRCR domain is known to bind S. mutans, we considered that the development of agriculture and consequent increase carbohydrate-rich foods, oral S. mutans and dental caries might be a selective pressure influencing the frequency distributions of CNV1 and CNV2. To test this, we correlated CNV1 and CNV2 mean copy number for each HGDP population with an index of extent of agricultural practice for each population, as published (31). We used two statistical approaches. First, a regression analysis corrected for population effects by using distance from East Africa as a covariate. Secondly, a partial Mantel analysis corrected using a population pairwise geographical distance matrix, a population pairwise genetic distance matrix, or distance from East Africa as covariates (32). We found a negative relationship between agricultural populations and the mean CNV1 copy number and a positive relationship between agricultural populations and CNV2 (Table 1). By resampling from an empirical distribution of partial Mantel r correlations with agriculture, we estimated the genome-wide significance of this observation to be P = 0.0467, using the geographic distance matrix as a covariate, and P = 0.0410 using the genetic distance matrix as a covariate. We also tested the association between mean copy number of a population and a subsistence strategy based on carbohydrate-rich foods, such as cereals, roots and tubers, as defined previously (Fig. 2C and ref. 33). We found an association for CNV1 and a weak association for CNV2, both in the expected directions (Table 1). This finding suggests that the subsistence history of a population has affected the frequency distribution of both CNVs within DMBT1.
DMBT1 CNV and population subsistence strategy
To investigate this association further, we called DMBT1 CNV1 and CNV2 copy number genotype using sequence read depth analysis on three published ancient DNA samples (refs. 34⇓–36, Table S1, and Fig. S5 E and F). Both Denisova and Neanderthal hominins show a high CNV1 copy number of 3 and a low CNV2 copy number of 3, within the range of hunter–gatherer human populations. Analysis of an 8,000-y-old hunter–gatherer from Loschbour in Luxembourg provides a more recent directly ancestral calibration point. He had a CNV1 genotype of 1, which is common in modern Europeans, and a CNV2 genotype of 4, which is less common but still present in modern Europeans. This observation tentatively suggests that a reduction in CNV1 copy number had occurred or was occurring, but an increase in CNV2 was yet to occur; however, further samples are required before any firm conclusions can be drawn, as the observed genotype is consistent with a copy number distribution that is unchanged from modern Europeans.
To provide further support for natural selection in shaping frequencies of CNV1 and CNV2 copy number, we used forward simulations to model CNV1 and CNV2 in situations of population expansion but no selection. Using a mutation rate derived from our pedigree analysis as well as initial allele frequency distributions based on our largest hunter-gatherer population sample (Biaka) we simulated 1,000 populations at CNV1 and CNV2 using a stepwise mutation model and a realistic demographic model (SI Methods). The resulting distribution of mean copy number for both CNVs provides an empirical test of the departure of our observed populations from this neutral stepwise mutation model. We interpreted a departure from the model as evidence of natural selection occurring on the locus on ancestors of individuals in that population. Using an estimate of mutation rate of the lower 95% confidence limit for both CNV1 and CNV2, 99.5% of simulated populations had a CNV1 mean copy number above 2.4 and 99.5% of populations had a CNV2 mean copy number above 7.4 (Fig. 2B). This observation shows that CNV2 copy numbers are lower than expected given a neutral stepwise mutation model for all populations, probably reflecting selective constraints on DMBT1SAG protein length. For CNV1, however, six populations show mean copy number consistent with a neutral stepwise mutation model. Four of those populations are from Africa, one from East Asia and one from South America, yet all six have been classified as nonagricultural (33).
The increase in CNV2 copy number and decrease in CNV1 copy number might be due to selection for a particular phenotype more favored following the transition to agriculture. Our favored hypothesis is that a particular SRCR domain that binds S. mutans or hydroxyapatite of the tooth more weakly, thereby reducing the likelihood of caries. Analysis of the S. mutans binding domain sequence shows that there is no difference between SRCR domains in CNV1 and CNV2 (Fig. 1D). However, when manually inspecting the human GRCh37/UCSC hg19 reference sequence but also 10 HGDP samples sequenced to high-depth (ref. 34 and Table S1), the SRCR domains in CNV2 share a serine to tyrosine change which disrupts the hydroxyapatite-binding domain (37) and abolishes a strong potential mucin-type O-linked glycosylation site (Fig. 1D). Replacement of CNV1-type SRCR domains with CNV2-type SRCR domains has therefore allowed this tyrosine substitution to propagate rapidly through the DMBTSAG molecule. This observation suggests that the transition to agriculture has been accompanied by a partial replacement of canonical SRCR domains with SRCR domains that either bind the tooth less strongly or, because glycosylation is important for binding, bind S. mutans less strongly.
Evolutionary Relationship Between DMBT1 and its Ligand in S. mutans.
If the interaction between DMBT1SAG and S. mutans is coevolving, we might expect to see a relationship between variation of S. mutans and CNV1 and CNV2 of DMBT1 across different individuals. Adaptation of S. mutans to the DMBT1SAG phenotype of different mouths reflects a more recent evolutionary time scale than adaptation of DMBT1 in humans. However, given that S. mutans colonizes the mouth in early childhood (38) and the doubling time of biofilm-attached S. mutans is in the order of hours (39), it is likely that it can adapt genetically to the oral environment. We genotyped 125 adult individuals resident in Leicester U.K., 92 of European origin, for CNV1 and CNV2, and sequenced part of the S. mutans gene spaP from DNA isolated from matched saliva. spaP encodes AgI/II which is the ligand for human DMBT1SAG (40). We focused on a 1-kb region of the spaP gene encoding 336 amino acids from the C-terminal region known to contain two binding domains for human DMBT1SAG, namely Ad1 and Ad2 (41). For 98 of our cohort (78%), only one S. mutans strain (as defined by homozygosity of the sequenced region) was found. This observation suggests very low within-mouth diversity of S. mutans, and that most people are colonized by only one strain. However, alignment of sequences showed 136 single-nucleotide polymorphisms, 82 of which altered amino acid sequence, reflecting very high levels of diversity between individuals (Fig. S6).
A difference between the allele frequency spectra (AFS) of nonsynonymous and synonymous polymorphisms can indicate selection, if synonymous polymorphisms are assumed to be neutral. Comparison of the AFS of the S. mutans spaP gene shows a difference in both the total and European-only cohort (P = 0.015 and P = 0.018 respectively), with an enrichment of polymorphisms with rare alleles (minor allele frequency < 1%, P = 0.003 for total cohort, P = 0.022 for European-only cohort, Fig. S6 E and F). This result indicates that weak negative selection is acting on spaP, consistent with previous genome-wide approaches (4).
If a particular spaP allele was adapted to a particular DMBT1SAG phenotype, we might expect the spaP allele and the DMBT1 genotype to be associated across a number of individuals. Across our cohort, we identified 13 polymorphisms that changed amino acid sequence in spaP Ad1 or Ad2. Given the low derived allele frequencies of these polymorphisms, we had limited power to detect an association with CNV1 or CNV2 copy number. However, two of these polymorphisms affect an amino acid highly conserved across oral Streptococci (Fig. S6 G and H), and in one the derived allele (A1090D, PDB P11657) was associated with lower copy number at CNV1 (nominal P = 0.0416) and CNV2 (nominal P = 0.0169) in the European-origin cohort, which leads to an overall association with low DMBT1 copy number (P = 0.046, corrected for multiple comparisons). Only CNV2 remains associated with low copy number (nominal P = 0.0196) in the full cohort, suggesting an effect of ethnicity on this interaction.
Discussion
We have shown that DMBT1 shows extensive CNV, with two distinct regions (termed CNV1 and CNV2) showing extensive copy number polymorphism across a wide range of populations. This polymorphism is predicted to underlie the variable number of tandemly repeated SRCR domains of the DMBT1SAG protein observed in different individuals. These SRCR domains have both S. mutans- and hydroxyapatite-binding activities (13, 22, 26, 42). Direct observation of de novo mutations in pedigrees show that both CNV1 and CNV2 have exceptionally high mutation rates, 1.4% and 3.3% per gamete per generation, respectively; to our knowledge the fastest mutation rates affecting coding sequence yet described in humans. Analysis of breakpoints suggests that, at least for CNV2, NAHR drives the mutation process and that most, but not all, NAHR events are inter- or intrachromatidal, rather than between homologous chromosomes. Such a bias has also been observed at the tandemly repeated DEFA1A3 locus (43) and at alpha-satellite DNA (44), suggesting a shared mechanism.
Populations which practice agriculture generally show a low copy number of CNV1 and high copy number of CNV2, distinct from hunter-gatherer populations and ancient hominins. This pattern of CNV increases the number of copies of a particular type of SRCR domain containing an amino acid change predicted to disrupt binding to hydroxyapatite in the tooth and to abolish a mucin-like O-glycosylation site. Given that cariogenic S. mutans became prevalent after the development of agriculture, our data suggest that DMBT1SAG has evolved to modulate its binding to the tooth surface or S. mutans (or both) by rapidly mutating SRCR domain units carrying the appropriate binding motifs. This scenario presupposes that caries was an agent of natural selection before the development of modern dentistry (8, 45). We think that this is not unlikely, given the known acute consequences of caries, such as the increased risk of abscess (46) and chronic consequences increase in difficulty eating, particularly in children, and reduced weight/height gain (47, 48). Nevertheless, hypotheses about the agents of evolutionary change in humans are very difficult to prove, and we note that, given DMBT1SAG protein is expressed on other mucosal epithelia and interacts with other microbes, other evolutionary scenarios are possible, such as adaptation to an altered microbiome of the gut. Indeed, CNV1 of DMBT1 corresponds to a previously described deletion (26), with zero copies reflecting a homozygous deletion and one copy reflecting a heterozygous deletion. This deletion has previously been associated, in a small case-control study, with increased susceptibility to Crohn’s disease (25), an intestinal inflammatory disease. If this association is confirmed, this would represent an interesting case of pathogen/culture-driven selection increasing the allele frequency of an autoimmune susceptibility allele.
We also investigated variation of S. mutans in the context of DMBT1SAG CNV. The overall pattern of variation of the DMBTSAG-binding region of AgI/II is that of weak negative selection across the population, where new amino acids changes are selected against when transferred from host to host. Our data also support the lack of geographical structure of S. mutans, as our sampling of a restricted population effectively captures the global diversity of sequences analyzed elsewhere, at least for this particular region, and again argues for weak negative selection and background selection being the dominant force shaping diversity in S. mutans. There is weak evidence that there is a relationship between sequence variation at AgI/II and CNV at DMBT1. As minor alleles at AgI/II are generally rare, a much larger cohort and functional analysis are needed to tease apart the natural variation modulating this interaction.
Our results now provide a framework for understanding the full nature and functional effect of sequence variation at this locus, which will have an unclear linkage disequilibrium relationship with neighboring SNPs. One study has highlighted DMBT1 to be a strong candidate for ancient balancing selection in the genome (49). Another recent study has identified the region upstream of DMBT1 to show unusually negative Tajima’s D (in the fifth percentile genome wide) in Europeans, supporting our model of selection (50). However, the effect on SNP diversity of a rapidly mutating CNV undergoing fluctuating geographically structured selection remains unclear. The relationship between the CNV we describe here and diseases should also be studied further, in particular those diseases with an infectious or immune component to their etiology. Taken further we hope that the rapidly mutating DMBT1 gene will become a paradigm of host-pathogen evolutionary study leading to important insights in understanding the process of caries formation, and other host-microbe interactions, in humans.
Methods
Full details on the methods used, and details of the samples, are described in SI Methods. CNV was characterized and typed using multiple paralog ratio tests (PRT), a form of quantitative PCR that uses the same primer sequences for test and reference loci to minimize amplification bias (27, 51, 52). PRTs were validated both by examining assay concordance and by testing a subset of samples using long PCR, array CGH, fiber-FISH, and next-generation sequencing (Tables S2 and S3).
S. mutans sequences were derived by Sanger sequencing following PCR amplification of a region of the spaP gene directly from DNA insulated from human mouthwash samples. Population simulations were conducted using simuPOP (53). Correlation of population subsistence with mean population copy number was performed as previously described (31, 33).
Acknowledgments
We thank Gurdeep Lall, Jenny Bowdrey, and Seijal Patel for help and support and Rita Neumann, Alec Jeffreys, Mark Jobling, and Jan Mollenhauer for DNA samples. This work was funded by a Government of India Ministry of Social Justice and Empowerment PhD studentship (to S.P. and E.J.H.). E.J.H. was supported in part by a Medical Research Council New Investigator Grant (GO801123). S.L. and F.Y. were supported by the Wellcome Trust (WT098051). D.S.H. was supported by NIH Grant RC4DK090937-01. This research used the ALICE and SPECTRE High Performance Computing Facilities at the University of Leicester.
Footnotes
- ↵1To whom correspondence should be addressed. Email: ejh33{at}le.ac.uk.
Author contributions: S.P. and E.J.H. designed research; S.P., S.L., T.B., F.Y., and E.J.H. performed research; D.F., M.S., and D.S.H. contributed new reagents/analytic tools; S.P., D.F., M.S., T.B., and E.J.H. analyzed data; and S.P. and E.J.H. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1416531112/-/DCSupplemental.
Freely available online through the PNAS open access option.
References
- ↵.
- Jobling M,
- Hollox E,
- Hurles M,
- Kivisild T,
- Tyler-Smith C
- ↵.
- Gerbault P, et al.
- ↵
- ↵
- ↵
- ↵.
- Cohen MN,
- Crane-Kramer GMM
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Esberg A,
- Löfgren-Burström A,
- Öhman U,
- Strömberg N
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Bikker FJ, et al.
- ↵.
- Kishimoto E,
- Hay DI,
- Gibbons RJ
- ↵.
- Lamont RJ,
- Demuth DR,
- Davis CA,
- Malamud D,
- Rosan B
- ↵.
- Loimaranta V, et al.
- ↵
- ↵.
- Sasaki H,
- Betensky RA,
- Cairncross JG,
- Louis DN
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Hancock AM, et al.
- ↵.
- Meyer M, et al.
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Caselitz P
- ↵
- ↵
- ↵.
- Alkarimi HA,
- Watt RG,
- Pikhart H,
- Sheiham A,
- Tsakos G
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Bikker FJ, et al.
Citation Manager Formats
Sign up for Article Alerts
Jump to section
You May Also be Interested in
More Articles of This Classification
Biological Sciences
Related Content
- No related articles found.
Cited by...
- No citing articles found.