New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology
Replication and refinement of a vaginal microbial signature of preterm birth in two racially distinct cohorts of US women
Edited by Lora V. Hooper, The University of Texas Southwestern, Dallas, TX, and approved August 1, 2017 (received for review April 11, 2017)

Significance
Premature birth (PTB) is a major global public health burden. Previous studies have suggested an association between altered vaginal microbiota composition and PTB, although findings across studies have been inconsistent. To address these inconsistencies, improve upon our previous signature, and better understand the vaginal microbiota’s role in PTB, we conducted a case-control study in two cohorts of pregnant women: one predominantly Caucasian at low risk of PTB, the second predominantly African American at high risk. With the results, we were able to replicate our signature in the first cohort and refine our signature of PTB for both cohorts. Our findings elucidate the ecology of the vaginal microbiota and advance our ability to predict and understand the causes of PTB.
Abstract
Preterm birth (PTB) is the leading cause of neonatal morbidity and mortality. Previous studies have suggested that the maternal vaginal microbiota contributes to the pathophysiology of PTB, but conflicting results in recent years have raised doubts. We conducted a study of PTB compared with term birth in two cohorts of pregnant women: one predominantly Caucasian (n = 39) at low risk for PTB, the second predominantly African American and at high-risk (n = 96). We profiled the taxonomic composition of 2,179 vaginal swabs collected prospectively and weekly during gestation using 16S rRNA gene sequencing. Previously proposed associations between PTB and lower Lactobacillus and higher Gardnerella abundances replicated in the low-risk cohort, but not in the high-risk cohort. High-resolution bioinformatics enabled taxonomic assignment to the species and subspecies levels, revealing that Lactobacillus crispatus was associated with low risk of PTB in both cohorts, while Lactobacillus iners was not, and that a subspecies clade of Gardnerella vaginalis explained the genus association with PTB. Patterns of cooccurrence between L. crispatus and Gardnerella were highly exclusive, while Gardnerella and L. iners often coexisted at high frequencies. We argue that the vaginal microbiota is better represented by the quantitative frequencies of these key taxa than by classifying communities into five community state types. Our findings extend and corroborate the association between the vaginal microbiota and PTB, demonstrate the benefits of high-resolution statistical bioinformatics in clinical microbiome studies, and suggest that previous conflicting results may reflect the different risk profile of women of black race.
Preterm birth (PTB; delivery at <37 gestational wk) affects ≈12% of US births and is the leading cause of neonatal death and morbidity worldwide. Multiple lines of evidence support a role for the indigenous microbial communities of the mother (the maternal microbiota) in the pathophysiology of PTB. Microbial invasion of the amniotic cavity is one of the most frequent causes of spontaneous PTB (1), and the most common invading taxa are consistent with maternal origin (2⇓–4). Bacterial vaginosis (BV), a condition involving an altered vaginal microbiota, has been consistently identified as a risk factor for PTB (5, 6). Multiple studies have also found chronic periodontitis, another condition associated with an altered microbiota, to be a risk factor for PTB (7, 8).
High-throughput sequencing methods have facilitated new lines of investigation into the microbial etiology of PTB (9, 10). Amplification and high-throughput sequencing of the 16S rRNA gene (metabarcoding) simultaneously measures the presence and relative abundance of thousands of bacterial taxa (composition), and resolves differences to the level of genus and sometimes species or subspecies. To date, metabarcoding studies of the relationship between the vaginal microbiota and PTB have yielded mixed, even discordant, results. We previously reported that diverse, low Lactobacillus vaginal communities (BV-like) were associated with PTB in two predominantly Caucasian cohorts (11). Romero et al. (12) detected no association between vaginal microbiota composition and PTB in 90 women, 88% of whom were African American. Recent studies disagreed along similar lines: BV-like communities were associated with preterm premature rupture of membranes in a predominantly Caucasian and Asian cohort (n = 91), while no significant association with PTB was detected in a cohort of African-American women (n = 40) (13, 14).
Resolving the mixed findings in prior PTB-microbiota studies will require addressing two challenges: (i) low power resulting from the combination of small study populations (30–91 pregnant women), the many taxa measured by metabarcoding, and the absence of initial hypotheses more specific than some difference between preterm and term gestations; and (ii) insufficient understanding of population-specific factors that might modulate the PTB–microbiota association. For example, there is evidence that women of black race, a proxy for host genetics and a correlate to social and demographic factors, have a different range of normal vaginal community compositions (15, 16).
Additional difficulties arise from differences in metabarcoding methodologies. Sample collection, DNA extraction, and PCR primers impact sensitivity to different taxa (17, 18), sometimes so strongly that particular taxa can go undetected (e.g., Gardnerella; ref. 19). Variation in the rate of evolutionary diversification within the 16S rRNA gene makes results from different 16S rRNA gene regions difficult to compare, especially at higher taxonomic resolutions (20). The confounding of methodological differences with population differences across studies is particularly challenging.
Here, we present a study designed with these challenges in mind. Using a uniform methodology, we attempt to replicate and refine a small number of previously generated hypotheses in two different populations differing dramatically in racial composition.
In our 2015 study (11), we measured the composition of the vaginal microbiota longitudinally over the course of pregnancy in 49 women and tested for associations with PTB. At the taxonomic level of genus, we reported that PTB was associated with (i) lower abundance of Lactobacillus, (ii) higher abundance of Gardnerella, and (iii) higher abundance of Ureaplasma. Here, we test those associations in two new study cohorts: a “replication” (Stanford) cohort of 39 low-risk women (30 term, 9 preterm deliveries) enrolled from the same predominantly Caucasian population that provided the hypothesis-generating cohort in our 2015 study, and a “high-risk” (UAB, University of Alabama at Birmingham) cohort of 96 women (55 term, 41 preterm) enrolled from the predominantly African-American population of pregnant women with a prior history of PTB who received prenatal care and progesterone treatment at the UAB. Our deep sequencing and weekly sampling approach provides the most accurate quantification yet of total gestational exposure to bacterial taxa in the vaginal microbiota.
We find that the previously reported associations between PTB and the Lactobacillus and Gardnerella genera replicate in the Stanford cohort, but not in the UAB cohort. High-resolution statistical bioinformatics reveals that Lactobacillus crispatus and a subspecies Gardnerella vaginalis variant (G2) drive the association of those genera with PTB. We describe the patterns of cooccurrence between three key species of the vaginal microbiota—L. crispatus, Lactobacillus iners, and G. vaginalis—finding that L. iners, but not L. crispatus, often coexists with G. vaginalis. We propose a model of the vaginal microbiota based on these key taxa, and discuss how ecological interactions and host–population-specific effects may explain the conflicting findings on the relationship between the vaginal microbiota and health.
Results
Replication.
The average gestational frequencies of the Lactobacillus, Gardnerella, and Ureaplasma genera, stratified by birth outcome, are shown in Fig. 1. Our previously reported associations between less Lactobacillus and PTB, and between more Gardnerella and PTB, replicated in the Stanford cohort (Wilcoxon rank-sum test; Lactobacillus, P = 0.0093; Gardnerella, P = 0.0070). However, neither association was significant in the UAB cohort, and our previously hypothesized association between more Ureaplasma and PTB was not supported in either cohort. For characteristics of these cohorts, see Tables S1 and S2.
Average gestational frequencies of Lactobacillus, Gardnerella, and Ureaplasma for women who delivered preterm and at term. Each point shows the average frequency of the genus-of-interest across gestational samples from one woman. Upper shows the Stanford cohort (n = 39), and Lower shows the UAB cohort (n = 96). The significance of the differences between average gestational frequencies in term and preterm births, in the previously reported directions, were assessed by one-sided Wilcoxon rank-sum test; **P < 0.01.
Demographic and clinical characteristics of the total study population (n = 135)
Features of women in both cohorts who delivered preterm (n = 50)
We calculated odds ratios (ORs) for the Lactobacillus and Gardnerella associations in the Stanford cohort. Using a somewhat arbitrary frequency threshold of 70% to delineate low Lactobacillus (high risk) and high Lactobacillus (low risk), we obtain an OR of 5.81 (95% CI range: 1.12–33.7) for the low Lactobacillus category. Using a frequency threshold of 0.1% for Gardnerella produced an OR of 5.12 (95% CI range: 1.05–31.1) for the high Gardnerella (high risk) category. These ORs are specific to the Stanford population.
Refinement.
Our bioinformatics approach grants subgenus taxonomic resolution for many taxa. We reconsidered the Lactobacillus association for each of the four principal Lactobacillus species of the vaginal microbiome and the Gardnerella association for each of the nine unique G. vaginalis 16S rRNA sequence variants detected in our study.
We found striking differences among the associations of different Lactobacillus species with PTB, especially between the highly abundant L. crispatus and L. iners species. A lower abundance of L. crispatus was significantly associated with PTB in both cohorts, while no significant association was detected for L. iners (Fig. 2A). In the UAB cohort, decreased abundance of the less common species L. jensenii and L. gasseri was also associated with PTB.
Associations between Gardnerella/Lactobacillus variants and PTB in two cohorts of women. (A) Associations between average gestational frequency and PTB were tested for the nine detected Gardnerella 16S rRNA sequence variants, and for the four major Lactobacillus species in the vaginal microbiota. Testing was performed separately for the Stanford and UAB cohorts. Dashed lines indicate P = 0.05 (one-sided Wilcoxon rank-sum test). (B) The three most abundant Gardnerella sequence variants were mapped onto a phylogenetic tree constructed from sequenced G. vaginalis genomes (SI Materials and Methods). (C) MDS was performed on the Bray–Curtis distances between all samples. The two MDS dimensions that explain the greatest amounts of variation in the data are displayed. Stanford (n = 897) and UAB (n = 1,282) samples are plotted separately. Landmark samples (magenta) contained the highest proportion of the labeled taxa: G1 (66% frequency in landmark sample), G2 (96%), L. crispatus (>99%), and L. iners (>99%).
The association between the Gardnerella genus and PTB was driven by a single variant. Gardnerella sequence variant 2 (G2) had a strong significant association in the Stanford cohort (P = 0.0033, Wilcoxon rank-sum test), while no other variants, nor the Gardnerella genus excluding G2, significantly associated with PTB in either cohort (Fig. 2A). The most abundant Gardnerella sequence variants (G1, G2, and G3) differ at just one or two nucleotides. However, when mapped onto whole-genome phylogenies (Fig. 2B and SI Materials and Methods), these variants differentiate between significantly diverged “genovars” within the G. vaginalis species (21). The functional classifications of clade-specific genes are largely redundant but transporters, membrane proteins, and toxin-antitoxin systems appear overrepresented in G2 variant genomes (Figs. S1 and S2).
Hierarchical clustering on the Euclidean distances of the presence/absence of 1,553 orthologous genes in 17 G. vaginalis genomes. Green: genes present; black: genes absent. Cluster 1, orthologous genes present in nearly all variants; cluster 2, orthologs in G2 variants; cluster 3, orthologous genes in G2 and G3 variants; cluster 4, orthologous genes in G1 variants.
Relative contribution of functions in clade-specific G. vaginalis genes. The relative number of genes in each functional category was determined for clusters 2–4 from Fig. S1.
We also considered the refined Lactobacillus and Gardnerella associations in the Stanford cohort for samples restricted to fall within certain time windows of pregnancy (Fig. S3). These results suggest that the replicated associations hold even early in the second trimester (13–18 gestational wk).
Statistical association between PTB and the average frequencies of Lactobacillus and Gardnerella during particular periods of gestation. The association between PTB and Lactobacillus/Gardnerella, as well as the two most abundant subgenus variants of each, was tested (one-sided Wilcoxon rank-sum test) in subsets of samples from specific time periods of gestation.
Ecology.
Four “landmark” samples with the highest observed proportion of G1, G2, L. crispatus, and L. iners, are highlighted in a multidimensional scaling (MDS) ordination of all samples (Fig. 2C). The landmarks sit at the four locations with the highest concentration of samples, indicating that these taxa drive or track community-wide variation in the vaginal microbiota.
The Lactobacillus and Gardnerella genera together make up more than two-thirds of the study-wide vaginal microbiota in both the Stanford and UAB cohorts (Fig. 3A). Notable differences between the cohorts are the higher frequency of Gardnerella in the UAB cohort (8.3%) relative to the Stanford cohort (4.7%) and the lower frequency of L. crispatus in UAB (15%) relative to Stanford (45%).
The distribution of Lactobacillus and Gardnerella in the vaginal microbiota of two cohorts of pregnant women. (A) The frequencies of Lactobacillus species and Gardnerella variants in the pooled samples from the Stanford and UAB cohorts. (B) The joint distributions of L. crispatus with Gardnerella, and L. iners with Gardnerella, stratified by cohort. The x axis shows the summed frequency of Gardnerella and the Lactobacillus species; the y axis shows the ratio between the two (log-scale).
The pattern of cooccurrence between L. crispatus and Gardnerella is different from between L. iners and Gardnerella (Fig. 3B). The ratio of L. crispatus to Gardnerella is extremely skewed toward one predominating over the other when the two make up a large fraction of the community, a pattern consistent with an exclusionary interaction. In marked contrast, L. iners and Gardnerella often coexist at comparable frequencies, at least in the UAB cohort, even at high combined frequencies.
We quantified community instability as the Bray–Curtis dissimilarity between communities sampled in consecutive weeks. The gestational average of community instability was inversely correlated with Lactobacillus dominance as expected (Spearman ρ = −0.76), but intriguingly community instability was associated with PTB in the Stanford cohort (Wilcoxon rank-sum test; P = 0.0018) even more strongly than was Lactobacillus frequency (22).
Exploration and Contamination.
We tested for associations between PTB and increased gestational frequencies of several genera frequently linked with BV (Gardnerella, Prevotella, Atopobium, Mobiluncus, Megasphaera, Dialister, Peptoniphilus, and Mycoplasma) and three sequence variants that exactly matched 16S rRNA genes from the recently identified bacterial vaginosis-associated bacteria BVAB1, BVAB2, and BVAB3 (23, 24). Prevotella was associated with PTB in both cohorts (P < 0.05), while most of the remaining genera were significantly associated only in the Stanford cohort (Table 1). BVAB sequence variants were not significantly associated with PTB in either cohort.
Association between PTB and BV-relevant taxa
We detected sequence variants corresponding to the causative organisms for three sexually transmitted infections (STIs) in the UAB cohort: Chlamydia trachomatis (5 gestations), Neisseria gonorrhoeae (4), and Trichomonas vaginalis (28). We also detected sequence variants corresponding to Trichomonas-associated Candidatus Mycoplasma girerdii (25⇓–27), and Trichomonas was indeed present in all samples with frequency >10−3.5. No STI organism was significantly associated with PTB (Fig. S4), but this may reflect insufficient power due to the small numbers of subjects in which they were observed.
Average gestational frequencies of STI-associated microbes. None of these taxa were significantly associated with PTB: Chlamydia P = 0.96 (one-sided Wilcoxon rank-sum test), Gonorrhea (N. gonorrhoeae) P = 0.30, Trichomonas P = 0.12, although power was limited by the small number of women in which they were observed. These STI-associated microbes were only detected in the UAB cohort.
We also performed an exploratory analysis in which the gestational frequency of each detected genus was tested for an association with PTB (Fig. 4 and Table S3). Many genera met a common standard for statistical significance in exploratory analyses (FDR < 0.1) (Table S3). However, closer inspection of the four genera which stand out the most from the bulk of the P value distribution (Yersinia, Abiotrophia, Stomatobaculum, Tumebacillus) suggests a potentially artifactual signal. These taxa were present in a small fraction of samples (0.5–7.0%) and at low overall frequency (<2 × 10−5), and were detected in at least one negative control (attempted PCR of extraction blank or DNA-free water). Furthermore, the prevalence of Yersinia and Tumebacillus in vaginal and control samples was strongly correlated across runs (Fig. 4B), suggesting these taxa were contaminants in the vaginal samples (SI Discussion). Alternative exploratory analyses restricted to sequence variants and genera with frequency greater than 10−3 (to avoid contaminants) yielded results largely consistent with BV-associated PTB risk (Figs. S5 and S6).
Statistical association between the average gestational frequencies of detected genera and preterm birth in two cohorts of women. (A) We tested the association between PTB and increased gestational frequency for all detected genera (for Lactobacillus, decreased frequency) in each cohort by the Wilcoxon rank-sum test. Genera in red have a significant composite P value after controlling FDR < 0.1 (Methods). Text size scales with the square root of study-wide frequency. (B) The fraction of samples in which the four highlighted genera were present among vaginal samples and negative controls, stratified by sequencing run.
Genera with a significant association between increased average gestational frequency and preterm birth, many of which may be contaminants
Statistical association between the average gestational frequencies of high-frequency sequence variants and preterm birth in two cohorts of women. We tested the association between PTB and increased gestational frequency for all non-Lactobacillus sequence variants with a minimum frequency of 0.001 in either the Stanford or UAB cohort by the Wilcoxon rank-sum test. Sequence variants are labeled by their assigned genus. Sequence variants with a red label have a significant composite P value after controlling FDR < 0.1 (Methods). Text size scales with the square root of study-wide frequency. The significant Gardnerella sequence variant is the G2 variant highlighted in the main text.
Statistical association between the average gestational frequencies of high-frequency genera and preterm birth in two cohorts of women. We tested the association between PTB and increased gestational frequency for all non-Lactobacillus genera with a minimum frequency of 0.001 in either the Stanford or UAB cohort by the Wilcoxon rank-sum test. Genera with a red label have a significant composite P value after controlling FDR < 0.1 (Methods). Text size scales with the square root of study-wide frequency. Note that Prevotella has a much stronger association with PTB when the data are aggregated at the level of genus than when each sequence variant is considered separately.
SI Materials and Methods
Study Population and Sampling Procedure.
Two cohorts of pregnant women were enrolled using the same study design, procedures, and methods at separate sites within the United States between 2013 and 2015, and analyzed under a case-control design. One cohort, the “Stanford” cohort, was enrolled from women presenting to the obstetrical clinics of the Lucille Packard Children’s Hospital at Stanford University for prenatal care. The underlying population is predominantly Caucasian and Asian and has a low risk for PTB (<10%); the Stanford cohort analyzed here was selected to enrich for PTB outcomes. Another cohort, the “UAB” cohort, was enrolled from women referred to the UAB for intramuscular 17α-hydroxyprogesterone caproate therapy due to a history of prior preterm delivery. For both cohorts, participants were included in the analysis if they met the following inclusion criteria: 18–45 y of age, singleton gestation, not immunosuppressed, ability to perform the study procedures, and ability to provide informed consent. Both cohorts were enrolled using the same study protocol, but each cohort was analyzed in a separate case-control analysis (using identical bioinformatics approaches). Cases were defined as women who delivered preterm (<37 wk of gestation); controls were women who delivered at term. The analyzed Stanford cohort consisted of 39 women, and the UAB cohort comprised 96 women; demographic and clinical characteristics of each cohort, stratified by enrollment site and by delivery outcome, are presented in Table S1. Features of the 50 women who delivered preterm are presented in Table S2.
Study participants self-collected vaginal swabs weekly by using sterile Catch-All Sample Collection Swabs (Epicentre Biotechnologies) to obtain material from the midvaginal wall. All clinical specimens were placed immediately after collection at −20 °C until transport to the laboratory for storage at −80 °C until processing. Information related to socioeconomic status, diet, medical conditions and symptoms, medication and dietary supplement use, and stress were captured in a detailed questionnaire completed by each study participant upon enrollment; any significant changes in these parameters were documented in a follow-up questionnaire completed at periodic clinical visits.
DNA Extraction, 16S rRNA Gene Amplification, and Amplicon Sequencing.
Whole genomic DNA was extracted from each clinical specimen by means of the PowerSoil DNA isolation kit (MO BIO Laboratories) according to the manufacturer’s protocol except for the inclusion of a 10-min incubation at 65 °C immediately after the addition of solution C1. The V4 hypervariable region of the 16S rRNA gene was amplified by PCR. The forward PCR primer (5′ AAT GAT ACG GCG ACC ACC GAG ATC TAC ACG CTN NNN NNN NNN NNT ATG GTA ATT GTG TGY CAG CMG CCG CGG TAA 3′) was a 74-nucleotide (nt) fusion primer consisting of the 32-nt Illumina adapter (designated by bold), a unique 12-nt barcode to label each amplicon (designated by the N’s), a 9-nt forward primer pad, a 2-nt linker (GT), and the 19-nt broad-range bacterial primer 515F (designated by underlining). The 56-nt reverse primer (5′ CAA GCA GAA GAC GGC ATA CGA GAT AGT CAG CCA GCC GGA CTA CNV GGG TWT CTA AT 3′) consisted of the 24-nt Illumina adapter (designated by bold), a 10-nt reverse primer pad, a 2-nt reverse primer linker (CC), and the 20-nt broad-range bacterial primer 806R (designated by underlining).
Triplicate 25-µL PCRs were carried out by using 1× HotMasterMix (5 PRIME), 0.4 µM concentrations of each commercially synthesized primer, and 3 µL of prepared DNA template. Thermal cycling conditions consisted of an initial denaturing step of 94 °C for 3 min, followed by 30 cycles of 94 °C for 45 s, 50 °C for 60 s, and 72 °C for 90 s, with a final extension step of 72 °C for 10 min. Upon completion of the PCRs, the corresponding triplicate reaction mixtures were pooled and purified by using the Ultra-clean-htp 96-well PCR clean-up kit (Mo Bio Laboratories) according to the manufacturer’s protocol. DNA concentrations from each triplicate pool were quantified using the Quant-iT High-Sensitivity dsDNA Assay Kit (Invitrogen) and combined in equimolar ratios into a single tube. The resulting amplicon mixture was concentrated by ethanol precipitation and resuspended in 100 µL of molecular biology-grade water (Life Technologies). The resuspended amplicon mixture was gel purified and recovered using a QIAquick gel extraction kit (Qiagen). Recovered PCR products were sequenced on an Illumina HiSeq 2500 instrument (Illumina) at the W. M. Keck Center for Comparative Functional Genomics at the University of Illinois, Urbana–Champaign, IL.
Bioinformatics.
Bioinformatics processing largely followed the DADA2 Workflow for Big Data (benjjneb.github.io/dada2/bigdata_paired.html). Forward/reverse read pairs were trimmed and filtered, with forward reads truncated at 245 nt and reverse reads at 235 nt, no ambiguous bases allowed, and each read required to have less than two expected errors based on their quality scores. The relationship between quality scores and error rates was estimated for each sequencing run to reduce batch effects arising from run-to-run variability. ASVs were independently inferred from the forward and reverse of each sample using the run-specific error rates, and then read pairs were merged. Chimeras were identified in each sample, and ASVs were removed if identified as chimeric in a sufficient fraction of the samples in which they were present.
Taxonomic assignment was performed against the Silva v123 database using the implementation of the RDP naive Bayesian classifier available in the dada2 R package (45, 46). Lactobacillus species were assigned by hand via BLAST against sequences from cultured Lactobacillus strains.
Gardnerella Genome Analysis.
The multiple genome alignment for the phylogenetic tree was built with progressiveMauve (47). Briefly, progressiveMauve finds initial local alignments used as seeds, which are progressively extended to global alignments (the pangenome) using a guide tree. The output of progressiveMauve, an extended multifasta file (.xmfa format), was converted to fasta format using a publicly available script (https://github.com/kjolley/seq_scripts/blob/master/xmfa2fasta.pl). The multiple genome alignment was used to construct the phylogenetic tree using FastTree Version 2.1.8 with parameters –nt –gtr –gamma. Orthologous genes were obtained from the Mauve alignment based on at least 70% alignment coverage.
G. vaginalis genomes that contained a full-length 16S rRNA gene were classified as sequence variants G1 (seven genomes), G2 (nine genomes), and G3 (one genome) by exact matching, and were used for gene orthology analysis (Figs. S1 and S2). Presence or absence of genes unique to each strain were not evaluated because only 3 of the 34 Gardnerella genomes are closed. From a total of 3,750 genes predicted in all 17 strains, 1,448 were orthologous genes present in at least half of the genomes in each variant group. Hierarchical clustering on the Euclidean distances of the presence/absence of the 1,448 orthologous genes was done with Cluster 3.0 and Average Linkage as clustering method (48), and visualized with Java Treeview (49). The relative contribution of genes in functional categories was estimated from the number of genes in each functional category divided by the total number of orthologous genes in each cluster.
Accession nos. for Gardnerella vaginalis genomes used in this study are as follows: SAMN02436832, SAMN02472074, SAMN02472073, SAMN02393773, SAMN02393775, SAMN00138210, SAMN02393781, SAMN02393782, SAMN02393783, SAMN02393779, SAMN02393784, SAMN02393777, SAMN02393778, SAMN02393774, SAMN02472072, SAMN02471014, SAMN02436904, SAMN02436712, SAMN02436831, SAMN02436711, SAMN02436912, SAMN02436773, SAMN02436710, SAMN02436911, SAMN02436830, SAMN02436772, SAMN02436771, SAMN02436910, SAMN02436829, SAMN02436909, CP001849, CP002104, CP002725, SAMN02472071.
SI Discussion
Low Frequency Variants and Contamination.
Our focus here was on testing previously hypothesized PTB–microbiota associations, and then refining those associations with higher-resolution methods. However, we additionally performed an exploratory analysis of all possible PTB-genus associations to generate new hypotheses while controlling the false discovery rate to be less than a common standard for reporting new associations (FDR < 0.1).
One of the most notable features of this analysis, and one not usually highlighted, is that we believe many of these newly detected associations result from contamination rather than a legitimate biological signal. Of particular note are the Yersinia and Tumebacillus genera, which had prevalence in vaginal and control samples that was strongly correlated across runs (Fig. 4B), suggesting these taxa are contaminants in the vaginal samples. We did not detect such strong run-to-run correlations for Abiotrophia (present in 11 PTB, one term) or Stomatobaculum (16 PTB, four terms), nor when considering other known potential batch effects such as technician or cohort. The correlation coefficients for all of the most significant associations are reported in Table S2.
We are presenting these results without attempting to remove likely contaminants, in part to demonstrate the critical impediment that contaminants present to fulfilling the potential for metabarcoding methods to probe low frequencies. In principle, metabarcoding methods can detect variation at frequencies as low as one over the library size, which here is about a frequency of 10−5, and when averaging over a gestation can be as low as 10−6. However, the prevalence of contaminants at study-wide frequencies of up to 10−4 made identifying real biological patterns at those low frequencies very difficult. This is more problematic in lower biomass samples, such as vaginal swabs, and represents a serious barrier in identifying rare but highly impactful bacterial taxa.
High correlations between prevalence in negative controls and prevalence in vaginal samples of many of the PTB-associated taxa reported here strongly suggests that contaminants are driving many of the potential associations. The problematic taxa are all relatively rare (<5 × 10−4 frequency), and we do not expect contamination to materially affect inference among more abundant taxa. Unfortunately, identifying contaminating sequences is not trivial; 94/100 of the most abundant genera in our study appear in negative controls because of cross-contamination, eliminating the possibility of simply removing anything that appears in a negative control. The size of our study allows us to detect some contaminants by looking at prevalence correlations between the natural batches of sequencing runs, but even so there are some taxa for which it is difficult to make a determination.
Run-specific contaminants should minimally interfere with exploratory testing (beyond decreasing power by increasing the number of hypotheses tested) if samples are randomly distributed among sequencing runs. Unfortunately, the samples in this study were not completely randomized among the sequencing runs, in part because of the need to give sequencing precedence to the samples from a nonrandom subset of the subjects that were also included in a second collaborative study. This likely contributed to the number of spurious associations we found, and our results highlight the importance of robust randomization procedures for exploratory studies, especially those which span multiple sequencing runs and that intend to test low-abundance taxa.
Comparison with Results in Kindinger et al.
A recent study by Kindinger et al. (37) also examined the relationship between the vaginal microbiota and PTB in a similarly sized cohort (n = 161) of predominantly Caucasian and Asian women with a prior history of preterm birth. Using a CST-based analysis, they corroborated an association between a reduction in PTB and an increased relative abundance of L. crispatus during pregnancy. They also found a weaker or absent association between the vaginal microbiota and PTB in women of black race. However, their study differed from ours in finding a significant positive association between L. iners-dominated communities (CST3) and PTB, and no significant association between diverse BV-like communities (CST4) and PTB. As a result, the authors argued that L. iners might promote PTB, rather than members of the diverse BV-associated community as has been previously suggested and as was supported by our study. We believe the differences in our findings may reflect methodological differences in our 16S rRNA gene sequencing protocols and analysis methods, rather than biological differences.
Kindinger et al. used a V1-V3 primer set that is sensitive to Lactobacillus sequences, but may be insensitive to some other important taxa. In particular, Gardnerella is essentially absent (<0.01%) in their amplicon sequencing data, while it accounts for almost 5% of our study-wide sequencing reads. We do not believe this reflects structural differences in the vaginal communities between our cohorts given the widespread observation of significant Gardnerella fractions in previous population studies of the pregnant vaginal microbiota (11, 12). Instead, we suspect that the three mismatches in their forward primer prevented amplification of the Gardnerella 16S rRNA gene and, thus, Gardnerella remained essentially absent.
If it is the case that the primer set used by Kindinger et al. is systematically less sensitive to non-Lactobacillus species than ours, the divergences between the studies may yet reflect a common biological reality. Our study showed clearly that L. iners and BV-associated taxa like Gardnerella often exist at comparable frequencies. An amplicon sequencing protocol that poorly amplified BV-associated taxa would then cause amplicon sequencing data from BV-like communities to contain a preponderance of reads from the well-amplified L. iners populations with which they coexist. This effect would be compounded by assigning communities to CSTs, which largely tracks the identity of the most frequent taxa in amplicon reads. Thus, even if BV-like communities were promoting PTB, as we believe our findings suggest, the classification of BV-like communities (CST4) as L. iners-dominated communities (CST3) based on amplicon data depleted of Gardnerella sequences could create the appearance of an association between communities classified as CST3 and PTB that would otherwise be absent.
The differences between the specifics of the positive associations detected in these two studies should not diminish the areas of agreement: Both studies found a significant protective role for L. crispatus, a significant positive association between states of the vaginal microbiota not dominated by L. crispatus and PTB in predominantly Caucasian populations, and weaker or absent associations in African-American women. Together, we believe these studies have strengthened the evidence for a role of the vaginal microbiota in the pathophysiology of PTB, and point toward L. iners and G. vaginalis as taxa of particular importance for further study. Other taxa that often cooccur with G. vaginalis, such as the genera Prevotella, Dialister, Mobiluncus, and Atopobium, among others, also merit additional investigation.
Discussion
Replication and Refinement.
The profusion of exploratory microbiome studies in recent years has yielded a surfeit of reported associations between health outcomes and microbial taxa, and a lack of confidence in each individually. Here, we focused on retesting the associations we discovered and published in our previous exploratory study (11). The replication of the associations between PTB and the Lactobacillus and Gardnerella genera substantially increases our confidence that these are true associations within the Stanford population that could serve as biomarkers of PTB risk.
More precise characterization of PTB-associated alterations in the vaginal microbiota is important given the nonspecific diagnostic criteria of bacterial vaginosis (BV), and the previous failures of antibiotic treatment of BV to reduce PTB (28, 29). Our second focus here was on refining the replicated associations with new methods that resolve exact sequence variants from amplicon data, rather than lumping sequences within an arbitrary dissimilarity threshold into operational taxonomic units (OTUs) (30). This high-resolution approach localized all of the risk associated with the Gardnerella genus to a subspecies clade, and differentiated N. gonorrhoeae from other Neisseria species within 3% Hamming distance over the sequenced gene region. In clinical applications, we see little reason to choose OTUs over exact sequence variants, as health impacts can vary significantly between closely related taxa.
An Ecological Model of the Vaginal Microbiota in Health.
The vaginal microbiota is commonly classified into five discrete CSTs (community state types): CST1, communities dominated by L. crispatus; CST2, by L. gasseri; CST3, by L. iners; CST5, by L. jensenii; and CST4, diverse communities not dominated by Lactobacillus (15). We argue that the lack of a clear distinction between CST3 and CST4 is a critical shortcoming of the CST approach. In this study, L. iners and Gardnerella often coexisted at near equal frequencies. Discrete CSTs do not well-represent that variation, and will obscure the relationship between the vaginal microbiota and health.
We propose instead a simplified ecological model of the vaginal microbiota based on the quantitative frequencies of three key taxa: G. vaginalis, L. crispatus, and L. iners. The frequency of G. vaginalis correlates with adverse health outcomes, i.e., PTB and symptomatic BV. Gardnerella and L. crispatus strongly exclude each other; exclusion between L. iners and Gardnerella is weak or absent. Additional taxa can be added as their ecology and health importance are established (e.g., Prevotella).
The association between Gardnerella and PTB in the Stanford cohort could be explained by a gain of pathologic function, such as the stimulation of a detrimental host immune response, a loss of protective community function such as colonization resistance against pathogens by L. crispatus, by a cooccurrence between Gardnerella and other risk factors, or combinations thereof. That one Gardnerella variant (G2) drives the association for the entire genus is suggestive of a causative role, but is still consistent with indirect association if evolutionary changes in the G2 clade modified host specificity and/or ecological behavior.
The Gardnerella genus deserves further investigation. Different associations between PTB and Gardnerella sequence variants, which tracked genovars identified from whole genome sequences, suggest health-relevant differentiation. Characterizing more Gardnerella strains will help elucidate their role in health, especially coupled with the health status of the women from whom they are recovered. Diagnostic information more specific than Nugent score is needed: The signs and symptoms of the women must also be reported. Diversity within L. iners deserves similar scrutiny (31).
The striking difference we found between the exclusion of Gardnerella by L. crispatus, and the coexistence of Gardnerella with L. iners, was also seen in previous population surveys (32, 33). Laboratory experiments support direct exclusion between Gardnerella and L. crispatus, but not L. iners, via reciprocal interference in epithelial adhesion and biofilm formation (34, 35). Some but not all Lactobacillus species can produce lactocillin, a peptide possessing potent antibiotic activity against G. vaginalis and other Gram-positive pathogens (36). Although some evidence suggests that L. iners may contribute to adverse health outcomes (37), it would be consistent with our findings, and previous observations that the L. iners-dominated CST3 is relatively unstable (11, 32), if L. iners were itself harmless but promoted or at least did not inhibit the growth of other harmful microbes.
Population Effects.
The differences between the Stanford and UAB cohorts strongly suggest that PTB–microbiota associations are population-dependent. This is especially likely here because of the multiple differences between these cohorts: Women in the UAB cohort are predominantly African American, had a prior history of PTB, and were treated with intramuscular 17α-hydroxyprogesterone caproate, an antiinflammatory agent, during pregnancy.
Women of black race have a different range of normal vaginal community compositions when not pregnant (15). A previous study of a cohort of predominantly African-American women found no significant association between Gardnerella or Lactobacillus and PTB (12). These findings are consistent with a higher normal frequency of L. iners and Gardnerella, and a lower susceptibility to Gardnerella-associated adverse health outcomes, in women of black race. Interestingly, the PTB-associated G2 variant was of nearly equal frequency in the Stanford and UAB cohorts, but the unassociated G1 and G3 variants were more than twice as frequent in UAB. Additionally, the less frequent L. gasseri and L. jensenii species were significantly associated with PTB in the UAB cohort but not in the Stanford cohort.
It is also possible that PTB–microbiota associations do not reach significance in the UAB cohort because of a lower fraction of PTB that is microbiota-associated. All women in the UAB cohort had a prior history of PTB. PTB has multiple causes, including factors unrelated to the microbiota such as vascular disorders (1). If nonmicrobiota-associated causes of PTB are more likely to persist between pregnancies, then PTB in women with a prior history will be enriched for nonmicrobiota causation. The use of a progesterone with antiinflammatory properties in the UAB cohort might have reduced potential inflammatory effects from the vaginal microbiota.
Genus-level associations were inconsistent between the UAB and Stanford cohorts, but our higher resolution analysis of Lactobacillus species revealed that PTB is significantly associated with a lower frequency of L. crispatus in both cohorts. Better discrimination of critical taxa may reveal consistencies across populations that are obscured when taxa are grouped at higher levels (38).
Comparison with Kindinger et al.
A recent study also examined the relationship between the vaginal microbiota and PTB in a similarly sized cohort (n = 161) of predominantly Caucasian and Asian women with a prior history of PTB (37). Using a CST-based analysis, they corroborated an association between increased gestational frequency of L. crispatus and reduced risk for PTB. They also found a weaker or absent association between the vaginal microbiota and PTB in women of black race. However, their study differed from ours in finding a significant positive association between L. iners-dominated communities (CST3) and PTB, and no significant association between diverse BV-like communities (CST4) and PTB. We believe the differences in our findings may reflect methodological differences in our 16S rRNA gene sequencing protocols and analysis methods, rather than biological differences (SI Discussion). Kindinger et al. used a V1-V3 primer set that is sensitive to Lactobacillus, but insensitive to Gardnerella. Classification of BV-like communities (CST4) as L. iners-dominated communities (CST3) based on amplicon data depleted of Gardnerella sequences could create an association between CST3 and PTB that would otherwise be absent.
We believe these two studies have strengthened the evidence for a role of the vaginal microbiota in the pathophysiology of PTB, and point toward L. iners and G. vaginalis as taxa of particular importance for further study. Other taxa that often cooccur with G. vaginalis, such as the genera Prevotella, Dialister, Mobiluncus, and Atopobium, also merit additional investigation.
Strengths and Weaknesses.
The strengths of our study include accurate quantification of total gestational exposure to each taxon from our dense longitudinal sampling regimen; high-resolution bioinformatics that discriminated health-relevant differences at the species and subspecies level; a hypothesis-driven approach based on previously proposed associations; and characterization of two very different cohorts with the same methodology. The weaknesses of our study include an inability to distinguish between the effects of race, progesterone treatment, and prior history of PTB because of their confounding across the two study cohorts; heterogeneity in the type of PTB considered (e.g., spontaneous or medically indicated); and incomplete use of the time-series aspects of the data in our analysis.
Materials and Methods
Study Cohorts.
Two study cohorts were enrolled prospectively from different locations within the United States. The Stanford cohort (Palo Alto, CA; 39 women, 9 of whom delivered preterm) was selected to enrich for PTB from an ongoing prospective study of women presenting to the obstetrical clinics of the Lucille Packard Children’s Hospital Stanford for prenatal clinical care; previous cohorts from this study provided samples that led to the identification of a vaginal microbiota signature of PTB (11). The underlying population is predominantly Caucasian and Asian and has low risk for PTB (<10%). The UAB cohort (Birmingham, AL; 96 women, 41 preterm) was enrolled prospectively from the pregnant women referred to UAB for prior history of PTB. The referral population is predominantly African American, and the analyzed cohort was representative of that population. The study was approved by an Administrative Panel for the Protection of Human Subjects (IRB, Institutional Review Board) of Stanford University (IRB protocol no. 21956) and by an IRB at UAB (protocol no. X121031002). All women provided written informed consent before completing an enrollment questionnaire and providing biological samples.
The UAB and Stanford cohorts were enrolled using the same study protocol, but because of the substantial differences in their demographic and clinical characteristics (Table S1), we treated them as separate case-control analyses. Study participants self-collected vaginal swabs weekly (897 swabs from 39 women in the Stanford cohort; 1,282 from 96 women in UAB) by sampling material from the midvaginal wall (Fig. S7). Controls were women who delivered at term; cases were women who delivered preterm, i.e., <37 wk of gestation (Table S2). See SI Materials and Methods for further details.
Sampling regimen in the Stanford and UAB cohorts. Points show vaginal swab samples. Parentheses indicate the gestational week of delivery. The median week of sampling in the Stanford cohort was 24, and the interquartile range (IQR) was 17–31 wk. The median week of sampling in the UAB cohort was 27, and the IQR was 21–32 wk.
DNA Sequencing.
The V4 hypervariable region of the 16S rRNA gene was PCR-amplified from genomic DNA extracted from vaginal swabs. Amplicons were sequenced on an Illumina HiSeq 2500 platform (2 × 250 bp), which yielded a median sequencing depth of ∼200,000 reads per sample. See SI Materials and Methods for details.
Bioinformatics.
DADA2 was used to infer the amplicon sequence variants (ASVs) present in each sample (39). Exact sequence variants provide a more accurate and reproducible description of amplicon-sequenced communities than is possible with OTUs defined at a constant level (97% or other) of sequence similarity (30). See SI Materials and Methods for details.
Statistical Analysis.
Sequence variants were filtered before analysis if observed in fewer than three samples or with 100 reads study-wide (≈10−7 frequency). Statistical testing was performed on average gestational frequencies—the mean frequency of a taxon across all samples from the same gestation—in the R software environment (40). Based on prior hypotheses, associations between PTB and decreased frequencies of Lactobacillus, and increased frequencies of all other taxa, were tested by the one-sided Wilcoxon rank-sum method. Testing was performed separately for the Stanford and UAB cohorts, composite P values were calculated by Fisher’s method, and the FDR was controlled by the method of Benjamini and Hochberg (41).
MDS ordination plots based on the Bray–Curtis dissimilarity were generated with the phyloseq R package (42). Odds ratios and their confidence intervals were evaluated with the epitools R package (43). Figures were created with the ggplot2 R package (44).
Data Availability.
Raw sequence data have been deposited at SRA under accession no. SRP115697. The processed sequence table and R scripts implementing the bioinformatics and analysis workflows are available from the Stanford Digital Repository (https://purl.stanford.edu/yb681vm1809), and the R markdown file in Dataset S1.
Acknowledgments
We thank study participants, as well as Cele Quaintance, Anna Robaczewska, March of Dimes Prematurity Research Center study coordinators, and nursing staff in the obstetrical clinics and the labor and delivery unit of Lucille Packard Children’s Hospital. This research was supported by the March of Dimes Prematurity Research Center at Stanford University, the Stanford Child Health Research Institute, the Bill and Melinda Gates Foundation, the Gabilan Fellowship (to S.P.H.), and the Thomas C. and Joan M. Merigan Endowment at Stanford University (to D.A.R.).
Footnotes
↵1B.J.C. and D.B.D. contributed equally to this work.
- ↵2To whom correspondence should be addressed. Email: relman{at}stanford.edu.
Author contributions: B.J.C., D.B.D., S.P.H., and D.A.R. designed research; B.J.C., D.B.D., D.S.A.G., C.L.S., E.K.C., P.J., J.R.B., R.J.W., M.L.D., G.M.S., D.K.S., S.P.H., and D.A.R. performed research; B.J.C., D.B.D., P.J., S.P.H., and D.A.R. contributed new reagents/analytic tools; B.J.C., D.B.D., D.S.A.G., C.L.S., E.K.C., P.J., R.J.W., S.P.H., and D.A.R. analyzed data; and B.J.C., D.B.D., D.S.A.G., C.L.S., E.K.C., J.R.B., R.J.W., M.L.D., G.M.S., D.K.S., S.P.H., and D.A.R. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: The sequences reported in this paper have been deposited in the NCBI Sequence Read Archive, https://www.ncbi.nlm.nih.gov/sra (accession no. SRP115697).
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1705899114/-/DCSupplemental.
Freely available online through the PNAS open access option.
http://www.pnas.org/site/misc/userlicense.xhtmlReferences
- ↵.
- Romero R,
- Dey SK,
- Fisher SJ
- ↵
- ↵.
- Han YW,
- Shen T,
- Chung P,
- Buhimschi IA,
- Buhimschi CS
- ↵
- ↵
- ↵
- ↵
- ↵.
- Ide M,
- Papapanou PN
- ↵
- ↵
- ↵.
- DiGiulio DB, et al.
- ↵
- ↵.
- Brown R, et al.
- ↵
- ↵.
- Ravel J, et al.
- ↵
- ↵.
- Gill C,
- van de Wijgert JH,
- Blow F,
- Darby AC
- ↵
- ↵.
- Frank JA, et al.
- ↵
- ↵.
- Ahmed A, et al.
- ↵
- ↵
- ↵.
- Foxman B, et al.
- ↵
- ↵.
- Costello EK, et al.
- ↵.
- Fettweis JM, et al., Vaginal Microbiome Consortium
- ↵
- ↵.
- Brocklehurst P,
- Gordon A,
- Heatley E,
- Milan SJ
- ↵
- ↵.
- Petrova MI,
- Reid G,
- Vaneechoutte M,
- Lebeer S
- ↵.
- Verstraelen H, et al.
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- R Core Team
- ↵.
- Benjamini Y,
- Hochberg Y
- ↵
- ↵.
- Aragon TJ
- ↵.
- Wickham H
- ↵
- ↵.
- Wang Q,
- Garrity GM,
- Tiedje JM,
- Cole JR
- ↵
- ↵.
- Eisen MB,
- Spellman PT,
- Brown PO,
- Botstein D
- ↵
Citation Manager Formats
Sign up for Article Alerts
Article Classifications
- Biological Sciences
- Microbiology