Previous Article |
Table of Contents
| Next Article
From the Cover
BIOLOGICAL SCIENCES / MICROBIOLOGY
Variation in virulence among clades of Escherichia coli O157:H7 associated with disease outbreaks






*Microbial Evolution Laboratory, National Food Safety and Toxicology Center, Michigan State University, East Lansing, MI 48824;
Division of Infectious Diseases, University of Medicine and Dentistry of New Jersey, Newark, NJ 07103;
Bureau of Laboratories, Michigan Department of Community Health, Lansing, MI 48909;
National Center for Food Safety and Technology, Illinois Institute of Technology, Summit, IL 60501; and ¶Foodborne and Diarrheal Diseases Branch, National Center for Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA 30333
Edited by Masatoshi Nei, Pennsylvania State University, University Park, PA, and approved January 25, 2008 (received for review November 16, 2007)
| Abstract |
|---|
|
|
|---|
pathogens | polymorphisms | population genetics
It is not clear why outbreaks of EHEC O157 vary dramatically in the severity of illness and the frequency of the most serious complication, hemolytic uremic syndrome (HUS) (10–12). The 1993 outbreak in western North America (4) and the large 1996 outbreak in Japan (13) had low rates of hospitalization and HUS (14, 15), whereas the 2006 North American spinach outbreak (8) had high rates of both hospitalization (>50%) and HUS (>10%). One hypothesis is that outbreak strains differ in virulence as a result of variation in the presence and expression of different Shiga toxin (Stx) gene combinations (16–19).
To assess the genetic diversity and variability in virulence among E. coli O157 strains, we developed a real-time PCR system for identifying synonymous and nonsynonymous mutations as SNPs (20–23). Although molecular subtyping methods, such as pulsed-field gel electrophoresis (PFGE), reveal extensive genomic diversity among O157 outbreaks, "DNA fingerprinting" data are not amenable to population genetic or phylogenetic analyses. PFGE analysis has demonstrated that differences between O157 strains result from discrete insertions or deletions that contribute to restriction site changes between strains rather than SNPs (24). Comparison of multiple O157 genomes has shown that bacteriophage variation is a major factor in generating genomic diversity (25) and presumably underlies most genomic variability detected by PFGE (24, 26). In contrast, the systematic analysis of SNPs, also useful for outbreak investigations, can resolve closely related bacterial genotypes, provide insights into the microevolutionary history of genome divergence (20, 27), and contribute to an epidemiologic assessment of associations between bacterial genotypes and disease. Here we genotyped >500 clinical strains of EHEC O157 based on 96 SNPs that separated strains into genetically distinct groups and sequenced the genome of the O157 strain implicated in the spinach outbreak. These data form a basis for addressing how EHEC O157 has diversified and evolved in genome content and for assessing intrinsic differences among O157 lineages with regard to clinical presentation and disease severity.
| Results |
|---|
|
|
|---|
|
), a measure of the degree of polymorphism within the O157 population, is 0.212 ± 0.199, indicating that two strains selected at random differ on average at
20% of SNP loci (Fig. 1B). The minimum evolution (ME) algorithm, which infers that the theoretical tree is the smallest among all possible trees based on the sum of branch length estimates (32), revealed nine clusters among the 39 genotypes (Fig. 1C). Eight of the nine clusters are significant (multiple SGs grouped with >85% bootstrap support). The deepest node in the ME phylogeny occurs at 15 SNP locus differences and separates a lineage that includes ancestral O157 strains and close relatives with wild-type E. coli phenotypes (i.e., GUD+; sorbitol-positive, Sor+) from the evolutionarily derived lineages (GUD– and Sor–) (Fig. 1C). Neighbor-net Resolves Clades. Subsequent analyses of the 39 SG profiles revealed phylogenetically informative loci, as defined by two variants found in two or more SGs. Among the 96 SNP loci, 71 sites had complete data, and, of these, there were 23 singletons and 48 parsimoniously informative (PI) sites. The 48 PI sites were used to construct a Neighbor-net tree (33) to determine whether the informative sites support conflicting phylogenies or a single tree (Fig. 2). In this analysis, the 39 SGs were resolved into 25 distinct nodes: 10 nodes contained two or more SGs with the same profiles across all 48 loci (Fig. 2). Clade 9 roots the phylogenetic network because it includes strains with wild-type E. coli phenotypes (e.g., GUD+ and Sor+), characteristics of the lineage most primitive to the derived EHEC O157 lineages (e.g., GUD– and Sor–) (31, 34). Rather than producing a unique bifurcating tree, the Neighbor-net reveals a central group of four clades (clade 3, 4, 5, and 7) connected by multiple paths. The presence of these parallel paths suggests that either recombination or recurrent mutation has contributed to the divergence of the central clades from the evolutionarily derived lineages. In contrast, clades 1, 2, 6, and 8 occur at the end of distinct branches with no evidence of conflicting phylogenetic signals, indicating that these lineages are diverging without evidence of recombination in background polymorphisms.
|
|
65%) of 519 of the 528 O157 strains tested, the distribution is highly nonrandom across clades (Fig. 3B). The stx1 gene was common in clade 2 strains (95.1% of all stx1-positive strains are in clade 2) but not clade 8 (3.7%). The stx2 gene was present in virtually all (98.5%) O157 strains evaluated (Fig. 3B), occurring most frequently in clade 2 (46.8% of 519 strains) and clade 8 (25.4%) strains. In total, 98.4% and 100% of clade 2 and clade 8 strains, respectively, were positive for stx2 (Fig. 3B). The stx2c gene also has a nonrandom distribution and is concentrated in clades 4, 6, 7, and 8 (Fig. 3B) but is missing from clades 1, 2, and 3. Most noteworthy is that clade 8 strains were significantly more likely to have both the stx2 and stx2c genes when compared with the other stx2c-positive clades (P < 0.0001); 69 of the 79 O157 strains positive for both the stx2 and stx2c genes belonged to clade 8, but not all (57.6%) of the 128 clade 8 strains had stx2c.
Virulence Differences Between O157 Clades.
Clade 1 contains two SGs and includes the O157 genome strain Sakai (29) (SG-1), implicated in the 1996 Japanese outbreak (Table 1) linked to radish sprouts (13). Clade 2, the predominant lineage identified, contains nine SGs and includes strain 93-111 (SG-9) from the 1993 outbreak associated with contaminated hamburgers in western North America (4). Clade 3 consists of seven genotypes and includes the genome strain EDL-933 (30) (SG-12) from the first human O157 outbreak in 1982 linked to hamburgers sold at a chain of fast food restaurant outlets in Michigan and Oregon (36). Although these outbreaks representing clades 1, 2, and 3 affected
12,000 people combined, the rate of HUS and hospitalization was low for each (4, 14, 15, 36) compared with the average rates for 350 North American outbreaks (3) (Table 1). Clade 8, in contrast, consists of five SGs that include O157 strains from multistate outbreaks linked to contaminated spinach (37) and lettuce (7) (SG-30) in North America. These 2006 outbreaks caused reportable illnesses in >275 patients and resulted in remarkably high rates of hospitalization (average 63%) and HUS (average 13%), a rate that is three times greater than the average HUS rate for 350 outbreaks (Table 1).
|
Among the 4,103 shared backbone genes within the Sakai and spinach genomes, the average sequence identity is 99.8%, and, of the 958 shared island genes with Sakai, the average sequence identity is 97.96%. The average sequence identity for all shared genes (n = 5,061) is 99.25%. We then compared the conservation of backbone genes and identified 2,741 shared genes with <0.5% nucleotide divergence among all three O157 genomes (SI Fig. 5). Interestingly, the Sakai and EDL-933 genomes are more similar to each other in gene content and nucleotide sequence identity than to the clade 8 spinach outbreak strain, which carries additional genetic material including stx2c and the Stx2c lysogenic bacteriophage 2851 (39). This suggests that the spinach outbreak genome and, by inference, clade 8, has substantial time to diverge with respect to its genetic composition when compared with strains from other lineages.
Association Between Clades and Severe Disease. To determine whether the O157 infections caused by clade 8 pathogens differ with respect to clinical presentation, we examined epidemiological data for all laboratory-confirmed O157 cases (n = 333 patients) identified in Michigan since 2001 (40). There are significant associations between specific O157 clades and patient symptoms as well as disease severity via univariate (SI Table 3) and multivariate (SI Table 4) analyses. Patients infected with O157 strains of clade 8 were significantly more likely to be younger (ages 0–18), and, despite the small number (n = 11) of HUS cases identified, HUS patients were seven times more likely to be infected with clade 8 strains than patients with strains from clades 1–7 combined (Fig. 4). This HUS association could not be explained by the presence of stx2c in clade 8 strains, because only four of 11 HUS patients had stx2c-positive strains.
|
Clade Frequencies over Time.
Because both the 2006 spinach and lettuce outbreaks were caused by members of the same SG within clade 8, we estimated the frequency of clade 8 over time in an epidemiologically relevant setting. There was a significant increase (Mantel–Haenszel
2 = 32.5, df = 1, P < 0.0001) in the frequency of disease caused by clade 8 strains among all 444 O157 cases in Michigan (SI Fig. 6). Specifically, the frequency of clade 8 strains increased from 10% in 2002 to 46% in 2006 despite the steady decrease in all O157 cases identified via surveillance (40) since 2002 (SI Fig. 6).
| Discussion |
|---|
|
|
|---|
Substantial variability in clinical presentation also has been observed among patients with EHEC O157 infections. This variation is even apparent among different O157 outbreaks, as some outbreaks have contributed to remarkably high frequencies of HUS and hospitalization relative to others (Table 1). Consequently, we hypothesize that there is extensive variation in virulence among distinct clades of O157. The genetic basis of virulence that contributes to variation by clade will require further investigation, as will assessing the ecological differences that contribute to variation in transmission rates and linkage to food and waterborne disease.
The evaluation of >500 O157 strains from clinical sources for up to 96 SNP loci highlights the degree of genetic variation among strains and identifies a specific O157 lineage (clade 8) that has increased in frequency (SI Fig. 6). This increase in clade 8 is surprising given that, at the same time, the overall national prevalence of EHEC O157 infections has been decreasing (45). Strains of the clade 8 lineage have caused two recent and unusually severe outbreaks linked to produce, are associated with HUS, and more frequently carry both the stx2 and stx2c genes. In concert, these results suggest that a more virulent subpopulation of EHEC O157 is increasing in its contribution to the overall disease burden associated with O157 infections. Although there are clear differences in the frequency and combination of stx genes among clades, the toxin–gene combination alone does not account for the variation in hospitalization and HUS rates by clade.
The observation that clade 8 strains more frequently have both the stx2 and stx2c genes implies that carriage of both the Stx2 and Stx2c phages contribute in part to the greater virulence of clade 8 strains. The Stx genes, encoded by lambda-like bacteriophages, can circulate among hundreds of different E. coli strains (46) and integrate into many sites in the O157 genome (25, 44). Previous studies have observed correlations between specific Stx genes and disease, particularly for stx2 and stx2c (18, 19), although it has not been suggested that having both variants together may increase virulence. Because not all clade 8 strains have both stx2 and stx2c, and none of the strains has only stx2c, the presence and presumable production of the Stx2c variant alone cannot be solely responsible for the enhanced virulence attributed to this lineage. This also is true for the production of Stx2, because it was detected in nearly every strain representing all nine clades. We cannot, however, rule out the possibility that stx2c is rapidly lost during infection, thereby inhibiting our ability to detect it in some strains. What accounts for the greater intrinsic virulence among clade 8 strains and other O157 genotypes has not been fully understood. There is a constellation of mobile genetic elements that contribute to the virulence of pathogenic E. coli (47), and it is possible that a novel combination of virulence factors has emerged in the clade 8 lineage. The extent to which other ancillary factors contribute to the full virulence of clade 8 strains requires further investigation.
Among the three most common clades (2, 7, and 8) examined, there are noteworthy differences in transmission and clinical disease characteristics (SI Table 3) in addition to the association between clade 8 and HUS. For example, patients infected with strains from both clades 2 and 8 reported bloody diarrhea more frequently when compared with patients with clade 7 infections. Furthermore, clades 7 and 8 were more common among female patients, and clade 8 was associated with disease in younger (<18 years) patients (Fig. 4). These observed differences among patients with O157 infections clearly reflect differences among the common clades that can result from variability in gene content or genetic variation in conserved, common genes. The sequence comparisons of the spinach outbreak genome (clade 8) with the two other complete genomes (clades 1 and 3) indicate that there has been sufficient evolution time for 5% mutational substitution (10% differences in sequence of 2,741 conserved genes). This is consistent with a study by Zhang et al. (23) that estimated the most recent ancestor for EHEC O157 strains in clades 1–8 (β-glucuronidase-negative, non-sorbitol-fermenting) to be
20 thousand years ago (based on the assumed rate of 4.7 x 10–9 per site per year).
To determine when specific clades first appeared in human disease and assess whether clade 8 strains have increased in frequency in strains recovered from outside of Michigan, we evaluated a subset of O157 strains isolated during different time periods. Through this screening, we identified clade 8 strains from clinical cases dating back to 1984 on multiple continents (SI Table 2), suggesting that clade 8 has not recently emerged. This result was confirmed by both the spinach outbreak genome (Fig. 4) and phylogenetic analyses (Fig. 1B), because clade 8 is more closely related to the evolutionarily ancestral O157 lineage (clade 9) than other lineages. In contrast to clade 8 strains from Michigan patients, the frequency of stx2c with or without stx2 did not increase in frequency over time, and stx2c was detected in a strain isolated in 1984, indicating that it, too, has not recently emerged.
It is clear that EHEC O157 is genetically diversified and comprises multiple detectable clades with substantial genomic, biological, and epidemiological variation. SNP genotyping has revealed the clades that reflect the genetic variability among pathogenic strains associated with clinical infection. These results support the hypothesis that the clade 8 lineage has recently acquired novel factors that contribute to enhanced virulence. Evolutionary changes in the clade 8 subpopulation could explain its emergence in several recent foodborne outbreaks; however, it is not clear why this virulent subpopulation is increasing in prevalence. Because humans are more an incidental host for EHEC O157, further investigation of the bovine reservoir (48, 49) and environment is critical, as is the evaluation of agricultural practices in areas where livestock and produce are farmed side by side. Identifying the underlying factors that lead to enhanced virulence and the successful transmission of EHEC O157 in contaminated food and water is imperative. Similarly, conducting large-scale molecular epidemiologic studies is necessary to assess the actual distribution of SGs, clades, and Stx variants in environmental reservoirs and broad geographic scales (50). The development and deployment of a rapid, inexpensive molecular test that can identify more virulent O157 subtypes also would be useful for clinical laboratories to identify patients with an increased likelihood of developing HUS.
| Materials and Methods |
|---|
|
|
|---|
SNP Loci and Real-Time PCR Assays. The 96 SNP loci (SI Table 5) were identified from data generated by comparative genome sequencing microarrays (23), multilocus sequence typing (28), virulence gene sequencing, and in silico comparisons of the two O157 genomes (29, 30). Hairpin-shaped primers were designed by adding a 5' tail complementary to the 3' end of each linear primer (22) for each locus, and real-time PCR was used to identify the SNP. Six strains were duplicated to serve as internal controls; identical SNP profiles were observed.
To reduce the number of SNP assays, we used the SNPT program (21), which identified a subset of 32 loci to delineate all 39 SGs. Additional assays were performed to confirm certain SGs. The final set of 32 loci was obtained by substituting three SNP loci that resolved SNP types 35–39 and adding three different loci for classifying SGs 1–34.
Phylogenetic Analyses. Distance between SGs was measured as the pairwise number of nucleotide difference. ME trees were used to infer the evolutionary relationships among the 39 SGs based on pairwise distance matrix with bootstrap replication for concatenated SNP data using MEGA3 (51). Bootstrap analysis of phylogenetic trees generated by the ME method were constructed by using MEGA3 (51), and bootstrap confidence levels (based on 1,000 replicate trees) were used to classify SGs into clades. A phylogenetic network based on the Neighbor-net algorithm (33) was applied to 48 PI sites by using the SplitsTree4 program (52).
Spinach Outbreak strain Genomic Analysis.
A culture isolated from a Michigan patient hospitalized in September 2006, linked by the PulseNet PFGE system (53) to the spinach outbreak pattern by the Michigan Department of Community Health and the Centers for Disease Control and Prevention, was sequenced. The Michigan State University Genomic Research Support Technical Facility used parallel pyrosequencing on the GS20 454 that included four standard sequencing runs and one paired end run. The final assembly had 201 large contigs (>500 nt) with
20 times coverage arranged into 79 scaffolds with a total of 5,307,096 nt, and 680 small contigs for a total of 213,699 nt (4% of the total assembled length). Contig alignments to published genomes [Sakai (29) and EDL-933 (30)] were conducted by MUMmer (38). Sakai/EDL-933 genes with at least one alignment of >90% nucleotide identity in the spinach genome were considered present in the spinach strain.
To evaluate the distribution of SNPs in the spinach genome, a strict set of comparison rules was applied. Conserved genes were included only if the alignment was 100% unique in both genomes (i.e., multicopied genes in either genome were excluded), the identity between the aligned regions was >90%, and the alignment region was >90% of the length of Sakai/EDL-933 genes. Insertions and deletions were excluded. A total of 2,741 genes that fit these criteria and occurred in all three genomes were compared by identify SNP differences. A map was plotted by GenomeViz (54).
Stx2c Detection. Multiplex PCR was used to detect stx2c and the Stx2c-phage o and q genes (39) in 519 strains; stx data were missing for 19 strains, four of which were repeatedly stx-negative. The malate dehydrogenase (mdh) gene was used as a positive control. Strains were considered positive for stx2c if mdh (835 bp), stx2c (182 bp), o (533 bp), and q (321 bp) were present.
The multiplex PCR does not distinguish between stx2 and stx2c [both genes differ by only 3 aa in the B subunit (55)]; thus, we developed a RFLP-based method that amplifies a larger PCR product (1,152 bp) using primers stx2_F61 (5'-TATTCCCRGGARTTTAYGATAGA-3') and stx2-2g_R1213 (5'-ATCCRGAGCCTGATKCACAG-3'). PCR conditions include a 10-min soak at 94°C and 35 cycles of 92°C or 1 min, 59°C for 30 sec, and 72°C for 1 min, followed by a 5-min soak at 72°C. Digestion with FokI at 37°C for 3 h yields banding patterns specific for stx2 (453 bp, 362 bp, 211 bp, and 126 bp) or stx2c (488 bp, 453 bp, and 211 bp). All bands from each pattern are visible in strains with both stx2 and stx2c.
Epidemiological Analyses.
We tested for differences in the frequency of clinical characteristics for Michigan patients using the likelihood
2 test and described the distributions using odds ratios with 95% confidence intervals. Clade 9 was omitted from the analysis, as was one strain not part of a clade. To adjust for factors associated with infection by clade, we fit logistic regression models adjusting for age, gender, and symptoms. The final epidemiologic analysis was limited to 333 of the 444 Michigan patients, because only one strain from each outbreak or cluster was included.
| Acknowledgments. |
|---|
|
|
|---|
| Footnotes |
|---|
Freely available online through the PNAS open access option.
Author contributions: D.A. and T.S.W. designed research; S.D.M., A.S.M., A.C.S., L.M.O., and J.M.M. performed research; S.D.M., A.S.M., W.Q., D.W.L., P.S., J.T.R., S.E.D., W.Z., and B.S. analyzed data; and S.D.M. and T.S.W. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: The SNPs of E. coli O157 reported in this paper have been deposited in the STEC Center database, www.shigatox.net.
See Commentary on page 4535.
This article contains supporting information online at www.pnas.org/cgi/content/full/0710834105/DC1.
© 2008 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
dms/spinacqa.html#howmany accessed 2007.Related articles in PNAS:
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
Understanding E. coli O157:H7 Journal Watch Infectious Diseases, April 16, 2008; 2008(416): 2 - 2. [Full Text] |
||||
![]() |
J. B. Kaper and M. A. Karmali The continuing evolution of a bacterial pathogen PNAS, March 25, 2008; 105(12): 4535 - 4536. [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||