| ||||||||||||||||||||||||||||
|
Makarova et al. 10.1073/pnas.0607117103. |
Fig. 4. Phylogenetic tree of Lactobacillales constructed on the basis of concatenated alignments of RNA polymerase subunits. All branches are supported at >75% bootstrap values. Species are colored according to the current taxonomy: blue, Lactobacillaceae; magenta, Leuconostocaceae; red, Streptococcaceae.
Fig. 5. Proportion of genes (56%) that evolve similarly (same mode) with ribosomal proteins.
Fig. 6. Phylogenetic analysis of enolase. The maximum-likelihood unrooted tree was built with the MOLPHY program. The same program was used to compute bootstrap probabilities. Each terminal node of the tree is labeled by the numeric GenBank identifier (GI) number (where available) and the respective species name. Major branches of interest that were supported by bootstrap probability >70% are marked by black circles. The species analyzed in this work are shown in blue.
Fig. 7. A schematic representation of the reconstruction of key metabolic pathways associated with central carbon (carbohydrate) metabolism in lactic acid bacteria. Black arrows show reactions present in all species; pale blue arrows show reactions present in all species except one; and red arrows show reactions present in a smaller subset of species. For the latter category, phyletic patterns are indicated (the detailed information for all LaCOGs associated with this figure is provided in Table 7). In the phyletic patterns, "|" indicates the presence of the gene in a given species and "-" indicates absence. Species in the phyletic pattern are shown in the order Streptococcus thermophilus, Lactococcus lactis ssp. lactis, Lactococcus lactis ssp. cremoris, Lactobacillus brevis, Lactobacillus plantarum, Pediococcus pentosaceus, Leuconostoc mesenteroides, Oenococcus oeni, Lactobacillus johnsonii, Lactobacillus gasseri, Lactobacillus delbrueckii, and Lactobacillus casei. The systematic protein names (from Escherichia coli, Bacillus subtilis, or Lactobacillales) for the enzymes assigned to each reaction are indicated. Key reactions of homo- and heterofermentation are color-coded (green and pink, respectively). Substrates that are additional precursors or products of several reactions are dark green. An additional representation of the presence or absence of key genes associated with central carbon metabolism is listed in Table 7. +, Phyletic pattern for phosphoenolpyruvate carboxykinase, pckA, includes representatives of LaCOG2238 and a single protein from Lb. casei; *, acetoin reductase, ButA, homologs belong to the large family of short chain dehydrogenases from multiple LaCOGs. Many of these LaCOGs have unknown substrate specificity and might be involved in the same reaction as ButA; &, although there is no dedicated phosphotransferase system for lactose transport identified, it is possible that some phosphotransferase systems with wide substrate specificity can also transport lactose; ˆ, the presence of the system was essentially determined on the basis of the presence of this gene.
Fig. 8. Genome clusters encoding bacteriocins and genes for their export systems, including novel putative bacteriocins.
Table 1. General genome features of the sequenced genomes
|
Species |
Genome length, bp |
Plasmids (no. of genes) |
No. of proteins |
No. of pseudogenes |
No. of rRNA operons |
No. of tRNAs |
No. of prophages |
Transposon-related ORFS, % |
|
Lactobacillus gasseri |
1,894,360 |
0 |
1,763 |
43 |
6 |
78 |
1 |
0.18 |
|
Lactobacillus brevis |
2,340,228 |
pLVIS1 (12); pLVIS2 (25) |
2,221 |
50 |
5 |
65 |
1 |
1.40 |
|
Pediococcus pentosaceus |
1,832,387 |
0 |
1,757 |
19 |
5 |
55 |
2? |
0.24 |
|
Lactococcus lactis ssp. cremoris |
2,641,635 |
pLACR1 (11) pLACR2 (8) pLACR3 (64) pLACR4 (38) pLACR5 (8) |
2,509 |
153 |
6 |
62 |
4? |
4.82 |
|
Streptococcus thermophilus |
1,864,178 |
pSTER1 (2) pSTER2 (4) |
1,718 |
206 |
6 |
67 |
1? |
3.72 |
|
Oenococcus oeni |
1,780,517 |
0 |
1,701 |
120 |
2 |
43 |
0 |
0.54 |
|
Leuconostoc mesenteroides |
2,075,763 |
pLEUM1 (34) |
2,009 |
17 |
4 |
71 |
1 |
0.51 |
|
Lactobacillus casei |
2,924,325 |
pLSEI1 (20) |
2,776 |
82 |
5 |
59 |
2 |
3.25 |
|
Lactobacillus delbrueckii ssp. bulgaricus |
1,856,951 |
0 |
1,725 |
192 |
9 |
98 |
0 |
1.93 |
Supporting Materials and Methods
Construction of Lactobacillales COGs. Based on triangles of the best-hit relationships formed by proteins from different genomes. LaCOGs with only two organisms were constructed by using reciprocal best-hit relationships. Lineage-specific expansions in Lactobacillales species were identified essentially as described in (1). LaCOGs were linked to prokaryotic COGs by using the COGNITOR approach (2).
Phylogenetic Reconstructions. Multiple alignments of protein sequences were constructed by using the MUSCLE program (3); sites with >33% gap content were removed. Least-squares trees were constructed with the PROTDIST and FITCH programs of the PHYLIP package (4), and maximum likelihood trees were constructed with the TREEPUZZLE program (5). The trees were examined for the compatibility with the molecular clock assumption with the LINTRE program (6). Alignments of the protein-coding nucleotide sequences were produced on the basis of the corresponding protein sequence alignments; synonymous and nonsynonymous substitutions were analyzed with the CODEML program of the PAML package with Yang-Nielsen-Hasegawa codon substitution model and nucleotide frequencies estimated from data (7).
Whole Genome Similarity Reconstructions. Intergenome distance matrices were constructed by using several previously described measures, namely, median distance between predicted orthologs [defined as reciprocal best hits (8, 9)], similarity of the gene content [number of shared LaCOGs normalized by the size of the smaller genome (10)], and similarity of the gene order (fraction of genes covered by aligned gene chains). Alignments of gene order were constructed with the LamarckN program as described previously (11). Similarity dendrograms were constructed with a neighbor-joining algorithm from the NEIGHBOR program of PHYLIP (4).
Consistency of Local Molecular Clock. The consistency of the local molecular clock in individual LaCOGs was tested by using a previously described approach (12). Specifically, maximum likelihood distances between aligned LaCOG sequences were estimated by using the CODEML program of PAML (7); if paralogs were present, the minimum distance between the paralogs from a pair of species was used to represent the interspecies distance. The matrix of interspecies distances was compared to the matrix of distances computed for concatenated alignments of ribosomal proteins. The residual variance after a straight-line, zero-intercept approximation was compared with the complete variance of the distances; LaCOGs for which this residual variance was greater than the complete variance were considered to be the cases of severe deviation from the local molecular clock (as defined by the rate of evolution of the ribosomal proteins). For selected cases, the presence of HGT was verified by phylogenetic tree reconstruction.
Reconstruction of Gene Gains and Losses. For the analysis of gene losses in the common ancestor of Lactobacillales, the complement of COGs present in at least one Lactobacillales species was compared with the COGs present in other Firmicutes. The COGs present in other Firmicutes but missing in Lactobacillales were considered lost by the common ancestor of Lactobacillales. The median number of genes per COG in Bacillales and Clostridiales was used as the estimate of the number of genes in the lost COGs. High and low estimates of the complement of COGs that were represented in the Lactobacillales and Bacilliales ancestor were obtained by requiring (for the low bound) or not requiring (for the upper bound) for a COG to be present in both Bacillales and non-Bacilli Firmicutes (Clostridiales or Mollicutes).
For the analysis of gene losses inside the Lactobacillales, the phyletic patterns of LaCOG were analyzed with a version of the weighted parsimony algorithm (13) with a gain penalty of 2. A gene was assigned to the common ancestor of Lactobacillales and Bacillales if the members of a COG corresponding to a given LaCOG were present in at least one Bacillales species.
Data Deposition. The sequences reported in this paper have been deposited in the GenBank database for the following plasmids, with the accession nos. in parentheses: Oenococcus oeni (CP000411), L. delbrueckii subsp. bulgaricus (CP000412), Lactobacillus gasseri ATCC 33323 (CP000413), L. mesenteroides subsp. mesenteroides (CP000414), L. mesenteroides subsp. mesenteroides plasmid pLEUM1 (CP000415), Lactobacillus brevis (CP000416), Lactobacillus brevis plasmid pLVIS1 (CP000417), Lactobacillus brevis plasmid pLVIS2 (CP000418), Streptococcus thermophilus (CP000419), Streptococcus thermophilus plasmid pSTER1 (CP000420), Streptococcus thermophilus plasmid pSTER2 (CP000421), Pediococcus pentosaceus (CP000422), Lactobacillus casei (CP000423), Lactobacillus casei plasmid pLSEI (CP000424), Lactococcus lactis subsp. cremoris (CP000425), L. lactis subsp. cremoris plasmid pLACR1 (CP000426), plasmid pLACR2 (CP000427), plasmid pLACR3 (CP000428), plasmid pLACR4 (CP000429), and plasmid pLACR5 (CP000430).
1. Jordan IK, Makarova KS, Spouge JL, Wolf YI, Koonin EV (2001) Genome Res 11:555-565.
2. Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, et al. (2003) BMC Bioinformatics 4:41.
3. Edgar RC (2004) Nucleic Acids Res 32:1792-1797.
4. Felsenstein J (1996) Methods Enzymol 266:418-427.
5. Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) Bioinformatics 18:502-504.
6. Takezaki N, Rzhetsky A, Nei M (1995) Mol Biol Evol 12:823-833.
7. Yang Z (1997) Comput Appl Biosci 13:555-556.
8. Grishin NV, Wolf YI, Koonin EV (2000) Genome Res 10:991-1000.
9. Wolf YI, Rogozin IB, Grishin NV, Tatusov RL, Koonin EV (2001) BMC Evol Biol 1:8.
10. Korbel JO, Snel B, Huynen MA, Bork P (2002) Trends Genet 18:158-162.
11. Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV (2001) Genome Res 11:356-372.
12. Novichkov PS, Omelchenko MV, Gelfand MS, Mironov AA, Wolf YI, Koonin EV (2004) J Bacteriol 186:6575-6585.
13. Mirkin BG, Fenner TI, Galperin MY, Koonin EV (2003) BMC Evol Biol 3:2.
Present Addresses
K. Huang: Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley CA 94720.
T. Hawkins: GE Healthcare, 800 Centennial Avenue, Piscataway, NJ 08855.
V. Plengvidhya National Center for Genetic Engineering and Biotechnology, 113 Paholyothin Road, Klong 1, Klong Luang, Pathumthani 12120, Thailand.
I. Díaz-Muñiz.: U.S. Department of Agriculture, Agricultural Research Service, North Carolina Agricultural Research Service, Department of Food Science, North Carolina State University, Raleigh, NC 27695.
W. Wechter: U.S. Vegetable Laboratory, U.S. Department of Agriculture, Agircultural Research Service, Charleston, SC 29414.
Y. Xie: Department of Pediatric Infectious Diseases, The Johns Hopkins University, 720 Rutland Avenue, Ross 1109, Baltimore, MD 21205.
G. Lorca: Banting and Best Institute, University of Toronto, 112 College Street, Toronto, ON, Canada M5G 1L6.
E. Altermann: AgResearch, Ltd., Tennent Drive, Private Bag 11008, Palmerston North, New Zealand.
R. Barrangou: Danisco, Inc., 2802 Walton Commons West, Madison, WI 53718.
H. Rawsthorne: Department of Bioscience and Biotechnology, Drexel University, 32nd & Chestnut Streets, Philadelphia, PA 19104-2875.
D. Tamir: Department of Plant Pathology and Microbiology, Agricultural, Food and Environmental Quality Sciences, Hebrew University of Jerusalem, P.O. Box 12, Rehovot 76100, Israel.
C. Parker: Fresh Express, Inc., Salinas, CA 93901.
| ||||||||||||||||||||||||||||