Seattle Children's Hospital Research Institute  Sign up for PNAS Online eTocs
Link: Info for AuthorsLink: Editorial BoardLink: AboutLink: SubscribeLink: AdvertiseLink: ContactLink: Sitemap Link: PNAS Home
Proceedings of the National Academy of Sciences
Link: Current Issue "" Link: Archives "" Link: Online Submission ""  Link: Advanced Search



This Article
Right arrow Abstract
Right arrow Full Text
Services
Right arrow Email this article to a colleague
Right arrow Alert me to new issues of the journal
Right arrow Request Copyright Permission
Citing Articles
Right arrow Citing Articles via CrossRef

Makarova et al. 10.1073/pnas.0607117103.

Supporting Information

Files in this Data Supplement:

Supporting Table 1
Supporting Figure 4
Supporting Table 2
Supporting Figure 5
Supporting Table 3
Supporting Table 4
Supporting Table 5
Supporting Table 6
Supporting Figure 6
Supporting Figure 7
Supporting Table 7
Supporting Figure 8
Supporting Table 8
Supporting Materials and Methods




Supporting Figure 4

Fig. 4. Phylogenetic tree of Lactobacillales constructed on the basis of concatenated alignments of RNA polymerase subunits. All branches are supported at >75% bootstrap values. Species are colored according to the current taxonomy: blue, Lactobacillaceae; magenta, Leuconostocaceae; red, Streptococcaceae.





Supporting Figure 5

Fig. 5. Proportion of genes (56%) that evolve similarly (same mode) with ribosomal proteins.





Supporting Figure 6

Fig. 6. Phylogenetic analysis of enolase. The maximum-likelihood unrooted tree was built with the MOLPHY program. The same program was used to compute bootstrap probabilities. Each terminal node of the tree is labeled by the numeric GenBank identifier (GI) number (where available) and the respective species name. Major branches of interest that were supported by bootstrap probability >70% are marked by black circles. The species analyzed in this work are shown in blue.





Supporting Figure 7

Fig. 7. A schematic representation of the reconstruction of key metabolic pathways associated with central carbon (carbohydrate) metabolism in lactic acid bacteria. Black arrows show reactions present in all species; pale blue arrows show reactions present in all species except one; and red arrows show reactions present in a smaller subset of species. For the latter category, phyletic patterns are indicated (the detailed information for all LaCOGs associated with this figure is provided in Table 7). In the phyletic patterns, "|" indicates the presence of the gene in a given species and "-" indicates absence. Species in the phyletic pattern are shown in the order Streptococcus thermophilus, Lactococcus lactis ssp. lactis, Lactococcus lactis ssp. cremoris, Lactobacillus brevis, Lactobacillus plantarum, Pediococcus pentosaceus, Leuconostoc mesenteroides, Oenococcus oeni, Lactobacillus johnsonii, Lactobacillus gasseri, Lactobacillus delbrueckii, and Lactobacillus casei. The systematic protein names (from Escherichia coli, Bacillus subtilis, or Lactobacillales) for the enzymes assigned to each reaction are indicated. Key reactions of homo- and heterofermentation are color-coded (green and pink, respectively). Substrates that are additional precursors or products of several reactions are dark green. An additional representation of the presence or absence of key genes associated with central carbon metabolism is listed in Table 7. +, Phyletic pattern for phosphoenolpyruvate carboxykinase, pckA, includes representatives of LaCOG2238 and a single protein from Lb. casei; *, acetoin reductase, ButA, homologs belong to the large family of short chain dehydrogenases from multiple LaCOGs. Many of these LaCOGs have unknown substrate specificity and might be involved in the same reaction as ButA; &, although there is no dedicated phosphotransferase system for lactose transport identified, it is possible that some phosphotransferase systems with wide substrate specificity can also transport lactose; ˆ, the presence of the system was essentially determined on the basis of the presence of this gene.





Supporting Figure 8

Fig. 8. Genome clusters encoding bacteriocins and genes for their export systems, including novel putative bacteriocins.





Table 1. General genome features of the sequenced genomes

Species

Genome length, bp

Plasmids (no. of genes)

No. of proteins

No. of pseudogenes

No. of rRNA operons

No. of tRNAs

No. of prophages

Transposon-related ORFS, %

Lactobacillus gasseri

1,894,360

0

1,763

43

6

78

1

0.18

Lactobacillus brevis

2,340,228

pLVIS1 (12); pLVIS2 (25)

2,221

50

5

65

1

1.40

Pediococcus pentosaceus

1,832,387

0

1,757

19

5

55

2?

0.24

Lactococcus lactis ssp. cremoris

2,641,635

pLACR1 (11) pLACR2 (8) pLACR3 (64) pLACR4 (38) pLACR5 (8)

2,509

153

6

62

4?

4.82

Streptococcus thermophilus

1,864,178

pSTER1 (2) pSTER2 (4)

1,718

206

6

67

1?

3.72

Oenococcus oeni

1,780,517

0

1,701

120

2

43

0

0.54

Leuconostoc mesenteroides

2,075,763

pLEUM1 (34)

2,009

17

4

71

1

0.51

Lactobacillus casei

2,924,325

pLSEI1 (20)

2,776

82

5

59

2

3.25

Lactobacillus delbrueckii ssp. bulgaricus

1,856,951

0

1,725

192

9

98

0

1.93





Supporting Materials and Methods

Construction of Lactobacillales COGs. Based on triangles of the best-hit relationships formed by proteins from different genomes. LaCOGs with only two organisms were constructed by using reciprocal best-hit relationships. Lineage-specific expansions in Lactobacillales species were identified essentially as described in (1). LaCOGs were linked to prokaryotic COGs by using the COGNITOR approach (2).

Phylogenetic Reconstructions. Multiple alignments of protein sequences were constructed by using the MUSCLE program (3); sites with >33% gap content were removed. Least-squares trees were constructed with the PROTDIST and FITCH programs of the PHYLIP package (4), and maximum likelihood trees were constructed with the TREEPUZZLE program (5). The trees were examined for the compatibility with the molecular clock assumption with the LINTRE program (6). Alignments of the protein-coding nucleotide sequences were produced on the basis of the corresponding protein sequence alignments; synonymous and nonsynonymous substitutions were analyzed with the CODEML program of the PAML package with Yang-Nielsen-Hasegawa codon substitution model and nucleotide frequencies estimated from data (7).

Whole Genome Similarity Reconstructions. Intergenome distance matrices were constructed by using several previously described measures, namely, median distance between predicted orthologs [defined as reciprocal best hits (8, 9)], similarity of the gene content [number of shared LaCOGs normalized by the size of the smaller genome (10)], and similarity of the gene order (fraction of genes covered by aligned gene chains). Alignments of gene order were constructed with the LamarckN program as described previously (11). Similarity dendrograms were constructed with a neighbor-joining algorithm from the NEIGHBOR program of PHYLIP (4).

Consistency of Local Molecular Clock. The consistency of the local molecular clock in individual LaCOGs was tested by using a previously described approach (12). Specifically, maximum likelihood distances between aligned LaCOG sequences were estimated by using the CODEML program of PAML (7); if paralogs were present, the minimum distance between the paralogs from a pair of species was used to represent the interspecies distance. The matrix of interspecies distances was compared to the matrix of distances computed for concatenated alignments of ribosomal proteins. The residual variance after a straight-line, zero-intercept approximation was compared with the complete variance of the distances; LaCOGs for which this residual variance was greater than the complete variance were considered to be the cases of severe deviation from the local molecular clock (as defined by the rate of evolution of the ribosomal proteins). For selected cases, the presence of HGT was verified by phylogenetic tree reconstruction.

Reconstruction of Gene Gains and Losses. For the analysis of gene losses in the common ancestor of Lactobacillales, the complement of COGs present in at least one Lactobacillales species was compared with the COGs present in other Firmicutes. The COGs present in other Firmicutes but missing in Lactobacillales were considered lost by the common ancestor of Lactobacillales. The median number of genes per COG in Bacillales and Clostridiales was used as the estimate of the number of genes in the lost COGs. High and low estimates of the complement of COGs that were represented in the Lactobacillales and Bacilliales ancestor were obtained by requiring (for the low bound) or not requiring (for the upper bound) for a COG to be present in both Bacillales and non-Bacilli Firmicutes (Clostridiales or Mollicutes).

For the analysis of gene losses inside the Lactobacillales, the phyletic patterns of LaCOG were analyzed with a version of the weighted parsimony algorithm (13) with a gain penalty of 2. A gene was assigned to the common ancestor of Lactobacillales and Bacillales if the members of a COG corresponding to a given LaCOG were present in at least one Bacillales species.

Data Deposition. The sequences reported in this paper have been deposited in the GenBank database for the following plasmids, with the accession nos. in parentheses: Oenococcus oeni (CP000411), L. delbrueckii subsp. bulgaricus (CP000412), Lactobacillus gasseri ATCC 33323 (CP000413), L. mesenteroides subsp. mesenteroides (CP000414), L. mesenteroides subsp. mesenteroides plasmid pLEUM1 (CP000415), Lactobacillus brevis (CP000416), Lactobacillus brevis plasmid pLVIS1 (CP000417), Lactobacillus brevis plasmid pLVIS2 (CP000418), Streptococcus thermophilus (CP000419), Streptococcus thermophilus plasmid pSTER1 (CP000420), Streptococcus thermophilus plasmid pSTER2 (CP000421), Pediococcus pentosaceus (CP000422), Lactobacillus casei (CP000423), Lactobacillus casei plasmid pLSEI (CP000424), Lactococcus lactis subsp. cremoris (CP000425), L. lactis subsp. cremoris plasmid pLACR1 (CP000426), plasmid pLACR2 (CP000427), plasmid pLACR3 (CP000428), plasmid pLACR4 (CP000429), and plasmid pLACR5 (CP000430).

1. Jordan IK, Makarova KS, Spouge JL, Wolf YI, Koonin EV (2001) Genome Res 11:555-565.

2. Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, et al. (2003) BMC Bioinformatics 4:41.

3. Edgar RC (2004) Nucleic Acids Res 32:1792-1797.

4. Felsenstein J (1996) Methods Enzymol 266:418-427.

5. Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) Bioinformatics 18:502-504.

6. Takezaki N, Rzhetsky A, Nei M (1995) Mol Biol Evol 12:823-833.

7. Yang Z (1997) Comput Appl Biosci 13:555-556.

8. Grishin NV, Wolf YI, Koonin EV (2000) Genome Res 10:991-1000.

9. Wolf YI, Rogozin IB, Grishin NV, Tatusov RL, Koonin EV (2001) BMC Evol Biol 1:8.

10. Korbel JO, Snel B, Huynen MA, Bork P (2002) Trends Genet 18:158-162.

11. Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV (2001) Genome Res 11:356-372.

12. Novichkov PS, Omelchenko MV, Gelfand MS, Mironov AA, Wolf YI, Koonin EV (2004) J Bacteriol 186:6575-6585.

13. Mirkin BG, Fenner TI, Galperin MY, Koonin EV (2003) BMC Evol Biol 3:2.

Present Addresses

K. Huang: Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley CA 94720.

T. Hawkins: GE Healthcare, 800 Centennial Avenue, Piscataway, NJ 08855.

V. Plengvidhya National Center for Genetic Engineering and Biotechnology, 113 Paholyothin Road, Klong 1, Klong Luang, Pathumthani 12120, Thailand.

I. Díaz-Muñiz.: U.S. Department of Agriculture, Agricultural Research Service, North Carolina Agricultural Research Service, Department of Food Science, North Carolina State University, Raleigh, NC 27695.

W. Wechter: U.S. Vegetable Laboratory, U.S. Department of Agriculture, Agircultural Research Service, Charleston, SC 29414.

Y. Xie: Department of Pediatric Infectious Diseases, The Johns Hopkins University, 720 Rutland Avenue, Ross 1109, Baltimore, MD 21205.

G. Lorca: Banting and Best Institute, University of Toronto, 112 College Street, Toronto, ON, Canada M5G 1L6.

E. Altermann: AgResearch, Ltd., Tennent Drive, Private Bag 11008, Palmerston North, New Zealand.

R. Barrangou: Danisco, Inc., 2802 Walton Commons West, Madison, WI 53718.

H. Rawsthorne: Department of Bioscience and Biotechnology, Drexel University, 32nd & Chestnut Streets, Philadelphia, PA 19104-2875.

D. Tamir: Department of Plant Pathology and Microbiology, Agricultural, Food and Environmental Quality Sciences, Hebrew University of Jerusalem, P.O. Box 12, Rehovot 76100, Israel.

C. Parker: Fresh Express, Inc., Salinas, CA 93901.





This Article
Right arrow Abstract
Right arrow Full Text
Services
Right arrow Email this article to a colleague
Right arrow Alert me to new issues of the journal
Right arrow Request Copyright Permission
Citing Articles
Right arrow Citing Articles via CrossRef

Current Issue | Archives | Online Submission | Info for Authors | Editorial Board | About
Subscribe | Advertise | Contact | Site Map

Copyright © 2008 by the National Academy of Sciences