Papke et al. 10.1073/pnas.0706358104.
Fig. 4. Phylogenetic relationships of STs for individual loci (see SI Methods for details of phylogenetic reconstruction). Bootstrap values less than 50% are not shown. (A) 16S rRNA gene phylogeny is compared to the concatenated phylogeny. Different color bars indicate the phylogroups to which specific STs were assigned in Fig. 2. Note that identical 16S rRNA gene sequences are frequently assigned to multiple phylogroups in the concatenated tree. Although many branches appear as not supported by bootstrapping (because there are so few phylogenetically informative sites), many branches are supported by undeniable shared derived characters, which could not have arisen independently (see Fig. 5A). (B) atpB gene phylogeny. (C) bop gene phylogeny. (D) EF-2 gene phylogeny. (E) radA gene phylogeny.
Fig. 5. Condensed alignments showing only the positions that are variable for 16S rRNA (A), atpB (B), bop (C), EF-2 (D), and radA (E). Sequences from phylogroups defined in Fig. 2 are boxed. Examples of potential intragenic recombination events are highlighted. Numbers at the top of each condensed alignment refer to the nucleotide position in alignments of PCR amplified fragments, not the full-length gene.
Fig. 6. NeighborNet reconstruction of the non-tree-like evolutionary relationships among the STs. STs belonging to phylogroups A, B, and C as defined in Fig. 2 are color-coded.
Fig. 7. The three largest eBURST defined complexes. STs within designated complexes have a minimum of three identical loci with any other member. STs residing in different complexes may share two or fewer identical loci. However, it is only with 16S rRNA genes that identical alleles have been identified in more than one complex. Numbers next to dots correspond to ST numbers shown in Fig. 2 and used throughout the text. The radius of each dot is proportional to the number of strains within an ST. Lines connect STs with four of five identical loci; however, lines do not connect all possible single locus variants. This is not a phylogenetic reconstruction, and line length has no meaning.
SI Methods
Data Collection
Strain cultivation. Samples in this study were collected from two different sources of hypersaline waters: a "manmade" saltern (located near Santa Pola, Spain) and an inland hypersaline lake (Tinnsilt Sebkha, located near Ain Mlila, Algeria). Salterns are commercial ventures that use a series of connected ponds and natural evaporative processes to precipitate seawater's constitutive salts, and they are easily visualized in satellite images like those from Google Earth. The Santa Pola saltern has been intensely studied since 1981 by Rodriguez-Valera et al. (1). As with any environment, abiotic conditions may fluctuate daily or seasonally in response to changes in weather, but evidence suggests that individual ponds are remarkably stable with respect to biology; for instance, identical gene sequences have been found in samples taken years apart or in different seasons (2, 3). Additionally, analyses indicated that the abiotic conditions for ponds of 23% and 36% salinity are remarkably similar for temperature, pH, O2, and Kjeldahl nitrogen and phosphorus concentrations (4), suggesting that the major difference between the sampled ponds is salinity. Similar analyses have not been reported for Tinnsilt Sebkha, although quite clearly the source of water is not seawater, suggesting that its constituent abiotic conditions differ from saltern waters of similar salinity. The close proximity of saltern ponds cannot prevent migration of populations, which probably occur through the mixing of waters that were incompletely removed from a drained pond or from wind-blown foam "tumble weeds".
Sampling and strain isolation were performed as described in ref. 5. Briefly, in November 2002, 50-ml samples of hypersaline water from two different-salinity ponds (23% and 36%) located at the saltern facilities in Santa Pola, Spain, and a hypersaline lake (22%), were inoculated onto agar plates containing 25% salt, 0.1% yeast extract and ampicillin to prevent bacterial growth. Plates were incubated ≈3 wks at 37°C in the dark. Individual colonies were grown to high density by using liquid medium (per liter of H2O: NaCl, 250 g; MgSO4×7H2O, 20 g; trisodium citrate×2H2O, 3 g; KCl, 3 g; yeast extract, 5 g; pH 7), and cells were harvested by centrifugation. Although we cultivated strains from three sites that differed in salinity and presumably other in situ conditions, our aim was to acquire strains of the same species, not detect the full spectrum of Halorubrum spp. diversity. Therefore, we applied only a single set of conditions to obtain isolates. The strains from Spain appear to be low frequency members of the communities, based on knowledge of previous in situ molecular work (6, 7). No additional data are known for the Algerian site. Strains will be phenotypically characterized in future work.
Nucleotide sequencing. PCR primers, PCR amplification, and sequencing protocols were described in ref. 5. Briefly, we amplified and sequenced between 300 and 500 base pairs from 5 genes: the 16S rRNA, atpB, bop, EF-2 and radA. For the bop locus, degenerate primers bop401F 5'-GAC TGG TTG TTY ACV ACG CC-3' and bop795R 5'-AAG CCG AAG CCG AYC TTB GC -3' and PCR products were sequenced in both directions by using BigDye technology (Applied Biosystems). DNA sequence chromatograms were examined by using Sequencher (Gene Codes), and incorrectly identified DNA nucleotides were edited.
Phylogenetic Analysis
Alignment and Neighbor Joining (NJ). Automated multiple taxon DNA alignments were performed by using ClustalX (8). Alignments were inspected and edited if necessary by using MacClade (9). Individual gene and concatenated trees were constructed by using the NJ method from distances calculated under the HKY85+Γ model (10, 11) as implemented in PAUP* (12). The outgroup taxa were chosen based on 16S rRNA gene analysis and their affiliation with distantly related Halorubrum taxa. Bootstrap support was determined by implementing the same models and 1000 replications in PAUP* (12).
Maximum Likelihood Mapping
Maximum likelihood mapping (13) analyses were performed by using the TREE-PUZZLE program version 5.2 (14) under HKY85+Γ model (10, 11). The four clusters were defined to correspond to the phylogroups A, B, and C, and the rest of the taxa, respectively. All possible quartets for defined clusters were evaluated for support of three possible unrooted tree topologies.
SplitsTree analysis
Splits network was calculated for the concatenated data set by using SplitsTree version 4.5 (15) with NeighborNet distance transformation.
Recombination
eBURST analysis. Allelic profiles were generated in standard MLST format, that is, each unique allele for each gene is given a numerical identifier. "Sequence type" (ST) numerical identifiers were assigned to unique allele combinations (strains with identical allelic profiles were assigned the same ST number). These profiles were then used as input data into eBURST version 3 (www.eburst.mlst.net) (16). This program subdivides the data into groups based on a threshold level of allelic identity. We used the settings that assigned STs to the same complex when they shared three of five identical alleles with at least one other strain in the group. STs that differ at only one allele are termed single locus variants (SLVs). For each complex, eBURST then attempts to assign a parsimonious "founder" genotype (the ST with the most representative strains and SLVs), to which other variants are then compared. These relationships are then outputted graphically.
Relative rates of HR to mutation (R/M). By examining the sequence differences between the founder and variant alleles in SLVs, it is possible to estimate the proportion of those alleles that have changed because of HR and point mutation (R/M) by using methods described in ref. 17. Briefly, cases where an allele is noted to change at a single nucleotide site are consistent with those changes having occurred by point mutation. As recombination between similar alleles may also result in a single nucleotide change, these changes should also be absent from distantly related strains to qualify as mutations (assuming low rates of homoplasy). Unfortunately, in the current data set, there are many cases where it is not possible to confidently exclude the possibility that the distribution of an allele is not due to identity by descent. For this reason, all alleles differing at a single base were assigned as putative mutations, regardless of which other strains contained the allele. Alleles differing by more than one nucleotide are assigned as recombinational replacements. This approach is conservative with respect to recombination as assessed in previous Halorubrum spp. analyses and most bacterial MLST studies, although this conservatism will be partly offset by the possibility that independent mutation events in a single locus will result in two or more changes (such alleles will be assigned as recombinants). This simplified approach does not require assumptions concerning the directionality of events, hence the analysis was not restricted to those SLVs involving an assigned founder. Two ratios were calculated as described previously: a per allele R/M ratio and a per site R/M ratio.
Linkage disequilibrium analysis. The Index of Association (IA) (18) was computed by using the Linkage Disequilibrium analysis available on the MLST web site (http://linux.mlst.net/link_dis/index.htm), and significance was gauged from 1,000 random permutations of the data. The IA measures whether the alleles from the different loci in a population are randomly or nonrandomly associated in the analyzed genomes. If the alleles are randomly associated, then the population is said to be in a state of linkage equilibrium. Such a state is very difficult to explain without invoking very high rates of HR. However, if the alleles are not randomly associated, such that the frequency of alleles at locus N is predictive of the frequency of alleles at locus M, then it is possible that rates of HR are more moderate. The interpretation of this latter state (linkage disequilibrium) is complicated, however, because it does not preclude the possibility of high rates of recombination. Linkage disequilibrium may be apparent even in normally freely recombining populations due to the short-term expansion of highly adapted clones, or excessive sampling bias. Alternatively, gene flow may be structured within the population such that recombination occurs more frequently between and within subpopulations, and this will also complicate the interpretation of IA. Despite the fact that this approach is a fairly blunt tool, it provides a useful metric for gauging the structure of the population.
Intragenic recombination detection. Assessment of recombination within individual gene sets was performed by using the PHI test as implemented in PhiPack (window size 100) (19). However, detection of recombinant sequences or recombination breakpoints is complicated (e.g., see evaluation studies of different methods in ref. 20), especially given limited length of our alignments and a possibility of multiple recombination events. When GeneConv version 1.81 (21) (with -gscale = 5 -no_indels options) was also used to search for intragenic recombination, no fragments were recovered with significant Bonferroni-corrected Karlin-Altschul P values. However, the Profile program used in PhiPack (19) allowed us to identify several putative mosaic alleles (see SI Fig. 5 for examples).
Nucleotide Analyses
Average nucleotide distance within and between groups. Pairwise distances were calculated for all unique STs by using the concatenated alignment and the HKY85+Γ model (10, 11) as implemented in PAUP* (12). STs were assigned to groups based on a subjective assessment of the concatenated gene phylogeny (Fig. 2).
dN/dS ratio estimation. Trees were calculated by using the PhyML program version 2.4.4 (22) under HKY85+Γ model (10, 11). The dN/dS ratio for individual genes was estimated by using the PAML program version 3.14 (23) under two models: dN/dS is fixed to one and dN/dS is estimated and fixed for every branch on the tree. The significance of the estimated dN/dS values was tested by using maximum likelihood ratio test with one degree of freedom.
1. Rodriguez-Valera F, Ruiz-Berraquero F, Ramos-Cormenzana A (1981) Microb Ecol 7:235-243.
2. Bolhuis H, Poele EM, Rodriguez-Valera F (2004) Environ Microbiol 6:1287-1291.
3. Legault BA, Lopez-Lopez A, Alba-Casado JC, Doolittle WF, Bolhuis H, Rodriguez-Valera F, Papke RT (2006) BMC Genomics 7:171.
4. Rodriguez-Valera F, Ventosa A, Juez G, Imhoff JF (1985) Microb Ecol 11:107-115.
5. Papke RT, Koenig JE, Rodriguez-Valera F, Doolittle WF (2004) Science 306:1928-1929.
6. Benlloch S, Lopez-Lopez A, Casamayor EO, Ovreas L, Goddard V, Daae FL, Smerdon G, Massana R, Joint I, Thingstad F, et al. (2002) Environ Microbiol 4:349-360.
7. Casamayor EO, Massana R, Benlloch S, Ovreas L, Diez B, Goddard VJ, Gasol JM, Joint I, Rodriguez-Valera F, Pedros-Alio C (2002) Environ Microbiol 4:338-348.
8. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) Nucleic Acids Res 25:4876-4882.
9. Maddison D, Maddison W (2003) MacClade, Version 4.06. (Sinauer Associates, Sunderland, MA).
10. Hasegawa M, Kishino H, Yano T (1985) J Mol Evol 22:160-174.
11. Yang Z (1994) J Mol Evol 39:306-314.
12. Swofford D (1998) PAUP* 4.0, Phylogenetic Analysis Using Parsimony (and Other Methods) (Sinauer, Sunderland, MA), beta version 10.
13. Strimmer K, von Haeseler A (1997) Proc Natl Acad Sci USA 94:6815-6819.
14. Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) Bioinformatics 18:502-504.
15. Huson DH, Bryant D (2006) Mol Biol Evol 23:254-267.
16. Feil EJ, Li BC, Aanensen DM, Hanage WP, Spratt BG (2004) J Bacteriol 186:1518-1530.
17. Feil EJ, Spratt BG (2001) Annu Rev Microbiol 55:561-590.
18. Smith JM, Smith NH, O'Rourke M, Spratt BG (1993) Proc Natl Acad Sci USA 90:4384-4388.
19. Bruen TC, Philippe H, Bryant D (2006) Genetics 172:2665-2681.
20. Posada D (2002) Mol Biol Evol 19:708-717.
21. Sawyer S (1989) Mol Biol Evol 6:526-538.
22. Guindon S, Gascuel O (2003) Syst Biol 52:696-704.
23. Yang Z (1997) Comput Appl Biosci 13:555-556.