Genome dynamics in a natural archaeal population

Allen et al. 10.1073/pnas.0604851104.

Supporting Information

Files in this Data Supplement:

SI Table 3
SI Figure 6
SI Figure 7
SI Table 4
SI Figure 8
SI Table 5
SI Figure 9
SI Figure 10
SI Figure 11
SI Materials and Methods




SI Figure 6

Fig. 6. Breakdown of Ferroplasma acidarmanus fer1 CDSs by functional category.





SI Figure 7

Fig. 7. Metabolic reconstruction of F. acidarmanus fer1 based on genome annotation.





SI Figure 8

Fig. 8. Comparison of the same genomic region in fer1 versus fer1(env) highlighting the degree of conserved gene order and coding sequence identity. Heterogeneous regions include areas impacted by phage insertion (22-kb insert in fer1) and areas often colocalized with transposase elements.





SI Figure 9

Fig. 9. Gene content dendogram for reconstructed fer1(env) strain population genome relative to fer1 isolate genome. Genes found in the fer1(env) population (1,792 of 1,963 fer1 ORFs) were further categorized based upon the sequence-type present. Genes for which only a single sequence-type was found in the environmental population were denoted as homogeneous. Within the homogeneous fraction, two possible classifications were made depending on the identity of the sequence-type present: isolate-type sequence only ( ³ 99.9% nt similarity to isolate sequence) or non-isolate type (env-type) sequence only. Those genes that exhibited multiple sequence types (either more than one env-type sequence or env-type plus isolate-type) were denoted as heterogeneous. Of the 171 fer1 ORFs not found in the fer1(env) population, 19 were due to gaps in the fer1(env) genome sequence coverage. The remaining 152 ORFs were classified as missing. Functional categories for selected subsets of the classified fer1(env) genes were made based upon gene annotations. Shown are functional category breakdowns for the fer1 genome (1,963 ORFs), the homogeneous (545) and heterogeneous (1,247) fer1(env) fractions, isolate-type only (427), isolate plus env mixed (1,006), and those ORFs present in fer1 but absent in the fer1(env) population (152).





SI Figure 10

Fig. 10. Example of incongruent tree topologies when comparing paired end-sequences from environmental fer1(env) clones (numbered 1, 2, and 3) versus the fer1 isolate genome sequence. The data provides further evidence consistent with recombination amongst individual variant sequence types within the fer1(env) population. (A) Ferroplasma acidarmanus fer1 genomic region spanning »4.5 kb (genes 1,456 to 1,458) and the location of fer1(env) clones mapped to this region. Solid colored lines represent end-sequences (forward and reverse reads) generated from shotgun sequencing. (B) Maximum likelihood tree for both the forward (F) and reverse (R) clone sequences. If multiple coexisting strains were present in the absence of recombination, congruent topologies should hold among paired F/R end sequences for an individual clone. In contrast, a highly recombinogenic population characterized by a limited number of sequence variants would predict incongruent topologies of the nature depicted here. This phenomenon is repeated throughout the fer1(env) environmental sequence data.





SI Figure 11

Fig. 11. Example of recombination event (read XYD1690) between isolate sequence type (gray) and fer1(env) sequence type (yellow).





SI Materials and Methods

fer1 Genome Assembly and Annotation. The genome sequence of F. acidarmanus fer1 was obtained from 41,779 small-insert library reads with an average read length >600 base pairs resulting in »13-fold redundancy. In addition, 1,537 end sequences were generated from a fosmid library (average insert size 36 kb) with an average read length >765 bp. Small-insert shotgun sequencing libraries were created in 1999 and 2000 from cells maintained in liquid culture. Clone manipulation and sequencing were performed as described previously (1). Large-insert fosmid libraries were generated and sequenced in fall 2003.

All generated sequences were assembled using the PHRAP assembly tool (2), JAZZ (3), and the ATLAS whole genome assembly suite (4). Independent assembly outputs from the three assemblers were compared using BLAST (5) and the MUMmer3.0 software package (6) for verification of consistency. Assembly discrepancies were corrected and verified manually using mate-pair information. ORFs likely to encode proteins were predicted using CRITICA (7) and ARTEMIS (8). Manual curation of predicted ORFs and annotation was accomplished using ARTEMIS. The set of predicted ORFs was searched against PFAM, PROSITE, PRODOM, SMART, and COG databases in addition to BLASTP against the NCBI NR database and the membrane transport protein classification database TC-DB (9). The tRNAscan-SE package was used for tRNA identification (10) and the TMHMM Sever v. 2.0 (www.cbs.dtu.dk/services/TMHMM/) was used for prediction of transmembrane helices in predicted proteins. Functional classification of annotated gene products was verified using KEGG and the comprehensive enzyme information system database BRENDA (11).

Anomalies associated with the sequence content of small and large insert libraries constructed from DNA extracted from fer1 maintained in liquid culture over 4 years further illustrate that phage-related gene gain can be very rapid. In 2003, shotgun sequencing of one of the fosmids yielded 15 kb of sequence (14 genes) not represented in prior small insert libraries. The remaining 28 kb of sequence on this clone mapped identically to a contiguous region of the fer1 genome. The novel region of this fosmid clone contained numerous hypothetical genes and genes involved in DNA replication and repair. We infer that a prophage insertion event occurred in the isolate before construction of large insert libraries.

1. Detter JC, Jett JM, Lucas SM, Dalin E, Arellano AR, Wang M, Nelson JR, Chapman J, Lou Y, Rokhsar D, et al. (2002) Genomics 80:691-698.

2. Ewing B, Green P (1998) Genome Res 8:186-194.

3. Aparicio SJ, Chapman J, Stupka E, Putnam N, Chia NJ, Dehal P, Christoffels A, Rash S, Hoon S, Smit A, et al. (2002) Science 297:1301-1310.

4. Havlak P, Chen R, Durbin KJ, Egan A, Ren Y, Song XZ, Weinstock GM, Gibbs RA (2004) Genome Res. 14:721-732.

5. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) J Mol Biol 215:403-410.

6. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL (2004) Genome Biol 5:R12.

7. Badger JH, Olsen GJ (1999) Mol Biol Evol 16:512-524.

8. Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream M-A, Barrell B (2000) Bioinformatics 16:944-945.

9. Busch W, Saier MH (2002) CRC Crit Rev Biochem Mol Biol 37:287-337.

10. Lowe TM, Eddy SR (1997) Nucleic Acids Res 25:955-964.

11. Schomburg I, Chang A, Ebeling C, Gremse M, Heldt C, Huhn G, Schomburg D (2004) Nucleic Acids Res 32:D431-D433.

This Article

  1. PNAS February 6, 2007 vol. 104 no. 6 1883-1888
  1. AbstractFree
  2. Figures Only
  3. Full Text
  4. Full Text (PDF)
  5. » Supporting Information