Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features

Derelle et al. 10.1073/pnas.0604795103.

Supporting Information

Files in this Data Supplement:

Supporting Table 2
Supporting Table 3
Supporting Figure 4
Supporting Figure 5
Supporting Text
Supporting Figure 6
Supporting Table 4
Supporting Table 5
Supporting Figure 7
Supporting Table 6
Supporting Table 7
Supporting Table 8
Supporting Figure 8
Supporting Table 9
Supporting Table 10
Supporting Figure 9
Supporting Table 11
Supporting Table 12





Supporting Figure 4

Fig. 4. Estimation of the genome size of Ostreococcus tauri. (A) Evolution of assembly size as a function of sequence accumulation. The size of the O. tauri assembly (Mb) is shown as a function of the total number of bases read. (B) Pulsed-field gel electrophoresis analysis of the O. tauri genome. Lane 1, O. tauri genome migrated for 72 h; lane 2, concatemerized l phage DNA as molecular weight marker; lane 3, detail of the migration of the biggest O. tauri chromosomes. This later electrophoresis was carried out for 88 h in a 0.8% agarose gel containing the dye (bis-benzimide-PEG conjugate; Q-Bioanalytic, Germany), which binds preferentially to GC-rich regions (1). All of the bands seen on the gel hybridized with a telomeric probe, indicating that all of them are of nuclear origin and not chloroplastic or mitochondrial.

1. Wawer, C., Rüggeberg, H., Meyer, G., & Muyzer, G. (1995) Nucleic Acids Res. 23, 4928–4929.





Supporting Figure 5

Fig. 5. Physical assignment of Ostreococcus tauri genome sequences to chromosomes. (A) Purified chromosomes 1 (lane 1), 2 (lane 2), and 3 (lane 3) labeled and hybridized to pulsed-field gel electrophoresis of O. tauri chromosomes. (B) Hybridization of the O. tauri genomic BAC library spotted on macroarrays with purified chromosomes 2 (array 1) and 3 (array 2).





Supporting Figure 6

Fig. 6. Markov model analysis of the Ostreococcus tauri chromosomes. To reveal similarities and differences in the structural organization of the chromosomes, a hidden Markov model (HMM) was used to divide the genome into parts with a similar genomic structure using the SHOW software (1). To apply this model, a hypothesis has to be stated of differences in coding and noncoding structure. This hypothesis can then be verified by the HMM, showing how well the observed sequences fit the hypothesis. We postulated that three types of compositional structures exist in O. tauri: one for chromosome 2, one for chromosome 19, and one for the rest of the chromosomes. Based on this hypothesis, an HMM with seven states was designed with three coding states (one for each type of structure, modeled by a fifth order Markov model), three intron states (one for each type of structure, modeled by a second order Markov model), and one intergenic state (same for all three types of structures, modeled by a zero order Markov model). State transition probabilities for the HMM were defined based on gene statistics. The results of our analysis showed that only two coding states exist: one for chromosomes 2 and 19 (Lower) and one for the rest of the chromosomes (Upper). This figure displays the probability for each chromosome to be in one or the other state and shows that chromosomes 2 and 19 can be distinguished from the other chromosomes, based on their coding structure.

1. Nicolas, P., Bize, L., Muri, F., Hoebeke, M., Rodolphe, F., Ehrlich, S.-D., Prum, B., & Bessières, P. (2002) Nucleic Acids Res. 30, 1418–1426.





Supporting Figure 7

Fig. 7. Intron heterogeneity of the Ostreococcus tauri genome. Comparison of chromosome 2 intron structure with that of other chromosomes. (a) Size distribution (bp) of documented small-type (A, red), "normal"-type (B, blue) introns in chromosome 2, and documented introns from other chromosomes (C, green). (b) Intron composition and splicing motifs (donor, acceptor, and branch site) of the three intron types. Different font sizes indicate the probability of a particular nucleotide at the respective motif position. The %G+C and mean size for each of the intron types are also indicated.





Supporting Figure 8

Fig. 8. Examples of gene fusion in Ostreococcus tauri. Bpnt-HADL: N-terminal domain, 5'-bisphosphate nucleotidase (Bpnt); C-terminal domain, haloacid dehalogenase (HADL). MBD4-FTSH2: N-terminal domain, putative mismatch-specific glycosylase Mbd4; C-terminal domain, cell division protein FtsH2. PNDO-SelD: Putative bifunctional protein; N-terminal domain, putative pyridine nucleotide-disulfide oxidoreductase family protein (PNOD); C-terminal part, selenium phosphate synthase (SPS or SelD). A similar situation of gene fusion is found in several cyanobacteria. Nii-Rb-Cb5R: Putative NADH:nitrite reductase, built from a ferredoxin:nitrite reductase at the N terminus, a rubredoxin-like domain downstream, and a cytochrome b5 reductase at the C terminus. TPS-GTPase: Fusion of thiamin phosphate synthase (TPS) at the N terminus with GTPase in the C terminus. Both enzymes are found on separate genes in bacteria but with the same organization in plants. Ckase-RPE: C-terminal domain, family of carbohydrate kinases and putative xylulose kinase; N-terminal domain, putative D-ribulose-5-phosphate 3-epimerase. SK-PPE: C-terminal domain, ribulose-phosphate 3-epimerase (PPE); N-terminal domain, sugar kinase or xylulokinase (prokaryotic). Both genes are involved in carbohydrate metabolism, but one appears to be of bacterial origin and the other eukaryotic. LYC-CYCbe: The lycopene e (N-term) and b (C-term) cyclase genes, encoding enzymes necessary for the synthesis of b- and e-carotene, respectively, are fused into a single gene. NCED-HYP: A homolog of the 9-cis-epoxycarotenoid dioxygenase (NCED) gene, involved in abscissic acid (ABA) biosynthesis is fused with a conserved hypothetical gene of unknown function. Given the functional relatedness of other fused genes, this latter example supports a role in ABA biosynthesis of the as yet uncharacterized gene in other organisms.





Supporting Figure 9

Fig. 9. (A) Comparison of nitrate assimilation clusters in Ostreococcus tauri (chromosome 10) and Chlamydomonas reinhardtii (V3.Scaffold 30). Nar, plastid nitrite transporter; Nia, nitrate reductase apoenzyme. [The functional NIA protein reduces nitrate to nitrite using NAD(P)H through the contribution of three redox cofactors: FAD, Heme, and MoCo (molybdenum cofactor).] Nii, plastid-targeted nitrite reductase apoenzyme. (The functional NII reduces nitrite to ammonium using ferredoxin through of a siroheme-iron sulfur cofactor.) In Ostreococcus, NII comprises two additional redox domains, rubredoxin-like and its corresponding reductase, suggested to allow reduction of nitrite directly from NAD(P)H); Nar2, nitrate high-affinity transporter accessory protein; Ntr2, nitrate high-affinity transporter; Snt, putative molybdate transporter; Cnx2, molybdenum cofactor biosynthesis protein (CNX2 performs the first step of MoCo synthesis together with CNX3, forming molybdopterin precursor Z); Maf4, Maf4-related hypothetical protein (although Maf4 is a MADS-box protein in higher plants, OtMaf4 has not the features of a MADs-box protein); Cnx5, molybdenum cofactor biosynthesis protein (molybdopterin synthase sulfurylase); Cb5f, Cytochrome b5 reductase (closer to nitrate reductase FAD/heme reductase domain than to stand alone cytochrome b5 reductase). (B) Urea assimilation cluster in O. tauri (chromosome 15). UreABC, Ni-dependant urease apoenzyme. The A, B, and C subunits are encoded by three separate genes in bacteria but form a single gene here, as in higher plants. UreGD, urease accessory proteins G and D are fused together, whereas, in other organisms, they are encoded by two separate genes; UreF, urease accessory protein F, which forms a complex with G and D and with apourease to allow nickel insertion, resulting in activation of urease; Dur3, urea high-affinity symporter.





Supporting Text

Characteristics of the Ostreococcus tauri genome draft. The O. tauri genome was sequenced by using a whole-genome shotgun approach from a culture isolated in 1995 from the Thau lagoon [1-3] (France, 43°24'N, 3°36'E). This strain has been deposited in the Roscoff culture collection (RCC 745, http://www.sb-roscoff.fr/Phyto/RCC/index.php). The raw genome data used in the assembly are summarized in Table 2 A and B and can be downloaded from the following site: http://bioinformatics.psb.ugent.be/genomes/ostreococcus_tauri/. Four independent shotgun libraries, two independent BAC libraries, and two independent cDNA libraries were sequenced (Table 2 A and B). Clones were sequenced by using primers at both ends of each insert, and sequences were trimmed to retain high quality bases excluding vector sequence. Reads with trim lengths >100 bp were considered "passing." Insert size estimates are based on placement on the assembly and are consistent with gel cuts made in the creation of the libraries. The O. tauri genome size obtained by sequencing is 12.56 Mb (Fig. 4A), which is compatible with the size determined by pulsed-field gel electrophoresis (PFGE) (between 12 and 13 Mb; Fig. 4B).

An oriented-sequencing approach was used to close gaps in the shotgun assembly. Clones for which specific primer-based sequencing would allow extension of contigs were chosen and sequenced with 1,520 primers. The current assembly consists of 102 scaffolds (N50 = 143,558 bp), all linked by paired-end constraints. These scaffolds are themselves linked by paired-end BAC clones constraints into 20 superscaffolds corresponding to chromosomes. Completeness of the final assembly was measured by comparison with BAC end and EST sequences. We found that 0.1% of the former and 1.3% of the latter have no match against the current genome sequence.

In parallel with this approach, a global strategy to physically assign all contigs to chromosomes was carried out. Individual chromosomes could be purified from PFGE by electrodialysis, except for chromosomes 1-2, 10-11, and 12-14, which could not be separated. After random priming labeling, each purified chromosome hybridized only to itself with a low background on other chromosomes, confirming that very few repeated sequences are present in the O. tauri genome (Fig. 5A). The O. tauri BAC libraries spotted on macroarrays were hybridized with purified chromosomes as probes, allowing the direct allocation of BAC clones, and consequently of contigs and scaffolds, to chromosomes (Fig. 5B). Lastly, for ≈100 contigs that could not be allocated to chromosomes by this approach, specific probes were designed and directly hybridized on PFGE (4). All these sequencing data show that the current assemblage represents >98-99% of the chromosomal sequence and of the O. tauri genes.

Mitochondrial and Chloroplastic Genomes. Because total cellular DNA was used for the reparation of the shotgun libraries, mitochondrial and chloroplast sequences were obtained and identified by their high BLAST scores with higher plant or green alga organellar genomes. The mitochondrial genome sequence is in one circular contig of 44,237 bp and contains 72 genes. No introns have been identified. The chloroplast genome is also in one circular contig and has a total size of 71,666 kb, and 94 genes have been identified.

1. Courties, C., Perasso, R., Chrétiennot-Dinet, M.-J., Gouy, M., Guillou, L. & Troussellier, M. (1998) J. Phycol. 34, 844–849.

2. Guillou, L., Eikrem, W., Chrétiennot-Dinet, M.-J., Le Gall, F., Massana, R., Romari, K., Pedrós-Alió, C. & Vaulot, D. (2004) Protist 155, 193–214.

3. Chrétiennot-Dinet, M.-J., Courties, C., Vaquer, A., Neveux, J., Claustre, H., Lautier, J. & Machado, M. C. (1995) Phycologia 34, 285–292.

4. Derelle, E., Ferraz, C., Lagoda, P., Eychenié, S., Cooke, R., Regad, F., Sabau, X., Courties, C., Delseny, M., Demaille, J., et al. (2002) J. Phycol. 38, 1150–1156.

5. Wawer, C., Rüggeberg, H., Meyer, G. & Muyzer, G. (1995) Nucleic Acids Res. 23, 4928–4929.

6. Nicolas, P., Bize, L., Muri, F., Hoebeke, M., Rodolphe, F., Ehrlich, S. D., Prum, B. & Bessières, P. (2002) Nucleic Acids Res. 30, 1418–1426.

7. Ramesh, M. A., Malik, S.-B. & Logsdon, J. M., Jr. (2005) Curr. Biol. 15, 185–191.

8. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990) J. Mol. Biol. 215, 403–410.

9. Combet, C., Blanchet, C., Geourjon, C. & Deléage, G. (2000) Trends Biochem. Sci. 25, 147–150.

10. Haas, B. J., Wortman, J. R., Ronning, C. M., Hannick, L. I., Smith, R. K. Jr., Maiti, R., Chan, A. P., Yu, C., Farzad, M., Wu, D., et al. (2005) BMC Biol. 3, 7.

This Article

  1. PNAS August 1, 2006 vol. 103 no. 31 11647-11652
  1. AbstractFree
  2. Figures Only
  3. Full Text
  4. Full Text (PDF)
  5. » Supporting Information