New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology
Single-cell genomics unveiled a cryptic cyanobacterial lineage with a worldwide distribution hidden by a dinoflagellate host
Edited by David M. Karl, University of Hawaii at Manoa, Honolulu, HI, and approved May 24, 2019 (received for review February 13, 2019)

Significance
Cyanobacteria are an important component of marine microbial ecology, and thus their biodiversity has been extensively studied. Here, through whole-genome sequencing, we discovered that a marine cyanobacterium in a symbiotic association with a unicellular eukaryote (OmCyn) represents a previously under-described lineage within an ecologically important cyanobacterial group. Our metagenomic analyses showed that the cyanobacterium OmCyn thrives in global oceans, further suggesting the existence of other cryptic cyanobacterial lineages that have been overlooked because of their symbiotic lifestyle. Via comparison with genomes of free-living relatives, the OmCyn genome was shown to have a reductive nature, which apparently resulted from intimate association with the host. Together, our results expand current understanding of the biology of cyanobacteria and marine microbial ecology.
Abstract
Cyanobacteria are one of the most important contributors to oceanic primary production and survive in a wide range of marine habitats. Much effort has been made to understand their ecological features, diversity, and evolution, based mainly on data from free-living cyanobacterial species. In addition, symbiosis has emerged as an important lifestyle of oceanic microbes and increasing knowledge of cyanobacteria in symbiotic relationships with unicellular eukaryotes suggests their significance in understanding the global oceanic ecosystem. However, detailed characteristics of these cyanobacteria remain poorly described. To gain better insight into marine cyanobacteria in symbiosis, we sequenced the genome of cyanobacteria collected from a cell of a pelagic dinoflagellate that is known to host cyanobacterial symbionts within a specialized chamber. Phylogenetic analyses using the genome sequence revealed that the cyanobacterium represents an underdescribed lineage within an extensively studied, ecologically important group of marine cyanobacteria. Metagenomic analyses demonstrated that this cyanobacterial lineage is globally distributed and strictly coexists with its host dinoflagellates, suggesting that the intimate symbiotic association allowed the cyanobacteria to escape from previous metagenomic studies. Furthermore, a comparative analysis of the protein repertoire with related species indicated that the lineage has independently undergone reductive genome evolution to a similar extent as Prochlorococcus, which has the most reduced genomes among free-living cyanobacteria. Discovery of this cyanobacterial lineage, hidden by its symbiotic lifestyle, provides crucial insights into the diversity, ecology, and evolution of marine cyanobacteria and suggests the existence of other undiscovered cryptic cyanobacterial lineages.
Cyanobacteria are one of the most successful groups of oxygen-producing photoautotrophs occupying a broad range of habitats on earth. Organisms in this group are highly ubiquitous in marine environments and play a vital role in oceanic biogeochemical cycles, with their metabolic abilities including photosynthesis and nitrogen fixation (1⇓–3). To obtain a better understanding of marine microbial ecology, the biodiversity and ecological features of marine cyanobacteria have been actively explored (4⇓⇓⇓–8). Previous studies based on environmental DNA and cultivated strains have revealed that marine cyanobacteria display broad genetic diversity and have evolved by adapting to various ecological niches (9⇓–11).
Other than free-living species, which have been extensively studied, cyanobacterial species have symbiotic relationships with various organisms (12, 13). Recent developments in DNA-sequencing technologies have revealed the genetic characteristics of one group of symbiotic cyanobacteria. Unicellular cyanobacteria group A (UCYN-A; also known as Candidatus Atelocyanobacterium thalassa) is a recently recognized, ecologically important cyanobacterial group in the marine environment. Recent studies have revealed that this cyanobacterial lineage symbiotically interacts with a lineage of unicellular photosynthetic eukaryotes (14⇓⇓–17). This symbiosis is thought to trace back to at least the Late Cretaceous Period (18). Whole-genome sequencing of UCYN-A showed that the cyanobacterial lineage has greatly reduced its metabolic capacities for photosynthesis and has been specialized for nitrogen fixation, thus furthering current understanding of marine cyanobacteria (16, 19). While detailed genetic features of some symbiotic cyanobacteria have been reported aside from UCYN-A (20⇓–22), those of most cyanobacterial symbionts remain unknown. These poorly understood symbiotic species potentially have biodiversity, which is important to an understanding of cyanobacteria as a whole.
Pelagic heterotrophic dinoflagellates of the genus Ornithocercus have long been known to host cyanobacteria as symbionts (Fig. 1). Ornithocercus species cells are surrounded by a cellulosic covering known as the thecal plate, and crown-shaped extensions of the thecal plate form an extracellular chamber per cell (SI Appendix, Fig. S1). A number of coccoid cyanobacteria reside in the specialized chamber (Fig. 1 and SI Appendix, Fig. S1); these extracellular symbionts are also called phaeosomes (12, 23). Despite the existence of phaeosomes being first recorded over 100 y ago (23), there is no laboratory culture or any report of successful cultivation of the symbiont and detailed characteristics of the cyanobacteria surviving in these chambers remain poorly understood. Here, we sequenced the genome of the cyanobacteria isolated from the dinoflagellate Ornithocercus magnificus (Fig. 1) using single-cell genomics technology. Analyses based on the genome sequence revealed that the cyanobacterium represents an underdescribed lineage of a cyanobacterial clade, of which biodiversity has been extensively studied and which has independently undergone reductive genome evolution. Analyses using metagenomic data from the Tara Oceans Expedition further suggest worldwide distribution of this cyanobacterial lineage and the potential existence of other undiscovered symbiotic cyanobacterial lineages.
Microscopy images of an Ornithocercus magnificus dinoflagellate with its cyanobacterial symbiont. (Left) Bright-field image of O. magnificus from the same sampling area where the individual used in the present study was collected, showing brown cyanobacterial symbionts (arrowheads) residing inside the symbiotic chamber. (Right) Blue-light epifluorescence image of the same individual at the same angle of view as that of the left image. The symbionts emit yellow autofluorescence. Arrowheads point to the same cyanobacterial cells as those indicated in the bright-field image.
Results and Discussion
Genome Sequence and Phylogenetic Analyses of Cyanobacteria Isolated from a Pelagic Dinoflagellate.
A single O. magnificus cell was obtained from seawater collected off Shimoda Bay, Japan, and cyanobacterial symbiont cells were isolated from a chamber of the host (SI Appendix, Fig. S2). The genomes of the isolated cyanobacterial cells were amplified and analyzed using two high-throughput sequencing platforms, Illumina MiSeq and Oxford Nanopore MinION. From scaffolds assembled using the short and long sequencing reads from the MiSeq and MinION systems, respectively, we successfully recovered ∼1.87 Mbp of sequence (SI Appendix, Table S1) representing the genome sequence of the cyanobacteria isolated from O. magnificus (hereafter referred to as OmCyn). The sequence showed particular similarity to genomes of the cyanobacterial lineage referred to as the marine picocyanobacterial clade, consisting of the genera Synechococcus and Prochlorococcus. This is consistent with the results of a previous study (24), which showed that the majority of prokaryotic 16S ribosomal RNA gene (16S rDNA) sequences amplified from marine dinoflagellates with cyanobacterial symbionts have high similarities to Synechococcus/Prochlorococcus species. Our phylogenetic analysis of 16S rDNA sequences confirmed the phylogenetic relationship between OmCyn and the marine picocyanobacteria (SI Appendix, Fig. S3). The OmCyn sequence made a monophyletic clade along with previously reported sequences. Most of those were a part of diverse 16S rDNA sequences directly amplified from dinoflagellate cells with cyanobacterial symbionts that are akin to O. magnificus (24). Biodiversity within the picocyanobacterial clade has been extensively studied, as these cyanobacterial species numerically dominate global oceans (7, 25). However, we detected no strong phylogenetic affinity between the clade of cyanobacterial sequences from dinoflagellates, including OmCyn, and previously explicitly described marine picocyanobacterial lineages (7) based on traditional phylogenetic markers for this group—namely, 16S rDNA (SI Appendix, Fig. S3) and the internal transcribed spacer (ITS; SI Appendix, Fig. S4). The two trees showed different phylogenetic positions of the clade comprising the cyanobacterial sequences from dinoflagellates: The ITS tree suggested the clade is basal to a well-defined Synechococcus clade, known as Synechococcus subcluster 5.1 (7), while the 16S rDNA tree placed the clade as sister to one of the clades within Synechococcus subcluster 5.1, named clade V. Nevertheless, since both backbones of the two trees did not receive high bootstrap support, we could not conclude on the phylogenetic position of OmCyn based on these single-gene phylogenetic analyses. To reveal the position of OmCyn within the phylogeny of marine picocyanobacteria, we carried out phylogenomic analyses using datasets consisting of 40 protein sequences (Fig. 2, Left, and SI Appendix, Fig. S5). The resultant trees clearly showed that OmCyn occupies a basal position to Synechococcus subcluster 5.1 in agreement with the ITS phylogeny (SI Appendix, Fig. S4). Synechococcus subcluster clade 5.1, together with OmCyn, was shown to be a sister to another monophyletic clade consisting of Prochlorococcus species; this topology was strongly supported with high bootstrap values (Fig. 2, Left, and SI Appendix, Fig. S5). Based on this phylogeny, the OmCyn lineage likely emerged soon after the split of the Prochlorococcus lineage and before the divergence of Synechococcus subcluster 5.1.
Maximum likelihood (ML) phylogenetic tree of a multiple-protein dataset and protein repertoire comparison between the marine picocyanobacteria and OmCyn. (Left) ML tree inferred from a concatenated dataset of 40 orthologous proteins. The dataset includes only taxa whose genomes have been completely sequenced. The ML tree for the full dataset comprising 50 taxa is shown in SI Appendix, Fig. S5. Thick branches indicate 100% bootstrap values based on ML nonparametric bootstrapping. (Right) Comparison of protein numbers in the nonredundant proteomes of marine picocyanobacteria and OmCyn. Each bar indicates a proteome size without functional redundancy (protein number in a species protein repertoire).
Metagenomic Analyses Reveal an Intimate Association of OmCyn with the Host Dinoflagellate and Its Worldwide Distribution.
Marine picocyanobacteria occupy a wide variety of ecological niches (7, 26). We attempted to determine the OmCyn distribution pattern using publicly available marine metagenomic data to reveal the habitat and behavior of the newly identified marine picocyanobacterial lineage in the natural environment. A read-mapping analysis using Tara Oceans short-read data (27⇓–29) against OmCyn and 19 picocyanobacteiral genome sequences revealed a marked difference in the occurrence pattern among sample size fractions between OmCyn and the remaining species examined. Reflecting their small cell size (<2 μm), the genome sequences of marine picocyanobacteria were most frequently (>70%; Fig. 3A) identified in samples from the 0.8–5-μm fraction relative to other fractions (5–20, 20–180, and 180–2,000 μm). In contrast, only 7% of OmCyn genome sequences were detected in samples from the 0.8–5-μm fraction, with 83% detected in the 20–180-μm fraction (Fig. 3A), inconsistent with the size of the symbiont cells (<5 μm) (30). This distribution pattern is clearly equivalent to that of the 18S rDNA of O. magnificus, which hosts the OmCyn lineage, based on Tara Oceans metabarcoding analysis data (Fig. 3A) (31). The majority of the 18S rDNA V9 amplicons corresponding to the O. magnificus 18S rDNA sequence were from the 20–180-μm fraction (Fig. 3A), with only 3% from the 0.8–5 μm fraction. This distribution pattern for the host is reasonable, considering that O. magnificus cells are 75–115 μm in size (32) and the low proportion of reads detected in the smaller fraction (0.8–5 μm) may have originated from free DNA in the water sample (e.g., DNA released from disrupted cells). OmCyn sequences were detected from the 20–180-μm samples from 51 out of 57 Tara Oceans geographical stations, suggesting its wide distribution across the world's oceans (Fig. 3B). Consistent with the comparison of sample size fractions, global distribution patterns of the OmCyn genome sequence and the O. magnificus 18S rDNA V9 amplicon were shown to be the most correlated with each other compared with free-living marine picocyanobacterial genomes (Fig. 3C). The result of our principal coordinate analysis (PCoA) on the distribution patterns also showed clear isolation of the OmCyn genome and the O. magnificus 18S rDNA V9 amplicon from the marine picocyanobacterial genomes (SI Appendix, Fig. S6A). Moreover, the sequence occurrence patterns for both the OmCyn genome and the O. magnificus rDNA amplicon among Tara Oceans metagenomic samples were predicted to be similarly associated with environmental physicochemical parameters (SI Appendix, Fig. S6B), supporting further the co-occurrence of OmCyn and O. magnificus.
Distribution patterns in natural environments estimated on the basis of metagenomic and metabarcoding data. (A) Relative abundances among four size fractions. While the marine picocyanobacterial lineages were abundant in the 0.8–5-μm fraction, OmCyn was estimated to be most abundant in the 20–180-μm fraction, matching the abundance pattern of O. magnificus, the host dinoflagellate. The relative abundances of cyanobacterial lineages, including OmCyn, and O. magnificus were calculated based on Tara Oceans metagenomic data and 18S rRNA V9 metabarcoding data, respectively. (B) OmCyn genome-sequence read abundance in the 20–180-μm fraction from 57 Tara Oceans stations. Circle sizes indicate abundance of detected reads per 5 million metagenomic reads. (C) Correlation matrix of distribution patterns among marine picocyanobacteria, OmCyn, and O. magnificus.
Given the strong correlation of sequence occurrence patterns for OmCyn and O. magnificus, along with reports that the symbiotic cyanobacteria of Ornithocercus species can be passed on to the symbiotic chambers of daughter cells through host cell division (12, 33), OmCyn cells are likely to exclusively reside within the symbiotic chambers of these dinoflagellates in the natural environment. This assumption could explain why the lineage including OmCyn was never explicitly described in previous surveys that extensively explored the genetic diversity of the marine picocyanobacterial clade. This may be attributable to the fact that studies focused on marine picocyanobacteria often use samples of small fraction size (i.e., filter size <20 μm) (34⇓⇓⇓–38) to increase the sensitivity of analyses by filtering out nontarget organisms with larger cell size; such procedures may lead to oversight of species attached to larger organisms. This implies that we may still be missing symbiotic cyanobacterial lineages other than OmCyn. Interestingly, our metagenomic read-mapping analyses detected a large number of metagenomic reads, which did not exactly match but showed high similarity to the OmCyn genome sequence from the eukaryote size fraction (20–180 μm; Fig. 4). Considering that Ornithocercus species other than O. magnificus and their relatives also have cyanobacterial symbionts (39, 40), and that some 16S rDNA sequences from dinoflagellate cells made a monophyletic clade with the OmCyn sequence (SI Appendix, Fig. S3), these metagenomic reads may partially represent the genomic diversity of cyanobacteria in symbiotic relationships with dinoflagellates.
Metagenomic-read recruitment plot of the OmCyn genome; 250 million Tara Oceans metagenomic reads from each of four filter size fractions (0.8–5, 5–20, 20–180, and 180–2,000 μm), totaling 1 billion metagenomic reads, were used for the mapping. Each marker in the left plot indicates reads with ≥100-bp alignment lengths. The histogram on the right shows the number of mapped reads. The asterisk indicates the group of reads that did not exactly match the OmCyn genome sequence but had high similarity (>90% identity). The color code represents filter size fractions from which each metagenomic read was obtained.
Genome Streamlining.
Another remarkable feature of the OmCyn genome is its streamlined protein-coding gene content. Given the high genome-sequencing coverage (mean, 436-fold), and the fact that the sequence contains tRNA genes for all 20 amino acids, the scaffolds likely represent the complete, or nearly complete, OmCyn genome. The genome sequence of OmCyn was predicted to contain 1,846 protein-coding genes, which could be clustered into 1,726 protein types by orthologous protein clustering (41). This protein repertoire of OmCyn (1,726 protein types; Fig. 2, Right, and SI Appendix, Table. S2) was significantly smaller (detected as an outlier by Grubbs’ test; α = 0.05) relative to the repertoires of other species within Synechococcus subcluster 5.1 (2,061–2,498 protein types; Fig. 2, Right). The size of the protein repertoire of OmCyn is comparable to those of Prochlorococcus species, which have the most reduced genomes among free-living cyanobacteria (Fig. 2, Right). The limited protein repertoire suggests that many protein-coding genes were lost during the evolution of the OmCyn lineage. To estimate the extent of OmCyn protein repertoire reduction, we assessed how the protein repertoires in each lineage have been altered from the ancestral state by comparing them with the putative protein repertoire of a common ancestor of the marine picocyanobacterial clade. The ancestral protein repertoire was defined as the protein set in common between total protein repertoires for each group of genomes included in Synechococcus subcluster 5.1 and the Prochlorococcus clade. This putative ancestral protein repertoire of marine picocyanobacteria consists of 2,114 proteins, and comparisons showed that, while the Synechococcus genomes have retained an average of 86% of ancestral proteins, the OmCyn genome possesses only 65% (Fig. 2, Right), supporting that a genome reduction event occurred in this lineage (see Dataset S1 for a list of proteins missing from OmCyn). The same assessment revealed that Prochlorococcus species with small genomes (Prochlorococcus other than P. marinus MIT 9313) have retained only 69% of ancestral proteins. While genome size and retention of putative ancestral proteins are apparently equivalent in OmCyn and Prochlorococcus species with reduced genomes, genome reduction, including gene losses, appears to have occurred independently in the two lineages based on their phylogenetic relationship (Fig. 2, Left, and SI Appendix, Fig. S5). In support of this notion, the OmCyn genome retains most genes encoding the components of phycobilisome, a large light-harvesting antennae complex, that were characteristically lost during Prochlorococcus evolution (26). To evaluate the impact of gene loss on metabolic function in OmCyn, we further analyzed the proteins that were predicted to be discarded from the OmCyn genome using the Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology system (42). The analysis showed that the metabolic functional category that experienced the most severe reduction was membrane transport (SI Appendix, Fig. S7), which is responsible for interactions with extracellular environments. Similar trends in gene loss have been documented in the genomes of other bacterial symbionts (22, 43⇓–45), implying that the symbionts have been adapted to relatively stable environments (e.g., the symbiotic chamber of dinoflagellates).
In contrast to these gene losses, the cyanobacterial “core” protein set that is conserved among all complete genomes of free-living cyanobacteria analyzed in this study is mostly retained (96.9%; Fig. 2, Right) in the OmCyn genome. The high degree of universality of the core protein set appears to indicate the importance of these proteins for growth and function of cyanobacteria. The genome of OmCyn has likely been streamlined while retaining minimum metabolic integrity, including photosynthetic ability. A total of 20 cyanobacterial core proteins are missing in OmCyn (SI Appendix, Table S3); however, to our knowledge, none of these proteins are critical for growth or photosynthetic ability. It is noteworthy that the list of the missing cyanobacterial core proteins includes five genes for DNA recombination and repair: recF, recG, recN, recO, and recR. Since such loss of genes for genetic recombination has been observed in various bacterial symbionts with reduced genomes (44, 46⇓–48), it is likely to be a common phenomenon associated with reductive genome evolution. In contrast to OmCyn, which retains most of the cyanobacterial core protein set, intracellular cyanobacterial symbionts, which metabolically depend on their host cell for growth (16, 19, 21, 44), retain only 74–88% (SI Appendix, Table S4) of core proteins. Retention of the conserved protein set implies that OmCyn retains metabolic independence from its host cell, contrasting sharply with obligate intracellular cyanobacterial symbionts.
The genome streamlining of OmCyn cyanobacteria can be explained by the intimate association between the symbiont and its host; if the habitat of the cyanobacteria is limited to inside the O. magnificus symbiotic chamber, the effect of genetic drift should increase due to shrinkage of the effective population size (49). An increased effect of genetic drift would accelerate the loss of genes dispensable in the relatively mild and stable environment inside the symbiotic chamber.
The characteristics of OmCyn also shed light on another important question regarding the function of the symbiont within dinoflagellate cells. Some symbiotic cyanobacteria provide fixed nitrogen to their hosts (17, 21, 50); however, this cannot be the case for the relationship between OmCyn and O. magnificus since all species of the marine picocyanobacterial clade, including OmCyn, lack genes encoding proteins required for nitrogen fixation. As the OmCyn genome retains genes required for photosynthetic activity, and since Ornithocercus is a nonphotosynthetic dinoflagellate, a reasonable explanation for this symbiotic relationship would be that the symbiont provides a source of organic carbon for its host in open oceans, an oligotrophic habitat. It remains ambiguous how the host cell takes up nutrients from the symbionts, which reside in an extracellular space (i.e., the symbiotic chamber). An intriguing possibility that Ornithocercus species and their relatives ingest and digest their cyanobacterial symbionts has been raised, based on transmission electron microscopy observations (33, 51). In symbiont-bearing dinoflagellate cells, food vacuoles with inclusions resembling degraded cyanobacterial symbionts were observed (33). If true, given the intimate association between OmCyn and its host, cyanobacterial symbionts could be considered a “domesticated breed” farmed within the specialized chambers of the dinoflagellates.
Conclusions
Metagenomic analyses using high-throughput sequencing have provided opportunities for greater understanding of the diversity and structure of oceanic microbial communities, including organisms that have not been cultivated; however, because the interpretation of metagenomic data still depends heavily on knowledge of the properties of organisms inhabiting an environment (e.g., morphology, lifestyle, and whole-genome sequence), efforts to obtain basic knowledge about poorly understood organisms must be simultaneously made. In the present study, we focused on, and revealed the genome sequence of, a yet to be cultured cyanobacterium in symbiosis with a dinoflagellate. The genome sequence provided insights into its unique phylogenetic position and reductive genome evolution, and enabled the use of metagenomic data to estimate its global distribution and relationship with its host. Our findings suggest the possibility of cryptic microbial lineages that were overlooked in previous metagenomic analyses, and further emphasize the significance of symbiosis as an important lifestyle in oceanic microbial ecology.
Materials and Methods
Sampling, Cell Isolation, Genome Amplification, and Sequencing.
An Ornithocercus magnificus cell containing cyanobacterial symbionts (SI Appendix, Fig. S2A) was isolated from a natural sample collected using a microplankton net (mesh size 25 μm) in Shimoda Bay, Shizuoka, Japan, on 2017 September 14. The water temperature of the sampling area was 23.7 °C. The isolated dinoflagellate cell was transferred into sterilized fresh water, and the cyanobacterial symbionts residing in a symbiotic chamber were isolated using a microcapillary pipette under an inverted microscope (SI Appendix, Fig. S2B). The genomes of the isolated cyanobacterial cells were amplified and analyzed using two different high-throughput sequencing platforms, Illumina MiSeq and Oxford Nanopore MinION. Detailed descriptions of genome amplification and sequencing are provided in SI Appendix, SI Materials and Methods.
De Novo Genome Assembly, Scaffold Screening, and Genome Annotation.
Paired-end short reads from the two Illumina libraries were assembled into scaffolds with the nanopore reads using SPAdes (version 3.11.1) (52, 53) in hybrid assembling mode (54). Preliminary assessment of the resulting scaffolds showed that the majority exhibited similarity to marine picocyanobacterial genome sequences while some had similarity to proteobacterial sequences, indicating the presence of contaminating genome sequences. To remove the contaminating sequences, we conducted scaffold screening based on phylogenetic analyses (see SI Appendix, SI Materials and Methods for details). With this procedure, 16 scaffolds totaling 1.879 Mbp were retrieved as the OmCyn genome, while 8 scaffolds totaling 2.371 Mbp were judged to be derived from contaminated (noncyanobacterial) cells. Annotation of predicted open reading frames in the OmCyn genome and prediction of tRNA and rRNA genes was initially conducted with Prokka (55) using a prokaryotic protein database including Reference Sequence (RefSeq) protein sequences from a wide range of cyanobacterial lineages (SI Appendix, Table S6), particularly marine picocyanobacteria, and then checked manually. The annotated genome data are available from the DNA Data Bank of Japan/GenBank/European Molecular Biology Laboratory DNA databases under BioProject accession number PRJDB7787.
Phylogenetic Analyses.
Phylogenetic analyses based on the 16S rRNA gene, the internal transcribed spacer, and the multiprotein datasets were conducted as described in SI Appendix, SI Materials and Methods.
Protein Repertoire Analysis.
The OrthoFinder (version 2.1.2) (41) clustering method was used to classify the proteomes predicted from cyanobacterial genomes, as well as that from the OmCyn genome, into orthologous protein groups, referred to as orthogroups. The overall statistics for the OrthoFinder analysis, including a list of cyanobacterial genomes used, are presented in SI Appendix, Table S2. By this orthologous protein–clustering procedure, functionally redundant proteins coded by a genome (i.e., products of gene duplications within a genome) were clustered into a single orthogroup. For each species, the sum of the total number of orthogroups detected in the proteome and the number of proteins unassigned to any orthogroups (proteins unique to a species) was defined as the protein repertoire size for the species, without functional redundancy. The cyanobacterial core protein set was defined as the protein repertoire conserved in all completely sequenced, free-living cyanobacterial genomes used for this analysis. The protein repertoire for the common ancestor of the marine picocyanobacteria was estimated based on the assumption that a protein in common between protein repertoires for two groups of genomes that are included in the Prochlorococcus clade and Synechococcus subcluster 5.1 should be possessed by the common ancestor of these two clades. Total orthogroups for each of the Prochlorococcus clade and Synechococcus subcluster 5.1 were separately extracted, and the intersection of the two sets was defined as the putative protein repertoire for the ancestral marine picocyanobacterium. Only completely sequenced genomes were used to estimate the protein repertoire in the ancestral marine picocyanobacterium. KEGG orthology ID assignment for the putative ancestral proteins (2,114 proteins) was performed using eggNOG-mapper (56) based on eggNOG 4.5 orthology data (57).
Abundance Estimation Using Tara Oceans Expedition Data and Co-Occurrence Analysis.
Relative abundances for OmCyn, free-living cyanobacteria, and O. magnificus in natural environments were estimated using metagenomic and metabarcoding data provided by the Tara Oceans Expedition (27). Detailed descriptions of the abundance estimation and co-occurrence analysis are provided in SI Appendix, SI Materials and Methods.
Acknowledgments
We are indebted to Kenjiro Hinode at Nagasaki University, Toshihiko Sato, Daisuke Shibata, Tomomi Kodaka, Jiro Takano, and all other staff at the Shimoda Marine Research Center, University of Tsukuba, for their support in sampling the dinoflagellate. We thank the Tara Oceans Consortium and sponsors who supported the Tara Oceans Expedition for making the data accessible. This work was supported by Japan Society for the Promotion of Science KAKENHI Grants (16H04826, 17K15164, 18KK0203, and JP16H06280) and Advanced Bioimaging Support platform (ABiS): Imaging Marine Organisms.
Footnotes
- ↵1To whom correspondence may be addressed. Email: nakayama.t{at}tohoku.ac.jp.
↵2Present address: Graduate School of Human and Environmental Studies, Kyoto University, Kyoto 606-8501, Japan.
Author contributions: T.N., M.N., Y.T., and Y.I. designed research; T.N., M.N., G.T., K.S., and K.I. performed research; T.N. analyzed data; and T.N., M.N., Y.T., G.T., K.S., K.I., Y.I., and M.K. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: The data reported in this paper are available from the DNA Data Bank of Japan (http://www.ddbj.nig.ac.jp/), NCBI GenBank database (https://www.ncbi.nlm.nih.gov/genbank), and the European Molecular Biology Laboratory (https://www.embl.de/) DNA database under BioProject accession number PRJDB7787.
See Commentary on page 15757.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1902538116/-/DCSupplemental.
Published under the PNAS license.
References
- ↵
- W. K. W. Li
- ↵
- D. G. Capone,
- J. P. Zehr,
- H. W. Paerl,
- B. Bergman,
- E. J. Carpenter
- ↵
- ↵
- F. Partensky,
- J. Blanchot,
- D. Vaulot
- ↵
- D. J. Scanlan et al
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- S. Ohtsuka,
- T. Suzaki,
- T. Horiguchi,
- N. Suzuki,
- F. Not
- J. Decelle,
- S. Colin,
- R. A. Foster
- ↵
- E. J. Carpenter,
- R. A. Foster
- ↵
- ↵
- J. P. Zehr et al
- ↵
- ↵
- A. W. Thompson et al
- ↵
- ↵
- ↵
- ↵
- T. Nakayama et al
- ↵
- T. Nakayama,
- Y. Inagaki
- ↵
- F. Schütt
- ↵
- ↵
- P. Flombaum et al
- ↵
- ↵
- S. Sunagawa et al
- ↵
- S. Pesant et al
- ↵
- Q. Carradec et al
- ↵
- ↵
- C. de Vargas et al
- ↵
- Y. B. Okolodkov
- ↵
- I. A. N. Lucas
- ↵
- G. K. Farrant et al
- ↵
- B. Díez et al
- ↵
- ↵
- ↵
- ↵
- ↵
- V. Krishnamurti
- R. E. Norris
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- T. Nakayama,
- Y. Inagaki
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
Citation Manager Formats
Sign up for Article Alerts
Article Classifications
- Biological Sciences
- Ecology

















See related content: