New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology
Error, signal, and the placement of Ctenophora sister to all other animals
Edited by David M. Hillis, The University of Texas at Austin, Austin, TX, and approved March 24, 2015 (received for review February 18, 2015)

Significance
Traditional interpretation of animal phylogeny suggests traits, such as mesoderm, muscles, and neurons, evolved only once given the assumed placement of sponges as sister to all other animals. In contrast, placement of ctenophores as the first branching animal lineage raises the possibility of multiple origins of many complex traits considered important for animal diversification and success. We consider sources of potential error and increase taxon sampling to find a single, statistically robust placement of ctenophores as our most distant animal relatives, contrary to the traditional understanding of animal phylogeny. Furthermore, ribosomal protein genes are identified as creating conflict in signal that caused some past studies to recover a sister relationship between ctenophores and cnidarians.
Abstract
Elucidating relationships among early animal lineages has been difficult, and recent phylogenomic analyses place Ctenophora sister to all other extant animals, contrary to the traditional view of Porifera as the earliest-branching animal lineage. To date, phylogenetic support for either ctenophores or sponges as sister to other animals has been limited and inconsistent among studies. Lack of agreement among phylogenomic analyses using different data and methods obscures how complex traits, such as epithelia, neurons, and muscles evolved. A consensus view of animal evolution will not be accepted until datasets and methods converge on a single hypothesis of early metazoan relationships and putative sources of systematic error (e.g., long-branch attraction, compositional bias, poor model choice) are assessed. Here, we investigate possible causes of systematic error by expanding taxon sampling with eight novel transcriptomes, strictly enforcing orthology inference criteria, and progressively examining potential causes of systematic error while using both maximum-likelihood with robust data partitioning and Bayesian inference with a site-heterogeneous model. We identified ribosomal protein genes as possessing a conflicting signal compared with other genes, which caused some past studies to infer ctenophores and cnidarians as sister. Importantly, biases resulting from elevated compositional heterogeneity or elevated substitution rates are ruled out. Placement of ctenophores as sister to all other animals, and sponge monophyly, are strongly supported under multiple analyses, herein.
Resolving relationships among extant lineages at the base of the metazoan tree is integral to understanding evolution of complex animal traits, including nervous systems and gastrulation. Historically, sponges and placozoans, both of which have relatively simple body plans and lack neurons, have been considered to diverge from other animals earlier than ctenophores, cnidarians, and bilaterians (1). Phylogenomic studies have resulted in controversial hypotheses placing either Placozoa (Fig. 1A) (2), ctenophores (ctenophore-sister hypothesis) (Fig. 1B) (3⇓⇓⇓–7), or a clade of ctenophores and sponges (Fig. 1C) (6) as sister to all remaining animals. Others (7⇓⇓–10) have claimed nontraditional findings resulted from systematic error and argued for traditional placement of sponges as sister to all remaining animals (Eumetazoa, or Porifera-sister hypothesis) (Fig. 1D) and a sister relationship between ctenophores and cnidarians (Coelenterata) (Fig. 1D). Limited statistical support for various hypotheses and conflict among, and even within studies, has undermined confidence in our understanding of early animal evolution. Basal metazoan relationships must be resolved with greater consistency before a consensus viewpoint is widely accepted.
Phylogenetic hypotheses from previous molecular studies. (A) Placozoa-sister hypothesis (2). (B) Ctenophora-sister hypothesis (3⇓⇓⇓–7). Placozoa was not included in some studies that found support for this topology. (C) Ctenophora + Porifera-sister hypothesis (6). (D) Traditional Porifera-sister hypothesis (7⇓⇓–10).
Long-branch attraction (LBA) (11), which occurs when two divergent lineages are artificially inferred as related because of substitutional saturation (11), is perhaps the most often evoked explanation for controversial or spurious phylogenetic results (7⇓⇓–10, 12, 13). Additional sources of systematic error include poor taxon or character sampling (10, 14, 15), large amounts of missing data (16), and model misspecification (16, 17). Such errors have been implicated as influencing the position of ctenophores in metazoan phylogeny studies (5⇓⇓–8). For example, Ryan et al. (6) recovered a sister relationship between sponges and ctenophores in analyses where taxa with high amounts of missing data were excluded, and support for this was highest in Bayesian inference with the CAT (17) substitution model. However, ctenophores were recovered as sister to all other extant animals in maximum-likelihood analyses with greater taxon sampling (Bayesian inference never converged for datasets with more than 19 taxa). The CAT model is a site-heterogeneous model that may handle LBA artifacts better than site-homogeneous substitution models like GTR (18). Therefore, LBA plausibly influenced the phylogenetic position of ctenophores in analyses of Ryan et al. (6) that recovered ctenophores-sister. In Moroz et al. (5), strong nodal support for ctenophores sister to all other extant animals disappeared when both the strictest orthology criteria were enforced and ctenophore taxon sampling increased. Thus, further consideration of systematic error influencing phylogeny reconstruction at the base of the animal tree is desirable.
The ctenophore-sister hypothesis has challenged our understanding of early metazoan evolution, but given conflicting results (2⇓⇓⇓⇓⇓⇓⇓–10), this and other hypotheses must be carefully scrutinized. Ideally, if robust datasets are assembled and causes of systematic error are accounted for, different datasets and analytical methods will converge on a single phylogenetic hypothesis (19). However, practical barriers exist in assembling robust datasets free of systematic error. For example, phylogenomic datasets are prone to missing data given the incomplete nature of transcriptome and even genome sequences, and orthology determination among distantly related species can be difficult (20, 21). Computational limitations of complex phylogenetic methods can also prevent using what may be the best theoretical phylogenetic method. Nevertheless, both data quality and appropriate methods should be emphasized if deep relationships of any organismal group are to be robustly resolved.
Here, we have assembled a more comprehensive phylogenomic dataset of metazoan lineages that branched early in animal evolution than previous studies to alleviate taxon sampling concerns. We have sequenced transcriptomes of eight additional species and used other deeply sequenced publicly available transcriptomes (including some not used in past studies). Additionally, we use a number of data-filtering steps to explore the sensitivity of these results to potential sources of error. This process includes strict orthology determination, removal of taxa and genes that may cause LBA, and removal of heterogeneous genes that may cause model misspecification. Regardless of how data were filtered, all maximum-likelihood analyses with model partitioning and all Bayesian inference analyses using a site-heterogeneous model recover ctenophores as sister to all other animals with strong support. We identify overreliance on ribosomal protein genes in some datasets (7, 9) as the source of incongruence among previous phylogenomic studies. We also find strong support for sponge monophyly in contrast to previous reports (7, 22⇓–24).
Results
Datasets and Accounting for Biases.
Orthology filtering of transcriptome and genome data from 76 species resulted in 251 orthologous groups (OGs) and 81,008 aligned amino acid sites (Tables S1 and S2). TreSpEx (25) further identified 83 “certain” paralogs (i.e., sequences TreSpEx classified as high-confidence paralogs) from 10 OGs. TreSpEx also identified 2,684 “uncertain” paralogs (i.e., sequences TreSpEx classified as possibly, but not definitively, paralogous) from 104 OGs. Datasets with certain and both certain and uncertain paralogs pruned were starting points for progressively filtering other causes of systematic error. Overall, 25 hierarchical datasets that had progressively fewer characters, but controlled for more potential causes of systematic errors, were assembled (Fig. 2 and Table S2) (all datasets have been deposited on figshare, doi 10.6084/m9.figshare.1334306). The percentage of gene occupancy and missing data ranged from 70–82% and 35–44%, respectively (Table S2). Other than progressive data filtering, differences between datasets analyzed here and those used in previous studies of basal metazoan relationships (3⇓⇓⇓⇓⇓⇓–10) are increased character sampling and less missing data compared with some studies and increased nonbilaterian taxon sampling. In contrast to Nosenko et al. (7) and Philippe et al. (9), both of which relied heavily on ribosomal proteins (i.e., 52% and 71%, respectively), our dataset did not contain a large representation of any one gene class (e.g., only 8 of 250 were ribosomal protein genes) (SI Methods).
Hierarchy of datasets with progressive data filtering. Numbers associated with each datasets are references throughout the text. Datasets 5, 11, 15, 21 have Choanoflagellates as an outgroup, and datasets 22, 23, 24, 25 have all outgroups removed. RAxML analyses were used for each dataset, and shaded datasets were also analyzed with PhyloBayes. Data matrix statistics (e.g., number of sites, percentage of missing data, and so forth) for each dataset can be found in Table S2.
Ctenophores Sister to Other Extant Animals and Monophyletic Sponges.
All maximum-likelihood analyses with outgroups resulted in topologies with strong support [≥ 97% bootstrap support (BS)] for ctenophores sister to all other extant animal lineages (datasets 1–21 in Figs. 2 and 3 and Figs. S1–S4). Importantly, Phylobayes (26) analyses using the CAT-GTR+Γ model also resulted in phylogenies with Ctenophora sister to all other animals with 100% posterior probability (PP), and 100% PP for sponge monophyly (datasets 6 and 16 in Figs. 2 and 3 and Fig. S5 B and C). Inferred relationships among major sponge lineages (i.e., Demospongiae + Hexactinellida sister to Calcarea + Homoscleromorpha) were consistent with morphology (27, 28) and most other molecular analyses that have recovered monophyletic sponges (datasets 1–27 in Figs. 2 and 3 and Figs. S1–S5) (4, 5, 8, 9, 28⇓–30). Alternative hypotheses of basal animal relationships (Fig. 1) were rejected by every phylogenetic analysis as measured by the approximately unbiased (AU) (31) test (P ≤ 0.001) (Table 1). Overall, our results overwhelmingly reject alternative hypotheses to ctenophores sister to all other extant animals (Table 1).
Reconstructed maximum-likelihood topology of metazoan relationships inferred with dataset 10. Maximum likelihood and Bayesian topologies inferred with other datasets (Fig. 2) have identical basal branching patterns (Figs. S1–S5). Nodes are supported with 100% bootstrap support unless otherwise noted. Support, as inferred from each dataset (Fig. 2), for nodes covered by black boxes are in Table 1.
AU test P values for alternative hypotheses of animal relationships and support for ctenophora-sister and sponge monophyly
Ribosomal protein genes can have conflicting signal with most other genes (32). The datasets of Philippe et al. (9) and Nosenko et al. (7), which recovered cnidarians and ctenophores sister, had high proportions of ribosomal protein genes (67 of 128 and 87 of 122 genes, respectively). Nosenko et al. (7) analyzed a dataset without ribosomal protein genes and recovered ctenophores sister to all other animals, but a similar analysis has not been done for the original Philippe et al. (9) dataset. If certain topologies are recovered only with the use of a high proportion of one group of genes (e.g., ribosomal protein genes), this may indicate a phylogenetic signal that conflicts with the true evolutionary history. As such, we analyzed the Philippe et al. (9) dataset with ribosomal proteins removed (67 of 128) using maximum-likelihood and Bayesian inference, and neither reconstruction placed ctenophores and cnidarians sister as in the original study (Fig. S5 D and E). The maximum-likelihood analysis recovered strong support for ctenophores as sister to all other metazoan lineages (BS = 93) (Fig. S5D). However, Bayesian inference (Fig. S5E) recovered sponges as sister to all other metazoans, but support for this and other deep nodes were low (PP ≤ 90).
Systematic Biases and Their Effect on Phylogenetic Inference.
Long-branch (LB) scores (28), a measurement for identifying taxa and OGs that could cause LBA, were calculated for each species and OG with TreSpEx (25). In total, we identified six “long-branched” taxa, all nonmetazoans (Fig. S6A and Table S2), and 28 OGs with high LB scores compared with other OGs (Fig. S6 B and C).We found complete congruence in relationships among basal metazoan phyla in trees inferred with (datasets 1, 2, 8, 12, and 18 in Fig. 2) and without (datasets 3–7, 9–11, 13–17, 19–21, and 22–25 in Fig. 2) taxa and genes that had high LB scores, and nodal support for critical nodes showed little variation among analyses (Fig. 3 and Figs. S1–S5). Removing OGs with high amino acid compositional heterogeneity (datasets 7–11, 17–21, 23, and 25 in Fig. 2) also had no effect on branching order (Fig. 3 and Figs. S2 A–E, S3 E and F, S4 A–E, and S5A). Topologies inferred with only the slowest evolving half of OGs assembled here (datasets 6 and 16 in Fig. 2) (i.e., least saturated and least prone to homoplasy; see Fig. S7 for saturation plots) recovered high support for ctenophores sister to all other animals and sponge monophyly with both maximum-likelihood (BS = 100) (Fig. 3 and Figs. S1F and S3D) and Bayesian inference using the CAT-GTR model (PP = 1) (Fig. 3 and Fig. S5 B and C). Importantly, our datasets of the slowest evolving half of OGs were of a broad range of protein classes (SI Methods; figshare), rather than consisting of a majority of ribosomal proteins (7, 9).
Inaccurate orthology assignment can also introduce systematic error into phylogenomic analyses. Although relationships among basal lineages were unaffected, removal of paralogs as identified by TreSpEx appeared to have the greatest effect on support for some critical nodes. For example, most topologies with both certain and uncertain paralogs removed had strong support for sponge monophyly (i.e., ≥ 95% BS) (datasets 12–14 and 18–20 in Figs. 2 and 3 and Figs. S2F, S3 A, B, and F, and S4 A and B), but four analyses with only certain paralogs removed recovered low support (< 90% BS) for sponge monophyly (datasets 5, 7, 9, and 10 in Figs. 2 and 3 and Figs. S1E and S2 A, D, and E).
Because outgroup sampling has the potential to influence rooting of the animal tree, we explored outgroup sampling as well. When all outgroups except two choanoflagellates were removed (datasets 5, 11, 15, and 21 in Fig. 2), inferred nonbilaterian relationships were identical as in analyses we performed with full outgroup sampling (datasets 5, 11, 15, and 21 in Figs. 2 and 3 and Figs. S1E, S2E, S3C, and S4C), but support for sponge monophyly decreased. In these analyses the leaf-stability indices for homoscleromorph and calcareous sponges were less than 0.94, but in all other analyses they were greater than 0.97 (Fig. S5 F and G). Regardless, when choanoflagellates were the only outgroup, ctenophores were still recovered as the deepest split within the animal tree with 100% BS support. Analyses with all outgroup taxa removed (datasets 22–25 in Fig. 2) recovered identical relationships among major metazoan lineages as other analyses (Figs. S4 D–F and S5A). However, we observed low support for relationships among ctenophores, sponges, and placozoans in these analyses. This resulted from the long placozoan branch being attracted to ctenophores in the absence of outgroup taxa as indicated by bootstrap tree topologies and leaf-stability index for Trichoplax of less than 0.92, whereas leaf-stability indices were greater than 0.99 in all other analyses (Fig. S5 F and G).
Discussion
Placement of Ctenophores Sister to all Remaining Animals Is Not Sensitive to Systematic Errors.
Every analysis conducted herein strongly supported the ctenophore-sister hypothesis (Fig. 3 and Table 1). A major hurdle to wide acceptance of ctenophores as sister to other animals has been that different analyses have yielded conflicting hypotheses of early animal phylogeny (2⇓⇓⇓⇓⇓⇓–9). Sensitivity to the selected model of molecular evolution has been especially problematic (2⇓⇓⇓⇓⇓⇓–9). In contrast, both maximum-likelihood analyses using data partitioning and Bayesian analyses using the CAT-GTR model of our datasets resulted in identical branching patterns among ctenophores, sponges, placozoans, cnidarians, and bilaterians. Past critiques of studies that found ctenophores to be sister to all other animals have emphasized the CAT model as the most appropriate model for deep phylogenomics because it is an infinite mixture model that accounts for site-heterogeneity (7, 8, 29). Notably, when the CAT-GTR model was used here (datasets 6 and 16 in Fig. 2), we recovered ctenophores-sister to all other metazoans (Fig. 3 and Fig. S5 B and C).
The argument for LBA (7⇓⇓–10) or saturated datasets (7, 8) as the reason past studies found ctenophores to be sister to all other animals seems to have been overstated. The recovered position of ctenophores was identical in analyses with (datasets 1, 2, 8, 12, and 18 in Fig. 2 and Figs. S1 A and B, S2 B and F, and S3F) and without (datasets 3–7, 9–11, 13–17, and 19–25 in Fig. 2, and Figs. S1 C–F, S2 A and C–E, S3 A–E, S4, and S5 A–C) taxa and genes with high LB scores, and analyses with the slowest evolving genes (datasets 6 and 16 in Fig. 2 and Fig. S7) also recovered ctenophores sister to all other animals (Fig. 3 and Figs. S1F, S3D, and S5 B and C). Furthermore, despite the long internal branch leading to the ctenophore clade, the position of this lineage did not change in any analysis including those when outgroups were removed (datasets 5, 11, 15, 21, and 22–25 in Fig. 2 and Figs. S1E, S2E, S3C, and S4 C–F). If this branch was being artificially attracted toward outgroups, then employment of different outgroup schemes would be expected to result in different ctenophore placement. Maximum-likelihood and Bayesian inference using the CAT-GTR model of the least saturated datasets (datasets 6 and 16 in Fig. 2 and Fig. S7) recovered identical basal relationships as our other analyses (Fig. 3 and Figs. S1F, S3D, and S5 B and C), also indicating homoplasy and model choice did not bias results. Given the consistency among our analyses that were designed to have different levels of potential biases, we conclude that the ctenophore-sister hypothesis is robust to systematic errors.
Rather than focusing on long branches, fast evolving genes, or model misspecification as influencing the position of ctenophores, the individual genes underlying datasets that resulted in a sister relationship between ctenophores and cnidarians (7, 9) should be the focus of identifying problems with phylogenetic reconstruction. A benefit of phylogenomic datasets is that multiple gene classes and many parts of the genome are analyzed. As such, phylogenomic datasets should not rely too heavily on a single gene class. Past molecular studies that found support for Coelenterata and the Porifera-sister (7, 9) hypotheses appear to have been strongly affected by a disproportionate reliance (i.e., > 50%) on ribosomal protein genes. Nosenko et al. (7) and Philipe et al. (9) found support for ctenophores sister to cnidarians, but Nosenko et al. (7) recovered ctenophores sister to all other animals when ribosomal proteins were excluded. Ribosomal protein datasets from these studies are less saturated than datasets assembled here based on linear regression of patristic distance versus uncorrected genetic distance (Fig. S7) (7⇓–9, 25). This lower mutational saturation has been the primary rational for emphasizing ribosomal genes when inferring deep animal relationship (7). However, standard measurements of sequence saturation (7, 8, 25) average across the length of the sequence. Thus, a sequence with a few variable, highly saturated sites may appear less saturated than a sequence with numerous variable sites but less saturation per site. Furthermore, extremely low mutation rates can indicate selection and result in too little phylogenetic information, both of which could lead to the inference of incorrect relationships (33). Our maximum-likelihood analysis of Philipe et al.’s (9) dataset with ribosomal genes removed recovered support for ctenophores as sister to all other animals (Fig. S5D). Basal relationships were poorly resolved in the Bayesian analysis, which may be a result of too few characters, but ctenophores and cnidarians were not recovered as sister (Fig. S5E). Notably, a study focused only on myxozoan cnidarians (34) used the same matrix as Philippe et al. (9), but added two highly divergent cnidarians and recovered ctenophores sister to all other animals. Ribosomal protein genes have previously been identified as a potential source of phylogenetic error (32), and the above indicates that ctenophores sister to cnidarians as in Philippe et al. (9) was caused by either limited cnidarian taxon sampling, misleading signal in ribosomal genes, or both. More work is needed to assess saturation, possible convergent evolution, and selective pressures of ribosomal proteins in ctenophores, sponges, placozoans, and cnidarians. However, differences in topologies when ribosomal proteins are included or excluded strongly imply a misleading signal in ribosomal protein genes. Put simply, it appears highly improbable that all genes other than ribosomal protein genes could be recovering an incorrect phylogeny.
Sponges Are Monophyletic.
Sponge monophyly, although less controversial than the phylogenetic positions of ctenophores, cnidarians, and placozoans remains an important question as several studies have supported sponge paraphyly (7, 22⇓–24). In regards to inferring the characteristics of the metazoan ancestor, sponge paraphyly, coupled with sponges being at the base of the metazoan phylogeny, is an attractive hypothesis that implies the metazoan ancestor was sponge-like. However, sponge monophyly was recovered in all of our analyses and best supported when the strictest orthology criteria were applied with TreSpEx (Fig. 3 and Table 1), which also removes sequences resulting from sample contamination (e.g., endosymbionts).This observation suggests that spurious paralogs or sequence contamination may have been a source of error when sponges were found paraphyletic, but datasets that have recovered sponge paraphyly were also much smaller than those analyzed here (e.g., refs. 7, 22⇓–24). Sponge monophyly and the well-supported ctenophore-sister hypothesis complicates inferring the ancestral condition of metazoan and other major metazoan groups (e.g., Placozoa + Cnidaria + Bilateria) because many sponge characteristics are likely apomorphic traits. Our robust support of sponge monophyly agrees with morphology (35) and most other large molecular datasets (5, 6, 8⇓–10).
Conclusions
For more than a century, sponges were traditionally considered sister to all other extant metazoans because unlike ctenophores, cnidarians, and bilaterians, they lack true tissues and body symmetry (36, 37). Sponges also possess choanocyte cells that are similar in morphology to choanoflagellates, the sister group to metazoans (36). However, Mah et al. (38) found that homology between choanoflagellates and sponge choanocytes is not as definitive as previously assumed. Similarly, some authors have argued that Placozoans are sister to all other animals because they lack neural and muscular systems and also share similarities in mitochondrial genome size with choanoflagellaes (2, 39). A common theme of these two hypotheses is the placement of morphologically simple animals near the base of the animal tree, but complexity is not a good proxy for metazoan evolution (40). Challenges to long-held viewpoints of morphological complexity and assumed improbability of convergent evolution must not be dismissed simply because they seem unlikely at face value, especially considering a growing body of evidence that supports convergent evolution of many animal traits including neurons (5, 6, 41⇓–43). Furthermore, the Porifera-sister hypothesis lacks critical evaluation and homology of many characters in these taxa still need thorough analysis (44). Overall, findings presented here robustly support ctenophores as sister to all remaining animals.
Methods
Taxon and Character Sampling.
Taxon sampling included previously available data and eight new transcriptomes from two choanoflagellates, three glass sponges (Hexactinellida), two demosponges, and a deep-sea cnidarian (Scyphozoa) (Table S1). Briefly, RNA was extracted, reverse-transcribed, and amplified using the SMART kit (Clontech), and sequenced on an Illumina HiSeq (SI Methods). Raw or assembled transcriptome data for 68 additional species were retrieved from public databases (Table S1).
Raw Illumina transcriptome data were digitally normalized using normalize-by-median.py (45) with a k-mer size of 20, a desired coverage of 30, and four hash tables with a lower bound of 2.5 × 109. Normalized Illumina reads were assembled using default parameters in Trinity v20131110 (46). Raw 454 transcriptome data were assembled with Newbler (47).
Orthology Determination and Data Filtering.
Putative orthologs were determined for each species using HaMStR v13.2 (48) using the model organism core ortholog set. OGs, determined by HaMStR, were further processed using a custom pipeline that filtered OGs with too much missing data (i.e., OGs with less than 37 species), aligned sequences, and filtered potential paralogs (https://github.com/kmkocot) (SI Methods).
TreSpEx (25), which requires individual gene trees for each OG, was used to identify putative paralogs and exogenous contamination missed by our initial orthology inference approach. Gene trees were inferred with RAxML v.8.0.2 (49) with 100 rapid bootstrap replicates followed by a full maximum-likelihood inference; each tree was inferred with the LG+Γ model, which was by far the most common best-fitting model when the complete dataset was partitioned (see below). Paralogs in the initial dataset were detected with TreSpEx using the automated BLAST method and the prepackaged Capitella teleta and Helobdella robusta BLAST databases. This method identified two classes of paralogs: “certain” or sequences that are high-confidence paralogs, and “uncertain” or sequences that are potential paralogs. From the initial, 251 gene dataset, we created one dataset by removing certain paralogs and another dataset with both certain and uncertain paralogs pruned (Fig. 2); after pruning, OGs with fewer than 37 taxa were removed. LB scores (25) were calculated for each taxon and OG with TreSpEx. Following Struck (25), these values were plotted in R (Fig. S6) (50), and outliers were identified as taxa or genes that could cause LBA artifacts. After removal of taxa and genes, each OG was ranked by evolutionary rate with a custom python script (https://github.com/nathanwhelan) following Telford et al. (51). Datasets with only the slowest half of remaining genes were then generated to assess if fast evolving homoplasious genes were biasing inferences (Fig. 2).
We used BaCoCa (52) and two metrics (χ2-test of heterogeneity and relative composition frequency variability; RCFV) (53) to identify genes with amino acid compositional heterogeneity. Some datasets were further filtered by removing non-choanoflagellate outgroups and all outgroup taxa to determine if outgroup choice affects inferred relationships. Saturation of each filtered dataset with full outgroup sampling was explored with TreSpEx and plotted in R (Fig. S7) to provide a further metric to compare datasets.
Phylogenetics.
In addition to removing compositionally heterogeneous genes from some datasets, two approaches were used to handle site-heterogeneity: (i) partitioning schemes for each dataset and associated protein substitution models were determined using the relaxed clustering method in PartitonFinder (54) with 20% clustering and the corrected Akaike information criterion; (ii) a site-heterogeneous mixture model, CAT-GTR+Γ, was used in PhyloBayes (26). Maximum-likelihood topologies were inferred with RAxML using partitions as indicated by ParitionFinder, associated best-fit substitution models, and the gamma parameter to model rate-heterogeneity. Nodal support was measured with 100 fast bootstrap replicates. Phylobayes analyses were run with two chains until the maxdiff statistic between chains was below 0.3 as measured by bpcomp (26). Convergence was also assessed with tracecomp (26) to ensure each parameter had a maximum discrepancy between chains of less than 0.3 and an effective sample size of at least 50. Computational demands and convergence issues prevented us from using the CAT-GTR+Γ model for most datasets. Therefore, Bayesian phylogenies are only reported for the two analyses of the slowest evolving half of OGs. Leaf-stability indices (55) for each taxon were measured in PhyUtility (56) to identify potentially unstable taxa in each dataset.
Maximum-likelihood and Bayesian inference trees were also inferred from the Philippe et al. (9) dataset with ribosomal proteins removed (i.e., 67 of 128 genes) to determine if a single gene class-biased phylogenetic inference. Ribosomal proteins were filtered from the original dataset following data matrix annotations (9) and the matrix was split into individual genes for model testing using a custom R script (https://github.com/nathanwhelan).
The AU test (31) was used to determine if a priori hypotheses of basal metazoan relationships could be rejected (Fig. 1). Topological constraints were enforced in RAxML and the most likely tree given this constraint was inferred with the same partitioning scheme and models used for unconstrained phylogenetic inference. Per site log-likelihoods for trees were calculated in RAxML and AU tests were performed in Consel (57).
Acknowledgments
We thank members of the Molette Biology Laboratory for Environmental and Climate Change Studies at Auburn University for help with bioinformatics and data collection, especially Damien Waits. This work was made possible in part by a grant of high-performance computing resources and technical support from the Alabama Supercomputer Authority and was supported by the US National Aeronautics and Space Administration (Grant NASA-NNX13AJ31G) and in part by National Science Foundation (Grant 1146575). This is Molette Biology Laboratory Contribution 36 and Auburn University Marine Biology Program Contribution 128.
Footnotes
- ↵1To whom correspondence should be addressed. Email: nwhelan{at}auburn.edu.
Author contributions: N.V.W., K.M.K., L.L.M., and K.M.H. designed research; N.V.W. and K.M.K. performed research; N.V.W. and K.M.K. analyzed data; and N.V.W., K.M.K., L.L.M., and K.M.H. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: The sequence reported in this paper has been deposited in the NCBI Sequence Read Archive, www.ncbi.nlm.nih.gov/sra (accession no. PRJNA278284). Transcriptome assemblies, phylogenetic datasets, and an annotation file were deposited to figshare, figshare.com (doi: 10.6084/m9.figshare.1334306).
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1503453112/-/DCSupplemental.
Freely available online through the PNAS open access option.
References
- ↵.
- Dohrmann M,
- Wörheide G
- ↵.
- Dellaporta SL, et al.
- ↵
- ↵.
- Hejnol A, et al.
- ↵
- ↵.
- Ryan JF, et al., NISC Comparative Sequencing Program
- ↵
- ↵
- ↵
- ↵.
- Pick KS, et al.
- ↵.
- Felsenstein J
- ↵
- ↵
- ↵.
- Heath TA,
- Hedtke SM,
- Hillis DM
- ↵
- ↵.
- Roure B,
- Baurain D,
- Philippe H
- ↵
- ↵.
- Tavaré S
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Sperling EA,
- Peterson KJ,
- Pisani D
- ↵
- ↵.
- Lartillot N,
- Rodrigue N,
- Stubbs D,
- Richer J
- ↵.
- van Soest RWM
- ↵
- ↵.
- Dohrmann M,
- Janussen D,
- Reitner J,
- Collins AG,
- Wörheide G
- ↵
- ↵.
- Shimodaira H
- ↵
- ↵.
- Edwards SV
- ↵
- ↵.
- Ax P
- ↵
- ↵
- ↵
- ↵
- ↵.
- Halanych KM
- ↵
- ↵.
- Liebeskind BJ,
- Hillis DM,
- Zakon HH
- ↵.
- Moroz LL
- ↵.
- Halanych KM
- ↵.
- Brown T,
- Howe C,
- Zhang A,
- Pyrkosz Q,
- Brom AB
- ↵
- ↵
- ↵
- ↵.
- Stamatakis A
- ↵.
- R Core Development Team
- ↵.
- Telford MJ, et al.
- ↵
- ↵
- ↵
- ↵
- ↵.
- Smith SA,
- Dunn CW
- ↵.
- Shimodaira H,
- Hasegawa M
Citation Manager Formats
Sign up for Article Alerts
Jump to section
You May Also be Interested in
More Articles of This Classification
Biological Sciences
Evolution
Related Content
- No related articles found.
Cited by...
- The triple helix of collagens - an ancient protein structure that enabled animal multicellularity and tissue evolution
- On the evolution of bilaterality
- Establishing and maintaining primary cell cultures derived from the ctenophore Mnemiopsis leidyi
- Reconstructing the Backbone of the Saccharomycotina Yeast Phylogeny Using Genome-Scale Data
- Miscues misplace sponges
- Reply to Halanych et al.: Ctenophore misplacement is corroborated by independent datasets
- Genomic data do not support comb jellies as the sister group to all other animals