Previous Article |
Table of Contents
| Next Article
BIOLOGICAL SCIENCES / EVOLUTION
Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms
,


Department of Botany and ¶Florida Museum of Natural History, University of Florida, Gainesville, FL 32611; and
Department of Biological Sciences, University of New Orleans, New Orleans, LA 70149
Communicated by David L. Dilcher, University of Florida, Gainesville, FL, August 28, 2007 (received for review June 15, 2007)
| Abstract |
|---|
|
|
|---|
42,000 bp) for 45 taxa, including members of all major basal angiosperm lineages. We also report the complete plastid genome sequence of Ceratophyllum demersum. Parsimony analyses of combined and partitioned data sets varied in the placement of several taxa, particularly Ceratophyllum, whereas maximum-likelihood (ML) trees were more topologically stable. Total evidence ML analyses recovered a clade of Chloranthaceae + magnoliids as sister to a well supported clade of monocots + (Ceratophyllum + eudicots). ML bootstrap and Bayesian support values for these relationships were generally high, although approximately unbiased topology tests could not reject several alternative topologies. The extremely short branches separating these five lineages imply a rapid diversification estimated to have occurred between 143.8 ± 4.8 and 140.3 ± 4.8 Mya.
Ceratophyllum | molecular dating | phylogenetics | mesangiosperms
Whereas the three basalmost angiosperm nodes are now well resolved and supported, relationships among the major lineages of Mesangiospermae [sensu (18)] have been more difficult to elucidate. Analyses of multigene data sets have provided strong support for the monophyly of each of the five major clades of mesangiosperms: Chloranthaceae, Magnoliidae [sensu (18), consisting of Laurales, Magnoliales, Canellales, and Piperales; for the rest of this paper, we will refer to this group as "magnoliids"], Ceratophyllum, monocots, and eudicots. However, the relationships among these five lineages remain unclear. For example, Ceratophyllum has been variously recovered as sister to eudicots (7, 16, 19) or monocots (20, 21). Bootstrap (BS) and Bayesian posterior probability values for these alternative relationships of Ceratophyllum and for the positions of these clades relative to magnoliids and Chloranthaceae have usually been low, even when as many as nine genes have been combined (2, 15). The phylogenetic position of monocots has also been problematic. In molecular phylogenetic studies, monocots have been recovered as sister to magnoliids, Ceratophyllum, or as part of a clade with magnoliids and Chloranthaceae, generally with low support (7, 12, 15, 17, 19–24). The unstable relationships exhibited among these five major lineages of angiosperms are likely due to a combination of the relatively ancient age of these taxa [at least four of which have fossil records that extend back >100 Mya to the Early Cretaceous (25–29)], the short evolutionary branches separating these lineages, and the relatively long branches leading to Ceratophyllum and to the basal lineages of monocots (2).
The increasing number of complete angiosperm plastid genome sequences presents an opportunity to explore whether character-rich data sets can resolve the relationships among these five major angiosperm lineages (13, 14, 30). Here we present phylogenetic analyses of 61 plastid protein-coding genes (
42,000 bp of sequence data) derived from complete plastid genome sequences of 45 taxa, including at least one member of every major basal lineage of angiosperms. As part of this study, we also report the complete nucleotide sequence of the Ceratophyllum demersum plastid genome. Although topology tests do not exclude several alternative relationships, we find generally high support for a fully resolved topology of Mesangiospermae, including a clade of Ceratophyllum, eudicots, and monocots. We also provide a time frame for the likely rapid diversification of the five major lineages of mesangiosperms.
| Results |
|---|
|
|
|---|
25 kb separating large and small single-copy regions (31, 32). The Ceratophyllum genome is unrearranged relative to Nicotiana (33), and the plastid gene content in Ceratophyllum is identical to that in most angiosperms (32) [supporting information (SI) Fig. 3]. General genome characteristics as well as 454 sequence assembly characteristics are available in SI Table 2.
Phylogenetic Analyses.
The total analyzed aligned length (total aligned length minus excluded base pairs) of the 61-gene combined data set was 42,519 bp, whereas the analyzed aligned lengths of the fast and slow gene partitions (see Materials and Methods) were 22,682 and 19,837 bp, respectively. Total aligned and analyzed aligned lengths for all partitions and genes are given in SI Table 3. The Akaike Information Criterion selected GTR + I +
as the optimal model for maximum likelihood (ML) and Bayesian searches for the 61-gene combined, fast, and slow gene data sets, and TVM + I +
for the 61-gene first/second codon position data set (although all analyses used GTR + I +
; see Materials and Methods).
In almost all analyses, Amborella, Nymphaeales, and Austrobaileyales (represented by Illicium) were successively sister to the remaining angiosperms with strong support regardless of partitioning strategy or phylogenetic optimality criterion (Figs. 1 and 2; SI Figs. 4–10). Only ML analyses of combined first and second codon positions for the 61-gene data set recovered a differing optimal topology; in the best ML tree from this analysis, a clade of Amborella + Nymphaeales was sister to remaining angiosperms (Fig. 1; SI Fig. 5).
|
|
Compared with MP, ML and Bayesian methods provided different but more stable topological results across our partitioning scheme. Regardless of partitioning strategy, ML and Bayesian methods recovered monocots as sister to a clade of Ceratophyllum + eudicots, with generally high support values (Fig. 1) in most cases despite the extremely short branch lengths separating these groups. A sister relationship of Ceratophyllum to eudicots received the highest support in the 61-gene combined data trees (ML BS = 71%; Bayesian posterior probability = 1.0) but was less well supported in fast and slow gene trees (Fig. 1). Chloranthaceae and magnoliids (including Piper) formed a clade that received moderate to high support values in the 61-gene and fast gene ML and Bayesian trees, but the slow gene analyses recovered magnoliids as sister to a clade of Chloranthaceae + Ceratophyllum/eudicots/monocots (Fig. 1). However, the latter relationship received ML BS support <50% and Bayesian posterior probability <0.5. Removing Piper from ML 61-gene combined data, fast gene, and slow gene analyses in no case altered the overall ML topology (SI Figs. 15–17).
After submission of this paper, the Ceratophyllum genome sequence data were added to an expanded 64-taxon (including extra eudicot and monocot taxa as well as a cycad outgroup), 81-gene (
76,000-bp) plastid genome matrix in conjunction with Jansen et al. (34). The analyses of this data set are presented as supporting information figures 7–9 in ref. 34. ML analyses were topologically identical to those in the 61-gene combined analyses, with higher BS support for Ceratophyllum + eudicots (82%) but with lower support for the sister relationship of this clade to monocots (73%) as well as for the clade of Chloranthaceae and magnoliids (64%) (supporting informamtion figure 8 in ref. 34). MP analyses continued to unite Piper and Ceratophyllum with high support (supporting information figure 7 in ref. 34).
Topology Tests. The approximately unbiased (AU) test failed to reject 17 of 104 alternative topologies involving Ceratophyllum, Chloranthaceae, eudicots, magnoliids, and monocots at the 0.05 significance level (SI Table 4). A strict consensus of these 17 trees and the best ML tree provided no resolution among these five lineages. An AU test of the 81-gene 65-taxon data set (including Ceratophyllum) jointly undertaken with Jansen et al. (34) also failed to resolve mesangiosperm relationships (supporting information figure 9 in ref. 34).
Molecular Dating.
Divergence time estimates varied little (<0.5% for all nodes) across the three fossil constraint schemes used in the penalized likelihood (PL) analyses (Table 1); the results of the unconstrained analysis are therefore reported here. The unconstrained PL analysis indicated that extant angiosperms began to diversify in the mid-Jurassic,
170 Mya, and that the five major mesangiosperm lineages diversified relatively rapidly in the earliest Cretaceous. The initial divergence of these five lineages was dated to 143.8 ± 4.8 Mya, and the youngest divergence (of Chloranthus and magnoliids) was dated to 140.3 ± 4.8 Mya (Table 1; SI Fig. 18). The origins of the extant crown groups of magnoliids, monocots, and eudicots were dated to somewhat later in the Cretaceous: 130.1 ± 4.4 Mya for magnoliids, 128.9 ± 4.9 Mya for monocots, and 124.8 ± 6.3 Mya for eudicots (Table 1; SI Fig. 18).
|
| Discussion |
|---|
|
|
|---|
42,000 bp of sequence data, it is difficult to resolve the relationships among Ceratophyllum, Chloranthaceae, eudicots, magnoliids, and monocots with confidence. The combination of extremely short internal and relatively long terminal branches that characterize these lineages is almost certainly responsible for the topological differences among MP, ML, and Bayesian analyses (35–37). For example, MP recovers a clearly incorrect topology by uniting Ceratophyllum and Piper (Fig. 1). Strong molecular and morphological evidence supports a magnoliid clade that includes Piperales and excludes Ceratophyllum (21, 30, 38). The relatively long branches leading to Ceratophyllum and Piper, the occurrence of each of these taxa in different parts of the tree in the absence of the other taxon in MP analyses, the increasing support for the erroneous Ceratophyllum/Piper topology in MP with increasing sequence length, and the fact that ML never unites these taxa suggest this is almost certainly a case of long-branch attraction (35, 39, 40). Although breaking up the long branch to Ceratophyllum is impossible, the addition of unsampled Piperales may resolve this problem. Despite the generally high ML support for a resolved basal angiosperm phylogeny in the 61-gene combined analyses (Fig. 2), the topology test results indicate that no statistically significant resolution of the relationships among Ceratophyllum, Chloranthaceae, eudicots, magnoliids, and monocots is possible with either the current data set (SI Table 4) or the 81-gene expanded data set (supporting information figure 9 in ref. 34). This difficulty in resolving mesangiosperm relationships may result from a number of phenomena, including the early and potentially rapid diversification of mesangiosperms in conjunction with the erosion of phylogenetic signal at more rapidly evolving sites. It is possible that increasing taxon sampling in several lineages (for example, magnoliids, monocots, and eudicots) in future analyses may alter the topology or support values by potentially reducing any phylogenetic error that arises from substitutional rate heterogeneity and mutational saturation (13, 14, 41–43). However, the problem of resolving mesangiosperm diversification may remain even with improved taxon sampling. More sophisticated analytical approaches may need to be developed before basal angiosperm branching order can be reconstructed confidently.
Monocot/Eudicot Relationships. Systematists have long thought in terms of a major split in angiosperms between monocotyledons and dicotyledons. This longstanding view dates to Ray (44) and served until recently as a fundamental division in angiosperm classifications, with these two groups designated as distinct classes, Liliopsida and Magnoliopsida (45–47). Many earlier angiosperm systematists (e.g., refs. 45, 46, and 48) proposed that monocots formed a clade derived from "primitive" dicot ancestors, such as Nymphaeales. Early molecular phylogenetic analyses confirmed that monocots were derived from a paraphyletic grade of "dicots" but did not resolve their position with high support. Molecular analyses have variously placed monocots as sister to all remaining angiosperms after the Amborella–Nymphaeales–Austrobaileyales grade (11, 15, 17), as part of a clade with magnoliids and Chloranthaceae (7, 15), or as sister to the magnoliids (15, 24).
Our analyses of the plastid genome, although not conclusive, suggest that monocots may be sister to eudicots or part of a clade with Ceratophyllum + eudicots. A close relationship between monocots and eudicots is recovered in all ML analyses, with high support in several cases (Figs. 1 and 2). Likewise, of the 17 alternative topologies not rejected by the AU test, only five do not place monocots sister to eudicots or within a Ceratophyllum/eudicot/monocot clade (SI Table 4). Moreover, the P values of these five topologies fell just above the 0.05 significance level. Other recent analyses have also provided evidence of a monocot/eudicot sister relationship (24, 30, 49) or a monocot + Ceratophyllum/eudicot clade (16). Should this tentative support be validated in future analyses with additional data and/or taxa, it would suggest that after some initial evolutionary "experiments" (the Amborella–Nymphaeales–Austrobaileyales grade, magnoliids, Chloranthaceae), there was indeed a major split in angiosperms between monocots and eudicots + Ceratophyllum, which collectively represent 97% of extant flowering plants.
Mesangiosperm Radiation.
The PL divergence dates obtained for deep-level angiosperm diversification using the optimal 61-gene combined ML tree generally agree with those estimated in several previous studies. For example, several studies have documented a mid- to late-Jurassic age for extant angiosperms as well as Early Cretaceous ages for the divergences of Ceratophyllum, Chloranthaceae, eudicots, magnoliids, and monocots (50–52). As has been noted elsewhere, however, the age estimates for all of these basal angiosperm divergences antedate the earliest unambiguous fossil angiosperms, which are of Hauterivian Age,
136–130 Mya (53–56). A number of causes have been advanced to explain this discrepancy, ranging from missing fossil histories to the problems inherent in molecular-based dating techniques (51, 52, 57–59). Our dating analyses rely on a large and apparently internally consistent data set (as judged by the similar phylogenetic results among the various partitions under ML) and consequently should be less susceptible to phylogenetic sources of error (52, 57). The age estimated in the unconstrained PL analyses for the origin of eudicots (124.8 ± 6.3 mya) is also encouraging, because it is consistent with the earliest known appearance of fossil eudicot pollen (125 Mya; refs. 26 and 54). However, the data set used here contains relatively sparse taxon sampling in several major angiosperm lineages (e.g., Nymphaeales, magnoliids, and monocots) despite the fact that it contains exemplar taxa from all of the major basal lineages of angiosperms. Thus we cannot rule out other sources of error involving rate variation among lineages [the lineage effects of (57)] and the proper placement of fossil constraints. Adding key angiosperm taxa to our data set will therefore be important to correct such error and allow the exploration of the effects of more fossil constraints.
Regardless of the absolute divergence times of individual clades, the PL analyses indicate that mesangiosperms diversified rapidly, probably over just a few million years (Table 1; SI Fig. 18). The origin and relatively rapid rise of the angiosperms have long been considered enigmatic [e.g., Darwin's "abominable mystery" (60)]. Although the fossil record certainly supports the presence of many diverse lineages early in angiosperm evolution (56, 61–63), our analyses clearly indicate that the radiation responsible for nearly all extant angiosperm diversity was not associated with the origin of the angiosperms but occurred after the earlier diversification of Amborella, Nymphaeales, and Austrobaileyales.
| Materials and Methods |
|---|
|
|
|---|
DNA Sequence Alignment. The data set for phylogenetic analyses was composed of the nucleotide sequence of the 61 protein-coding genes (SI Table 3) that are present, with very few exceptions, in all angiosperm plastid genomes (32). We modified the 61-gene alignment of Cai et al. (30) by adding Ceratophyllum and several other recently sequenced chloroplast genomes as well as by reducing taxonomic coverage in Poaceae and Solanaceae, both of which have many available plastid genome sequences. The complete taxonomic sampling for the current analyses is given in SI Table 5. Manual realignment of some genes was necessary after the addition of new sequences. Several short regions that were difficult to align in the more quickly evolving genes (e.g., matK, ndhF, and rpoC2) were excluded from analyses, as were all sequence insertions present in only one taxon.
Phylogenetic Analyses. MP, ML, and Bayesian searches were conducted on the combined 61-gene data set as well as on two partitions of the combined data set that were designed to test the influence of relative evolutionary rate on tree reconstruction. These partitions were created by first ranking the 61 genes based on the average pairwise distance across all taxa for each gene. A relatively large break in pairwise distances of 0.007 units between rps14 and atpA (SI Table 3) was then chosen to divide the genes into relatively more quickly and more slowly evolving groups (hereafter called fast and slow gene partitions) that contained roughly similar numbers of base pairs and genes. The genes included in each partition, along with sequence characteristics for each gene, are given in SI Table 3. The complete data set is available in SI Dataset 1.
We also investigated the effects of mutational saturation at third codon positions in the 61-gene combined data set. Uncorrected pairwise distances for transitions and transversions among all taxa in the data set were plotted against GTR + I +
distances to detect mutational saturation at first and second codon positions combined, as well as at third positions. Because third codon position transitions displayed the strongest evidence of mutational saturation (SI Fig. 19), we also performed phylogenetic analyses on combined first and second codon positions for the 61-gene data set.
MP heuristic searches were performed by using PAUP* 4.0 (66) with 1,000 random sequence addition replicates, TBR branch swapping and MULTREES, with gaps treated as missing data. Clade support under MP was assessed by using 1,000 BS replicates (67) with the same settings as for heuristic searches, except with 10 random sequence addition replicates per BS replicate. Trees were rooted with Pinus and Ginkgo.
ML and Bayesian searches were performed for the 61-gene combined data set and for all data partitions, incorporating the model selected as optimal by Modeltest Ver. 3.7 (68) by using the Akaike Information Criterion (AIC) (69) whenever possible. ML analyses were conducted by using the program GARLI, which uses a genetic algorithm to perform rapid heuristic ML searches (www.bio.utexas.edu/faculty/antisense/garli/Garli.html). Default parameters were used for the GARLI searches except that significanttopochange was set to 0.01. A total of 100 ML BS replicates was also performed by using GARLI. Bayesian searches were performed with MrBayes Ver. 3.1.2 (70). To ensure convergence on the appropriate posterior probability distribution, three replicate analyses were run for 6,000,000 generations each for all data sets except the 61-gene first/second codon position data set (1,500,000 generations each). Each replicate used four chains with default parameters. Trees were sampled every 1,000 generations, and the point of stationarity was determined by examining plots of the values of the estimated parameters against generation time and examining split parameters in the program AWTY (http://king2.scs.fsu.edu/CEBProjects/awty/awty_start.php). After ensuring that stationarity was reached in each run, the final 5,000 trees (the final 1,400 trees in the first/second codon position analyses) sampled from each replicate were combined to compute Bayesian majority-rule consensus trees. The AIC selected the TVM + I +
model for the 61-gene first/second codon position data set. Because neither MrBayes nor GARLI incorporates five-state models as analysis options, the model was set to GTR + I +
for this data partition.
Hypothesis Testing.
To assess whether alternative relationships among Ceratophyllum, Chloranthaceae, eudicots, magnoliids, and monocots could be statistically rejected, we performed AU tests (71) as implemented in CONSEL Ver. 0.1i (72). All 105 possible rooted alternative topologies (including the best ML tree) involving these five major lineages were tested, while holding all other relationships constant to those found in the best GARLI ML tree. Individual site likelihoods were estimated in PAUP* under the GTR + I +
model.
Molecular Dating Analyses.
A likelihood ratio test of rate constancy across lineages indicated that our data do not conform to a molecular clock model. Divergence times were therefore estimated under a relaxed molecular clock by using PL (73) as implemented in the program r8s (74). The smoothing parameter (
) was determined by cross-validation. The best ML topology for the 61-gene combined data set as found by GARLI was used for divergence time analyses, but branch lengths and model parameters were reestimated in PAUP* by using a GTR + I +
model of sequence evolution, because GARLI does not fully optimize these parameters (although GARLI-estimated ML parameters are always extremely close to the fully optimized values). To quantify errors in our divergence time estimates, we used the nonparametric BS approach outlined by ref. 75.
Three PL analyses that varied in the application of fossil constraints were run. All analyses used root constraints of a maximum age of 310 Mya and a minimum age of 290 Mya as a conservative estimate of the age of crown group seed plants. The first PL analysis used no further age constraints (this will be referred to as the unconstrained analysis), whereas the second and third analyses used a minimum age of 125 Mya for crown and stem group eudicots, respectively. The latter two analyses also incorporated a number of other minimum age constraints across the tree. All fossil constraints are discussed in detail in SI Text.
| Acknowledgements |
|---|
|
|
|---|
| Footnotes |
|---|
Abbreviations: AU test, approximately unbiased test; BS, bootstrap; ML, maximum likelihood; MP, maximum parsimony; PL, penalized likelihood.
To whom correspondence should be addressed at the present address: Biology Department, Oberlin College, Oberlin, OH 44074-1097. E-mail: michael.moore{at}oberlin.edu
Freely available online through the PNAS open access option.
Author contributions: P.S.S. and D.E.S. designed research; M.J.M. performed research; M.J.M. and C.D.B. analyzed data; and M.J.M., C.D.B., P.S.S., and D.E.S. wrote the paper.
The authors declare no conflict of interest.
Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession no. EF614270).
This article contains supporting information online at www.pnas.org/cgi/content/full/0708072104/DC1.
© 2007 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
R. K. Jansen, Z. Cai, L. A. Raubeson, H. Daniell, C. W. dePamphilis, J. Leebens-Mack, K. F. Muller, M. Guisinger-Bellian, R. C. Haberle, A. K. Hansen, et al. Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns PNAS, December 4, 2007; 104(49): 19369 - 19374. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||