Skip to main content

Main menu

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
    • Front Matter Portal
    • Journal Club
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Home
Home
  • Log in
  • My Cart

Advanced Search

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
    • Front Matter Portal
    • Journal Club
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
Research Article

Extensive gene tree discordance and hemiplasy shaped the genomes of North American columnar cacti

View ORCID ProfileDario Copetti, Alberto Búrquez, Enriquena Bustamante, Joseph L. M. Charboneau, Kevin L. Childs, Luis E. Eguiarte, Seunghee Lee, Tiffany L. Liu, Michelle M. McMahon, Noah K. Whiteman, Rod A. Wing, Martin F. Wojciechowski, and Michael J. Sanderson
  1. aArizona Genomics Institute, School of Plant Sciences, University of Arizona, Tucson, AZ 85721;
  2. bInternational Rice Research Institute, Los Baños, Laguna, Philippines;
  3. cInstituto de Ecología, Unidad Hermosillo, Universidad Nacional Autónoma de México, Hermosillo, Sonora, Mexico;
  4. dDepartment of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721;
  5. eDepartment of Plant Biology, Michigan State University, East Lansing, MI 48824;
  6. fDepartamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, Ciudad de México, Mexico;
  7. gSchool of Plant Sciences, University of Arizona, Tucson, AZ 85721;
  8. hDepartment of Integrative Biology, University of California, Berkeley, CA 94720;
  9. iSchool of Life Sciences, Arizona State University, Tempe, AZ 85287

See allHide authors and affiliations

PNAS November 7, 2017 114 (45) 12003-12008; first published October 23, 2017; https://doi.org/10.1073/pnas.1706367114
Dario Copetti
aArizona Genomics Institute, School of Plant Sciences, University of Arizona, Tucson, AZ 85721;
bInternational Rice Research Institute, Los Baños, Laguna, Philippines;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Dario Copetti
Alberto Búrquez
cInstituto de Ecología, Unidad Hermosillo, Universidad Nacional Autónoma de México, Hermosillo, Sonora, Mexico;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Enriquena Bustamante
cInstituto de Ecología, Unidad Hermosillo, Universidad Nacional Autónoma de México, Hermosillo, Sonora, Mexico;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joseph L. M. Charboneau
dDepartment of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kevin L. Childs
eDepartment of Plant Biology, Michigan State University, East Lansing, MI 48824;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Luis E. Eguiarte
fDepartamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, Ciudad de México, Mexico;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Seunghee Lee
aArizona Genomics Institute, School of Plant Sciences, University of Arizona, Tucson, AZ 85721;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tiffany L. Liu
eDepartment of Plant Biology, Michigan State University, East Lansing, MI 48824;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michelle M. McMahon
gSchool of Plant Sciences, University of Arizona, Tucson, AZ 85721;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Noah K. Whiteman
hDepartment of Integrative Biology, University of California, Berkeley, CA 94720;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rod A. Wing
aArizona Genomics Institute, School of Plant Sciences, University of Arizona, Tucson, AZ 85721;
bInternational Rice Research Institute, Los Baños, Laguna, Philippines;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Martin F. Wojciechowski
iSchool of Life Sciences, Arizona State University, Tempe, AZ 85287
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael J. Sanderson
dDepartment of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: sanderm@email.arizona.edu
  1. Edited by David M. Hillis, The University of Texas at Austin, Austin, TX, and approved September 26, 2017 (received for review April 24, 2017)

  • Article
  • Figures & SI
  • Info & Metrics
  • PDF
Loading

Significance

Convergent and parallel evolution (homoplasy) is widespread in the tree of life and can obscure evidence about phylogenetic relationships. Homoplasy can be elevated in genomes because individual loci may have independent evolutionary histories different from the species history. We sequenced the genomes of five cacti, including the iconic saguaro of the Sonoran Desert and three other columnar cacti, to investigate whether previously uncharacterized features of genome evolution might explain long-standing challenges to understanding cactus phylogeny. We found that 60% of the amino acid sites in proteins exhibiting homoplasy do so because of conflicts between gene genealogies and species histories. This phenomenon, termed hemiplasy, is likely a consequence of the unusually long generation time of these cacti.

Abstract

Few clades of plants have proven as difficult to classify as cacti. One explanation may be an unusually high level of convergent and parallel evolution (homoplasy). To evaluate support for this phylogenetic hypothesis at the molecular level, we sequenced the genomes of four cacti in the especially problematic tribe Pachycereeae, which contains most of the large columnar cacti of Mexico and adjacent areas, including the iconic saguaro cactus (Carnegiea gigantea) of the Sonoran Desert. We assembled a high-coverage draft genome for saguaro and lower coverage genomes for three other genera of tribe Pachycereeae (Pachycereus, Lophocereus, and Stenocereus) and a more distant outgroup cactus, Pereskia. We used these to construct 4,436 orthologous gene alignments. Species tree inference consistently returned the same phylogeny, but gene tree discordance was high: 37% of gene trees having at least 90% bootstrap support conflicted with the species tree. Evidently, discordance is a product of long generation times and moderately large effective population sizes, leading to extensive incomplete lineage sorting (ILS). In the best supported gene trees, 58% of apparent homoplasy at amino sites in the species tree is due to gene tree-species tree discordance rather than parallel substitutions in the gene trees themselves, a phenomenon termed “hemiplasy.” The high rate of genomic hemiplasy may contribute to apparent parallelisms in phenotypic traits, which could confound understanding of species relationships and character evolution in cacti.

  • Saguaro
  • homoplasy
  • lineage sorting
  • phylogenomics

Cactaceae have undergone adaptive radiation on a continental scale in the Americas. Occurring from arid deserts to alpine steppes and tropical forests, they exhibit a remarkable diversity of growth forms, ranging from tiny, nearly subterranean “buttons” to giant columnar or candelabra forms, epiphytes, and leafy shrubs (1). Classification of the family’s 1,438 species (2) has been unusually fraught, with taxonomic treatments recognizing anywhere from 20 to 233 genera (1⇓⇓–4), and classification above the genus level being equally problematic (1, 5). In part this has been attributed to homoplasy (ref. 1, p. 18 and ref. 6, p. 45), the independent evolutionary origin of the same trait (7). For example, early taxonomists combined the giant columnar cacti of North and South America into one large genus Cereus Mill (3). Later split into numerous smaller genera, this association was sometimes maintained at a suprageneric rank (8), but molecular phylogenies clearly show North and South American Cereus are separate clades (6, 9).

Frequent convergence in growth habit may reflect design limitation (10) of the relatively simple cactus body plan of stem succulence, simplified branching, and loss or reduction of leaves (9) (ref. 11, p. 536). In cacti, convergent simplification via paedomorphosis (12), along with parallelisms among close relatives, has also obscured phylogeny (13, 14). This may help explain the finding that <50% of (nonmonotypic) cactus genera are monophyletic (15). Taxonomic impediments take on special significance in cacti because of the unusually high fraction of species of conservation concern (31% estimated in ref. 16). Despite its potential significance, homoplasy has been quantified in cacti for only a small number of phenotypic traits (14, 17). Here, we focus on the genomes of cacti and assess the degree to which apparent homoplasy among these species is elevated due to discordance between gene trees and the species tree.

An important but recently recognized contributor to molecular homoplasy is “hemiplasy” (18, 19), originally defined as the apparent multiple origin of a character state on the inferred species tree arising when an inferred gene tree (with no homoplasy) is discordant with that species tree (18⇓–⇓21) (Fig. 1). A slight generalization of this definition is needed to account for characters with homoplasy on both trees (Fig. 1, Lower Right): A character (site) exhibits hemiplasy if the inferred number of character state changes on the species tree is strictly greater than on the gene tree, which occurs only if the two trees are discordant (Materials and Methods).

Fig. 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 1.

Hemiplasy in amino acid alignments of cacti (genes annotated as Cgig1_gene##:site ##). Hemiplasy occurs when the inferred number of state changes of a trait is greater on the species tree than on the gene tree, which happens only if the two trees are discordant. Gene trees in blue are imbedded in black species trees. Yellow dot is the position of a state change inferred for an amino acid site for this gene tree. Black rectangles are locations where state changes would be inferred on the species tree (alternative equally optimal reconstructions are possible but the number of state changes is the same). Amino acids are indicated at leaf nodes. Three trees have the same number of state changes on species and gene trees, but at Lower Left a homology (one change) on the gene tree is seen as homoplasy (two changes) on the species tree because of discordance (i.e., hemiplasy). Cgig, C. gigantea; Lsch, L. schottii; Phum, P humboldtii; Ppri, P. pringlei; Sthu, S. thurberi.

Genome scale datasets revealed high levels of gene tree discordance (20, 22⇓–24), which suggests a potentially important role for hemiplasy as a contributor to overall molecular homoplasy. Discordance can arise because of incomplete lineage sorting (ILS) and gene flow, which depend, in turn, on demographic factors at the population level, phylogenetic history, divergence times, mutation rate, generation time, and the timing and taxa involved in introgression or hybridization (25). It can also arise from gene duplication and loss (26).

The cactus tribe Pachycereeae is a model for taxonomic and phylogenetic complexities in cacti (27, 28). Its ∼70 species, including most of the columnar cacti of Mexico and adjacent regions, have been dispersed among 6–23 genera in various taxonomic treatments (1, 5, 27). Despite early molecular support for its monophyly (11), broadly sampled recent molecular studies have led to abandonment of the tribe (9, 13, 15), or to recasting it with the informal name, “core Pachycereeae” (6). Notably, homoplasy in vegetative and floral traits has been cited as a factor contributing to its difficult taxonomic history (ref. 28, p. 1086 and ref. 29, p. 556).

To sample widely in Pachycereeae, we sequenced the genomes of four representatives of the two subtribes of Pachycereeae recognized in Gibson and Horak (30) (Table S1) and some earlier treatments. This included the iconic saguaro cactus of the Sonoran Desert (Carnegiea gigantea) to serve as a high-coverage reference assembly, and Pachycereus pringlei (cardón, sahueso), Lophocereus schottii (senita), and Stenocereus thurberi (organ pipe, pitaya), also of the Sonoran Desert, to lower coverage. Pereskia humboldtii, a leafy, Andean cactus, was included as an outgroup (31).

View this table:
  • View inline
  • View popup
Table S1.

Sample, voucher, and sequence library information for taxa in this paper

Results and Discussion

The assembly of saguaro’s 1.4-Gb genome (32) from short read libraries (Table S1) spanned 980 Mb, with a scaffold N50 of 61.5 kb (Table S2). Transcriptomes were also assembled (Table S3) and used with other evidence to annotate the saguaro genome. The saguaro genome contains 28,292 protein coding genes; 58% of the assembly consists of transposable elements and retroviral and repeated sequences (Tables S2 and S4). Assemblies from single libraries of the other four cacti were more fragmented (Table S5).

View this table:
  • View inline
  • View popup
Table S2.

C. gigantea (SGP5) assembly metrics (percent of assembled genome included where appropriate)

View this table:
  • View inline
  • View popup
Table S3.

Saguaro transcriptome sequence and assembly results

View this table:
  • View inline
  • View popup
Table S4.

Repeat annotation of the saguaro genome

View this table:
  • View inline
  • View popup
Table S5.

Comparison of assembly results for five cacti in this study

The high completeness of the gene space in the saguaro assembly (Table S2) let us construct 4,436 alignments of orthologous genes across the five cacti, each of which contained two or more exons for all five taxa, and contiguous introns. Based on gene tree confidence levels estimated from alignments containing all nucleotide positions, sets of alignments differing in phylogenetic robustness (“gene confidence sets”) were compiled for downstream analyses. For example, the 90% gene confidence set comprised 458 genes, with gene trees having maximum likelihood bootstrap support value above 90% for all clades (Table 1). These gene sets provide a control for the effect of weakly supported gene trees on gene tree discordance estimates (33).

View this table:
  • View inline
  • View popup
Table 1.

Gene tree discordance relative to the Pachycereeae species tree (Fig. 2B)

The most robust 90% gene confidence set implies levels of gene tree discordance between Pachycereeae genera (Fig. 2A) comparable to those seen in closely related short-lived angiosperm genera such as Solanum sect. Lycopersicon [time scale of 2 million years (MYR); ref. 24] and the Oryza AA genome clade (2.5 MYR: ref. 22). Species trees constructed using both gene tree-based (MP-EST; ref. 34) and alignment-based methods (BPP; ref. 35) were the same for all gene tree confidence sets and for three different partitioning schemes for the alignments by codon position and intron (Materials and Methods and Fig. 2B). An exhaustive search of tree space indicated a single optimal species tree under the MP-EST pseudolikelihood criterion, and three replicate runs of each MCMC chain in BPP from random starting trees generated this same species tree for all five gene confidence sets, indicating convergence. This optimal tree agreed with previous molecular phylogenetic analyses of mostly plastid genes (6, 28, 29). In the 80% and 90% gene confidence sets, 61% and 63% of gene trees, respectively, agreed with the species tree (Table 1). Estimates of species tree edge lengths for these gene confidence sets in BPP ranged from 1.11 to 1.24 for edge A and from 1.22 to 1.34 for edge B, in “coalescent time units” (= T/2Ne, where T is measured in generations), depending on data partition, with slightly lower values estimated by MP-EST (Table 1).

Fig. 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 2.

(A) Topological discordance in the 90% gene confidence set of 458 gene trees visualized in DensiTree (78). Blue trees are the most frequent, followed by red and shades of green. (B) Species tree inferred with BPP and MP-EST, which was the same for all gene sets, with divergence times estimated using the Neutral partition. Blue dashed line indicates reticulation, having inferred inheritance probability, γ, on the optimal PhyloNet analysis of the same set of gene trees. Green dashed line indicates position of reticulation on the next best inferred network (Fig. S1). The origin of this reticulation edge earlier than its endpoint implies the existence of an extinct or unsampled taxon.

Introgression may also contribute to gene tree discordance. Network reconstruction allowing both ILS and gene flow in PhyloNet (36) indicated substantial support for a network model with one reticulation vs. a tree model (ΔAIC = 17.0 and 28.6 for the 90% and 80% gene sets, respectively; ref. 37), implicating introgression into P. pringlei from either S. thurberi or some more closely related but unsampled or extinct taxon (Fig. 2B and Fig. S1). The best two networks were the same in both gene confidence sets (although the ranking reversed; Fig. S1). Moreover, the “major tree” within these networks (Materials and Methods) was the same as that inferred by species tree methods, and the inheritance probability estimate from the 90% gene confidence set, which indicates the level of introgression, was low (6.5% and 14.2% for the optimal and next best networks in the 90% gene confidence set; although higher in one of the two networks from the 80% gene set). The two optimal networks were consistently recovered in multiple searches from different starting networks, and in each gene set, the two results were substantially better than other suboptimal networks (Fig. S1).

Fig. S1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. S1.

PhyloNet (36) network reconstructions for the 90% (A) and 80% (B) gene confidence sets. Likelihood scores decrease from left to right. AIC scores are relative to the optimal networks at Left. Branch lengths are not scaled to sequence divergence, but the lesser of the two inheritance probabilities for reticulation edges is given next to edge in the format ##.#%. Parametric bootstrap probabilities are given next to edges for top scoring networks in A and B, as percentages in the format ##. The dotted box encloses the two best networks in each gene confidence set, and these are the same in A and B, but in reverse ranking. The remaining networks have substantially worse AIC scores relative to the optimal networks in each row.

Intergeneric hybrids are well known in cacti (1, 5), including rare hybrids between P. pringlei and Bergerocactus emoryi in Baja California in a narrow zone where the two species are sympatric (1, 38). Bergerocactus (not sampled here) is more closely related to Carnegiea, Lophocereus, and Pachycereus than to Stenocereus in the most complete cactus phylogeny to date (6). P. pringlei is possibly a recent autotetraploid (39), so postzygotic inviability and sterility barriers (40) would have had to have been overcome if the introgression took place following tetraploidy.

Owing to the limited level of introgression, we inferred demographic history using BPP assuming that ILS is the primary cause of discordance. The neutral mutation rate was estimated from the substitution rate in the “neutral” partition of the alignments (Table S6) assuming a root age of the tree at 26.88 MYR (Materials and Methods). Estimates ranged from μG = 5.98 to 6.07 × 10−8 per site for each generation across the five gene confidence sets. An alternative estimate based on pairwise synonymous Ks distances in CDS regions between Carnegiea and Pereskia from all 4,426 alignments was somewhat higher at μG = 8.75 × 10−8 per site per generation (Table S7). Per year rates are comparable to several angiosperm tree species (41⇓–43), but generation times in the cacti, ranging from 20 to 75 y (44⇓⇓–47), are 2–12 times longer. The BPP estimates of scaled mutation rate (θ = 4 NeμG), imply an Ne of 24,000–39,000 and 31,000–36,000 for edges A and B, respectively, over the 15 gene sets and partitions (Table S6). Combining the BPP estimates of θ with the pairwise Ks estimate of mutation rate produces ancestral Ne of ∼2/3 the BPP estimates. These ancestral Ne values are higher than estimates of recent Ne for some perennial angiosperms [Populus trichocarpa: ∼4,000−6,000 (48); Eucalyptus grandis: ∼11,000 (41); Amborella trichopoda: 5,000 (49)] but comparable to Populus balsamifera (44,000−59,000; ref. 50); and much less than Pinus taeda (560,000; ref. 51).

View this table:
  • View inline
  • View popup
Table S6.

Demographic inference based on BPP analysis

View this table:
  • View inline
  • View popup
Table S7.

Neutral mutation (substitution) rate analyses across cacti

The probability that a rooted gene tree with three taxa disagrees with the species tree that contains it is 2/3 e−T/2Ne, where T is the time in generations along the internal edge of the tree (52). This can be high if T is small, either because divergence time in years is small or generation time is large. Thus, the level of gene tree discordance we inferred in columnar cacti having Ne of 25–40,000, along edges with a duration of 2–3 million years would be far too high were it not for the long generation times of these plants, similar to findings in long-lived conifers (53).

Gene tree discordance has a strong impact on the distribution of protein sequence variation among these genera of cacti in generating hemiplasy. Taking the 80% and 90% gene confidence sets as samples of potentially phenotypically relevant amino acid sequence variation across these genomes, we partitioned the homoplastic amino acid sites in genes on the species tree into three parts: the fraction arising from homoplasy on concordant gene trees (Fig. 1, Upper Right), the fraction arising solely from gene trees with less homoplasy than the species tree but that are discordant with the species tree (hemiplasy: Fig. 1, Lower Left), and the small fraction that arises from homoplasy on discordant gene trees that is equally homoplastic on the species tree (which as yet has no term defined: Fig. 1, Lower Right). Hemiplasy accounts for 58–63% of all apparent homoplasy (Table 2).

View this table:
  • View inline
  • View popup
Table 2.

Homoplasy and hemiplasy in amino acid alignments of genes

As examples, consider two genes from the 90% gene confidence set having strongly discordant gene trees. Both play potential roles in the physiological adaptation of cacti to arid environments. The saguaro nuclear gene annotated as a chloroplastic NADP-dependent malate dehydrogenase (MDH: Cgig1_18427) is the only 1 of 11 MDH isoforms in the saguaro annotation that is NADP-dependent. It catalyzes the reduction of oxaloacetate to malate in the chloroplast (54) and appears to participate in the fixation of carbon dioxide under both light and dark conditions (55). Together with the cytosolic NAD-dependent MDH, these two isoforms may be primarily responsible for malate formation during dark CO2 fixation in crassulacean acid metabolism (CAM) plants (56). The MDH gene tree has a single replacement of an ancestral serine with an alanine, but the gene tree is discordant with the species tree, making the alanines appear to have evolved twice on the species tree, when they are in fact hemiplastic.

The gene annotated as a DNAJ JJJ1 homolog (Cgig1_00352) contains a DNAJ domain involved in heat shock protein interactions. Columnar cacti must be able to tolerate internal tissue temperatures that can exceed 50 °C in the summer (5). The gene tree is discordant with the species tree, and 19 of its 20 potentially informative amino acid replacements exhibit homoplasy on the species tree, but only one does on the gene tree, meaning 18 sites are hemiplastic. A search of the National Center for Biotechnology Information (NCBI) CDD database (57) shows that two of these hemiplastic sites are in the conserved DNAJ domain proper, although neither involves known HSP70 interaction sites.

Hemiplasy is not restricted to the coding sequence in this gene. The phylogeny of the 1,000 bp upstream of the start codon, which often contains proximal conserved regulatory elements in plants (58), is the same discordant tree as is found for the coding region. Of 22 potentially informative nucleotide sites there, 21 are homoplastic on the species tree, whereas only two are on the gene tree. Most homoplasy in this noncoding region on the species tree is thus also due to hemiplasy. All of these cases suggest caution in viewing multiple origins of the same amino acid or nucleotide in a species tree as possible evidence of convergent evolution (18).

These conclusions depend on the robustness of various model assumptions. The extent of gene tree discordance and hemiplasy was estimated from gene tree topologies inferred with an HKY substitution model in PAUP*. Estimates of tree topology are generally more robust than estimates of rates, especially at low sequence divergences (59⇓–61), where simple models sometimes even outperform more general ones (62). Mean pairwise nonsynonymous and synonymous distances between the most divergent Pachycereeae were only 1.3% and 4.1% respectively, and about one-half of the 4,436 sequence alignments had bootstrap values below 50%, a consequence of low sequence divergence and short length. Substitution model therefore likely had little impact on estimates of the prevalence of discordance and hemiplasy.

However, our explanation for these levels of hemiplasy in terms of generation time rests on estimates of divergence times and Ne obtained from BPP, which adds the assumptions of panmixia within and no gene flow between species, constant population sizes within tree edges, and no linkage (61). To examine the impact of BPP’s Jukes–Cantor clock model on its estimates of divergence times and effective population sizes, we evaluated more general models with subsets of the data using BEAST (63), finding that the parameter effect sizes were small (SI Materials and Methods and Table S8). Other assumptions were more difficult to test. Panmixia will not be strictly true even for outcrossing columnar cacti, but gene flow is likely high within three of the four Pachycereeae that are bat pollinated (e.g., S. thurberi; ref. 64). We did find limited gene flow between species in PhyloNet analyses, and this can lead to overestimates of population sizes and underestimates of divergence times when using the multispecies coalescent approach (65), but because the rates of ILS correlate with the product of population size and time, these biases potentially counteract each other. However, it is unlikely that population sizes have remained constant within species tree edges. Paleoclimatic and packrat midden evidence document the ebb and flow of Sonoran Desert plant communities during Pleistocene glacial and interglacial periods (66). BPP’s divergence time estimates are biased toward the recent in “more extreme bottleneck scenarios” of population history (67), but shorter edge durations in Pachycereeae would tend to make the observed level of ILS even higher than estimated. Finally, linkage is unlikely to be problematic. Our 80% gene confidence set has an average of 1.8 Mb between genes. Linkage disequilibrium in long-lived, outcrossing angiosperms typically decays in distances from a few kb to 50 kb (41, 48).

View this table:
  • View inline
  • View popup
Table S8.

Effect of model choice on parameter estimates in nine samples of 50 genes from 90% gene confidence set, assayed in BEAST 2 (63)

The phylogenomic history of saguaro and its relatives exhibits extensive gene tree discordance due to ILS and low rates of introgression. Comparable findings have been seen in rapid and/or very recent radiations (20, 22, 24), but in Pachycereeae, ILS acts at long time scales, between taxonomically divergent genera with very long generation times. A consequence of this discordance is elevated levels of apparent homoplasy in the species tree. The connection between apparent genomic homoplasy arising from ILS and apparent phenotypic homoplasy is probably strongest for traits with a simple genetic architecture (18), such as those involving the function or regulation of a single enzyme (21, 68). In plants, enzymes in biosynthetic pathways for floral pigments or defensive compounds are good candidates (69, 70). Notably, the taxonomic distribution of alkaloids, triterpenes, and sterols have played a role in the systematics of Pachycereeae (30). When hemiplasy arises because of introgression, genetic architecture may be less of an issue because multiple loci may undergo gene flow almost simultaneously (18). In plants, even complex traits with multiple components and fitness effects have been adaptively introgressed (71). Collectively, this body of evidence lends plausibility to the hypothesis that the phenotypic effects of genomic hemiplasy may have exacerbated the long-standing problem of inferring relationships in these charismatic cacti.

SI Materials and Methods

Sampling and Genome Sequencing.

Fresh material was obtained from an individual C. gigantea (Engelm.) Britton & Rose (“SGP5”) growing at the Tumamoc Hill Reserve in Tucson, AZ, and from a second plant purchased at a Tucson nursery (“SGP3”). Fresh material of L. schottii (Engelm.) Britton & Rose [=Pachycereus schottii (Engelm.) D. R. Hunt], P. pringlei (S. Watson) Britton & Rose, P. humboldtii Britton & Rose (=P. horrida DC), and S. thurberi (Engelm.) Buxb. were obtained from cultivated individual plants at the Desert Botanical Garden (Table S1). Genomic DNAs were extracted using a modified cetyltrimethylammonium bromide (CTAB) method (64).

For C. gigantea, paired end (PE) and mate pair (MP) libraries were constructed as described previously (82) (Table S1). The C. gigantea genome was sequenced in four different libraries (insert size ranging from 180 bp to 2.5 kb) on Illumina HiSeq (2 × 100-nt reads) and MiSeq (2 × 300-nt reads) sequencers. Single PE libraries were constructed for Lophocereus, Pachycereus, Pereskia, and Stenocereus and sequenced on Illumina NextSeq 500 (2 × 150 nt reads; Table S1). All reads were deposited in GenBank (Table S1).

Genome Assemblies.

Raw reads were trimmed using Sickle (83) for bases below QV20 and shorter than 20 bp. Genome size was estimated from k-mer frequency using the script KmerFreq_AR from the SOAPdenovo v2.01 suite (84). The C. gigantea data were error-corrected with BLESS v. 0.14 (85), and residual adapter bases were removed with PRINSEQ (86).

For all species, preprocessed reads were assembled de novo in scaffolds with Meraculous v. 2.0.4 (87) with the diploid option enabled. Out of the multiple iterations tested at different k-mer values, the most contiguous saguaro assembly was obtained with a k-mer value of 69; the genomes of P. pringlei, L. schottii, and S. thurberi were assembled with k-mer values of 55, 37, and 57, respectively. Gaps were closed with Platanus (88). Scaffolding was performed with BESST v. 1.2.3 (89) using the paired end and (when available) mate pair libraries; each scaffolding step was followed by a step of gap closing. Low confidence contigs or gaps were broken with REAPR v. 1.0.17 (90), producing the final assembly. Finally, we removed scaffolds and contigs shorter than 1 kb (saguaro) or 200 bp (all other species) that matched the ϕX174 genome, or that aligned to the saguaro chloroplast (82) for 90% of their length and had at least 90% similarity.

Blastx alignments against GenBank databases revealed significant hits only to other plant sequences. Completeness of the gene-space component of the genome assembly was assessed by running CEGMA (91) and BUSCO (92), using for the latter the Plantae database and tomato as the species for the Augustus gene prediction.

Annotation.

A saguaro repeat library was developed using RepeatExplorer (93) parsing the output as described previously (94). The annotation of the repeats in the saguaro assembly was obtained merging the output of RepeatMasker (www.repeatmasker.org/, v. 3.3.0) and Blaster (a component of the REPET package; ref. 95). Reconciliation of the two masked repeat sets was carried out using custom Perl scripts and formatted as gff3 files.

To identify noncoding RNAs, Infernal (96) was run using the Rfam library Rfam.cm.12.2. Hits with an e value higher than the threshold of 1e-5 were removed, as well as results with score below the family-specific gathering threshold. When overlapping loci were predicted, only the hit with the highest score was kept. Transfer RNAs were predicted using tRNAscan-SE (97) with default parameters.

We also obtained transcriptome data for saguaro using an RNA-seq approach. Samples were collected from SGP3 and from seedlings of the SGP5 specimen (Table S3). Libraries were constructed with the Ovation RNA-Seq System V2 (NuGEN) kit and sequenced on an Illumina instrument (Table S3). All reads were deposited in GenBank (Table S3).

The saguaro genome assembly was annotated using the MAKER pipeline (98). Several transcript and protein sequence datasets were identified to aid in gene prediction. Transcript unigenes were generated for the peyote cactus Lophophora williamsii from publicly available RNA-seq datasets (SRR1575212, SRR1575317, and SRR1575216). Saguaro RNA-seq reads were assembled de novo with Trinity r2013-02-25 (99), with the normalization option (Table S3). Assembled transcripts were merged in a unique EST file and clustered at 95% with Usearch v4.2.66 (100).

Protein sequences used for evidence during gene predictions included predicted proteins from Arabidopsis thaliana (TAIR10, ref. 101), Uniprot/Swissprot proteins (102), and GenBank protein sequences for the Caryophyllalean species Beta vulgaris and Spinacia oleracea. After repeat masking and transcript and protein alignment, high-quality saguaro transcript alignments were used to create an initial set of gene models that were used to train the Hidden Markov Model (HMM) for the SNAP gene prediction program (103). MAKER was run a second time, and genes were predicted using SNAP. High-quality SNAP gene predictions (annotation edit distances ≤ 0.2; ref. 104) were used to train the HMM for SNAP a second time, and MAKER was run again allowing the second SNAP HMM to predict genes. High-quality genes from the second SNAP run were then used to train the HMM for AUGUSTUS (105). MAKER was run a final time allowing genes to be predicted by both SNAP and AUGUSTUS. Alignments of single exon transcripts spanning less than 500 bp were excluded from the final MAKER run, and the “always_complete” and “keep_preds” options were set to 1. Predicted genes with 50 amino acids or less were discarded. Overall, 87% of the genes (25,862 models) had annotation edit distance between 0 and 0.5, with more than half (15,635) between 0 and 0.2.

Predicted proteins were analyzed with HMMER hmmscan to identify matching Pfam domains (106, 107). Gene models with support from transcript evidence, protein evidence, and/or matching Pfam domains were retained as an initial high-quality gene set that was further probed to identify and remove transposable element-related genes. Transcripts with homology to known transposable elements (TEs) and containing domains matching TE-related Pfam domains were excluded. The final set of 28,292 predicted genes were functionally annotated using the Trinotate pipeline (trinotate.github.io). Genes were deemed as transcribed if they had a Salmon v.0.6.1 (108) transcripts per million score equal or greater to 5, or if the Lophophora williamsii or Opuntia ficus-indica (SRR 1616998) unigenes aligned for more than 50 bp with at least 80% blastn similarity. Under these criteria, 50% (14,164) of gene models had evidence of transcription from saguaro RNA-Seq data, and two-thirds (18,597) if including evidence from assembled unigenes.

Robustness of BPP Model Assumptions.

BPP’s substitution model for computing gene tree likelihoods is the Jukes–Cantor (JC) model with a molecular clock. We evaluated model choice in BPP relative to more complex models using likelihood ratio tests (109) with likelihoods computed in PAUP*. Of the 2,291 gene alignments in the most inclusive 50% gene confidence set, the JC model was rejected in favor of HKY85 in 97.8% of the alignments (P = 0.05; df = difference in model parameters = 4; base frequencies estimated). The molecular clock was rejected in far fewer: 18.6–21.6% of alignments (P = 0.05; df = #taxa-2 = 3), depending on the confidence set. Results were similar for other partitions. Although few in number, these genes might conceivably have an outsize effect. To estimate the magnitude of overall lineage effects in Pachycereeae, we concatenated all alignments in the 90% gene set in which the gene tree was concordant with the species tree and estimated branch lengths in PAUP* with an HKY85 + Gamma model (estimating ti/tv ratio). There was evidence of a slight slowdown in rate in the Carnegiea/Pachycereus clade, but overall the effect size for lineage effects was small, with a coefficient of variation in root to tip rates of 13.2%.

To examine model robustness in more detail, we compared population size and divergence time parameter estimates (posterior means) from our data using another program, BEAST 2 v. 2.4.6 (63), which implements more substitution models, including a relaxed clock. We estimated parameters in nine independent sets of 50 genes from the 90% gene All partition set comparing either JC/HKY85 models (assuming a clock), or clock/UCLN (uncorrelated lognormal) models (assuming JC). Input files for BEAST, in XML format, were generated in BEAUti using the StarBeast2 package. Models were unlinked across genes, sites, trees, and clock rates. We estimated a single population size for each branch by setting the population model to “constant” population size. We assigned the same priors in BEAST as we had used in BPP, noting that population sizes in BEAST are in units of 2Neµ, whereas BPP uses units of 4Neµ. “Clock rates” and SDs for each gene tree were unlinked, and we assumed the default prior exponential distribution for the SD. Number of generations was set to 10–50 million as needed to obtain ESS values above 200; information from the first 10% of the MCMC chains was discarded.

Paired t tests indicated no significant differences between JC and HKY85 models for population size, although estimates at one internal node were nearly significant. Effect sizes of the models (the difference between the mean estimate across replicates in the two treatments) were less than 2.8% (Table S8). Divergence time estimates, which were scaled to the root age, were not significantly different between JC/HKY85 models. Population size estimates were more sensitive to clock/UCLN model differences, but the effect sizes were still small. For two of the three internal edges, population size estimates varied by as much as 10.6% between the clock and UCLN models. Divergence time estimates were less sensitive to the clock/UCLN model assumptions, with no significant t tests and effect sizes of less than 3.4% of the root to tip distance in the tree. Overall, these experiments suggest that these datasets are large enough to uncover significant but subtle differences in parameter estimates between some models. However, these effects are unlikely to be large enough to affect our conclusions.

Materials and Methods

Sampling, Genome Sequencing, and Annotation.

For details of taxa sampled, library construction, genome sequencing, assembly, and annotation, see SI Materials and Methods.

Phylogenomic Analyses.

Gene alignments and gene trees.

Sets of gene trees were inferred from gene alignments constructed from the genome assemblies with custom PERL scripts. CDS and intron regions were extracted from the saguaro genome based on its annotation. Each region was used as a query in blastn searches (72) against the other four cactus assemblies, with an e value cutoff of 10−10 for the intron searches. To enrich for orthologs, (i) a CDS region was kept for further processing if it returned exactly one hit for all four taxa and (ii) the gene was kept if and only if at least two CDS regions from the same gene passed test (i).

We then used predicted saguaro amino acid sequences within the retained CDS regions in pairwise tblastn runs against the nucleotide hits found in each of the four cacti. Thereby, we obtained subject sequences in the same frame as the saguaro query. We used Muscle (73) to align these protein regions and then tranalign (74) to align the underlying nucleotide sequences. Each gene thus consists of CDS regions that have sequence data present for all taxa. Given at least one such region, introns were then evaluated. An intron region was kept if it returned exactly four hits, one per taxon, and each hit covered at least 50% of the query saguaro sequence. Any alignment in which at least one taxon had more than 50% gaps or missing data was excluded from further consideration.

Three partitions of each gene nucleotide alignment were prepared: one having just third positions in codons (“Third”), one having third positions plus introns (“Neutral”); and one having all nucleotide positions (“All”). Bootstrap maximum likelihood majority rule trees for the All positions alignment were constructed with PAUP* v. 4.0a with an HKY85 model (75). “Gene confidence sets” were compiled based on the quality of their trees (e.g., the “90%” set is all genes for which the minimum bootstrap value is 90% for all clades). Sets were assembled at 50, 60, 70, 80, and 90% levels and rooted with Pereskia.

Species tree inference.

Species trees were constructed first by an alignment-based Bayesian method, BPP v.3.3a (35, 61), in which the input was the alignments from a gene confidence set and partition. Convergence of Markov chains was checked by running three chains from random starting trees for each of the five gene confidence sets (burnin = 4,000; number of generations = 40,000). This is aimed at avoiding multiple optima in the posterior (76) and may be more informative than examining acceptance rates or effective sample sizes (35).

The prior for the scaled root divergence time, τroot, was set to a gamma distribution, based on the mean and variance of pairwise Ks distances between saguaro and Pereskia (Table S7). The prior for scaled population size, θroot, was set to a gamma distribution with a mean estimated from the modern nucleotide diversity of saguaro, which was estimated, using the program PSMC (77) applied to the saguaro genome scaffolds ≥100 kb in length, to be π = 0.00146 (variance set to half the mean). The rate variation prior was set to a Dirichlet prior with parameter α = 2.

A second, tree-based pseudolikelihood method, MP-EST v. 1.5 (34), was used with an input of all rooted gene trees constructed from the All partition from each gene confidence set. In each, five replicate searches from random starting species trees were done. To check for multiple optima, an exhaustive enumeration of pseudolikelihood scores for all possible ingroup rooted species trees was done by supplying MP-EST with those 15 trees.

Gene tree discordance.

Discordance between gene trees and the inferred species tree was assayed on a clade basis and for the entire gene tree jointly using PAUP*. Discordant gene trees were binned into groups of identical topology by comparing them to all 14 possible rooted discordant binary trees for five taxa using a custom PERL script. Topological discordance in the All partition was visualized using DensiTree (78) by making the gene trees in the 90% gene confidence set ultrametric using a clock model in PAUP* and scaling root ages to 1.0 (Fig. 2A).

Introgression and network reconstruction.

We used InferNetwork_ML in PhyloNet (36) to reconstruct optimal phylogenetic networks using maximum likelihood, using as input gene trees from the All partition [invoked with “InferNetwork_ML (all) h -n 10 -di -o -po -x 20 -s starting_tree”, where h is the maximum allowed number of reticulation events]. Values of h of 0 (a tree) and 1 were each run from three different starting topologies. Each inferred reticulation node has two incident edges with inheritance probabilities γ and 1 − γ. The tree imbedded in the network, obtained by following the path along the larger of the two inheritance probabilities, is called the major tree.

To compare solutions, we used the AIC information criteria (37): AIC = 2k − 2 log L, where k is the number of parameters in the model and L is the maximum likelihood value of the model. To assess the statistical support for the optimal network, we used PhyloNet’s parametric bootstrap procedure with the same search parameters as used previously.

Demographic inference.

We used BPP to infer divergence times and effective population sizes conditional on the species tree found above. To ensure convergence of the Markov chains, we increased the number of generations to 400,000 (burnin = 40,000), leading to all parameter estimates having an acceptable effective sample size (ESS) >500 (79) and acceptance rates lying in the range 0.25–0.40. We also examined parameter traces in Tracer (63). The raw parameter output of BPP is in units of sequence divergence, D = Tμ, where T is the time in generations and μ is the per generation mutation rate per site, and scaled effective population size, θ = 4Neμ. Note that 2D/θ = T/2Ne is the edge length in “coalescent time units.”

Quantifying Hemiplasy.

For any site in an alignment, let mS and mG be the inferred number of state changes on the species and gene trees respectively. If the two trees are concordant, mS = mG, but if the trees are discordant, then the number of changes may differ. The simplest case of hemiplasy is mG = 1 but mS > 1: A homology on the gene tree is a homoplasy on the species tree (Fig. 1, Lower Left). A site may also exhibit homoplasy on the gene tree: mG > 1 (Fig. 1, Lower Right). We define a hemiplastic site as one in which mS > mG. The exceptional case of mG > 1 but mS = mG is “homoplasy but not hemiplasy.”

Ancestral states for each residue in each protein sequence alignment in the most robust 80% and 90% gene confidence sets were reconstructed using parsimony with PAUP*, and mS and mG were computed for all potentially parsimony informative sites. Sites homoplastic on the species tree for concordant gene trees, and sites homoplastic or hemiplastic for nonconcordant gene trees, were tallied by a custom PERL script.

Neutral Mutation Rate Estimates.

We estimated neutral mutation rate from BPP output as sequence divergence from root to tip divided by the crown group age of cacti, since the assumed model is ultrametric. We also used codeml (80) to infer pairwise synonymous divergences in the coding regions across the entire set of gene trees, using the crown age of Cactaceae estimated at 26.88 MYR (81).

Acknowledgments

We thank S. Kumar, T. Hernández-Hernández, B. Rannala, K. Steele, F. Tax, L. Venable, and D. Zwickl for discussion, and the Tumamoc Hill Reserve, Tucson, and the Desert Botanical Garden, Phoenix, for permission to collect material. Funding was provided by the University of Arizona–Universidad Nacional Autónoma de México Consortium for Drylands Research, the Tucson Cactus and Succulent Society, and Arizona State University’s College of Liberal Arts and Sciences and the School of Life Sciences. A.B. received sabbatical support at the University of Arizona from DGAPA-Universidad Nacional Autónoma de México and by Programa de Apoyo a Proyectos de Investigación e Innovación Tecnológica, Universidad Nacional Autónoma de México Grant IN213814. N.K.W. received support from NIH Grant R35GM119816.

Footnotes

  • ↵1To whom correspondence should be addressed. Email: sanderm{at}email.arizona.edu.
  • Author contributions: D.C., A.B., M.M.M., N.K.W., R.A.W., M.F.W., and M.J.S. designed research; D.C., A.B., E.B., J.L.M.C., M.M.M., M.F.W., and M.J.S. performed research; D.C., K.L.C., L.E.E., S.L., T.L.L., M.M.M., M.F.W., and M.J.S. analyzed data; and D.C., A.B., E.B., L.E.E., T.L., M.M.M., N.K.W., M.F.W., and M.J.S. wrote the paper.

  • The authors declare no conflict of interest.

  • This article is a PNAS Direct Submission.

  • Data deposition: The sequences reported in this paper have been deposited in GenBank under BioProject number PRJNA318822; individual accession numbers are listed in Tables S1, S3, and S5.

  • This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1706367114/-/DCSupplemental.

Published under the PNAS license.

References

  1. ↵
    1. Anderson EF
    (2001) The Cactus Family (Timber, Portland, OR), p 776.
  2. ↵
    1. Hunt D
    (2006) The New Cactus Lexicon (Remous, Milborne Port, UK).
  3. ↵
    1. Schumann K
    (1903) Gesamtbeschreibung der Kakteen (Monographia Cactacearum) (J. Neumann, Neudamm, Germany), 2nd Ed, p 832.
  4. ↵
    1. Backeberg C
    (1966) Das Kakteenlexicon (Gustav Fischer, Stuttgart).
  5. ↵
    1. Gibson AC,
    2. Nobel PS
    (1986) The Cactus Primer (Harvard Univ Press, Cambridge, MA), p 286.
  6. ↵
    1. Hernández-Hernández T, et al.
    (2011) Phylogenetic relationships and evolution of growth form in Cactaceae (Caryophyllales, Eudicotyledoneae). Am J Bot 98:44–61.
    OpenUrlAbstract/FREE Full Text
  7. ↵
    1. Sanderson M,
    2. Hufford L
    , eds (1996) Homoplasy: The Recurrence of Similarity in Evolution (Academic, New York).
  8. ↵
    1. Britton NL,
    2. Rose JN
    (1920) The Cactaceae (Dover, Mineola, NY), 2nd Ed.
  9. ↵
    1. Nyffeler R
    (2002) Phylogenetic relationships in the cactus family (Cactaceae) based on evidence from trnK/ matK and trnL-trnF sequences. Am J Bot 89:312–326.
    OpenUrlAbstract/FREE Full Text
  10. ↵
    1. Wake DB
    (1991) Homoplasy–The result of natural-selection, or evidence of design limitations. Am Nat 138:543–567.
    OpenUrlCrossRef
  11. ↵
    1. Cota JH,
    2. Wallace RS
    (1997) Chloroplast DNA evidence for divergence in Ferocactus and its relationships to North American columnar cacti (Cactaceae: Cactoideae). Syst Bot 22:529–542.
    OpenUrlCrossRef
  12. ↵
    1. Griffith MP,
    2. Porter JM
    (2009) Phylogeny of Opuntioideae (Cactaceae). Int J Plant Sci 170:107–116.
    OpenUrlCrossRef
  13. ↵
    1. Nyffeler R,
    2. Eggli U
    (2010) A farewell to dated ideas and concepts: Molecular phylogenetics and a revised suprageneric classification of the family Cactaceae. Schumannia 6:109–149.
    OpenUrl
  14. ↵
    1. Porter JM,
    2. Kinney M,
    3. Heil KD
    (2000) Relationships between Sclerocactus and Toumeya (Cactaceae) based in chloroplast trnL-F sequences. Haseltonia 7:8–23.
    OpenUrl
  15. ↵
    1. Bárcenas RT,
    2. Yesson C,
    3. Hawkins JA
    (2011) Molecular systematics of the Cactaceae. Cladistics 27:470–489.
    OpenUrlCrossRef
  16. ↵
    1. Goettsch B, et al.
    (2015) High proportion of cactus species threatened with extinction. Nat Plants 1:15142.
    OpenUrl
  17. ↵
    1. Ogburn RM,
    2. Edwards EJ
    (2009) Anatomical variation in Cactaceae and relatives: Trait lability and evolutionary innovation. Am J Bot 96:391–408.
    OpenUrlAbstract/FREE Full Text
  18. ↵
    1. Hahn MW,
    2. Nakhleh L
    (2016) Irrational exuberance for resolved species trees. Evolution 70:7–17.
    OpenUrlCrossRefPubMed
  19. ↵
    1. Avise JC,
    2. Robinson TJ
    (2008) Hemiplasy: A new term in the lexicon of phylogenetics. Syst Biol 57:503–507.
    OpenUrlCrossRefPubMed
  20. ↵
    1. Suh A,
    2. Smeds L,
    3. Ellegren H
    (2015) The dynamics of incomplete lineage sorting across the ancient adaptive radiation of neoavian birds. PLoS Biol 13:e1002224.
    OpenUrlCrossRefPubMed
  21. ↵
    1. Storz JF
    (2016) Causes of molecular convergence and parallelism in protein evolution. Nat Rev Genet 17:239–250.
    OpenUrlCrossRefPubMed
  22. ↵
    1. Zwickl DJ,
    2. Stein JC,
    3. Wing RA,
    4. Ware D,
    5. Sanderson MJ
    (2014) Disentangling methodological and biological sources of gene tree discordance on Oryza (Poaceae) chromosome 3. Syst Biol 63:645–659.
    OpenUrlCrossRefPubMed
  23. ↵
    1. Fontaine MC, et al.
    (2015) Mosquito genomics. Extensive introgression in a malaria vector species complex revealed by phylogenomics. Science 347:1258524.
    OpenUrlAbstract/FREE Full Text
  24. ↵
    1. Pease JB,
    2. Haak DC,
    3. Hahn MW,
    4. Moyle LC
    (2016) Phylogenomics reveals three sources of adaptive variation during a rapid radiation. PLoS Biol 14:e1002379.
    OpenUrlCrossRefPubMed
  25. ↵
    1. Yu Y,
    2. Dong J,
    3. Liu KJ,
    4. Nakhleh L
    (2014) Maximum likelihood inference of reticulate evolutionary histories. Proc Natl Acad Sci USA 111:16448–16453.
    OpenUrlAbstract/FREE Full Text
  26. ↵
    1. Rasmussen MD,
    2. Kellis M
    (2012) Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Res 22:755–765.
    OpenUrlAbstract/FREE Full Text
  27. ↵
    1. Gibson AC,
    2. Spencer KC,
    3. Bajaj R,
    4. McLaughlin JL
    (1986) The ever-changing landscape of cactus systematics. Ann Mo Bot Gard 73:532–555.
    OpenUrl
  28. ↵
    1. Hartmann S,
    2. Nason JD,
    3. Bhattacharya D
    (2002) Phylogenetic origins of Lophocereus (Cactaceae) and the senita cactus-senita moth pollination mutualism. Am J Bot 89:1085–1092.
    OpenUrlAbstract/FREE Full Text
  29. ↵
    1. Arias S,
    2. Terrazas T,
    3. Cameron K
    (2003) Phylogenetic analysis of Pachycereus (Cactaceae, Pachycereeae) based on chloroplast and nuclear DNA sequences. Syst Bot 28:547–557.
    OpenUrl
  30. ↵
    1. Gibson AC,
    2. Horak KE
    (1978) Systematic anatomy and phylogeny of Mexican columnar cacti. Ann Mo Bot Gard 65:999–1057.
    OpenUrlCrossRef
  31. ↵
    1. Edwards EJ,
    2. Nyffeler R,
    3. Donoghue MJ
    (2005) Basal cactus phylogeny: Implications of Pereskia (Cactaceae) paraphyly for the transition to the cactus life form. Am J Bot 92:1177–1188.
    OpenUrlAbstract/FREE Full Text
  32. ↵
    1. Bennett MD,
    2. Leitch IJ
    (2012) Plant DNA C-values database (release 6.0, December 2012). Available at data.kew.org/cvalues/. Accessed December 1, 2016.
  33. ↵
    1. Xu B,
    2. Yang Z
    (2016) Challenges in species tree estimation under the multispecies coalescent model. Genetics 204:1353–1368.
    OpenUrlAbstract/FREE Full Text
  34. ↵
    1. Liu L,
    2. Yu L,
    3. Edwards SV
    (2010) A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol Biol 10:302.
    OpenUrlCrossRefPubMed
  35. ↵
    1. Rannala B,
    2. Yang Z
    (2017) Efficient Bayesian species tree inference under the multispecies coalescent. Syst Biol 66:823–842.
    OpenUrl
  36. ↵
    1. Than C,
    2. Ruths D,
    3. Nakhleh L
    (2008) PhyloNet: A software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics 9:322.
    OpenUrlCrossRefPubMed
  37. ↵
    1. Burnham KP,
    2. Anderson DR
    (2010) Model Selection and Multi-Model Inference (Springer, New York), 3rd Ed, p 496.
  38. ↵
    1. Moran R
    (1962) Pachycereus orcuttii—A puzzle solved. Cactus Succulent J (US) 34:88–94.
    OpenUrl
  39. ↵
    1. Murawski DA,
    2. Fleming TH,
    3. Ritland K,
    4. Hamrick JL
    (1994) Mating system of Pachycereus pringlei: An autotetraploid cactus. Heredity 72:86–94.
    OpenUrlCrossRef
  40. ↵
    1. Husband BC,
    2. Sabara HA
    (2004) Reproductive isolation between autotetraploids and their diploid progenitors in fireweed, Chamerion angustifolium (Onagraceae). New Phytol 161:703–713.
    OpenUrlCrossRef
  41. ↵
    1. Silva-Junior OB,
    2. Grattapaglia D
    (2015) Genome-wide patterns of recombination, linkage disequilibrium and nucleotide diversity from pooled resequencing and single nucleotide polymorphism genotyping unlock the evolutionary history of Eucalyptus grandis. New Phytol 208:830–845.
    OpenUrlCrossRefPubMed
  42. ↵
    1. Sollars ESA, et al.
    (2017) Genome sequence and genetic diversity of European ash trees. Nature 541:212–216.
    OpenUrl
  43. ↵
    1. Luo MC, et al.
    (2015) Synteny analysis in Rosids with a walnut physical map reveals slow genome evolution in long-lived woody perennials. BMC Genomics 16:707.
    OpenUrlCrossRef
  44. ↵
    1. Steenbergh W,
    2. Lowe C
    (1983) Ecology of the Saguaro: Growth and Demography, Part 3, National Park Service Scientific Monograph Series (Government Printing Office, Washington, DC).
  45. ↵
    1. Parker KC
    (1988) Growth-rates of Stenocereus thurberi and Lophocereus schottii in Southern Arizona. Bot Gaz 149:335–346.
    OpenUrl
  46. ↵
    1. Parker KC
    (1989) Height structure and reproductive characteristics of senita, Lophocereus schottii (Cactaceae), in Southern Arizona. Southwest Nat 34:392–401.
    OpenUrl
  47. ↵
    1. Dimmitt M
    (2016) Cactaceae (the cactus family). Available at www.desertmuseum.org/books/nhsd_cactus_.php. Accessed January 1, 2017.
  48. ↵
    1. Slavov GT, et al.
    (2012) Genome resequencing reveals multiscale geographic structure and extensive linkage disequilibrium in the forest tree Populus trichocarpa. New Phytol 196:713–725.
    OpenUrlCrossRefPubMed
  49. ↵
    1. Albert VA, et al., Amborella Genome Project
    (2013) The Amborella genome and the evolution of flowering plants. Science 342:1241089.
    OpenUrlAbstract/FREE Full Text
  50. ↵
    1. Olson MS, et al.
    (2010) Nucleotide diversity and linkage disequilibrium in balsam poplar (Populus balsamifera). New Phytol 186:526–536.
    OpenUrlCrossRefPubMed
  51. ↵
    1. Brown GR,
    2. Gill GP,
    3. Kuntz RJ,
    4. Langley CH,
    5. Neale DB
    (2004) Nucleotide diversity and linkage disequilibrium in loblolly pine. Proc Natl Acad Sci USA 101:15255–15260.
    OpenUrlAbstract/FREE Full Text
  52. ↵
    1. Hudson RR
    (1983) Testing the constant-rate neutral allele model with protein sequence data. Evolution 37:203–217.
    OpenUrlCrossRef
  53. ↵
    1. Zhou Y, et al.
    (2017) Importance of incomplete lineage sorting and introgression in the origin of shared genetic variation between two closely related pines with overlapping distributions. Heredity (Edinb) 118:211–220.
    OpenUrl
  54. ↵
    1. Scheibe R
    (1987) NADP+-malate dehydrogenase in C3-plants: Regulation and role of a light-activated enzyme. Physiol Plant 71:393–400.
    OpenUrlCrossRef
  55. ↵
    1. Cushman JC
    (1993) Molecular cloning and expression of chloroplast NADP-malate dehydrogenase during Crassulacean acid metabolism induction by salt stress. Photosynth Res 35:15–27.
    OpenUrlCrossRefPubMed
  56. ↵
    1. Mallona I,
    2. Egea-Cortines M,
    3. Weiss J
    (2011) Conserved and divergent rhythms of crassulacean acid metabolism-related and core clock gene expression in the cactus Opuntia ficus-indica. Plant Physiol 156:1978–1989.
    OpenUrlAbstract/FREE Full Text
  57. ↵
    1. Marchler-Bauer A, et al.
    (2015) CDD: NCBI’s conserved domain database. Nucleic Acids Res 43:D222–D226.
    OpenUrlCrossRefPubMed
  58. ↵
    1. Haudry A, et al.
    (2013) An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nat Genet 45:891–898.
    OpenUrlCrossRefPubMed
  59. ↵
    1. Zharkikh A
    (1994) Estimation of evolutionary distances between nucleotide sequences. J Mol Evol 39:315–329.
    OpenUrlCrossRefPubMed
  60. ↵
    1. Bos DH,
    2. Posada D
    (2005) Using models of nucleotide evolution to build phylogenetic trees. Dev Comp Immunol 29:211–227.
    OpenUrlCrossRefPubMed
  61. ↵
    1. Yang ZH
    (2015) The BPP program for species tree estimation and species delimitation. Curr Zool 61:854–865.
    OpenUrlCrossRef
  62. ↵
    1. Doerr D,
    2. Gronau I,
    3. Moran S,
    4. Yavneh I
    (2012) Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions. Algorithms Mol Biol 7:22.
    OpenUrlPubMed
  63. ↵
    1. Bouckaert R, et al.
    (2014) BEAST 2: A software platform for Bayesian evolutionary analysis. PLOS Comput Biol 10:e1003537.
    OpenUrlCrossRefPubMed
  64. ↵
    1. Bustamante E,
    2. Búrquez A,
    3. Scheinvar E,
    4. Eguiarte LE
    (2016) Population genetic structure of a widespread bat-pollinated columnar cactus. PLoS One 11:e0152329.
    OpenUrl
  65. ↵
    1. Leaché AD,
    2. Harris RB,
    3. Rannala B,
    4. Yang Z
    (2014) The influence of gene flow on species tree estimation: A simulation study. Syst Biol 63:17–30.
    OpenUrlCrossRefPubMed
  66. ↵
    1. McAuliffe JR,
    2. Van Devender TR
    (1998) A 22,000-year record of vegetation change in the north-central Sonoran Desert. Palaeogeogr Palaeoclimatol Palaeoecol 141:253–275.
    OpenUrlCrossRef
  67. ↵
    1. Barley AJ,
    2. Brown JM,
    3. Thomson RC
    (2017) Impact of model violations on the inference of species boundaries under the multispecies coalescent. Syst Biol doi:10.1093/sysbio/syx073.
    OpenUrlCrossRef
  68. ↵
    1. Lang M, et al.
    (2012) Mutations in the neverland gene turned Drosophila pachea into an obligate specialist species. Science 337:1658–1661.
    OpenUrlAbstract/FREE Full Text
  69. ↵
    1. Huang R,
    2. O’Donnell AJ,
    3. Barboline JJ,
    4. Barkman TJ
    (2016) Convergent evolution of caffeine in plants by co-option of exapted ancestral enzymes. Proc Natl Acad Sci USA 113:10613–10618.
    OpenUrlAbstract/FREE Full Text
  70. ↵
    1. Brockington SF, et al.
    (2015) Lineage-specific gene radiations underlie the evolution of novel betalain pigmentation in Caryophyllales. New Phytol 207:1170–1180.
    OpenUrlCrossRefPubMed
  71. ↵
    1. Whitney KD,
    2. Randell RA,
    3. Rieseberg LH
    (2010) Adaptive introgression of abiotic tolerance traits in the sunflower Helianthus annuus. New Phytol 187:230–239.
    OpenUrlCrossRefPubMed
  72. ↵
    1. Altschul SF, et al.
    (1997) Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 25:3389–3402.
    OpenUrlCrossRefPubMed
  73. ↵
    1. Edgar RC
    (2004) MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113.
    OpenUrlCrossRefPubMed
  74. ↵
    1. Rice P,
    2. Longden I,
    3. Bleasby A
    (2000) EMBOSS: The European molecular biology open software suite. Trends Genet 16:276–277.
    OpenUrlCrossRefPubMed
  75. ↵
    1. Swofford DL
    (2002) PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods) (Sinauer, Sunderland, MA), Version 4.0.
  76. ↵
    1. Whidden C,
    2. Matsen FA 4th
    (2015) Quantifying MCMC exploration of phylogenetic tree space. Syst Biol 64:472–491.
    OpenUrlCrossRefPubMed
  77. ↵
    1. Li H,
    2. Durbin R
    (2011) Inference of human population history from individual whole-genome sequences. Nature 475:493–496.
    OpenUrlCrossRefPubMed
  78. ↵
    1. Bouckaert RR
    (2010) DensiTree: Making sense of sets of phylogenetic trees. Bioinformatics 26:1372–1373.
    OpenUrlCrossRefPubMed
  79. ↵
    1. Drummond A,
    2. Bouckaert RR
    (2015) Bayesian Evolutionary Analysis with BEAST (Cambridge Univ Press, Cambridge, UK), p 260.
  80. ↵
    1. Yang Z
    (2007) PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586–1591.
    OpenUrlCrossRefPubMed
  81. ↵
    1. Hernández-Hernández T,
    2. Brown JW,
    3. Schlumpberger BO,
    4. Eguiarte LE,
    5. Magallón S
    (2014) Beyond aridification: Multiple explanations for the elevated diversification of cacti in the New World Succulent Biome. New Phytol 202:1382–1397.
    OpenUrlCrossRefPubMed
  82. ↵
    1. Sanderson MJ, et al.
    (2015) Exceptional reduction of the plastid genome of saguaro cactus (Carnegiea gigantea): Loss of the ndh gene suite and inverted repeat. Am J Bot 102:1115–1127.
    OpenUrlAbstract/FREE Full Text
  83. ↵
    1. Joshi N,
    2. Fass J
    (2011) Sickle: A Sliding-Window, Adaptive, Quality-Based Trimming Tool for FastQ Files, Version 1.33. Available at https://github.com/najoshi/sickle. Accessed December 1, 2016.
  84. ↵
    1. Luo R, et al.
    (2012) SOAPdenovo2: An empirically improved memory-efficient short-read de novo assembler. Gigascience 1:18.
    OpenUrlCrossRefPubMed
  85. ↵
    1. Heo Y,
    2. Wu XL,
    3. Chen D,
    4. Ma J,
    5. Hwu WM
    (2014) BLESS: Bloom filter-based error correction solution for high-throughput sequencing reads. Bioinformatics 30:1354–1362.
    OpenUrlCrossRefPubMed
  86. ↵
    1. Schmieder R,
    2. Edwards R
    (2011) Quality control and preprocessing of metagenomic datasets. Bioinformatics 27:863–864.
    OpenUrlCrossRefPubMed
  87. ↵
    1. Chapman JA, et al.
    (2011) Meraculous: De novo genome assembly with short paired-end reads. PLoS One 6:e23501.
    OpenUrlCrossRefPubMed
  88. ↵
    1. Kajitani R, et al.
    (2014) Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res 24:1384–1395.
    OpenUrlAbstract/FREE Full Text
  89. ↵
    1. Sahlin K,
    2. Vezzi F,
    3. Nystedt B,
    4. Lundeberg J,
    5. Arvestad L
    (2014) BESST–Efficient scaffolding of large fragmented assemblies. BMC Bioinformatics 15:281.
    OpenUrlCrossRefPubMed
  90. ↵
    1. Hunt M, et al.
    (2013) REAPR: A universal tool for genome assembly evaluation. Genome Biol 14:R47.
    OpenUrlCrossRefPubMed
  91. ↵
    1. Parra G,
    2. Bradnam K,
    3. Ning Z,
    4. Keane T,
    5. Korf I
    (2009) Assessing the gene space in draft genomes. Nucleic Acids Res 37:289–297.
    OpenUrlCrossRefPubMed
  92. ↵
    1. Simão FA,
    2. Waterhouse RM,
    3. Ioannidis P,
    4. Kriventseva EV,
    5. Zdobnov EM
    (2015) BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212.
    OpenUrlCrossRefPubMed
  93. ↵
    1. Novák P,
    2. Neumann P,
    3. Pech J,
    4. Steinhaisl J,
    5. Macas J
    (2013) RepeatExplorer: A Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29:792–793.
    OpenUrlCrossRefPubMed
  94. ↵
    1. Copetti D, et al.
    (2015) RiTE database: A resource database for genus-wide rice genomics and evolutionary biology. BMC Genomics 16:538.
    OpenUrlCrossRef
  95. ↵
    1. Flutre T,
    2. Duprat E,
    3. Feuillet C,
    4. Quesneville H
    (2011) Considering transposable element diversification in de novo annotation approaches. PLoS One 6:e16526.
    OpenUrlCrossRefPubMed
  96. ↵
    1. Nawrocki EP,
    2. Eddy SR
    (2013) Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29:2933–2935.
    OpenUrlCrossRefPubMed
  97. ↵
    1. Schattner P,
    2. Brooks AN,
    3. Lowe TM
    (2005) The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res 33:W686–W689.
    OpenUrlCrossRefPubMed
  98. ↵
    1. Campbell MS, et al.
    (2014) MAKER-P: A tool kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiol 164:513–524.
    OpenUrlAbstract/FREE Full Text
  99. ↵
    1. Grabherr MG, et al.
    (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652.
    OpenUrlCrossRefPubMed
  100. ↵
    1. Edgar RC
    (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:2460–2461.
    OpenUrlCrossRefPubMed
  101. ↵
    1. Berardini TZ, et al.
    (2015) The Arabidopsis information resource: Making and mining the “gold standard” annotated reference plant genome. Genesis 53:474–485.
    OpenUrlCrossRefPubMed
  102. ↵
    1. Apweiler R, et al.
    (2014) Activities at the Universal Protein Resource (UniProt). Nucleic Acids Res 42:D191–D198.
    OpenUrlCrossRefPubMed
  103. ↵
    1. Korf I
    (2004) Gene finding in novel genomes. BMC Bioinformatics 5:59.
    OpenUrlCrossRefPubMed
  104. ↵
    1. Eilbeck K,
    2. Moore B,
    3. Holt C,
    4. Yandell M
    (2009) Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics 10:67.
    OpenUrlCrossRefPubMed
  105. ↵
    1. Stanke M,
    2. Waack S
    (2003) Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19:ii215–ii225.
    OpenUrlCrossRefPubMed
  106. ↵
    1. Finn RD,
    2. Clements J,
    3. Eddy SR
    (2011) HMMER web server: Interactive sequence similarity searching. Nucleic Acids Res 39:W29–W37.
    OpenUrlCrossRefPubMed
  107. ↵
    1. Finn RD, et al.
    (2014) Pfam: The protein families database. Nucleic Acids Res 42:D222–D230.
    OpenUrlCrossRefPubMed
  108. ↵
    1. Patro R,
    2. Duggal G,
    3. Love MI,
    4. Irizarry RA,
    5. Kingsford C
    (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14:417–419.
    OpenUrlCrossRefPubMed
  109. ↵
    1. Felsenstein J
    (1981) Evolutionary trees from DNA sequences: A maximum likelihood approach. J Mol Evol 17:368–376.
    OpenUrlCrossRefPubMed
PreviousNext
Back to top
Article Alerts
Email Article

Thank you for your interest in spreading the word on PNAS.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Extensive gene tree discordance and hemiplasy shaped the genomes of North American columnar cacti
(Your Name) has sent you a message from PNAS
(Your Name) thought you would like to see the PNAS web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Genome evolution in columnar cacti
Dario Copetti, Alberto Búrquez, Enriquena Bustamante, Joseph L. M. Charboneau, Kevin L. Childs, Luis E. Eguiarte, Seunghee Lee, Tiffany L. Liu, Michelle M. McMahon, Noah K. Whiteman, Rod A. Wing, Martin F. Wojciechowski, Michael J. Sanderson
Proceedings of the National Academy of Sciences Nov 2017, 114 (45) 12003-12008; DOI: 10.1073/pnas.1706367114

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Request Permissions
Share
Genome evolution in columnar cacti
Dario Copetti, Alberto Búrquez, Enriquena Bustamante, Joseph L. M. Charboneau, Kevin L. Childs, Luis E. Eguiarte, Seunghee Lee, Tiffany L. Liu, Michelle M. McMahon, Noah K. Whiteman, Rod A. Wing, Martin F. Wojciechowski, Michael J. Sanderson
Proceedings of the National Academy of Sciences Nov 2017, 114 (45) 12003-12008; DOI: 10.1073/pnas.1706367114
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Mendeley logo Mendeley

Article Classifications

  • Biological Sciences
  • Evolution
Proceedings of the National Academy of Sciences: 114 (45)
Table of Contents

Submit

Sign up for Article Alerts

Jump to section

  • Article
    • Abstract
    • Results and Discussion
    • SI Materials and Methods
    • Materials and Methods
    • Acknowledgments
    • Footnotes
    • References
  • Figures & SI
  • Info & Metrics
  • PDF

You May Also be Interested in

Smoke emanates from Japan’s Fukushima nuclear power plant a few days after tsunami damage
Core Concept: Muography offers a new way to see inside a multitude of objects
Muons penetrate much further than X-rays, they do essentially zero damage, and they are provided for free by the cosmos.
Image credit: Science Source/Digital Globe.
Water from a faucet fills a glass.
News Feature: How “forever chemicals” might impair the immune system
Researchers are exploring whether these ubiquitous fluorinated molecules might worsen infections or hamper vaccine effectiveness.
Image credit: Shutterstock/Dmitry Naumov.
Venus flytrap captures a fly.
Journal Club: Venus flytrap mechanism could shed light on how plants sense touch
One protein seems to play a key role in touch sensitivity for flytraps and other meat-eating plants.
Image credit: Shutterstock/Kuttelvaserova Stuchelova.
Illustration of groups of people chatting
Exploring the length of human conversations
Adam Mastroianni and Daniel Gilbert explore why conversations almost never end when people want them to.
Listen
Past PodcastsSubscribe
Panda bear hanging in a tree
How horse manure helps giant pandas tolerate cold
A study finds that giant pandas roll in horse manure to increase their cold tolerance.
Image credit: Fuwen Wei.

Similar Articles

Site Logo
Powered by HighWire
  • Submit Manuscript
  • Twitter
  • Facebook
  • RSS Feeds
  • Email Alerts

Articles

  • Current Issue
  • Special Feature Articles – Most Recent
  • List of Issues

PNAS Portals

  • Anthropology
  • Chemistry
  • Classics
  • Front Matter
  • Physics
  • Sustainability Science
  • Teaching Resources

Information

  • Authors
  • Editorial Board
  • Reviewers
  • Subscribers
  • Librarians
  • Press
  • Cozzarelli Prize
  • Site Map
  • PNAS Updates
  • FAQs
  • Accessibility Statement
  • Rights & Permissions
  • About
  • Contact

Feedback    Privacy/Legal

Copyright © 2021 National Academy of Sciences. Online ISSN 1091-6490