New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Likelihoodmapping: A simple method to visualize phylogenetic content of a sequence alignment
Abstract
We introduce a graphical method, likelihoodmapping, to visualize the phylogenetic content of a set of aligned sequences. The method is based on an analysis of the maximum likelihoods for the three fully resolved tree topologies that can be computed for four sequences. The three likelihoods are represented as one point inside an equilateral triangle. The triangle is partitioned in different regions. One region represents starlike evolution, three regions represent a wellresolved phylogeny, and three regions reflect the situation where it is difficult to distinguish between two of the three trees. The location of the likelihoods in the triangle defines the mode of sequence evolution. If n sequences are analyzed, then the likelihoods for each subset of four sequences are mapped onto the triangle. The resulting distribution of points shows whether the data are suitable for a phylogenetic reconstruction or not.
The sequencebased study of phylogenetic relationships among different organisms has become routine. Parallel to the increasing amount of sequence information available a variety of methods have been suggested to reconstruct a phylogenetic tree (1) or a phylogenetic network (2–4). So far, few approaches have been proposed to elucidate the phylogenetic content in a set of aligned sequence a priori (5, 6). The socalled statistical geometry in sequence space analyzes the distribution of numerical invariants for all possible subsets of four sequences. The resulting distributions make it possible to distinguish between tree, star, and netlike geometry of the data. Moreover, based on the averages of the invariants, the method allows one to draw a graph that illustrates the mode of evolution. While the description of this diagram is straightforward if sequences consist only of purines and pyrimidines, it gets difficult if more complex alphabets (nucleic acids, amino acids) are used (7). Statistical geometry in sequence space has been successfully applied to study the evolution of tRNAs (8) or HIV (9).
In this paper, we present an alternative approach, likelihoodmapping, to display phylogenetic information contained in a sequence alignment. The method is applicable to nucleic acid sequences, amino acid sequences, or any other alphabet provided a model of sequence evolution (1, 10, 11) that can be implemented in a maximum likelihood tree reconstruction program (12, 13). Our approach allows one to visualize the treelikeness of all quartets in a single graph and therefore renders a quick interpretation of the phylogenetic content. We will exemplify the method by applying it to simulated sequences that evolved on a startree or on a completely resolved tree. The analysis of two biological data sets (14, 15) will conclude the paper.
METHOD
Four Sequences.
Let us consider a set of four sequences, a socalled quartet. For this quartet the maximum likelihoods (not loglikelihoods) belonging to the three possible fully resolved tree topologies (Fig. 1) are computed, using any model of sequence evolution (1, 10, 11). Let L_{i} be the maximum likelihood of tree T_{i} where i = 1, 2, 3. We can compute, according to Bayes’ theorem, the posterior probabilities 1 for each tree. Note the p_{i} are true probabilities satisfying p_{1}+ p_{2} + p_{3} = 1 and 0 ≤ p_{i} ≤ 1 in contrast to the maximum likelihoods L_{i} that are only conditional probabilities with L_{1}+ L_{2}+ L_{3} ≠ 1. The probabilities (p_{1}, p_{2}, p_{3}) can be viewed as the barycentric coordinates of the point P belonging to the twodimensional simplex 2 where the e_{i} are real valued and independent. They point to the three corners of the simplex. As a special case S_{2} can be illustrated as an equilateral triangle. This construction allows an easy geometric interpretation of the p_{i} values. For a given point P ∈ S_{2} the p_{i} are simply the lengths of the perpendiculars from the point P to the three sides of the triangle (Fig. 2).
If P is close to one corner of the triangle, the likelihoods (p_{1}, p_{2}, p_{3}) are clearly favoring one tree over the other two. Thus, every corner of the triangle corresponds to one of the three quartet topologies T_{1}, T_{2}, or T_{3}. In a typical maximumlikelihood analysis one chooses the tree T_{i} with 3 It is easy to compute the corresponding basins of attraction for each tree topology (Fig. 3A). The location of a point P in the simplex gives an immediate impression which tree is preferred.
Unfortunately, this picture is too optimistic. For real data it is not always possible to resolve the phylogenetic relationships of four sequences. This is either a consequence of limiting resolution due to short sequences (“noise”) or the true evolutionary tree was a star phylogeny. To account for this case, we introduce a region in the triangle S_{2} representing the star phylogeny. The center c of the simplex is the point where all probabilities take on the value p_{i}= that is the three trees are equally likely. Thus, if P is near the center the phylogenetic relationship cannot be resolved and is better displayed by a star phylogeny. On the other hand, it also might be possible that one can exclude one of the three trees but cannot choose from among the two remaining alternatives. This is the case, if T_{1} and T_{2} show probabilities p_{1}= p_{2}= ½ and if p_{3}= 0, for example. Near point x_{12} (see Fig. 3A) the phylogenetic relationship is best displayed by a netlike geometry that excludes tree T_{3}. Similarly, near points x_{13} and x_{23} it is impossible to unambiguously favor one tree. Based on these seven attractors in the triangle (marked with dots in Fig. 3B) the corresponding basins of attraction are easily computed. Each point in one of the seven regions has smallest Euclidean distance to its attractor. By A_{∗} we denote the region where the star tree is the optimal tree. Its area equals the sum of the areas of A_{1}, A_{2}, and A_{3}, the regions where one tree is clearly better then the remaining ones. The regions A_{ij} represent the situation where we cannot distinguish trees T_{i} and T_{j}. The area of A_{ij} equals the sum of the area of A_{i} and A_{j}.
There is yet another way to describe the basins of attraction. If one considers the threedimensional simplex S_{3} where the fourth corner represents the star phylogeny, the basins of attraction can be viewed as projections of their corresponding volumes of the tetrahedron S_{3} onto the twodimensional plane.
The General Case.
For a set of n aligned sequences there are exactly () different possible quartets of sequences. To get an overall impression of the phylogenetic signal present in the data we compute the probabilityvectors P for the quartets and draw the corresponding points in the simplex. If only few sequences are analyzed, P vectors of all () quartets are considered, otherwise a random sample of, e.g., 1,000 quartets is sufficient to obtain a comprehensive picture of the phylogenetic quality of the data set. The resulting distribution of points in the triangle S_{2} forms a distinct pattern allowing us to predict a priori whether an ntaxon tree will show a good resolution or not. If most of the points P are found, e.g., in regions A_{12}, A_{13}, A_{23}, or in the startree region A_{∗}, it is clear that the overall tree will be highly multifurcating. That is, evolution was either starlike or not treelike at all. However, the opposite conclusion is not necessarily true: Even if all quartets are completely resolved, that is, almost all P vectors are in A_{1}, A_{2}, and A_{3}, it is possible that the overall ntaxon tree is not completely resolved (13, 16).
FourCluster LikelihoodMapping.
Instead of looking at all quartets, the analysis of treelikeness for four disjoint groups of sequences (clusters) is also possible. Let C_{1}, C_{2}, C_{3}, and C_{4} be a set of four clusters with c_{1}, c_{2}, c_{3}, and c_{4} sequences. Then, we compute the probability vectors P for the c_{1}·c_{2}·c_{3}·c_{4} possible quartets and plot the corresponding points on the triangle S_{2}. While the p_{i} values are randomly assigned to the trees T_{1}, T_{2}, and T_{3}, when all quartets are studied, the assignment of p_{i} to tree T_{i} is now fixed. Each tree represents one of the three possible phylogenetic relationships among the clusters. As an illustration, think of the S_{i} in Fig. 1 as a representative of cluster C_{i}. The distribution of the c_{1}·c_{2}·c_{3}·c_{4} probability vectors over the basins of attractions allows one not only to identify the correct phylogenetic relationship of the four clusters but also shows the support for this and alternative groupings. This type of likelihoodmapping analysis is a helpful tool to illustrate how well supported an internal branch of a given tree topology is.
RESULTS
Simulation Studies.
Fig. 4 displays the result of a typical likelihoodmapping analysis. A simulated set of 16 DNAsequences was used to show the distribution of probability vectors P as a function of sequence length and the evolutionary history.
If evolution was according to a star topology then the probability vectors are concentrated in the center of the simplex with rays emanating to the corners of the triangle. This picture does not change with increasing sequence length. However, the proportion of quartets found in area A_{∗} increases (Table 1). If sequence evolution followed a completely resolved tree then the proportion of points P located inside A_{1}+ A_{2}+ A_{3} increases with longer sequences, as an indication that noise due to sampling artifacts is diminished. Correspondingly, the number of quartets in the remaining regions decreases. For sequences of length 500 bp the nontreelike regions of the triangle are empty (Table 1). Thus, Fig. 4 illustrates that likelihoodmapping enables an easy distinction between starlike or treelike evolution. The influence of sequence length (“noise”) on treelikeness of the data is easily recognized.
Data Analysis.
We illustrate the power of likelihoodmapping using two data sets published recently (14, 15). The first set (14) comprises eight partial cytochromeb sequences (135 bp) and nine putative dinosaur sequences (17). The second alignment (1,850 bp) consists of ribosomal DNA from major arthropod classes (three myriapods, two chelicerates, two crustaceans, three hexapods) and six other sequences (human, Xenopus, Tubifex, Caenorhabditis, mouse, and rat). Likelihoodmapping suggests (Fig. 5) that the Zischler et al. (14) data show a fair amount of starlikeness with 17.5% of all quartet points in region A_{∗} in contrast to only 0.2% for the ribosomal DNA. This result is corroborated by the bootstrap analysis as shown in refs. 14 and 15. Because of the short sequence length the percentage of quartets mapped into regions A_{12}, A_{13}, and A_{23} is with 10.1% for the sequences from ref. 14, very high compared with 1.6% for the rDNA sequences. However, the cytochromeb data still contain a reasonable amount of treelikeness as 72.4% of all quartets are placed in the areas A_{1}, A_{2},, and A_{3}. The treelikeness of the ribosomal DNA is extremely high (A_{1}+ A_{2}+ A_{3}= 98.3%). The a posteriori analysis based on bootstrap values (15) shows that all groupings in the tree receive high support.
FourCluster Likelihood Mapping.
A further application of likelihoodmapping allows testing of an internal edge of a tree as given from any tree reconstruction method. As an example we consider the sister group status of myriapods and chelicerates as suggested by Friedrich and Tautz (15). Fig. 6 shows that 90.4% of all quartets between the four corresponding clusters support the branching pattern that groups chelicerates and myriapods versus crustaceans, hexapods, and the remaining sequences. We find only very low support (6.9%) for the topology that pairs myriapods with crustaceans plus hexapods rather than with chelicerates or with the rest. Based on likelihoodmapping we cannot reject the hypothesis of monophyly of myriapods and chelicerates. However, the outcome of statistical tests as suggested in ref. 18 remains to be seen. But this is outside the scope of this paper.
DISCUSSION
The evaluation of the phylogenetic contents in a data set is of prime importance if one wants to avoid false conclusions about evolutionary relationships among organisms. Methods abound that evaluate the reliability of a reconstructed tree a posteriori (1). Likelihoodmapping† can be viewed as a complementary approach to existing methods of a priori or a posteriori evaluations of treelikeness. Our method may be helpful when analyzing controversial phylogenies. Similar to statistical geometry in sequence space (5–7) likelihoodmapping is based on the analysis of quartets, the basic ingredients to reconstruct trees (16). Moreover, the description of seven basins of attraction (Fig. 3) that can be characterized as fully resolved (A_{1}, A_{2}, and A_{3}), starlike (A_{∗}), or intermediate between two trees (A_{12}, A_{13}, and A_{23}) is also of great importance in the quartetpuzzling tree search algorithm (13, 19). Using a variant of likelihoodmapping it is also possible to detect recombination (A.v.H., unpublished data).
Here, we have provided a simple, but versatile, approach to visualize the phylogenetic content of a data set. We have shown that the method has reasonable predictive power. While we have presented only a visual tool to analyze the phylogenetic signal of sequences it is certainly necessary to develop solid statistical tests, that provide evidence as to the significance of clusters (18) or to a deviation from treelikeness. For example, the assumption of equal prior probability for the trees may be debatable. It remains to be seen how approaches like Jeffrey’s prior (20) or the inclusion of the variance of likelihood estimates (21) will influence the analysis.
Finally, one should keep in mind that the interpretation of the result of a likelihoodmapping analysis strongly depends on sequence length. The alignment of human mitochondrial controlregion data (22) comprises 1,137 positions, and 82.5% of the quartets belong to the regions that represent fully resolved trees. Thus, the result suggests that the data are very well suited to reconstruct a well resolved tree. However, we observe 8.3% of all quartets in the starlike region A_{∗} of the triangle. This value is too high for a completely resolved phylogeny (see Table 1). Therefore, we expect a phylogeny that is well resolved in certain parts of the tree only.
Acknowledgments
We thank Roland Fleissner, Nick Goldman, Sonja Meyer, Svante Pääbo, and Gunter Weiss for fruitful and stimulating discussions. We also would like to thank Hans Zischler and Diethard Tautz for providing the sequence alignments. Walter Fitch made helpful comments on a late version of the manuscript. Finally, we would like to acknowledge financial support from the Deutsche Forschungsgemeinschaft.
Footnotes

↵* To whom reprint requests should be addressed. email: arndt{at}zi.biologie.unimuenchen.de.

Walter M. Fitch, University of California, Irvine, CA

↵† Likelihoodmapping analysis is available as part of the maximumlikelihood tree reconstruction program puzzle Version 3.0 (13, 19). It can be retrieved free of charge over the Internet from URLs ftp://ftp.ebi.ac.uk/pub/software and http://www.zi.biologie.unimuenchen.de/~strimmer/puzzle.html.
 Received September 28, 1996.
 Accepted April 23, 1997.
 Copyright © 1997, The National Academy of Sciences of the USA
References
 ↵
 Hillis D M,
 Moritz C,
 Mable B K
 Swofford D L,
 Olsen G J,
 Waddell P J,
 Hillis D M
 ↵

 Dopazo J,
 Dress A,
 von Haeseler A
 ↵
 ↵
 Eigen M,
 WinklerOswatitsch R,
 Dress A
 ↵
 ↵
NieseltStruwe, K., Mayer, C. B. & Eigen, M. (1996) Determining the Reliability of Phylogenies with Statistical Geometry, preprint.
 ↵
 Eigen M,
 Lindemann B F,
 Tietze M,
 WinklerOswatitsch R,
 Dress A,
 von Haeseler A
 ↵
Eigen, M. & NieseltStruwe, K. (1990) AIDS Suppl. 1, 4, S85–S93.
 ↵
 ↵
 ↵
 Felsenstein J
 ↵
 Strimmer K,
 von Haeseler A
 ↵
 Zischler H,
 Höss M,
 Handt O,
 von Haeseler A C,
 van der Kuyl A,
 Goudsmit J,
 Pääbo S
 ↵
 ↵
 ↵
 Woodward S R,
 Weynand N J,
 Bunnell X
 ↵
 ↵
 Strimmer K,
 Goldman N,
 von Haeseler A
 ↵
 Lake J A
 ↵
 ↵
 Vigilant L,
 Stoneking M,
 Harpending H,
 Hawkes K,
 Wilson A C
Citation Manager Formats
More Articles of This Classification
Biological Sciences
Related Content
 No related articles found.
Cited by...
 Insights into the Impact of CD8+ Immune Modulation on Human Immunodeficiency Virus Evolutionary Dynamics in Distinct Anatomical Compartments by Using Simian Immunodeficiency VirusInfected Macaque Models of AIDS Progression
 Structure of the HIV1 RNA packaging signal
 Phylodynamic Analysis of Clinical and Environmental Vibrio cholerae Isolates from Haiti Reveals Diversification Driven by Positive Selection
 Phylogenomics resolves the timing and pattern of insect evolution
 Origin and Evolution of European CommunityAcquired MethicillinResistant Staphylococcus aureus
 Phylogenomics provides strong evidence for relationships of butterflies and moths
 Complete Mitochondrial Genomes of Ancient Canids Suggest a European Origin of Domestic Dogs
 A Single Early Introduction of HIV1 Subtype B into Central America Accounts for Most Current Cases
 LargeScale Spatial and Temporal Genetic Diversity of Feline Calicivirus
 Combination of Immune and Viral Factors Distinguishes LowRisk versus HighRisk HIV1 Disease Progression in HLAB*5701 Subjects
 Probing Individual Environmental Bacteria for Viruses by Using Microfluidic Digital PCR
 Ret signalling integrates a craniofacial muscle module during development
 Dynamics of Two Separate but Linked HIV1 CRF01_AE Outbreaks among Injection Drug Users in Stockholm, Sweden, and Helsinki, Finland
 Characterization of a Putative Ancestor of Coxsackievirus B5
 Genetic and Phenotypic Characterization of GII4 Noroviruses That Circulated during 1987 to 2008
 Combined Genomic and Proteomic Approaches Identify Gene Clusters Involved in Anaerobic 2Methylnaphthalene Degradation in the SulfateReducing Enrichment Culture N47
 Genetic Structure and Biology of Xylella fastidiosa Strains Causing Disease in Citrus and Coffee in Brazil
 An Elaborate Classification of SNARE Proteins Sheds Light on the Conservation of the Eukaryotic Endomembrane System
 Hantavirus Disease Outbreak in Germany: Limitations of Routine Serological Diagnostics and Clustering of Virus Sequences of Human and Rodent Origin
 Searching for species in haloarchaea
 The evolution of Nglycandependent endoplasmic reticulum quality control factors for glycoprotein folding and degradation
 A Complete Set of Flagellar Genes Acquired by Horizontal Transfer Coexists with the Endogenous Flagellar System in Rhodobacter sphaeroides
 GEOGRAPHICAL DISTRIBUTION OF HANTAVIRUSES IN THAILAND AND POTENTIAL HUMAN HEALTH SIGNIFICANCE OF THAILAND VIRUS
 Comprehensive identification of Drosophila dorsalventral patterning genes using a wholegenome tiling array
 Adaptive Covariation between the Coat and Movement Proteins of Prunus Necrotic Ringspot Virus
 Emergence of a New Norovirus Genotype II.4 Variant Associated with Global Outbreaks of Gastroenteritis
 Compartmentalization of Hepatitis C Virus Quasispecies in Blood Mononuclear Cells of Patients with Mixed Cryoglobulinemic Syndrome
 Mitochondrial genomes suggest that hexapods and crustaceans are mutually paraphyletic
 Central European Dobrava Hantavirus Isolate from a Striped Field Mouse (Apodemus agrarius)
 The diversity of dolichollinked precursors to Asnlinked glycans likely results from secondary loss of sets of glycosyltransferases
 pyramus and thisbe: FGF genes that pattern the mesoderm of Drosophila embryos
 First Molecular Identification of Human Dobrava Virus Infection in Central Europe
 Analysis of the Serotype and Genotype Correlation of VP1 and the 5' Noncoding Region in an Epidemiological Survey of the Human Enterovirus B Species
 Subpopulations of Equine Infectious Anemia Virus Rev Coexist In Vivo and Differ in Phenotype
 Occurrence of Renal and Pulmonary Syndrome in a Region of Northeast Germany Where Tula Hantavirus Circulates
 Homez, a homeobox leucine zipper gene specific to the vertebrate lineage
 Origin of the Superflock of Cichlid Fishes from Lake Victoria, East Africa
 The FAR Protein Family of the Nematode Caenorhabditis elegans: DIFFERENTIAL LIPID BINDING PROPERTIES, STRUCTURAL CHARACTERISTICS, AND DEVELOPMENTAL REGULATION
 Molecular Characterization of Astroviruses by Reverse Transcriptase PCR and Sequence Analysis: Comparison of Clinical and Environmental Isolates from South Africa
 Genetic Interaction between Distinct Dobrava Hantavirus Subtypes in Apodemus agrarius and A. flavicollis in Nature
 WholeGenome Analysis of Photosynthetic Prokaryotes
 Molecular Analysis of Three Ljungan Virus Isolates Reveals a New, ClosetoRoot Lineage of the Picornaviridae with a Cluster of Two Unrelated 2A Proteins
 A phylogenetic analysis of myosin heavy chain type II sequences corroborates that Acoela and Nemertodermatida are basal bilaterians
 Functional Annotation of Class I LysyltRNA Synthetase Phylogeny Indicates a Limited Role for Gene Transfer
 Characterization of a Novel Simian Immunodeficiency Virus with a vpu Gene from Greater SpotNosed Monkeys (Cercopithecus nictitans) Provides New Insights into Simian/Human Immunodeficiency Virus Phylogeny
 Molecular Epidemiology of Caliciviruses Causing Outbreaks and Sporadic Cases of Acute Gastroenteritis in Spain
 Discrete Forms of Amylose Are Synthesized by Isoforms of GBSSI in Pea
 HOX genes in the sepiolid squid Euprymna scolopes: Implications for the evolution of complex body plans
 Molecular Evolution of Puumala Hantavirus
 The role of haustoria in sugar supply during infection of broad bean by the rust fungus Uromyces fabae
 Mitochondrial DNA sequences in ancient Australians: Implications for modern human origins
 A KingdomLevel Phylogeny of Eukaryotes Based on Combined Protein Data
 Origin of mitochondria in relation to evolutionary history of eukaryotic alanyltRNA synthetase
 Sequence Heterogeneity of TT Virus and Closely Related Viruses
 Evolutionary Rate and Genetic Drift of Hepatitis C Virus Are Not Correlated with the Host Immune Response: Studies of Infected DonorRecipient Clusters
 Different population dynamics of human T cell lymphotropic virus type II in intravenous drug users compared with endemically infected tribes
 Genetic Reassortment of Rift Valley Fever Virus in Nature
 Distribution of TetrahydromethanopterinDependent Enzymes in Methylotrophic Bacteria and Phylogeny of Methenyl Tetrahydromethanopterin Cyclohydrolases
 Molecular Evolution of a Developmental Pathway: Phylogenetic Analyses of Transforming Growth Factor Family Ligands, Receptors and Smad Signal Transducers
 DNA sequence of the mitochondrial hypervariable region II from the Neandertal type specimen
 Acoel Flatworms: Earliest Extant Bilaterian Metazoans, Not Members of Platyhelminthes
 Discontinuous Occurrence of the hsp70(dnaK) Gene among Archaea and Sequence Features of HSP70 Suggest a Novel Outlook on Phylogenies Inferred from This Protein
 A headactivator binding protein is present in hydra in a soluble and a membraneanchored form
 StructuredTree Topology and Adaptive Evolution of the Simian Immunodeficiency Virus SIVsm Envelope during Serial Passage in Rhesus Macaques According to Likelihood Mapping and Quartet Puzzling
 Origin of mitochondria in relation to evolutionary history of eukaryotic alanyltRNA synthetase
 HOX genes in the sepiolid squid Euprymna scolopes: Implications for the evolution of complex body plans
 The role of haustoria in sugar supply during infection of broad bean by the rust fungus Uromyces fabae