New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Prospects for inferring very large phylogenies by using the neighborjoining method

Contributed by Masatoshi Nei, June 11, 2004
Abstract
Current efforts to reconstruct the tree of life and histories of multigene families demand the inference of phylogenies consisting of thousands of gene sequences. However, for such large data sets even a moderate exploration of the tree space needed to identify the optimal tree is virtually impossible. For these cases the neighborjoining (NJ) method is frequently used because of its demonstrated accuracy for smaller data sets and its computational speed. As data sets grow, however, the fraction of the tree space examined by the NJ algorithm becomes minuscule. Here, we report the results of our computer simulation for examining the accuracy of NJ trees for inferring very large phylogenies. First we present a likelihood method for the simultaneous estimation of all pairwise distances by using biologically realistic models of nucleotide substitution. Use of this method corrects up to 60% of NJ tree errors. Our simulation results show that the accuracy of NJ trees decline only by ≈5% when the number of sequences used increases from 32 to 4,096 (128 times) even in the presence of extensive variation in the evolutionary rate among lineages or significant biases in the nucleotide composition and transition/transversion ratio. Our results encourage the use of complex models of nucleotide substitution for estimating evolutionary distances and hint at bright prospects for the application of the NJ and related methods in inferring large phylogenies.
Inference of phylogenetic trees is becoming increasingly important in the study of molecular evolution and functional genomics. However, with the enormous increase in the size of data sets for orthologous genes from diverse species and homologous sequences from multigene families, the probability of finding the optimal tree(s) diminishes rapidly with an astronomical increase in the number of possible topologies to be examined (Fig. 1A ) (1, 4). Even a moderate exploration of topological (tree) space is not practical because of the enormous amount of computational time required. These circumstances have led to the extensive use of the NJ method (2), which quickly generates a final tree for large phylogenies under the principle of minimum evolution. This method is especially useful when the number of sequences to be analyzed is in the order of hundreds or thousands (e.g., 5–7). Furthermore, the accuracy of NJ trees is similar to other more timeconsuming methods for relatively small data sets (<200 sequences) (1, 4, 8–13).
The NJ method constructs trees by clustering neighboring sequences in a stepwise manner. In each step of sequence clustering, it minimizes the sum of branch lengths (2) and thus examines multiple topologies. For large data sets, however, NJ examines only a minuscule fraction of the total number of possible topologies. For instance, it will examine all three unrooted trees in the case of four sequences but only 10^{10} of >10^{13,867} possible trees when 4,000 sequences are used (Fig. 1B ). The effect of this property on the accuracy of NJ trees has been unclear. Therefore, we have done computer simulations to study the accuracy of NJ trees when tens to thousands of sequences are used.
NJ is known to be statistically consistent in the sense that, if correct pairwise distances with no statistical errors are used, it reconstructs the true tree (2). In practice, however, estimates of all distances are subject to statistical errors, so it may produce erroneous trees. At present, all distances are estimated independently for each pair of sequences [independent estimation (IE) method] either by analytical formulas (1, 14, 15) or by likelihood methods (4, 15). The standard errors of the estimates obtained in this way are rather high unless very long sequences are used. However, these standard errors can be reduced considerably if all distances for a given set of aligned sequences are estimated simultaneously [simultaneous estimation (SE) method]. Recently, we developed an SE method based on the maximum likelihood principle and found that the use of this method substantially improves the accuracy of NJ trees. In this article, we first present this SE method and then discuss the accuracy of NJ trees obtained for large and small phylogenies.
Methods
Simultaneously Estimating All Pairwise Distances. Pairwise distances used for constructing NJ trees are currently estimated by the IE method for a variety of mathematical models to incorporate varying degrees of complexity of nucleotide or amino acid substitution (reviewed in refs. 1, 4, 14, 15). The estimates obtained by the IE method are expected to have larger standard errors than those obtained by the SE method. These larger errors are partly because the parameters, such as the transition/transversion rate ratio in the Kimura model (16), are estimated independently for each pair of sequences in the IE method, whereas they are estimated by considering all pairs of sequences simultaneously in the SE method. There is obviously a greater advantage of using the SE method for a sophisticated model of substitution than for a simple model, because in the former model we have to estimate a larger number of parameters. Furthermore, the accuracy of parameter estimates obtained by the SE method is expected to increase as the number of sequences increases.
In the following we consider the evolutionary distance based on the TN model (3), which is one of the most sophisticated models of nucleotide substitution. For this model, the evolutionary distance (d) is given by where a _{1}, a _{2}, and b are the numbers of transitional substitutions between purines (a _{1}), transitional substitutions between pyrimidines (a _{2}), and transversional substitutions (b) per site, respectively, and g _{A}, g _{T}, g _{C}, and g _{G} each represent the frequencies of nucleotides A, T, C, and G. For the distance (d_{ij} ) between sequences i and j, Eq. 1 can be rewritten as follows. where k _{1} = a _{1} _{ij} /b_{ij} and k _{2} = a _{2} _{ij} /b_{ij} . In Eq. 2, the transition/transversion rate ratios (k _{1} and k _{2}) are independent of evolutionary time and are the same for all pairs of sequences, whereas b_{ij} depends on evolutionary time, varying with sequence pair i and j. When there are m sequences, the total number of b_{ij} 's is m(m – 1)/2. To estimate d_{ij} in Eq. 2, we need to know the estimates of k _{1}, k _{2}, and b_{ij} . The maximum likelihood estimates of k _{1}, k _{2}, and b_{ij} in Eq. 2 can be obtained by maximizing the following log likelihood function (IE method): where P̂ _{1} _{ij} , P̂ _{2} _{ij} , and Q̂_{ij} are the observed proportions of nucleotide sites showing transitional differences between purines and between pyrimidines and sites showing transversional differences when sequences i and j are compared, respectively. P _{1} _{ij}, P _{2} _{ij} , and Q_{ij} are the theoretical values of P̂ _{1} _{ij} , P̂ _{2} _{ij} , and Q̂_{ij} , respectively, and are given by where g _{R} = g _{A} + g _{G} and g _{Y} = g _{C} + g _{T}.
However, because the parameters k _{1} and k _{2} are shared by the log likelihood functions for all pairs of i and j, they should be estimated by maximizing the sum of all likelihood functions (SL), which is Theoretically, this is not a likelihood function, because d_{ij} 's are not necessarily independent. However, by using the method of Taylor's expansion (as in ref. 17), one can show that approximately unbiased estimates of k _{1} and k _{2} can be obtained by maximizing SL. We therefore suggest the following procedure for estimating k _{1}, k _{2}, and b_{ij} 's. We first compute the initial estimates of k _{1}, k _{2}, and b_{ij} 's by using uncorrected estimates, k _{1} = (P̂ _{1} _{ij} /g _{A} g _{T})/(Q̂/g _{R} g _{Y}), k _{2} = (P̂ _{2} _{ij} /g _{T} g _{C})/(Q̂_{ij} /g _{R} g _{Y}), and b_{ij} = Q̂_{ij} /(4g _{R} g _{Y}). This method is computationally efficient and gives estimates with smaller standard errors. We then compute the averages of estimates of k _{1} and k _{2} separately and use them for obtaining improved estimates of b_{ij} 's by maximizing L_{ij} in Eq. 3. We can now obtain improved estimates of k _{1} and k _{2} by maximizing SL in Eq. 7. These estimates are then used for further improvement of the estimates of b_{ij} 's by Eq. 3. The last two processes are repeated until stable estimates of k _{1}, k _{2}, and b_{ij} 's are obtained. The final estimates are asymptotically unbiased, although they may not be maximum likelihood estimates.
The SE method above can be used for many other substitution models. For example, the Hasegawa–Kishino–Yano (HKY) model (18) is a special case of the TN model, in which k _{1} and k _{2} are assumed to be the same. Therefore, the HKY distance for sequences i and j can be estimated by using Eq. 3 under the assumption of k _{1} = k _{2} = k. Similarly, the Tamura (19) and the Kimura (16) models are a special case of the TN model (see ref. 1). In the case of the Tamura model the G + C content (θ = g _{G} + g _{C}) instead of nucleotide frequencies is considered with a single k parameter. In the Kimura model all nucleotide frequencies are assumed to be equal (0.25) with a single k. The SE method can also be used when the substitution rate varies with site, following the gamma or other distribution (e.g., refs. 20 and 21).
In Eqs. 4–6, the use of average nucleotide frequencies for the entire set of sequences is recommended because of the smaller sampling errors. However, when the distance methods that relax the assumption of the equality of nucleotide frequencies among lineages (heterogeneity of substitution pattern over the phylogeny) are used, the sequencespecific nucleotide frequencies should be used for each comparison (22). Note that these methods do not take into account the variation of transition/transversion ratio among lineages and are not guaranteed to generate asymptotically unbiased estimates of evolutionary distances.
The SE approach easily produces estimates of the transition/transversion rate ratios (k _{1} and k _{2}) and pairwise distances (d_{ij} ) for all sequence pairs simultaneously. Computation of the variances of these estimates by analytical formulas is somewhat complicated, but they can be obtained by the bootstrap method with site resampling (1, 23).
Methods of Computer Simulation. In our study of the accuracy of NJ trees obtained by the IE and SE methods of distance estimation, we used three different sets of model trees. The first model tree consisted of 66 sequences (Fig. 2A ), of which the relative branch lengths were derived from the mammalian DNA sequence data (figure 1 in ref. 24). For this model tree, we considered 448 different sets of evolutionary parameters (substitution parameters and sequence lengths), in which the number of nucleotides per sequence (n) varied from 147 to 9,359, the evolutionary rate varied 10 times, the G + C content (θ) varied from 0.3 to 0.9, and the transition/transversion rate ratio (k _{1} = k _{2} = k) varied from 2.1 to 26.6 (see ref. 25). Using this model tree and the set of evolutionary parameters with the TN model, we evaluated the relative performance of the SE and IE methods in estimating evolutionary parameters and inferring phylogenies by the NJ method.
The second and third sets of model trees were used to compare the accuracy of NJ trees of different sizes. The second set was based on two predefined model trees in Fig. 2, where the tree in B represents a case of constant rate evolution for 32 sequences and the tree in C represents a varying rate case with 32 sequences. Multiple copies of these trees, connected at the roots, generated increasingly larger model trees consisting of 32 to 4,096 sequences (see also ref. 10). The third set of model trees was generated by using an agglomerative algorithm. In this algorithm we combined a given set of sequences and randomly selected pairs or groups of sequences for making different model trees in a stepwise manner. This process was continued until the required number of sequences was clustered. Model trees in Fig. 2 D and E are two examples of such randomly generated trees. For model trees in Fig. 2 B and D , the exterior branch lengths (0.2 substitutions per site) were 20 times longer than the interior branch lengths (0.01 substitutions per site). For model trees in Fig. 2 C and E , the exterior branches were either long (0.2) or short (0.01). For the type of model trees in Fig. 2E , the exterior branches were assigned a long or short length randomly. The number of nucleotides per sequence (n) used in the second and third data sets was 1,000.
For the second and third sets of model trees, nucleotide substitution was simulated by using the TN model with two sets of biologically realistic values of substitution parameters: (i) k _{1} = k _{2} = 4 and g _{A} = g _{T} = g _{C} = g _{G} = 0.25 to simulate nuclear gene evolution and (ii) k _{1} = k _{2} = 20, g _{A} = g _{T} = 0.40, and g _{C} = g _{G} = 0.10 to simulate animal mitochondrial DNA gene evolution.
Using the standard simulation procedure (e.g., refs. 1 and 10), we generated simulated sequence data for each model tree with a set of nucleotide substitution parameters and sequence length (n). A NJ tree was then constructed and compared with the true tree. The accuracy of the NJ tree was measured by the percentage of phylogenetic clades correctly inferred (P _{C}). This was obtained by P _{C} = 100 [1 – d _{T}/(2m – 6)], where d _{T} is the topological distance between the inferred and model trees and m is the number of sequences used (1, 26, 27). P _{C} is 100% when all clades are correctly inferred and is 0% when none of the clades is correctly identified. For each model tree (model tree in Fig. 2 A, B, or C ) for a given number of sequences, 100 replicates of simulated sequence data were generated for the same topology, whereas in the case of randomly generated trees (trees in Fig. 2 D and E ), a new model tree was generated in each replicate simulation.
To make the P _{C} values comparable among model trees of different sizes (except Fig. 2 A ), the expected interior branch lengths were assumed to be the same (0.01 substitutions per site) for all topologies. This approach is different from that used in most previous studies, in which the maximum depths of trees or the maximum evolutionary distances were assumed to be the same (e.g., refs. 11 and 28). The latter approach makes the interior branch lengths shorter in large phylogenies than in small phylogenies and, therefore, the comparison of P _{C} values among trees of different sizes becomes improper.
Results
Performance of the SE Method. Using the model tree in Fig. 2 A , we first compared the standard errors of distance estimates obtained by the IE and SE methods. Fig. 3A shows that the standard errors for the SE method are always smaller than those for the IE method and that the extent of reduction of the standard errors for the SE method increases as the standard errors for the IE method increases. In this case we used the TN model with k _{1} = k _{2} = 4.4, θ = 0.61, and n = 1,066. Essentially the same results were obtained for each of the 448 simulation conditions examined (data not shown). Fig. 3B shows the accuracy of the estimates of k obtained by the SE method. In each of the 448 simulation conditions, the mean estimate of k was close to the true value, and the 95% confident interval was narrow. These results indicate that the SE method is effective in obtaining reliable estimates of k without knowing the topology of the tree.
Fig. 4A shows the P _{C} values of NJ trees obtained by the SE and IE methods for each of the 448 cases. Each data point in this figure represents the average P _{C} from 100 replicate simulations for each case. The P _{C} values of NJ trees obtained by the SE method (NJSE) are always higher than those of NJ trees obtained by the IE method (NJIE). On average, 19% of all erroneously identified phylogenetic clades of NJIE trees were corrected in NJSE trees. Large improvements were observed when evolutionary distances were large or when the base composition or the transition/transversion biases were high. In these cases >50% of erroneous clades of NJIE trees were corrected in NJSE trees. Essentially the same results were obtained for other model trees in Fig. 2.
Previously we reported that the accuracy of NJ trees is often higher when p distances (proportions of different nucleotide sites) are used than when IE distance estimates for the correct substitution models are used (1, 11, 29, 30). This occurred mainly because p distances have smaller standard errors than distances estimated by the IE method. The same results were observed in the present simulation; P _{C} was higher in NJp trees than in NJIE trees for 60.4% of the simulation conditions when the TN model was used. However, NJSE trees had a higher P _{C} value than NJp trees in 92.9% of the cases (Fig. 4B ). Therefore, the SE method considerably improved the accuracy of NJ trees when more realistic models were used. The same results were obtained for other tree topologies given in Fig. 2. For this reason, we consider only NJSE trees in the following comparison of P _{C} values among trees of different sizes.
Accuracy of NJ Trees with Increasing Number of Sequences. One might expect P _{C} to decline significantly as the number of sequences used (m) increases, because the number of possible topologies rapidly increases with increasing m. To study this problem, we examined the P _{C} value for different numbers of sequences (32, 64, 128, 256, 512, 1,024, 2,048, and 4,096 sequences) using the model trees in Fig. 2. When type B model trees (constantrate) with nuclear gene evolution were considered, P _{C} was 79% for m = 32 (Fig. 5A , filled circles). When m was increased to 256, P _{C} declined only by 0.8%. Even for m = 4,096 (128times increase), P _{C} was lower than that for m = 32 only by 3.4%. These results indicate that the accuracy of NJSE trees does not decline significantly even when m increased by 4,064 sequences. When the evolution of mitochondrial DNA was considered, P _{C} was much lower than that for nuclear genes (55% for m = 32) (Fig. 5A , open circles). Yet, the decline of P _{C} with increasing m was quite small (Fig. 5A , open circles). The P _{C} value for m = 4,096 was smaller than that for m = 32 only by 2.2%. A small extent of decline of P _{C} with increasing m was observed even when type C model trees (varying evolutionary rates) were used (Fig. 5A , open and filled triangles) or when randomly generated model trees (types D and E) were used (Fig. 5B , squares and diamonds). The decrease in P _{C} was <10% in every case in Fig. 5. These results indicate that P _{C} does not decline very much when m increases from 32 to 4,096 and, therefore, NJ can be used efficiently even when hundreds or thousands of sequences are used.
Similar results were obtained for NJSE trees when the substitution pattern and G + C content vary with evolutionary lineage or when the sequence length is reduced to a half (results not shown). Therefore, although the absolute values of P _{C} depend on the substitution pattern or sequence length, the relationship between P _{C} and m is nearly the same for all cases. In other words, P _{C} generally declines with increasing m, but the extent of the decline is remarkably small.
Discussion
We have seen that the SE of evolutionary distances reduces the variances of distance estimates considerably and consequently increases the accuracy (P _{C}) of NJ trees. We have also seen that the P _{C} of NJ trees does not decrease significantly when the number of sequences (m) increases from 32 to 4,096. However, this latter property is not necessarily due to the use of SE distances, because a similar property was observed even with IE distances when m was ≤100 (10). Some authors have reported a substantial decrease of P _{C} when m increased from 50 to 100 (e.g., ref. 28). In the latter study the lengths of interior branches were considerably smaller for large trees than for small trees, and this difference contributed to the substantial decrease of P _{C}. Therefore, it is important to maintain the same interior branch lengths for a comparison of P _{C} for large and small trees, as mentioned earlier.
In reality, however, our assumption that all interior branches have the same length irrespective of the tree size is unrealistic. When the number of sequences used is very large, it is quite likely that some interior branches are relatively long and others are very short. If the expected length of an interior branch is very short, no substitutions may occur in the branch when the number of nucleotides examined is small. In this case it would be difficult to resolve the clades associated with the short branches. Therefore, our results should be interpreted only as a guideline. As long as the interior branch is sufficiently long and a sufficient number of nucleotides per sequence is used, the NJ method is capable of producing reasonably accurate trees.
The relatively high P _{C} values obtained for large trees in this study are partly due to the use of SE distances, which have smaller standard errors than IE distances. This SE approach is effectively the same as the use of a larger number of nucleotides in the IE approach. For example, when model tree B was used, the average standard error of SE distances for the TN model with k _{1} = k _{2} = 20, g _{A} = g _{T} = 0.4, g _{C} = g _{G} = 0.1, and n = 1,000 was similar to that of IE distances with n = 1,650. Furthermore, another merit of using the SE approach instead of the IE approach exists. That is, when the number of sequences (m) is as large as 1,000, estimates of IE distances are not always obtainable, because the arguments of logarithms in the analytical formulas may become negative by chance (31). The proportion of such unestimable cases increases as m increases in the IE method (Fig. 1C ). By contrast, we have encountered no such cases in our simulation when the SE method was used. Of course, unestimable cases might happen even with the SE method, if the extent of sequence divergence is very high, but the probability of occurrence of such events is much smaller in the SE approach than in the IE approach.
The higher accuracy of NJSE trees also appears to be related to the fact that the sum of branch lengths (S) for NJSE trees is on average closer to that of the true tree than the S value for NJIE trees (Fig. 6). It is already known that the S value for NJIE trees is on average considerably smaller than that of the true tree when the number of sequences relative to the sequence length is small (8, 11, 12, 32). Our simulation results indicate that the S values for NJSE trees are closer to that of the true tree although, in general, they are still smaller than that of the true tree (Fig. 6). These results also indicate that although the NJ method examines a minuscule portion of the entire tree space, a further search for trees with smaller S values does not necessarily lead to a tree closer to the true tree. This is because the optimization principle does not work well in phylogenetic inference when m is large relative to n, whether the minimum evolution, maximum parsimony, or maximum likelihood method is used (1, 32).
We therefore conclude that the prospects are bright for using the NJ method for generating initial evolutionary hypotheses for very large species and multigene family trees. These large phylogenies often span long evolutionary times in which the pattern of nucleotide substitution is expected to be complex (1, 4). In these cases, the use of the SE approach with sophisticated models of DNA substitution is expected to improve the accuracy of phylogenetic inference.
Acknowledgments
We thank C. R. Rao, Xun Gu, George Zhang, Adriana Briscoe, Alan Filipski, Araxi Urritia, Dana Desonie, Michael Rosenberg, Sankar Subramanian, and Sudhindra Gadagkar for comments. This work was supported by grants from the National Institutes of Health (to S.K. and M.N.), the National Science Foundation (to S.K. and James L. Collins, School of Life Sciences, Arizona State University), and the Japan Society for the Promotion of Science (to K.T.).
Footnotes

↵ ¶ To whom correspondence should be addressed. Email: s.kumar{at}asu.edu.

Abbreviations: NJ, neighborjoining; SE, simultaneous estimation; IE, independent estimation; TN, Tamura–Nei.

Freely available online through the PNAS open access option.
 Copyright © 2004, The National Academy of Sciences
References

↵
Nei, M. & Kumar, S. (2000) Molecular Evolution and Phylogenetics (Oxford Univ. Press, New York).
 ↵
 ↵

↵
Felsenstein, J. (2003) Inferring Phylogeny (Sinauer, Sunderland, MA).
 ↵

↵
RuizPesini, E., Mishmar, D., Brandon, M., Procaccio, V. & Wallace, D. C. (2004) Science 303 , 223–226. pmid:14716012
 ↵
 ↵

↵
Takahashi, K. & Nei, M. (2000) Mol. Biol. Evol. 17 , 1251–1258. pmid:10908645
 ↵

↵
Desper, R. & Gascuel, O. (2004) Mol. Biol. Evol. 21 , 587–598. pmid:14694080

↵
Kumar, S., Tamura, K. & Nei, M. (2004) Brief. Bioinform. 5 , 150–163. pmid:15260895

↵
Swofford, D. L. (1998) PAUP*, Phylogenetic Analysis Using Parsimony (* and Other Methods) (Sinauer, Sunderland, MA).
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵

↵
Tamura, K. & Kumar, S. (2002) Mol. Biol. Evol. 19 , 1727–1736. pmid:12270899

↵
Efron, B. (1982) The Jackknife, the Bootstrap, and Other Resampling Plans (Soc. Industrial and Appl. Math., Philadelphia).

↵
Rosenberg, M. S. & Kumar, S. (2001) Proc. Natl. Acad. Sci. USA 98 , 10751–10756. pmid:11526218

↵
Rosenberg, M. S. & Kumar, S. (2003) Mol. Biol. Evol. 20 , 610–621. pmid:12679548
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵

↵
Nei, M., Kumar, S. & Takahashi, K. (1998) Proc. Natl. Acad. Sci. USA 95 , 12390–12397. pmid:9770497
Citation Manager Formats
Sign up for Article Alerts
Jump to section
You May Also be Interested in
More Articles of This Classification
Biological Sciences
Related Content
 No related articles found.
Cited by...
 The rhg1a (Rhg1 lowcopy) nematode resistance source harbors a copiafamily retrotransposon within the Rhg1encoded {alpha}SNAP gene
 Molecular Characterization and Phylogenetic analysis of Wuchereria bancrofti in human blood samples from Malindi and Tana River Delta, endemic regions in Kenya
 Laboratory Methods Supporting Measles surveillance in Queensland, Australia, 2010  2017
 Detection of Seoul virus in wild brown rats (Rattus norvegicus) from pig farms in Northern England
 Population genomic analyses reveal a highly differentiated genetic cluster of northern goshawks (Accipiter gentilis laingi) in Haida Gwaii
 Nodalike RNA viruses infecting Caenorhabditis nematodes: sympatry, diversity and reassortment
 A global perspective on microbial diversity in the terrestrial deep subsurface
 Bacteriamediated stabilization of a panel of Picornaviruses
 Influenza A(H1N1)pdm09 M and HA segments sequences from Rio Grande do Sul, Brazil
 Xylanolytic psychrotrophs from andosolic sedge fens and moss heaths in Iceland
 Entomophthovirus: An insectderived iflavirus that infects a behavior manipulating fungal pathogen of dipterans
 Rapid and sensitive virulence prediction and identification of Newcastle disease virus genotypes using thirdgeneration sequencing
 Miniscule differences between the sex chromosomes in the giant genome of a salamander, Ambystoma mexicanum
 Bacterial Stabilization of a Panel of Picornaviruses
 Bacillus safensis FO36b and Bacillus pumilus SAFR032: A Whole Genome Comparison of Two Spacecraft Assembly Facility Isolates
 Wholegenome resequencing and pantranscriptome reconstruction highlight the impact of genomic structural variation on secondary metabolism gene clusters in the grapevine Esca pathogen Phaeoacremonium minimum
 APPLES: Distancebased Phylogenetic Placement for Scalable and Assemblyfree Sample Identification
 Evolution of the Highly Repetitive PEVK Region of Titin Across Mammals
 Morphological variation in Blommersia pelvic shape coincides with muscular architecture and parallels genus phylogenetic evolution
 Integrating phylogenetic and network approaches to study gene family evolution: the case of the AGAMOUS family of floral genes
 Genomics of Natural Populations: Evolutionary Forces that Establish and Maintain Gene Arrangements in Drosophila pseudoosbscura
 Bacterial lipopolysaccharide reduces the stability of avian and human influenza viruses
 Endopeptidase Regulation as a Novel Function of the ZurDependent Zinc Starvation Response
 A New Highly Sensitive and Specific RealTime PCR Assay Targeting the Malate Dehydrogenase Gene of Kingella kingae and Application to 201 Pediatric Clinical Specimens
 Characterization of a Vesivirus Associated with an Outbreak of Acute Hemorrhagic Gastroenteritis in Domestic Dogs
 Using ProjectBased Learning to Teach Phylogenetic Reconstruction for Advanced Undergraduate Biology Students: Molluscan Evolution as a Case Study
 Expanded cellular clones carrying replicationcompetent HIV1 persist, wax, and wane
 The Complexities of Nocardia Taxonomy and Identification
 Insights into the Impact of CD8+ Immune Modulation on Human Immunodeficiency Virus Evolutionary Dynamics in Distinct Anatomical Compartments by Using Simian Immunodeficiency VirusInfected Macaque Models of AIDS Progression
 Heterogeneous Patterns of Genetic Diversity and Differentiation in European and Siberian Chiffchaff (Phylloscopus collybita abietinus/P. tristis)
 Structures of humaninfecting Thogotovirus fusogens support a common ancestor with insect baculovirus
 Bacterial Lipopolysaccharide Destabilizes Influenza Viruses
 Multiresistant opportunistic pathogenic bacteria isolated from polluted rivers and first detection of nontuberculous mycobacteria in the Algerian aquatic environment
 Characterization of the Protein Tyrosine Phosphatase LmPRL1 Secreted by Leishmania major via the Exosome Pathway
 Proliferation of latently infected CD4+ T cells carrying replicationcompetent HIV1: Potential role in latent reservoir dynamics
 Effects of Glucose and Starch on Lactate Production by Newly Isolated Streptococcus bovis S1 from Saanen Goats
 Reply to "May Newly Defined Subgenotypes Va and Vb of Newcastle Disease Virus in Poultry Be Considered Two Different Genotypes?"
 Malassezia arunalokei sp. nov., a Novel Yeast Species Isolated from Seborrheic Dermatitis Patients and Healthy Individuals from India
 Photobacterium damselae subsp. damselae, an Emerging Fish Pathogen in the Black Sea: Evidence of a Multiclonal Origin
 Evolution of Neuroadaptation in the Periphery and Purifying Selection in the Brain Contribute to Compartmentalization of Simian Immunodeficiency Virus (SIV) in the Brains of Rhesus Macaques with SIVAssociated Encephalitis
 MetaPalette: a kmer Painting Approach for Metagenomic Taxonomic Profiling and Quantification of Novel Strain Variation
 Detection of the Emerging Picornavirus Senecavirus A in Pigs, Mice, and Houseflies
 Newcastle Disease Viruses Causing Recent Outbreaks Worldwide Show Unexpectedly High Genetic Similarity to Historical Virulent Isolates from the 1940s
 Phenotypic, Genotypic, and Antimicrobial Characteristics of Streptococcus halichoeri Isolates from Humans, Proposal To Rename Streptococcus halichoeri as Streptococcus halichoeri subsp. halichoeri, and Description of Streptococcus halichoeri subsp. hominis subsp. nov., a Bacterium Associated with Human Clinical Infections
 Solution Structure of the HIV1 Intron Splicing Silencer and Its Interactions with the UP1 Domain of Heterogeneous Nuclear Ribonucleoprotein (hnRNP) A1
 Clades of Photosynthetic Bacteria Belonging to the Genus Rhodopseudomonas Show Marked Diversity in LightHarvesting Antenna Complex Gene Composition and Expression
 Treatment of Alkaline Cr(VI)Contaminated Leachate with an Alkaliphilic MetalReducing Bacterium
 Phytobeneficial Properties of Bacteria Isolated from the Rhizosphere of Maize in Southwestern Nigerian Soils
 Modeling malaria genomics reveals transmission decline and rebound in Senegal
 A New Fungal Isolate, Penidiella sp. Strain T9, Accumulates the Rare Earth Element Dysprosium
 The MET Gene Is a Common Integration Target in Avian Leukosis Virus Subgroup JInduced Chicken Hemangiomas
 GlycanForaging Systems Reveal the Adaptation of Capnocytophaga canimorsus to the Dog Mouth
 Ecology and Biotechnology of SeleniumRespiring Bacteria
 Parallel Evolution of Streptococcus pneumoniae and Streptococcus mitis to Pathogenic and Mutualistic Lifestyles
 HIV1 Interacts with Human Endogenous Retrovirus K (HML2) Envelopes Derived from Human Primary Lymphocytes
 Separate Evolution of Virulent Newcastle Disease Viruses from Mexico and Central America
 Genome sequencing of disease and carriage isolates of nontypeable Haemophilus influenzae identifies discrete population structure
 Distribution of Indigenous Bacterial Pathogens and Potential Pathogens Associated with RoofHarvested Rainwater
 Prevalence of Virulence Genes Associated with Pathogenic Escherichia coli Strains Isolated from Domestically Harvested Rainwater during Low and HighRainfall Periods
 Novel Firmicutes Group Implicated in the Dechlorination of Two Chlorinated Xanthones, Analogues of Natural Organochlorines
 Aeromonas hydrophila and Aeromonas veronii Predominate among Potentially Pathogenic Ciprofloxacin and TetracyclineResistant Aeromonas Isolates from Lake Erie
 Systemic Spread and Propagation of a PlantPathogenic Virus in European Honeybees, Apis mellifera
 Evolution and Diversity in Human Herpes Simplex Virus Genomes
 Molecular Investigation and Phylogeny of Anaplasma spp. in Mediterranean Ruminants Reveal the Presence of NeutrophilTropic Strains Closely Related to A. platys
 Urban Leptospirosis in Africa: A CrossSectional Survey of Leptospira Infection in Rodents in the Kibera Urban Settlement, Nairobi, Kenya
 ColdActive Acid Pectinolytic System from Psychrotolerant Bacillus: Color Extraction from Red Grape Skin
 Prevalence of Hemolysin Genes and Comparison of ehxA Subtype Patterns in Shiga ToxinProducing Escherichia coli (STEC) and NonSTEC Strains from Clinical, Food, and Animal Sources
 Sequence Analyses of 2012 West Nile Virus Isolates from Texas Fail to Associate Viral Genetic Factors with Outbreak Magnitude
 Conservation of Meningococcal Antigens in the Genus Neisseria
 Functional Genomics Reveals That a Compact Terpene Synthase Gene Family Can Account for Terpene Volatile Production in Apple
 Highly Divergent Virulent Isolates of Newcastle Disease Virus from the Dominican Republic Are Members of a New Genotype That May Have Evolved Unnoticed for Over 2 Decades
 Longitudinal Analysis of the Evolution and Dissemination of Neisseria gonorrhoeae Strains (Saskatchewan, Canada, 2005 to 2008) Reveals Three Major Circulating Strains and Convergent Evolution of Ciprofloxacin and Azithromycin Resistance
 Evaluation of Methods for Identification and Determination of the Taxonomic Status of Strains Belonging to the Streptococcus porcinusStreptococcus pseudoporcinus Complex Isolated from Animal, Human, and Dairy Sources
 Genetic Characterization of Atypical Mansonella (Mansonella) ozzardi Microfilariae in Human Blood Samples from Northeastern Peru
 Metagenomic Analyses of Drinking Water Receiving Different Disinfection Treatments
 Genetic Analysis of Cytomegalovirus in Malignant Gliomas
 Phylogeny of the Staphylococcal Major Autolysin and Its Use in Genus and Species Typing
 The ligand specificity of the Gproteincoupled receptor GPR34
 Abrupt Temporal Fluctuations in the Chicken Fecal Microbiota Are Explained by Its Gastrointestinal Origin
 Unified modeling of gene duplication, loss, and coalescence using a locus tree
 Atypical Listeria monocytogenes Serotype 4b Strains Harboring a Lineage IISpecific Gene Cassette
 Complete Genome and Clinicopathological Characterization of a Virulent Newcastle Disease Virus Isolate from South America
 LowDissolvedOxygen Nitrifying Systems Exploit AmmoniaOxidizing Bacteria with Unusually High Yields
 Genotypic Characterization of Cryptosporidium hominis from Water Samples in Sao Paulo, Brazil
 D1/D2 Domain of LargeSubunit Ribosomal DNA for Differentiation of Orpinomyces spp.
 Conserved Mutations in the Pneumococcal Bacteriocin Transporter Gene, blpA, Result in a Complex Population Consisting of Producers and Cheaters
 Identification and Characterization of Potential PerformanceRelated Gut Microbiotas in Broiler Chickens across Various Feeding Trials
 Identification of a TolueneDegrading Bacterium from a Soil Sample through H218O DNA Stable Isotope Probing
 Persistent Phytoplankton Bloom in Lake St. Lucia (iSimangaliso Wetland Park, South Africa) Caused by a Cyanobacterium Closely Associated with the Genus Cyanothece (Synechococcaceae, Chroococcales)
 Colletotrichum truncatum: an Unusual Pathogen Causing Mycotic Keratitis and Endophthalmitis
 Common Ancestry and Novel Genetic Traits of Francisella novicidaLike Isolates from North America and Australia as Revealed by Comparative Genomic Analyses
 Antifungal Susceptibilities of Sporothrix albicans, S. brasiliensis, and S. luriei of the S. schenckii Complex Identified in Brazil