| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Previous Article |
Table of Contents
| Next Article
From the Cover
BIOLOGICAL SCIENCES / ECOLOGY
DNA barcoding the floras of biodiversity hotspots





,
,¶
*Department of Botany and Plant Biotechnology, APK Campus, University of Johannesburg, P.O. Box 524, Auckland Park 2006, Johannesburg, South Africa;
Lankester Botanical Garden, University of Costa Rica, P.O. Box 1031-7050 Cartago, Costa Rica;
Royal Botanic Gardens, Kew, Richmond TW9 3DS, United Kingdom; and
Imperial College London, Silwood Park Campus, Buckhurst Road, Ascot SL5 7PY, United Kingdom
Edited by Daniel H. Janzen, University of Pennsylvania, Philadelphia, PA, and approved December 17, 2007 (received for review October 18, 2007)
| Abstract |
|---|
|
|
|---|
CITES | Kruger National Park | Mesoamerica
Kress et al. (13) proposed originally that the trnH-psbA plastid region would be a suitable universal barcode for land plants. Concurrently, the newly established "plant working group" from the consortium for the barcoding of life tested a series of other genomic regions at first disregarding trnH-psbA because of its complex molecular evolution (14). It was also proposed that, because the plastid genome is evolving so slowly relative to other genomes, more than one barcode may be necessary to provide enough variation for this technique to work (15–17). However, several competing proposals have so far been put forward, which need thorough evaluations. Kress and Erickson (16) proposed to combine the original trnH-psbA barcode from Kress et al. (13) with rbcL, following analyses from Newmaster et al. (17). By contrast, Chase et al. (15) proposed either to combine rpoc1, rpoB, and matK or rpoc1, matK, and trnH-psbA, whereas Taberlet et al. (18) suggested the trnL intron as a suitable plant barcode. Furthermore, tests of potential DNA barcodes have been based on a taxonomic coverage approach, necessarily encompassing just a few representatives from a wide range of distantly related groups of land plants (13, 15–17). However, the critical test of evaluating the applicability of DNA barcoding for biodiversity inventories in species-rich geographic areas has been lacking.
Here, we focus on two biodiversity hotspots (19, 20), Mesoamerica and Maputaland–Pondoland–Albany in southern Africa, in which we analyze >1,600 plant specimens. We test eight potential DNA barcodes, six of which were made publicly available at the plant working group's website [www.kew.org/barcoding (15)], whereas a further two were proposed by Kress and Erickson (16). Our study sites have been chosen for their exceptional plant diversity and contrasting habitats. Costa Rica comprises tropical forests and has one of the richest orchid floras in the world. Although there is a well developed network of protected areas in Costa Rica, the orchid flora remains under constant threat from deforestation and illegal trade. Orchids are also well known to be difficult to identify, particularly when they are sterile, which makes them an ideal model group in which to test DNA barcoding techniques. In southern Africa, we have undertaken our study in the Kruger National Park (KNP), one of the largest protected areas in the world. The KNP is renowned for its large game animals but less for its flora, which is under continuous pressure from mega-herbivores. Home to
600 species of trees and shrubs (21), the KNP area has the highest tree diversity of any of the world's temperate regions.
During 2005–2007, we conducted extensive fieldwork to collect samples for this study. We used several metrics to evaluate the various potential barcoding regions. Intra- and interspecific genetic divergences were assessed by using pairwise calculations (22). Statistical tests were used to compare divergences. Phylogenetic analyses were performed to look for species monophyly. Genetic clustering algorithms (23, 24) were applied to test whether the coalescent process in a given barcode matched species delimitation.
| Results and Discussion |
|---|
|
|
|---|
We assessed genetic divergences within and between species, using various metrics (22). We comment here on calculations, using the best-fit models for each barcode (Table 1). For comparison purposes with other studies, we also provide as SI the results based on other distances [supporting information (SI) Tables 6 and 7]. A suitable barcode must exhibit high interspecific but low intraspecific divergence. Here, the highest interspecific divergence is provided by trnH-psbA (KNP and combined datasets; Table 1). The next most variable barcode at interspecific level is matK for all datasets. Three different metrics were used to characterize intraspecific divergence: (i) average of all pairwise distances between all individuals sampled within those species that had at least two representatives; (ii) "mean theta," with theta being the average pairwise distances calculated for each species that have more than one representative, thereby eliminating biases associated with uneven sampling among taxa; and (iii) average coalescent depth, i.e., the maximum distance from tips of a node linking all sampled extend members of a species, "book-ending" intraspecific variability (see also SI Table 8). The results from these calculations of intraspecific differences do not show a clear pattern. In orchids, the barcodes exhibiting the lowest intraspecific divergence are rpoC1 (average mean divergence), accD/matK (mean theta) and matK (coalescent depth). In the KNP, the lowest intraspecific divergence is provided by ndhJ with all three metrics. Wilcoxon signed rank tests on combined data show that trnH-psbA is the most variable barcode at interspecific level, followed by matK (Table 2). At intraspecific level, Wilcoxon signed rank tests show rpoC1 and accD having the lowest level of divergence, whereas the highest is provided by trnH-psbA (Table 3). Based on these results alone, it is difficult to decide on which barcode is the most suited for plants.
|
|
|
|
|
|
Our sampling is more comprehensive than previous studies on DNA barcoding in plants. Kress et al. (13) used 19 species with duplicates/triplicates and a further 83 species with only one representative per species. Kress and Erickson (16) used 48 pairs of species, each represented by one sample. Cowan et al. (29) and Chase et al. (15) report that the plant working group started with 96 pairs of taxa but narrowed it down to fewer species. Cameron used 343 species from within the botanical garden in New York (28). We used here 86 species in which all barcodes were tested and a further 1,036 orchid species in the dataset restricted to matK. Because the assessment of intraspecific variability is crucial for deciding on a suitable barcode, we included 44 species in which there were at least two and up to seven representatives per species. Our results are robust and all point toward the same pair of loci. Given that the second half (5' end) of the matK exon is easy to amplify and align, we propose that matK is used as a preferred universal DNA barcode for flowering plants. The trnH-psbA region performs nearly equally well, although its pattern of molecular evolution is complex. Therefore, we propose that trnH-psbA is used as either an alternative to matK or a complementary barcode to matK. When combined, these loci achieve only moderate improvement, as shown by our analyses of recovering species monophyly.
The use of matK as a barcode has been criticized mainly because no universal primers were available (15), hence it had the lowest amplification success in Kress and Erickson (16). However, we found that primers 390F and 1326R from Cuénoud et al. (26) amplify the same region with a 100% success. The use of trnH-psbA has been criticized because of the difficulty in the alignment due to extensive length variation and because certain species host a pseudogene (15). Although in certain cases trnH-psbA might indeed be problematic, we found here that it was one of the most useful regions across a wide range of angiosperms.
Using matK alone or in combination with trnH-psbA, our tests of monophyly reach >90% of correct species identification. If our sampling was restricted to sister species rather than natural geographic assemblages of species, we may have found this value to drop. However, our samples do include very closely related species, given that Costa Rica and southern Africa both have experienced extensive rapid radiations (30, 31).
Apart from combining matK and trnH-psbA, we found that adding the other barcodes did not improve species identification by >3% and therefore was not worth pursuing if one balances gains in identification versus sequencing efforts. It is possible that some regions yet untested here may be useful as a complementary barcode, and we await further studies. Alternatively, we may need to accept that no more than
90% of species will be identified with universal plastid barcodes and that those difficult lineages will need "case-by-case" analyses, using, for example, nuclear population genetic markers and taking advantage of recent developments in DNA sequencing technology (32).
Our results differ from the proposal of Kress and Erickson (16) in the sense that we advocate matK rather than rbcL, although we agree with the utility of trnH-psbA. As explained above, the amplification of matK is not problematic, as Kress and Erickson thought before, and the pattern of variation in its second half (5' end) is particularly appropriate for its use as a DNA barcode, as exemplified by our large-scale analysis in orchids. The matK gene also presents another advantage: its first half (3' end) was useful to reconstruct the phylogeny of angiosperms (33), and therefore the complete sequence of this gene can be used as dual barcode-phylogenetic marker. The matK gene has an unusual mode and tempo of evolution; it is the only putative chloroplast-encoded group II intron maturase, and its function relates to the regulation of plant development. Analyses of the expression of this gene suggested that "genetic buffers" are in operation and constrain its evolution, which may explain why relatively low intraspecific but high interspecific variation is found and therefore why it fits DNA barcoding purposes so well. We disagree with Kondo et al. (34), who argued that matK on its own was not useful for species identification, but their study focused exclusively on species of liquorices in the legume family. We also disagree with the proposal of Chase et al. (15), because we found that neither rpoC1 nor rpoB were performing well as a barcode (Tables 1![]()
![]()
–5). These two loci amplify easily in non-angiosperms (15), but we found that they were too conserved in angiosperms. It might in fact not be so important to design primer pairs or barcodes that work universally from ferns, mosses, to seed plants. Several of the DNA barcoding applications (e.g., rapid inventories for conservation) may not need to identify non-seed plants at the species level, and alternatively if this was required then moss- and fern-specific primers or barcodes could be used in complement to seed plant barcodes. In the meantime, we propose that DNA barcoding with matK is used on a large scale.
DNA barcoding with matK alone (or matK plus trnH-psbA combined) has the potential to speed up the exploration and preservation of plant life on Earth by facilitating considerably biodiversity inventories beyond South Africa and Costa Rica. In addition, new methods are now being developed in which DNA barcoding data can be used in conservation (35). As an example, we illustrate how customs officers could use DNA barcoding to identify plant fragments from species in which trade is controlled by the Convention on International Trade of Endangered Species (CITES). All orchids are in Appendix 2 of CITES [i.e., a special permit is required for their trade (www.cites.org)], but a few species, such as the lady's slipper orchids in Mesoamerica (genus Phragmipedium), are so threatened in the wild that their trade is prohibited altogether (i.e., they are listed in Appendix 1 of CITES). We included in our large matK matrix one sequence of Phragmipedium as a reference and ran a UPGMA analysis with all 1,500+ orchids with 10 additional Phragmipedium sequences representing another seven species (GenBank accession nos. AY918826–31, AJ581442, AY557204). All species of Phragmipedium clustered together correctly. This means that in our theoretical case, using our proposed DNA barcode, the custom services would have positively identified species from CITES Appendix 1 (i.e., the lady's slipper orchids) from species in Appendix 2 (i.e., the other orchids) and those not listed by CITES (here, the species from the KNP).
To ensure even longer-term benefit of the DNA barcoding efforts, it is also essential to put in place DNA banking strategies (36) so that complementary barcodes to the ones identified here can be produced in the future. More importantly, if DNA barcoding is to achieve its goals, it must urgently become available to countries rich in biodiversity but poor in resources through efficient capacity building and judicious funding programs.
| Methods |
|---|
|
|
|---|
DNA Sequencing. Total DNA was extracted by using the method of Doyle and Doyle (37). We amplified and sequenced accD, rpoC1, rpoB, ndhJ, matK, and ycf5, following guidelines from the plant working group. For matK, additional primers 390F and 1326R (26) were used. Primers trnHf and psbA3'f were used for trnH-psbA (13). For the first half of the rbcL exon, primers 1F and 724R were used following Kress et al. (13). DNA sequences were aligned in PAUP4b10 (38).
Genetic and Phylogenetic Analyses. Inter- and intraspecific genetic divergences were calculated following Meyer and Paulay (22). Pairwise distances were calculated with PAUP4b10 (38) and the best-fitting model as given by applying MODELTEST 3.7 (39). Wilcoxon signed rank tests were performed to compare intra- and interspecific variability for every pairs of barcodes following Kress and Erickson (16). We evaluated DNA barcoding gaps by comparing the distribution of intra- versus interspecific divergences (22). To evaluate whether species were recovered as monophyletic with each barcode, we used standard phylogenetic techniques: MP, maximum likelihood (ML), neighbor joining (NJ), and UPGMA with PAUP4b10 (38). Bayesian statistical inferences (BI) were performed with MrBayes software, Version 3.1.2 (40). The parsimony analysis of the large matK matrix of Mesoamerican orchids was performed by using the parsimony ratchet method (41). We identified genetic clusters by coalescence analyses, using methods developed by Pons et al. (23) and Fontaneto et al. (24). Details are available from the corresponding author upon request.
| ACKNOWLEDGMENTS. |
|---|
|
|
|---|
| Footnotes |
|---|
Freely available online through the PNAS open access option.
Author contributions: M.v.d.B., J.W., and V.S. designed research; R.L., D.B., F.P., G.G., O.M., and S.D. performed research; T.G.B. contributed new reagents/analytic tools; R.L. analyzed data; and R.L., M.v.d.B., and V.S. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. EU254252–EU254410 and EU213263–EU214530.
See Commentary on page 2761.
This article contains supporting information online at www.pnas.org/cgi/content/full/0709936105/DC1.
© 2008 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
Related Commentary in PNAS:
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
W. J. Kress and D. L. Erickson DNA barcodes: Genes, genomics, and bioinformatics PNAS, February 26, 2008; 105(8): 2761 - 2762. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||