Previous Article |
Table of Contents
| Next Article
BIOLOGICAL SCIENCES / EVOLUTION
Evolution and genetic differentiation among relatives of Arabidopsis thaliana

Heidelberg Institute of Plant Sciences, Department of Biodiversity and Plant Systematics, University of Heidelberg, Im Neuenheimer Feld 345, D-69120 Heidelberg, Germany
Communicated by June B. Nasrallah, Cornell University, Ithaca, NY, February 15, 2007 (received for review November 13, 2006)
| Abstract |
|---|
|
|
|---|
5 mya. Significant karyotype evolution in A. thaliana with base chromosome number reduction from x=8 to x=5 might indicate and favor effective genetic isolation from these other species, although hybrids are occurring naturally and have been also constituted under controlled conditions. We tested the evolutionary significance to separate the x=5 from the x=8 lineage using DNA sequence data from the plastome and the nuclear ribosomal DNA based on an extensive, representative worldwide sampling of nearly all taxonomic entities. We conclude that (i) A. thaliana is clearly separated phylogenetically from the x=8 lineage, (ii) five major lineages outside A. thaliana can be identified (A. lyrata, A. arenosa, A. halleri, Arabidopsis croatica, and Arabidopsis pedemontana) together with Arabidopsis cebennensis, and (iii) centers of genetic and morphological diversity are mostly in congruence and are located close to the Balkans in Austria and Slovakia outside glaciated and permafrost regions with few notable exceptions.
genetic diversity | phylogenetic relationships | phylogeography | reticulation
The evolutionary split between x=5 A. thaliana and x=8 Arabidopsis taxa occurred
5 mya (3, 4, 7) and initiated the evolution of A. thaliana with its unique characters compared with the x=8 lineage, and also changes on the chromosome level resulting in its derived genome structure (8–10). On the contrary, there is much more variation in the x=8 lineage (1) resulting in the recognition of several species and subspecies. Because eight wild relatives of A. thaliana on the species level were recognized <8 years ago (11), the number of new taxonomic combinations is increasing (12, 13). An overview on the current status of Arabidopsis taxonomy and synonymy has been given recently (1). In general, three major lineages can be recognized, namely Arabidopsis lyrata, Arabidopsis halleri, and Arabidopsis arenosa (1), and most species or subspecies can be treated within these three lineages. In addition, three species have been described that are not closely related to one of these three species groups: Arabidopsis croatica (Croatia), Arabidopsis cebennensis (France), and Arabidopsis pedemontana (Italy). It can be expected that below the species level the number of taxa will increase further as is the case for A. halleri, which segregates with five subspecies (1). The same will happen to A. arenosa segregates because actual taxonomic treatments are unconvincingly based on comparative cytological, morphological, or genetic analysis (11, 14), and we are still lacking any comparative morphometric analysis.
At the present time, evolutionary studies are restricted to single species or groups of populations (15–18), and no evolutionary framework has been provided yet that comprises the entire genus. High levels of genetic diversity in periglacial regions (17, 19, 20–24), heavy metal tolerance (19, 25), and self-incompatibility and breeding systems (20, 24, 26–35) encompass the completed research on Arabidopsis wild relatives.
In this study we present the first comprehensive phylogenetic framework for the genus. We studied genetic variation of all evolutionary lineages of A. thaliana relatives based on a representative geographic sampling by studying maternally inherited chloroplast DNA (cpDNA) haplotype variation and sequence diversity of the internal transcribed spacer region (ITS) of ribosomal RNA. The cpDNA haplotype data were analyzed phylogeographically, and gene diversity parameters were calculated. The plastid data were compared with the nuclear data, and significant differences among the various evolutionary lineages are highlighted. Finally, we will contribute to the systematic status of some taxonomical–nomenclatoral combinations such as Arabidopsis kamchatica and Arabidopsis arenicola (12, 13).
| Results |
|---|
|
|
|---|
|
2 mya. It is important to mention that there are no other taxa from the mustard family that are positioned phylogenetically between A. thaliana and its relatives analyzed herein. It should also be noted that the position of A. thaliana in the TCS network indicates that ITS types of A. arenosa are more ancestral than those of the other segregates. This finding is important when comparing the data with results from the cpDNA analysis. Arabidopsis suecica has been excluded from our analysis. This taxon has been fully confirmed as a hybrid between A. thaliana and A. arenosa, and it evolved with a unique origin <400,000 years ago in Fennoscandinavia (36). Another taxon, A. lyrata ssp. kamchatica, with a proven hybrid status in Japan (A. halleri ssp. gemmifera x A. lyrata), carried exclusively A. lyrata-specific ITS types a and b. However, it should be noted that the hybrid origin of A. lyrata ssp. kamchatica accessions from Russia (type locality is Kamchatka) or Alaska and Canada is not proven yet and remains questionable. It is most likely that A. lyrata ssp. kamchatica (or A. kamchatica) from Japan, Korea, and Taiwan is a distinct taxon of hybridogenous origin and not the same species or subspecies as distributed in Russia, Alaska, and Canada.
Chloroplast Haplotypes Indicate Ancient Shared Polymorphisms. In total 34 cpDNA suprahaplotypes have been characterized. Considering length variation and additional single-nucleotide polymorphisms in the highly dynamic 3' region of the trnL-F intergenic spacer carrying the trnF pseudogenes polymorphisms these 34 suprahaplotypes consist of 153 haplotypes (numbered from 1 to 153 in SI Table 1). When compared with the ITS data the resulting network analysis provided a different, not species-specific distribution pattern of genetic variation (SI Fig. 5). The most ancestral haplotypes (interior in the network) are shared in various combinations by almost all species lineages as characterized by the ITS analysis. Only haplotypes at the various tips of the cpDNA haplotype network are species-specific. This significant incongruence between nuclear and plastid data sets can be explained in two ways. First, reticulation and hybridization among lineages have transferred cpDNA types from one lineage into the other. However, if this scenario is true we would not expect such a clear evolutionary signal as is provided with the ITS data. The second explanation would expect that ancestral cpDNA type diversity predates separation of the main evolutionary lineages. This scenario is more likely and correlates well with the fact that suprahaplotypes from internal position of the network consists of more haplotypes than those at the tips of the network (SI Fig. 5). However, this hypothesis requires that an old center of genetic diversity is congruent to a center of origin of the various lineages (see below). Chloroplast types from A. cebennensis and A. pedemontana, but also A. croatica, are congruent with such a hypothesis, because these haplotypes (N, T, and Z) are directly connected to the ancestral type A.
As for A. suecica, previous extensive analysis of cpDNA variation (37) has demonstrated that this hybrid taxon (A. thaliana x A. arenosa) is carrying exclusively A. thaliana cpDNA types.
Phylogeographic Data and Gene Diversity Statistics Demonstrate High Levels of Genetic Variation. To demonstrate species-specific distribution of cpDNA variation, the suprahaplotype network (SI Fig. 5) has been redrawn for the three major lineages (A. arenosa, A. halleri, and A. lyrata) separately, highlighting only those haplotypes occurring in each lineage, respectively (Fig. 2). The northern hemisphere has been divided into nine major regions: (i) glaciated north (GN), comprising the areas of Europe that were glaciated at the maximum extend of Pleistocene glaciation cycles (Iceland, Norway, Sweden, Fennia, Denmark, coastal areas around the Baltic Sea, northern Ireland, and northern Great Britain); (ii) North America with Alaska, Canada, and the northern U.S.; (iii) Japan, Taiwan, and Korea; (iv) central European permafrost regions comprising the region in central Europe between the GN and the Alps; (v) the western Alps (WA); (vi) the glaciated part of the eastern Alps (GEA); (vii) the nonglaciated east (NGE), comprising the nonglaciated part of the eastern Alps and the nonglaciated area between the eastern Alps, the northern permafrost region, and the western Carpathians; (viii) the western Carpathians (WC); and (ix) the southeastern Carpathians (SEC).
|
ST estimates showed that the NGE was clearly differentiated from Japan, the SEC, and the WC (SI Table 3). The WA and Japan were totally different because they had no suprahaplotypes in common. Japan was significantly differentiated from all European regions except the SEC. Pairwise FST estimates showed a similar picture. Additionally, the WA were significantly differentiated from the SEC. Nucleotide diversity was highest in the NGE (0.0021), followed by the SEC (0.0014) and the WC (0.0012) (SI Table 4). In the WA and in Japan nucleotide diversity was zero because we detected only one suprahaplotype. Nearly all suprahaplotypes were found in the NGE (seven suprahaplotypes, R = 3.69). The contacting regions, the GEA and the WC, also showed high numbers of different suprahaplotypes (GEA, five suprahaplotypes, R = 2.86; WC, four suprahaplotypes, R = 2.77). Effective genetic diversity was also highest in the NGE (3.69) followed by the GEA (2.42) and the WC (2.38). Private suprahaplotypes were found in the NGE and the SEC. The corresponding diversity estimates for all haplotypes (SI Table 5) were congruent. Here the GEA shows the highest number of haplotypes (12 haplotypes, R = 4.82) and the highest effective genetic diversity (8.6). Similar high values are found for the SEC (11 haplotypes, R = 4.75, va = 6.23) and the NGE (12 haplotypes, R = 4.62, va = 6.26). The WA and Japan again have the lowest diversity estimates. Private haplotypes are found in all regions. Interestingly, 93% of the SEC haplotypes are private; in the WC and in Japan even 100% of the haplotypes are private. In A. lyrata 13 different suprahaplotypes comprising 31 haplotypes were detected in Europe and North America (Fig. 2b). The most ancestral suprahaplotype A was found in Alaska and Canada, but also in the NGE. Two major lineages evolved from suprahaplotype A. Derived suprahaplotypes from the tips of the network were distributed only in Europe. Three rare suprahaplotypes from Austria (R, V, and AF) were directly connected to the most ancestral suprahaplotype A. In North America the most frequent and ancestral suprahaplotypes A, B, and C were found.
Regional suprahaplotype and haplotype sharing is quite low in A. lyrata (SI Table 2). A maximum of two suprahaplotypes and haplotypes, respectively, is shared among the four different regions. Pairwise
ST and FST estimates showed a clear differentiation for all regions except the NGE and CE permafrost region (SI Table 3), which provide some additional evidence for periglacial survival in permafrost dominated areas during the Pleistocene (1). SI Table 4 summarizes suprahaplotype frequencies and genetic diversity indices of A. lyrata. The formerly glaciated north of Europe (GN) appears to be the region with highest genetic diversity estimates. Nucleotide diversity (0.0028), effective diversity (2.81), and the number of different haplotypes corrected for sample size (3.8) are higher than in any other region. These estimates are different when calculating diversity parameters taking all haplotypes into account (SI Table 5). Both regions, the formerly GN and the NGE, have the same number of different haplotypes (5.3). However, the NGE shows the highest effective genetic diversity (7.09). Among the remaining regions, the lowest effective diversity and haplotype frequency for suprahaplotype estimates and haplotype estimates was observed for the northern permafrost region. In summary, we observed that in general genetic diversity in A. lyrata is lower than in A. halleri, and in both species groups NGE outside the glaciers and permafrost areas played an important role as refuge area and center of genetic diversity. However, in contrast to A. halleri, periglacial survival in A. lyrata might have also played a substantial role in maintaining genetic diversity throughout its northern distribution range.
In A. arenosa 17 suprahaplotypes comprising 72 haplotypes were detected (Fig. 2c). Ancestral suprahaplotype A is widely distributed in Europe, whereas type B is restricted to Austria and Slovenia. Suprahaplotype and haplotype sharing among adjacent regions is extensive (SI Table 2). Pairwise
ST and FST estimates showed that among the regional groups for A. arenosa the NGE is differentiated strongest from the WA (SI Table 3). Additionally, the NGE is clearly differentiated from the other regional groups (except GN for
ST). Among the remaining regions there is significant differentiation of the WA compared with the GN for both
ST and FST estimates. Considering FST estimates, only the WC are differentiated from the WA and the GEA. Nucleotide diversity based on suprahaplotypes was highest in the WC (0.0016) followed by the GN (0.0015) and the NGE (0.0014) (SI Table 4). However, the differences are mostly minor and depend largely on sample size. In any case, again the NGE and adjacent areas showed the highest levels of genetic diversity. This result is much more obvious when all haplotypes are considered (SI Table 5). Here we observe extremely high levels of genetic diversity in the NGE and, although lowered, still high levels of genetic variation in the Carpathians, in the GEA, and in central European permafrost regions. In summary, it has to be concluded that genetic diversity is much higher than in A. lyrata and A. halleri, and, as concluded for A lyrata and A. halleri, the NGE plays a dominant role as the center of genetic diversity.
Taking into consideration that cpDNA haplotype variation to some extent predates evolution of the several evolutionary lineages the overall distribution of haplotypes among the nine defined regions is remarkable, with strong gradients of decreasing number of haplotypes in any geographical direction starting from a center in Slovakia and eastern Austria (Fig. 3). It is also obvious from
ST/FST comparisons (SI Table 3) that for all three species groups
ST does not generally exceed FST, which means that haplotype phylogeny does not fit haplotype distribution significantly. This again favors the assumption that an old stock of cpDNA haplotypes had existed before species diversification and Pleistocene migration.
|
ST does not exceed FST, which means that ITS phylogeny does not fit ITS type distribution significantly (SI Table 7). Comments on A. kamchatica, A. arenicola, A. cebennensis, A. pedemontana, and A. croatica. The herein analyzed accessions of A. kamchatica from Japan are characterized by cpDNA suprahaplotype AD, which derived from A. halleri haplotypes. The same accessions are defined by ITS type b, which is characteristic for A. lyrata. This confirms the hypothesis that A. kamchatica from Japan, Korea, and Taiwan indeed represents a hybrid between A. halleri ssp. gemmifera and A. lyrata. However, all A. kamchatica (A. lyrata ssp. kamchatica) accessions from outside these countries analyzed herein are characterized by cpDNA suprahaplotype B and ITS type b. Thus, they are very similar to any other A. lyrata accessions from North America analyzed herein not favoring any hybridization scenario as proven for Japanese accessions. However, the taxonomy of the two subspecies A. lyrata ssp. lyrata and A. lyrata ssp. kamchatica in Russia and North America is even more complicated by descriptions that all plants of ssp. lyrata might be diploid, while all plants of ssp. kamchatica have been described to appear tetraploid (38, 39). Considering these data our results favor autopolyploidization of A. lyrata ssp. lyrata resulting in A. lyrata ssp. kamchatica distributed actually in Russia, Alaska, and Canada. Additional research with material of known ploidy level is badly needed.
A. arenicola as analyzed by Warwick et al. (12) is characterized by cpDNA suprahaplotype A and ITS type e (12), which supports closest relatedness to A. lyrata. We think that both taxa, A. arenicola and A. kamchatica from outside Japan, Taiwan, and Korea, are best summarized within a broadly defined A. lyrata, maybe best on the subspecies level. If future research will demonstrate that Japanese hybrids, also treated as A. kamchatica, are not related to North American and Russian A. lyrata ssp. kamchatica, taxonomic rules will require a new name for these Japanese hybrids, e.g., Arabidopsis kawasakiana (13). A. cebennensis and A. pedemontana are old diploids (confirmed by chromosome counts and microsatellite analysis; M.A.K. and R. Schmickl, unpublished data) and genetically well defined species with a relictual distribution in southeast France and northwest Italy, respectively. A. croatica is distantly related only to A. arenosa, but in this case secondary contact with A. arenosa in Croatia resulted in genetic admixture.
| Discussion |
|---|
|
|
|---|
5 mya (3, 4) and count the mean number of mutational steps in the ITS network (Fig. 1) from A. thaliana to any other tip of the network (30 steps), we obtain a rough estimate for the age of the inner part of the network of
2 million years, which is close to the beginning of the Pleistocene and its various glaciation and deglaciation cycles. A similar value is also obtained for ITS phylogenetic reconstructions enforcing a molecular clock (data not shown). In addition, cpDNA data favored a primary center of genetic diversity in the eastern part of its European distribution. During all of the Pleistocene, this area might have served also as an important refuge area for the various segregates of A. halleri, A. lyrata, and A. arenosa. Consequently, these refuge areas have served as a genetic reservoir for the generation of new taxa mostly within the A. halleri and A. arenosa lineages, which is reflected by the numerous taxa described from this region. Few taxa have been forced very early during their evolution into relictual areas in southeastern France and northwestern Italy (such as A. cebennensis and A. pedemontana) or Croatia (A. croatica), but theses taxa did not expand back into their (unknown) original distribution areas. Interestingly, all of these species are highly endemic with narrow distribution ranges and consequently exhibit less genetic variation than any other species. However, throughout the Pleistocene A. arenosa, A. lyrata, and A. halleri evolved differently in terms of ecological adaptation (1) and range expansion. A. lyrata was the most successful colonizer of northern regions, and our data demonstrate that A. lyrata survived glaciation periods north of the central European ice sheets in permafrost regions (21). High levels of genetic variation have been maintained because of large effective population sizes and the self-incompatible breeding system. However, more recent colonization of formerly glaciated areas such as in North America resulted in much lower levels of genetic variation. In case of A. halleri we have a significant preference for higher altitudes rather than simply harsh environments. This is also reflected by its mainly central to east European distribution in mountainous to subalpine habitats, with only one successful colonizer in eastern Asia in Japan and adjacent regions (A. halleri ssp. gemmifera). The situation in A. arenosa and its various segregates is more complex and not resolved in detail by our data. We can conclude that in A. arenosa neither our ITS nor the cpDNA data reflect any intraspecific differentiation as demonstrated by morphological or cytological variation (1). Furthermore, the ITS data demonstrate extensive genetic contact of A. arenosa with A. croatica but also A. lyrata (M.A.K., unpublished data), but not with A. halleri. Our data provide a first comparative overview on genetic diversity on all Arabidopsis segregates on a representative geographic scale. The most important finding here is that A. arenosa carries much higher levels of genetic diversity than any other species. This is best explained by a breeding system that is dominated by self-incompatibility (M.A.K., unpublished data). An additional alternative explanation for these high levels of genetic variation is past and ongoing hybridization and reticulation with A. lyrata. Such a complex suture zone has been circumscribed in Austria outside the range of the last maximum glaciation and outside the permafrost areas (M.A.K., R. Schmickl, and M.M., unpublished data).
Our aim is to contribute substantially to the poor knowledge on Arabidopsis wild relatives and, therefore, to stimulate further research in these nonmodel plants. Previous studies have already successfully focused on A. halleri and A. lyrata (1, 2), but other taxa such as A. arenosa offer some additional resources of genetic diversity and character variation. Despite differences in ecological niche differentiation and evolutionary history, all three species groups are represented by diploids (but in the case of A. arenosa and A. lyrata tetraploids have been also observed frequently) and a predominantly effectively working self-incompatibility system.
| Materials and Methods |
|---|
|
|
|---|
DNA Preparation and Sequencing. DNA extraction from dry leaf samples (either herbarium material or silica gel-dried material collected directly in the wild) followed a simple cetyltrimethylammonium bromide protocol (38). Amplification and sequencing of the ITS and the trnL intron–trnLF intergenic spacer followed the protocols and information provided earlier (trnL-F, refs. 40 and 41; ITS, ref. 42). GenBank accession numbers are provided in SI Table 1.
DNA-Based Phylogenetic Reconstructions and Networks. Alignments were created manually because of nearly identical sequence length, and indels were coded as binary characters. As for the plastid trnL-F region we did not align the 3' region of the trnL-F intergenic spacer because of extensive trnF pseudogene copy number variation and resulting ambiguities in the alignment. This reduced amount of DNA variation has been used to define "suprahaplotypes," which are mostly based on single-nucleotide polymorphisms. The DNA sequence information from the pseudogene-rich region has been used to subdivide these suprahaplotypes into a significant higher number of haplotypes (41, 43) without any further phylogenetic calculations. The cpDNA data (suprahaplotypes) have been subjected to network analysis. Suprahaplotype networks were constructed for the trnL intron–trnLF intergenic spacer. For this purpose all indels (except polyT stretches) were coded as additional single binary characters. Haplotype networks were constructed by using TCS version 1.21 (44) according to acceptance criteria outlined earlier (45). DNA sequence data from the ITS were obtained from a direct sequencing approach. In principle this DNA region is subjected to a process called concerted evolution (42), and multiple copies might indicate species-specific naturally occurring variation among loci or might demonstrate the result of more recent hybridization and reticulation. We obtained numerous sequences with ambiguous sites and not totally homogenized ITS copies. Therefore, we used the TCS program to group the 103 different sequences in total. TCS recognized 24 groups of ITS types (further named as "general types"), and we selected manually one representative sequence for each type with the lowest number (or even zero) of ambiguous sites. These 24 sequences (ITS types a–x) were used for phylogenetic reconstructions running a maximum-likelihood analysis with PAUP*4.0 (46) [options: exhaustive search, Multrees (save multiple trees), and TBR (tree bisection and reconnection) branch swapping]. Substitution models were selected by MODELTEST 3.5 (47) under the Akaike information criterion, with parameters estimated during ML exhaustive searches. The same alignment has been subjected to a network analysis by using TCS as outlined for the plastid data starting initially with a 95% confidence interval (three internal steps allowed) and adding remaining groups of sequences with the 90% confidence interval option (four steps) and finally running standard settings (allowing the maximum number of steps). Sequences from A. thaliana served as outgroup in various calculations (ITS, GenBank accession no. AJ232900; trnL intron, GenBank accession no. DQ313522; trnLF intergenic spacer, GenBank accession no. DQ528960).
Gene Diversity Statistics and Phylogeographic Inference.
For the phylogeographic analysis, individual cpDNA sequences were divided into nine regional groups based on geography and observed haplotypes. Genetic diversity was estimated as haplotype richness (R, the number of different haplotypes corrected for sample size through rarefaction) (48), nucleotide diversity (49), and effective genetic diversity (50). We estimated genetic differentiation between all pairs of regions and among all regions with an analysis of molecular variance using the program ARLEQUIN (51). Both FST, an estimate of differentiation based on allele frequencies, and
ST, an estimate of differentiation taking into account the molecular distance between haplotypes, were estimated. In an analysis of molecular variance framework these values are estimated as the proportion of variance among groups. These two estimators are analogous to GST and NST, respectively (52). In the case of correspondence between haplotype phylogenies and their geographic distribution, estimates for NST (
ST) will be greater than the GST (FST) values (52, 53). The program PERMUT (www.pierroton.inra.fr/genetics/labo/Software) tests whether the difference between the two estimates is significant by a permutation test exchanging haplotypes but conserving haplotype frequencies (54). It generates a random distribution for NST, which allows determining a P value for the observed estimate. The average of the random distribution corresponds to GST. GST/NST differs from FST/
ST in the way they treat differences in sample size (52).
We performed the same analysis for the ITS data. It should be noted that sequence ambiguities due to multiple intraindividual copies might bias the calculated genetic parameters significantly, but these analyses were kept to confirm at least general trends as demonstrated by the cpDNA data.
| Acknowledgements |
|---|
|
|
|---|
| Footnotes |
|---|
Abbreviations: cpDNA, chloroplast DNA; ITS, internal transcribed spacer region; GN, glaciated north; WA, the western Alps; GEA, the glaciated part of the eastern Alps; NGE, nonglaciated east; WC, western Carpathians; SEC, southeastern Carpathians.
To whom correspondence should be addressed. E-mail: mkoch{at}hip.uni-heidelberg.de
Author contributions: M.A.K. designed research; M.A.K. and M.M. performed research; M.A.K. and M.M. analyzed data; and M.A.K. wrote the paper.
The authors declare no conflict of interest.
Data deposition: The sequences reported in this paper have been deposited in the GenBank database. For a list of accession numbers, see SI Table 1.
This article contains supporting information online at www.pnas.org/cgi/content/full/0701338104/DC1.
© 2007 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
, C, Mitchell-Olds, T & Koch, M. (2004) Mol Ecol 13, 349–370.[CrossRef][Medline]
, C, Matschinger, M, Bleeker, W, Vogel, J & Kiefer, M. (2005) Mol Biol Evol 22, 1032–1043.
, C & Mitchell-Olds, T. (2003) Mol Biol Evol 20, 338–350.
, C, Kiefer, C, Schmickl, R, Klimes, L & Lysak, MA. (2007) Mol Biol Evol 24, 63–73.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||