Previous Article |
Table of Contents
| Next Article
BIOLOGICAL SCIENCES / POPULATION BIOLOGY
Just one cross appears capable of dramatically altering the population biology of a eukaryotic pathogen like Toxoplasma gondii





*Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305;
Department of Pathology, Cambridge University, Cambridge CB2 1QP, United Kingdom;
Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom;
Institute for Genomic Research, Rockville, MD 20850; ¶Department of Biology and Penn Genomics Institute, University of Pennsylvania, Philadelphia, PA 19104; ||Department of Molecular Microbiology, Center for Infectious Diseases, Washington University School of Medicine, St. Louis, MO 63110; and **Department of Veterinary Molecular Biology, Montana State University, Bozeman, MT 59717
Edited by Louis H. Miller, National Institutes of Health, Rockville, MD, and approved May 24, 2006 (received for review November 29, 2005)
| Abstract |
|---|
|
|
|---|
clonal population structure | genetic recombination | virulence
These properties have led to an unusual population structure for Toxoplasma: in North America and Europe, at least, a majority of the isolates obtained from humans and livestock appear to fall into just three clonal lineages (57). Remarkably, these three clonotypes, as well as many of the less common strains, appear to represent the recent merging of two distinct gene pools (8, 9), because among these strains there are just two major allelic types for each locus (9, 10). The success of Toxoplasma as a parasite belies this genetic simplicity: it can infect almost any warm-blooded vertebrate, and it maintains a high prevalence in many host species, including humans. It appears, then, that much of the "success" of Toxoplasma in certain geographical locales is due to the emergence of only a few lines that have dramatically enhanced fitness when compared to other genotypes (8, 9). Similar clonal population structures have been described in other parasitic protozoa (1), and although the mechanisms that led to clonal propagation may differ between species (and the clonal line), identifying the genetic basis for their emergence and eventual dominance in a large geographical area has the potential to shed considerable light on the evolution of eukaryotic pathogens. To do this, a genealogy of the known dominant strains is necessary.
To date, it has not been possible to deduce whether the three dominant lines of Toxoplasma are the result of a few or many sexual crosses between the ancestral lineages. Also, it has not been possible to deduce what the two ancestral genotypes looked like or whether these ancestral types might be represented in todays strains. These gaps in our knowledge have been due to a lack of sequence data across the genome and a genetic map to examine linkage disequilibrium. Both limitations have recently been removed (11, 12), and so we have revisited the question of how many generations separate these clonal types. The results suggest a surprising model whereby clone I and clone III are the product of just one or two crosses each between clone II (representing one lineage) and distinct but related clones (representing the other). Given this model, we deduced the other parent in each cross, and used maternally inherited plastid sequence data to sex the parents. Importantly, the data also call into question the true nature of strains previously identified as recombinants or as belonging to the type I clonotype.
| Results |
|---|
|
|
|---|
For sites with data from all three strains, type I and II SNPs represent a disproportionate fraction (84% vs. the expected 67%) of the SNPs identified. Using our initial algorithm to identify a type III SNP, the presence of two type III sequences at that site was required, whereas only one sequence each of the type I and type II strains was required. Therefore, it was possible that fewer type III SNPs were identified due to fewer type III sequences in the data set (see Materials and Methods). To determine the impact of this on the apparent paucity of type III SNPs, we repeated the analysis using criteria that eliminate such biases; i.e., two sequences per strain were required at each site regardless of the putative SNP type. This lowered the number of overall SNPs that could be identified to 2,209, as expected, but
81% (1,779) were type I or II SNPs (data not shown). The biological significance of this relative lack of type III SNPs is discussed further below, but because the reason for the identification of fewer type III SNPs was not due to sequence coverage biases, we required only one sequence per site in each of the nondivergent strains to obtain the most complete determination of the number and location of SNPs in the three clonotypes.
SNP Types Are Not Randomly Distributed Throughout the Genome. As shown in Fig. 1, large chromosomal regions are dominated by one of the three SNP types. For example, 98.7% (241 of 244) of the SNPs on chromosome XI are type I SNPs, whereas 90% (45 of 50) of those on chromosome IV are type III SNPs. There are also chromosomal regions dominated by type II SNPs. However, these latter regions always contain a larger number of atypical SNPs (i.e., other than type II, in this case) than do regions dominated by type I or type III SNPs. On chromosome X, for example, 67.2% (256 of 381) of the SNPs are type II, whereas type I and III SNPs make up 11.5% and 21.3% of the total, respectively. In the maps of all of the 14 T. gondii chromosomes, we find no clear examples of type II regions that do not have this "contamination" with non-type II SNPs (Figs. 1 and 2). Note that because the SNPs identified here are derived largely from protein-coding regions, analysis of more complete type I and III genomic sequence may reveal that the SNP rate for a given region differs from the results reported here, but there is no reason to expect that the relative numbers will change significantly.
|
|
0.65%, for chromosome Ia it is
0.03%. Although Ia is the only whole chromosome with these SNP characteristics, at least one other region shows a similarly low polymorphism percentage: the distal end of chromosome XI (Fig. 1). Specifically, no SNPs were detected in the last 0.63 Mb of chromosome XI, even though 2,046 sites had enough data to allow for SNP detection. We confirmed this by obtaining sequence data from all three major clonotypes from a contiguous 3,652-bp segment in this area (Table 1, which is published as supporting information on the PNAS web site), as well as from another area on the same chromosome predicted to contain comparatively more type I SNPs (3,603 bp; Table 1). In the low polymorphism region, three type I SNPs were identified, whereas in the distal area, we identified 11 SNPs (all type I; see Fig. 4 B and C). Based on existing EST sequence information, both of these regions encompass at least two gene products, and therefore it is unlikely that the difference in polymorphism is due to different gene content. Therefore, it appears that the region at the end of chromosome XI is also a region of low polymorphism. No other regions greater in size than 0.2 Mb that contain at least 1,000 bp of useful data (i.e., data from all three strains) have a similarly low level of polymorphism. Model for the Creation of the Three Predominant T. gondii Clonotypes. We have used the pattern of SNPs in the T. gondii genome to produce a model of the genetic history of the type I, II, and III T. gondii clonotypes. Most important to this model is the fact that chromosomes contain a maximum of two different SNP type regions (e.g., chromosome VIII; Fig. 1). These regions are found in contiguous segments with, at most, a single, clear transition point, presumably due to recombination during ancestral crosses between distinct genotypes. Because many (6 of 14) chromosomes do not have observable switch points (Fig. 1), and those that do have only one, it is likely that very few crosses occurred between the distinct ancestors that generated these dominant T. gondii clonotypes.
A priori, the simplest model for the creation of types I, II, and III would be that they are F1 progeny of a single cross between two ancestral lineages (9). Such a model would explain the polymorphism patterns observed in type I and III SNP-dominated regions, but would fail to account for the comparatively higher level of "atypical" SNPs in type II-dominated regions. Taking this latter fact into account, the most parsimonious model (in terms of the number of crosses) requires two separate crosses between ancestral versions of the present day type II strain and two distinct strains, which we have named
and
. In this model, the type II parental strain is divergent from strains
and
, which are themselves distinct but more closely related to one another than they are to type II (Fig. 3A). According to this model, a cross between a type II strain (type II1) and strain
resulted in the creation of the type I clonotype and a cross between type II3 and strain
produced the type III clonotype. We also identified five polymorphisms in ESTs derived from the genome of the maternally inherited plastid organelle (3), all of which were type III SNPs. This finding predicts that, in the cross between II1 and
, the macrogamete was provided by II1, whereas in the II3 x
cross, strain
served this role (Fig. 3A).
|
, whereas the type III strain contains the chromosome from type II3 (Fig. 3B). Diversion due to random genetic drift between II3 and modern type II (II2) before the cross could explain the extremely rare non-type I SNPs in these regions. Alternatively, the minor polymorphism in these regions could have occurred after the type I strain first arose but this seems less likely because there is essentially no intratypic polymorphism within a given type (9, 14). Similarly, for type III SNP-dominated regions (e.g., chromosome IV), type I strains contain the sequence from II1, whereas type III strains inherited this region from strain
(Fig. 3B). For predominantly type II SNP regions (e.g., the right end of chromosome VIII), type I and III strains contain sequence from strains
and
, respectively (Fig. 3B). Type II SNPs are the dominant SNP type in these regions because
and
are both more distantly related to the type II parent than they are to one another (Fig. 3A), whereas the presence of significantly more "atypical" SNPs in these particular regions is due to the presence of sequence from strains
and
in the type I and III strains, respectively (Figs. 1 and 2). The model also predicts that chromosomal regions of very low polymorphism should exist where all three modern types have inherited the type II genotype. Chromosome Ia and the extreme right end of chromosome XI are examples of this (Fig. 1). These scenarios were tested by using genetic cross simulations (see Supporting Text, which is published as supporting information on the PNAS web site) to determine the probability of the observed data. F1 progeny from a single experimental cross (15) and F2 progeny from an experimental backcross (J.P.B., J.P.J.S., M.W.W., and J.C.B., unpublished results) were used to analyze strains with a known pedigree. The peak probability for the number of crosses likely to produce the observed genotype blocks was calculated comparing a single cross between the parental lines as proposed, and up to five backcrosses. The peak probability of the observed data in the type III lineage was for a single cross (Fig. 5A, which is published as supporting information on the PNAS web site), as it was for 23 of 26 experimental F1 progeny (Fig. 5B). In contrast, and as expected, the peak probability for the all of the experimental back-cross progeny was for at least two crosses (i.e., an original cross followed by a back-cross to one of the parents as was in fact performed; Fig. 5B).
For the type I lineage, the peak probability was associated with a two cross (cross followed by back-cross) scenario (Fig. 5A). This observation is consistent with the high percentage of non-type II sequence in the type I genome. However, note that this same overrepresentation of sequence from one parent and consequent peak probability at two crosses was also observed for 3 of 26 experimental F1 progeny known to be derived from a single cross (Fig. 5B; ref. 15). Hence, as discussed further below, the data argue for two crosses being involved in the generation of type I, but in no way do they exclude the possibility that just one cross was involved.
Identifying Extant Versions of Strains
and
.
Using the proposed model, it is possible to describe the genotype of the hypothetical strains
and
and then ask whether any existing strain matches the prediction. For example, the model predicts that strain
would be essentially identical to type I strains in regions where type I inherited the
sequence, but would be divergent from type I strains at sites where this did not occur (e.g., chromosomes IV and Ia). Moreover, in regions like chromosome IV, strain
would be expected to be most similar, but not identical to, type III strains because these have the chromosome from strain
(which has relatively close ancestry with
; see Fig. 3A). The converse of all this would apply for strain
. Finally, both
and
should be divergent from all three of the major clonotypes at genomic locations where these latter strains are predicted to have the type II sequence (e.g., chromosome Ia).
Applying these criteria to sequence data from a number of strains outside of the type I, II, and III clonotypes (9), strain P89 was identified as a candidate for being strain
: five of five previously studied loci matched the criteria perfectly, including the BSR4 locus where P89 and a type III strain have the same allele, which is very different from the type II allele (Fig. 4J). To corroborate P89 as a candidate for strain
, we PCR-amplified and sequenced six additional, distinguishing loci from it and compared them to representatives of the type I, II, and III clonotypes. This additional genomic sequence was obtained from regions dominated by type I (two loci), type II (one locus) and type III (one locus) SNPs, and two regions of comparatively low polymorphism (chromosome Ia and the right end of chromosome XI). Phylograms for all of these loci, as well as four key loci sequenced previously (9), are shown in Fig. 4. At all loci examined, P89 bears all of the characteristics described above for strain
in the two-cross model (Fig. 3A and Table 1). We also sequenced relevant regions of the plastid of strain P89 and found it to be identical to the type III strain at all five identified SNPs (data not shown), confirming that it provided the macrogamete.
|
, although, as discussed further below, the close similarity of
and type I may mean that some strains previously designated type I are in fact strain
. This will require further analysis at key, distinguishing loci. | Discussion |
|---|
|
|
|---|
. Although the data are also consistent with a single cross giving rise to type I, in this case between type II and a hypothetical strain named
, probability calculations favor a two-cross model; that is, a first cross between type II and
followed by a second cross between an F1 and another strain of T. gondii, either
itself or some other strain. However, invoking such a backcross is not necessary because experimentally derived F1 progeny can be readily isolated that are similarly dominated by one parental genotype (see Fig. 5B). The fact that one cat can produce over a million F1 progeny means that "unlikely" genotypes will certainly arise and, if some of these are particularly fit, they could easily emerge as a dominant strain. Type I could well be an example of such. More information on the genealogy of the type I lineage will depend on identifying a strain bearing the characteristics of strain
, as has been done for strain
in the present study. This work provides the information needed to predict and identify such a strain, perhaps among existing isolates that may not have been examined at multiple loci.
A notable feature of the model presented here is that the type I and III clonotypes both have an ancestral type II parent. Because a majority of strains so far isolated in Europe and North America are of this type (6), this is not surprising. Less clear is why only a limited number of strains emerged from these crosses to become dominant clonotypes today. A reasonable explanation for this is that the dominant lines were clones that had a combination of alleles that conferred the most important "type II-like" traits as well as, perhaps, other traits that allowed them to out-compete both their F1 siblings and their parents. The alleles that confer the key type II-like traits are presumably within chromosomal regions where both dominant clonotypes received the type II sequence (i.e., chromosome Ia). The alleles that have allowed them to out-compete their parents are likely due to (possibly unique) combinations of alleles from the
or
lineages and the type II lineage.
An extant strain (P89) bears all of the genetic features of
in our model. The existence of such a strain not only provides support for the proposed genealogy, but also represents an opportunity to study the genetics behind the emergence of the type III clonotype. Although the phenotypic properties of the type I, II, and III lineages have been extensively studied, much less is known about P89. This strain was first identified as an acutely virulent Toxoplasma isolate based on the fact that feeding mice even a single oocyst (which are produced only via the sexual cycle in cats) resulted in death (16). However, when intermediate host stages [either bradyzoite cysts (16) or tachyzoites (17)] were used to infect mice, P89 was 10-fold less virulent than when mice were infected with oocysts. This stage-specific disparity in virulence is atypical for Toxoplasma: introduction of a single organism of any life stage of the type I virulent strain GT-1 is uniformly lethal in mice (18). These phenotypic differences can now be studied in the broader context of the proposed model, and may provide clues as to the mechanism of the emergence of the type III clonotype as a dominant strain. The ancestral II x
cross can be recreated by crossing P89 with a type II strain in cats, and phenotypic differences among the parents and progeny can be genetically mapped to individual loci using forward genetics (12) and complementation cloning (19). Note that the SNPs described here can be used as genetic markers at a frequency of about one locus per
63 Kb. This, combined with the ability to carry out experimental crosses in cats and to genotype by microarray analysis, will substantially facilitate forward genetic studies.
The model presented here also has significant implications for how the results of previous studies on the population biology of T. gondii should be interpreted. It is clear from these data that the genomic location of genes used for strain-typing will have an enormous effect on the interpretation of strain relationships. For example, strain P89 was originally considered to be a "recombinant" of the type I and type III lineages based on loci derived exclusively from type I and type II SNP dominated regions (6); however, sequence from a chromosome Ia locus in P89 revealed a completely unique allele (Fig. 4D), clearly demonstrating that P89 is not a I x III recombinant. Because >90% of the genome is dominated by either type I or type II SNPs, a majority of randomly selected loci will be from these regions, which do not distinguish type I from
. In four studies (5, 6, 10, 14) that used multilocus genotyping on T. gondii isolates, this was the case. Therefore, for future studies to determine whether isolates are "true" members of the three major clonotypes, we recommend that at the very minimum, at least one locus from each of the four region "styles" be sequenced (e.g., chromosomes XI, X, IV, and Ia; see Table 1). This approach will be applied to isolates that have been categorized as type I because the loci analyzed all fall within regions where the type I clonotype and strain
are predicted to be identical. To distinguish such strains from
, it will be especially important to sequence loci from chromosomes Ia and IV. Doing this in a number of strains previously classified as "type I" might reveal one or more to be, in fact, strain
.
From the results presented here, it is clear that a very small number of mating events is sufficient to dramatically alter the population structure of a sexual pathogen like Toxoplasma. The fact that within-clonotype polymorphism is almost nonexistent (8, 9) indicates that these key crosses occurred in very recent times followed by rapid clonal expansion. Toxoplasma is unusual in its ability to undergo such clonal expansion through transmission between intermediate (nonfeline) hosts and/or through self-mating within felines. Despite this distinction, punctuated change through genetic recombination is likely to be a key aspect of the evolution of this, and many other, sexual pathogens.
| Materials and Methods |
|---|
|
|
|---|
Sequence Assembly. All EST sequences were first compared en masse to the 10x coverage T. gondii genome by using BLASTN (20), and grouped based on their genomic location. Assemblies of each sequence group were carried out by using CAP3 software (maximum gap length, 50; overlap length cutoff, 21; ref. 21). To increase the number of identified SNPs we also included the 10x T. gondii genomic sequence (from a type II strain; www.toxodb.org) in the final CAP3 analysis.
SNP Detection. SNPs were identified in T. gondii sequence assemblies by using a custom Perl script (snp_miner_new_gen.pl; http://boothroydlab.stanford.edu/snp_maps). Consensus nucleotides for each strain were determined if >70% of the sequences for that strain were in agreement. Type I, II, or III SNPs were identified at sites with reliable consensus nucleotides for each of the three canonical strains if the divergent strain consensus was confirmed by at least two sequences. For example, if a potential type III SNP was identified (type III strain different from types I and II), at least two sequences were required from the type III strain, whereas one each from the type I strain and the type II strain was sufficient. We also attempted to detect SNPs where all three clonotypes were different, for which we required at least two sequences per strain.
Mapping SNPs to the T. gondii Genome.
The orientation and position of 63 of the largest scaffolds representing
95% of the 65-Mb genome have been determined by using BAC end sequences and genetic mapping (12). To determine the location of each SNP within the genome, the consensus sequences from the assembly were aligned to scaffolds by using BLASTN (20). To make SNP plots, the genome was examined in 500-bp windows, and if overlapping sequence data were available from each of the three clonotypes for at least 50 sites, the number of each type of SNP (I, II, or III) in that window was determined. If one SNP type was in the majority for that window, the window was colored either red (type I), green (type II), or blue (type III); if SNPs were present but no type was in the majority, the window was colored brown. For most plots, if no SNPs were identified, the window was not displayed. The height of each bar from a SNP-containing window was calculated linearly based on the overall percent polymorphism for that 500 bp window. For a subset of chromosomes, additional plots were created where regions that contained adequate data from all three strains but did not contain SNPs were colored gray. All of the maps presented here are available in an interactive format at http://boothroydlab.stanford.edu/snp_maps. To calculate polymorphism percentages in regions of the genome dominated by different SNP types, plots were analyzed by eye and segments were grouped into four chromosomal region "styles" (see Results). Within each chromosomal segment, data where at least two sequences per strain were available were considered "informative" and were used to calculate polymorphism percentages of each SNP type within that segment. This more stringent criterion (i.e., requiring two EST reads per site) was used to remove any biases associated with different numbers of ESTs available for the different strains.
DNA Sequencing. All DNA samples were obtained from strains grown in our laboratory, with the exception of strain P89 (16) which was obtained from Marie-Laure Dardé (Limoges, France). PCR products from two separate reactions were cloned and sequenced. For some P89 loci two separate linearly amplified DNA templates (Genomiphi; Amersham Pharmacia) were used in PCRs, cloning, and sequencing.
| Acknowledgements |
|---|
|
|
|---|
| Footnotes |
|---|

To whom correspondence should be addressed. E-mail: john.boothroyd{at}stanford.eduAuthor contributions: J.P.B. and J.C.B. designed research; J.P.B. and B.R. performed research; J.P.B., B.R., J.P.J.S., J.W.A., M.B., I.P., D.S.R., L.D.S., and M.W.W. contributed new reagents/analytic tools; J.P.B., B.R., J.P.J.S., and J.C.B. analyzed data; and J.P.B. and J.C.B. wrote the paper.
Conflict of interest statement: No conflicts declared.
This paper was submitted directly (Track II) to the PNAS office.
Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. DQ676958DQ676977 and DQ768301DQ768304).
© 2006 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
C. W. Lee, W. Sukhumavasi, and E. Y. Denkers Phosphoinositide-3-Kinase-Dependent, MyD88-Independent Induction of CC-Type Chemokines Characterizes the Macrophage Response to Toxoplasma gondii Strains with High Virulence Infect. Immun., December 1, 2007; 75(12): 5788 - 5797. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Khan, B. Fux, C. Su, J. P. Dubey, M. L. Darde, J. W. Ajioka, B. M. Rosenthal, and L. D. Sibley Recent transcontinental sweep of Toxoplasma gondii driven by a single monomorphic chromosome PNAS, September 11, 2007; 104(37): 14872 - 14877. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Lukes, I. L. Mauricio, G. Schonian, J.-C. Dujardin, K. Soteriadou, J.-P. Dedet, K. Kuhls, K. W. Q. Tintaya, M. Jirku, E. Chocholova, et al. Evolutionary and geographical history of the Leishmania donovani complex with a revision of current taxonomy PNAS, May 29, 2007; 104(22): 9375 - 9380. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-K. Kim, A. E. Fouts, and J. C. Boothroyd Toxoplasma gondii Dysregulates IFN-{gamma}-Inducible Gene Expression in Human Fibroblasts: Insights from a Genome-Wide Transcriptional Profiling J. Immunol., April 15, 2007; 178(8): 5154 - 5165. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. P. J. Saeij, J. P. Boyle, S. Coller, S. Taylor, L. D. Sibley, E. T. Brooke-Powell, J. W. Ajioka, and J. C. Boothroyd Polymorphic Secreted Kinases Are Key Virulence Factors in Toxoplasmosis Science, December 15, 2006; 314(5806): 1780 - 1783. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Khan, U. Bohme, K. A. Kelly, E. Adlem, K. Brooks, M. Simmonds, K. Mungall, M. A. Quail, C. Arrowsmith, T. Chillingworth, et al. Common inheritance of chromosome Ia associated with clonal expansion of Toxoplasma gondii Genome Res., September 1, 2006; 16(9): 1119 - 1125. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||