Previous Article |
Table of Contents
| Next Article
From the Cover
PLANT BIOLOGY
Gene movement by Helitron transposons contributes to the haplotype variability of maize
, 
*The Waksman Institute, Rutgers, The State University of New Jersey, Piscataway, NJ 08855; and
Department of Plant Biology, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901
Edited by Susan R. Wessler, University of Georgia, Athens, GA, and approved May 9, 2005 (received for review April 11, 2005)
| Abstract |
|---|
|
|
|---|
genome variability | Helitrons | bz locus | corn | polymorphisms
High intraspecific haplotype variability is not restricted to maize, having been recently described in barley, another species with a large amount of repetitive DNA. A comparison of the Rph7 locus in two barley cultivars established that colinearity was restricted to <35% of the two sequences, principally because of differences in retrotransposon blocks (9). Interestingly, a gene encoding a truncated helicase was present in only one of the two cultivars. On the other hand, no cases of gene acquisition or loss were found in a comparison of two different orthologous regions between rice subspecies (10, 11). This finding suggests that the type of variation detected in maize and barley may not be a general feature of plant genomes. The functional significance of the "plusminus" type of variation is also unclear, because the genes that vary among accessions of the same species are present in multiple copies (3), and many of them are clearly pseudogenes or gene fragments (8, 9). Independent of its generality or functional significance, the described variation raises an important question: How did it arise? Evidence presented here indicates that the apparent intraspecific violations of genetic colinearity in maize and, probably, barley, arise from the movement of genes or gene fragments by Helitrons, a recently discovered type of eukaryotic transposon (12).
Helitrons were found by computational analysis of genomic sequences from Arabidopsis, rice, and Caenorhabditis elegans (12) and were later reported to be the causative agents of two spontaneous mutations in maize (13, 14). These transposons account for 2% of the genomes of Arabidopsis and C. elegans but had escaped detection because they lack structural features, such as terminal inverted repeats or target site duplications, that can be easily detected by computer-assisted searches (15). Instead, the transposons have 5'-TC and 3'-CTRR termini, carry a 16- to 20-bp palindrome of variable sequence
1012 bp upstream of the 3' terminus, and insert invariably between host nucleotides A and T. The putative autonomous elements reconstructed from the Arabidopsis and rice genome sequences are large (5.515 kb) and encode proteins with homology to a DNA helicase and an ssDNA-binding protein. Although these proteins are not similar to known transposases, the predicted helicases share motifs with the replication-initiation proteins of rolling-circle (RC) replicons, which catalyze both cleavage and ligation of DNA. Hence, Helitrons were postulated to transpose by RC replication. However, the vast majority of Helitrons are nonautonomous elements that vary greatly in size and do not encode the set of proteins encoded by the putative autonomous element. Kapitonov and Jurka (12) argued that the Helitron's helicase and ssDNA-binding protein were likely to have evolved from host proteins recruited by ancestral RC transposons, because of the conservation of their exonintron structure and their similarity to known host proteins. Along those lines, Feschotte and Wessler (15) proposed that the acquisition of host genes by RC elements must have occurred frequently enough to permit the eventual capture of useful genes or exons and viewed them as potential "exon-shuffling machines."
|
| Materials and Methods |
|---|
|
|
|---|
10-kb fragment containing the four genes (cdl1, hypro2, hypro3, and rlk) that are present in the bz genomic region of McC, but not of B73, was used as query to search the genome survey sequence (GSS) maize database of GenBank (Fig. 1A). Most of the sequences in this database are from the inbred B73. One of the highest-scoring hits in the BLASTN analysis was a 937-bp bacterial artificial chromosome (BAC) end sequence (GenBank accession no. CL205862
[GenBank]
) that had homology to the coding region of the predicted gene hypro2. That BAC end came from a clone (b0570E18) that had been anchored in the maize physical map (www.genome.arizona.edu). Based on the physical map, a subset of clones that overlapped with b0570E18 was selected. PCR experiments with template DNA from the selected BAC clones and primers designed according to the BAC end sequences were conducted to confirm the presence of the hypro2 sequence in these BACs and to determine which end of the b0570E18 corresponded to the hypro2 sequence. Clone b0511I12 was chosen for sequencing because the physical map and PCR experiment suggested that it had the hypro2 sequence in the middle. The BAC clone was sequenced by the shotgun sequencing strategy on an Applied Biosystems 3730xl DNA sequencer and analyzed as described in ref. 2. Characterization of a hypro2 cDNA Clone. The B73 cDNA clone (GenBank accession no. CO522311 [GenBank] ) was obtained from the University of Arizona (www.genome.arizona.edu/orders). A transposon minilibrary was made by following the manufacturer's (Finnzymes, Helsinki) instructions, and 10 randomly selected clones were sequenced from both ends. Sequencing reactions were performed with the ABI PRISM BigDye Terminator Cycle Sequencing Ready Reaction kit V3.1 (Applied Biosystems) and analyzed on an Applied Biosystems 3730xl DNA sequencer.
PCR Amplification and DNA Sequencing. The 5' end of HelA2 and its flanking sequence were amplified from total genomic DNA of the McC line by using primer pairs cdl3/2hp-up1 and 2hp8/2hp-up2. The corresponding sequences at the 3' end were amplified with primer pairs cdl-up3/cdl7. All amplification reactions were carried out with the Expand High Fidelity PCR system (Roche). The PCR products were sequenced as described above. The following primers were used: cdl-3, ACATGGTTCCATCCACGCTT; 2hp-up1, GGTCTGTCGACTACGTTCCTT; cdl-up3, GCATCGCTGCATCAATGTCGAA; cdl7, GCAGTACAGGAGACTCGTA; 2hp8, TACAGGCACGCAGGAGCGTAGAA; and 2hp-up2, GGCTAACTGGCATGCTCTGTA.
Sequence Data Deposition. The sequences described here have been deposited in GenBank under the following accession nos.: B73 clone b0511I12, AC159612 [GenBank] ; B73 hypro2 cDNA, DQ000639 [GenBank] ; and McC HelA2, DQ003206 [GenBank] .
| Results |
|---|
|
|
|---|
475,000 B73 BAC clones, many of which have been anchored to the genetic map (6).
|
The PlusMinus Variation of the bz Genomic Region Arises from Gene Movements Mediated by Helitrons. Analysis of the BAC sequence revealed that a gene island consisting of eight genes was flanked on either side by retrotransposon clusters of the type originally described at the Adh1 locus (19) and later found to be of general occurrence in the maize genome (2, 20, 21). As shown in Fig. 2, at the left end of the gene island, there is a 6,000-bp fragment that includes cdl and hypro2 and has >99% identity with a corresponding 5,869-bp fragment from the Bz-McC genomic region. The sequences differ only in 11 SNPs, nine short indels of 17 bp, and one larger indel of 133 bp. Thus, the intraspecies breakdown of colinearity reported in the maize bz genomic region (3) can be partly explained by the movement of cdl and hypro2 from one location of the maize genome to another. Evidence presented below supports the argument that the movement is mediated by Helitrons. Hence, we have designated the element carrying cdl and hypro2 as HelA, the version of the element in 9S as HelA-1, and the one in 5S as HelA-2.
An examination of their end sequences revealed that both fragments have typical Helitron features (Fig. 3), beginning with TC and ending with CTAG, sequence motifs that occur at the 5' and 3' termini, respectively, of Helitrons. Both have palindromic sequences 11 bp upstream of the 3' terminus and are flanked by an A at the 5' terminus and a T at the 3' terminus. An alignment of their terminal sequences with those of previously described maize Helitrons reveals strong sequence conservation. Their 5'-terminal 12 bp, which are identical to each other, match those of the Helitron insertions in the sh2-7527 mutation (13) and the ba1-Ref mutation (14, 16) at 11 of the 12 positions. Their 3'-terminal 30 base pairs, which include the palindrome and differ from each other at only one site, are also conserved: the HelA-1 terminus matches those of the Helitrons in sh2 and ba1 at 26 and 24 positions, respectively. More importantly, the sequences flanking the insertion in the McC bz genomic region are identical to sequences flanking an AT dinucleotide in the bz genomic region of B73 (3) and of other lines, such as Mo17 (8) and A188 (Q. Wang and H.K.D., unpublished work), which also lack cdl and hypro2 at that location. The most parsimonious explanation for the polymorphism detected in today's modern inbreds is that the unoccupied site was never visited. This explanation is also in agreement with the current view that RC transposition does not result in excision of the element from the donor site (22). However, because the actual mechanism of transposition of Helitrons is not known, the sequence present in the bz genomic region of B73, Mo17, and A188 will be referred to as the "vacant site," to leave open the possibility that a Helitron may have resided there at one time in the past and to distinguish it from the footprint-bearing "empty sites" produced by the excision of most class II DNA transposons.
|
|
Upon discovering that a Helitron accounted partly for the difference in apparent gene content of the bz genomic region of different inbred lines, an attempt was made to determine whether the plusminus variation for the other gene sequences unique to the McC haplotype could also be explained by Helitron movement. Sequence alignments were performed between the variable genomic regions of McC and those of B73 and Mo17, both of which lack all four genes in the region (3, 8). The latter comparison proved fruitful. A 2,712-bp sequence was identified to be present in McC and absent from the corresponding location in Mo17. Similar to the sequence found in HelA, this sequence has typical Helitron features, so it has been called HelB. It begins with TC, ends with CTAG, is flanked by A and T residues at the 5' and 3' ends, respectively, and has an 8-bp palindromic sequence 10 bp upstream of the 3' end (Fig. 3). Its termini are less related to those of previously described maize Helitrons than to those of HelA and appear to be closer to those of rice Helitron2 (12). An exact vacant site was identified in Mo17 but not in B73 or A188 (Q. Wang and H.K.D., unpublished work), haplotypes that may have suffered a deletion in this region (Fig. 4). The sequence separating the two Helitrons in McC is short, just 892 bp. Together, then, HelA and HelB can account for all the genes found to be present in the bz genomic region of McC but not of B73 (Fig. 1C).
The Genic Content of the Helitrons. Fu et al. (1) concluded, on the basis of the differential hybridization of specific gene probes to RNA from either wild-type or a deletion mutant, that at least some of the genes now known to be carried by Helitrons were expressed. However, because these genes are members of multiple gene families, unambiguous evidence for their expression can be provided only by the isolation of their respective cDNAs.
A cDNA clone (GenBank accession no. CO522311 [GenBank] ) with 100% identity to the B73 hypro2 gene was identified in the maize EST database, sequenced in its entirety, and confirmed to be derived from the hypro2 gene of HelA-2 on the basis of its almost complete sequence identity (only one mismatch in 1,586 bp). The inferred hypro2 exonintron structure is shown in Fig. 1D. The transcript begins close to the 5' end of HelA, spans more than 3.5 kb of HelA sequence, and helps to define five exons, with exons 2 and 3 being separated by a large 1.8-kb intron. Conceptual translation of the transcript revealed premature stop codons in all reading frames, indicating that the cDNA clone does not encode a functional protein. Furthermore, careful examination of the gene's exonintron structure showed it to be chimeric: exons 35 correspond to exons 68 of a putative glycosyl hydrolase (GH) in rice [National Center for Biotechnology Information (NCBI) protein database accession no. BAD36734 [GenBank] and Arabidopsis (accession no. BAB09947 [GenBank] ; exon 2 corresponds to exon 2 of the GH and exon 1 is of unknown origin. A 1.8-kb intron separates exon 2 from exon 6, but the sequences for the intervening GH exons 35 are completely missing from the genomic DNA. Thus, although hypro2 is expressed, it is clearly a pseudogene.
No cDNAs corresponding to the three other genic sequences in HelA or HelB have been recovered, either from McC cDNA libraries (1) or by RT-PCR using mRNA templates from a diversity of McC and B73 tissues (data not shown). A reexamination of the structure of the predicted cdl gene shows that it, too, consists of the terminal exons of a gene with multiple exons. The cdl sequence is homologous to exons 810 from a family of genes encoding a cell-division-like protein in rice (NCBI protein database accession no. BAD53799 [GenBank] and Arabidopsis (accession no. AAN86163 [GenBank] . These exons, separated by their respective introns, are present at the 3' end of HelA1, in the orientation opposite that of hypro2. As originally annotated (1), the hypro3 gene spanned sequences that are now known to be split between HelA and HelB, but a reexamination of the sequence reveals that hypro3 is shorter and contained entirely within HelA. The hypro3 fragment is actually found within the large intron of hypro2, in the opposite transcriptional orientation, and contains coding information for a truncated protein with high similarity to a rice putative serine protease (NCBI protein database accession no. BAD82560 [GenBank] and an Arabidopsis hypothetical protein (accession no. BAB11289 [GenBank] . The genes encoding both of these proteins consist of five exons, of which exons 2, 3, and 4 and part of 5 are present in HelA. Finally, the rlk gene of HelB contains only part of the first exon of a two-exon gene annotated as a putative receptor-like protein kinase in rice (NCBI protein database accession no. BAA94519 [GenBank] and Arabidopsis (accession no. AAO64924 [GenBank] . Thus, HelA and HelB resemble the two previously described Helitron elements of maize in carrying only gene fragments.
| Discussion |
|---|
|
|
|---|
|
900 bp away from each other at a location just distal to bz in 9S. Both Helitrons are present in line McC and absent from lines B73 (3) and Mo17 (8). Both share the following structural features of Helitrons (12): (i) they begin with a TC (5' end) and end with CTAG (3'end), (ii) they have a 10- to 16-bp palindrome
11 bp upstream of the 3' end, and (iii) they are inserted at an AT host dinucleotide. This site is referred to as the occupied site in McC and as the vacant site in B73 and Mo17. The larger of the two Helitrons, HelA-1, is 5,869 bp long and carries in it fragments from three separate genes (cdl, hypro2, and hypro3) that have homology to rice and Arabidopsis genes. As documented in Results, none of these genes is complete. cdl and hypro3 are in one orientation and hypro2 is in the opposite orientation. Interestingly, hypro3 is contained within the long second intron of hypro2. An almost identical copy of this Helitron, termed HelA-2, was isolated from a chromosome 5 BAC clone of B73. HelA-2 is slightly longer (6,000 bp) as a consequence of a 133-bp indel polymorphism, yet its overall sequence is >99% identical to that of HelA-1. A 1.6-kb hypro2 transcript from HelA-2 was identified in the B73 EST collection, but the presence of premature stop codons in every reading frame indicates that this transcript does not encode a functional protein. Unlike most other transposons, Helitrons do not have clear terminal features, such as terminal inverted repeats or target-site duplication, that mark their limits. The availability, in this instance, of two closely related Helitron copies from two different genomic locations greatly facilitated the definition of the ends of the HelA transposon and the identification of the AT target dinucleotide in the B73 vacant site.
The smaller of the two Helitrons, HelB, is 2,712 bp long and carries in it a fragment of an rlk gene that has close homologues in rice and Arabidopsis. The vacant AT site for this Helitron is missing in B73 but present in Mo17. Thus, the ends of HelB could be determined only from an alignment of the McC and Mo17 genomic sequences. This comparison highlights the value of the vertical sampling of one genomic region for the precise identification of genomic sequences, such as those from complex Helitrons, that lack the strong structural features required for global genomic computational searches. A schematic diagram outlining the possible origin of an McC-type haplotype from a Mo17-type progenitor haplotype is presented in Fig. 5.
Fu and Dooner (3) speculated that, if found to be common at other locations in the genome, the plusminus variability uncovered at bz could contribute to the phenomenon of heterosis or hybrid vigor in maize. They reasoned that genes absent from certain polymorphic locations of the genome might be complemented by copies of those genes at other polymorphic locations. On the other hand, Song and Messing (7) showed that the level of expression of most zein genes that were present in one line and absent in another did not show simple additive patterns when the two lines were intercrossed. Recently, Brunner et al. (8) have found that plusminus variability may be common throughout the maize genome. However, evidence presented here indicates that the sequences displaying that kind of variability are often gene fragments ferried around the genome by Helitron transposons. Therefore, this plusminus variability, unlike that of the z1C1 locus, would contribute to heterosis, as envisioned originally, only when intact genes rather than gene fragments have been captured by Helitrons. Alternatively, the Helitron-mediated movement of large blocks of DNA into the vicinity of genes could lead to differences in gene expression from the placement of those genes within a novel sequence context.
Gene Movement by Helitrons. The Helitrons described here differ from those originally described in Arabidopsis, rice, and C. elegans, in that they lack sequences similar to replication protein A (RPA) and DNA helicases (12). They resemble, instead, other Helitron insertions previously described in maize. The Helitrons in the sh2-7527 (13) and ba1-Ref (14, 16) mutants and in a BAC clone of the 19-kDa zein gene family (16) are heterogeneous in size and contain portions of at least 11 different genes, none of which is related to RPA or DNA helicases. If, as suggested for helicase and RPA (12), these genes are being recruited from the host, the requirements for gene capture by nonautonomous maize Helitrons do not appear to be very stringent.
The capture of genes or gene fragments from the host by transposable elements has been documented in several plants (2832). The maize Helitrons share several features with the Pack-MULEs (mutator-like elements) recently described in rice (33): Most of the sequences captured are gene fragments, not complete genes; a single element can contain fragments from multiple genes; sequence acquisition is at the DNA level, as indicated by the conservation of introns; and transcripts can initiate within the element (e.g., hypro2) or outside of the element, producing chimeric transcripts (13, 16). Based on the above features of Pack-MULEs, their abundance, and the large fraction (one-fifth) that contain fragments from multiple loci, Jiang et al. (33) have argued that Pack-MULEs have the potential to create new plant genes through the multiplication, rearrangement, and fusion of fragments from multiple genomic loci. Although maize Helitrons have just begun to be characterized (13, 16), their properties shared with Pack-MULEs suggest that they have a similar potential.
The mechanism of host-sequence acquisition by Helitrons is not known, but Feschotte and Wessler (15) have proposed a model based on the observation that the transposition of RC replicons, such as IS91 (22), has minimal cis requirements. The model postulates that RC replication initiates correctly at the 5' end but that the normal 3' palindrome termination signal is bypassed, leading to the replication and capture of adjacent sequences until a new cryptic downstream palindrome is encountered that can serve as a terminator. The capture of either complete or partial gene sequences by this mechanism would depend on where in a gene the Helitron was inserted initially. Helitrons that lose their mobilization machinery would become nonautonomous elements, although, possibly, the majority of nonautonomous elements has a different origin. Given the apparently minimal requirements for transposition, nonautonomous Helitrons could be very small, as are the nonautonomous Ds1 and dTph elements in maize and petunia, respectively (34, 35). In fact, the abundant nonautonomous Helitron elements Helitrony2 and Helitrony3 of C. elegans are just 249 and 195 bp long, respectively. It is conceivable that most of the complex Helitrons of maize are not derived from autonomous elements but from these much more numerous defective elements, which could readily pick up adjacent host sequences in the presence of an autonomous element.
| Acknowledgements |
|---|
| Footnotes |
|---|
This paper was submitted directly (Track II) to the PNAS office.
Abbreviations: BAC, bacterial artificial chromosome; GSS, genome survey sequence; RC, rolling-circle.
Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. AC159612 [GenBank] , DQ000639 [GenBank] , and DQ003206 [GenBank] ).
To whom correspondence should be addressed. E-mail: dooner{at}waksman.rutgers.edu.
© 2005 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
Related articles in PNAS:
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
R. Kalendar, J. Tanskanen, W. Chang, K. Antonius, H. Sela, O. Peleg, and A. H. Schulman Cassandra retrotransposons carry independently transcribed 5S RNA PNAS, April 15, 2008; 105(15): 5833 - 5838. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Saha, S. Bridges, Z. V. Magbanua, and D. G. Peterson Empirical comparison of ab initio repeat finding programs Nucleic Acids Res., April 1, 2008; 36(7): 2284 - 2294. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Campbell, W. Zhu, N. Jiang, H. Lin, S. Ouyang, K. L. Childs, B. J. Haas, J. P. Hamilton, and C. R. Buell Identification and Characterization of Lineage-Specific Genes within the Poaceae Plant Physiology, December 1, 2007; 145(4): 1311 - 1322. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Hollister and B. S. Gaut Population and Evolutionary Dynamics of Helitron Transposable Elements in Arabidopsis thaliana Mol. Biol. Evol., November 1, 2007; 24(11): 2515 - 2524. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Fengler, S. M. Allen, B. Li, and A. Rafalski Distribution of Genes, Recombination, and Repetitive Elements in the Maize Genome Crop Sci., July 16, 2007; 47(S2): S-83 - S-95. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Liu, C. Vitte, J. Ma, A. A. Mahama, T. Dhliwayo, M. Lee, and J. L. Bennetzen A GeneTrek analysis of the maize genome PNAS, July 10, 2007; 104(28): 11844 - 11849. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. M. Springer and R. M. Stupar Allelic variation and heterosis in maize: How do two halves make more than a whole? Genome Res., March 1, 2007; 17(3): 264 - 275. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Lamb, T. Danilova, M. J. Bauer, J. M. Meyer, J. J. Holland, M. D. Jensen, and J. A. Birchler Single-Gene Detection and Karyotyping Using Small-Target Fluorescence in Situ Hybridization on Maize Somatic Chromosomes Genetics, March 1, 2007; 175(3): 1047 - 1058. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. J. Pritham and C. Feschotte Massive amplification of rolling-circle transposons in the lineage of the bat Myotis lucifugus PNAS, February 6, 2007; 104(6): 1895 - 1900. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Emrich, L. Li, T.-J. Wen, M. D. Yandeau-Nelson, Y. Fu, L. Guo, H.-H. Chou, S. Aluru, D. A. Ashlock, and P. S. Schnable Nearly Identical Paralogs: Implications for Maize (Zea mays L.) Genome Evolution Genetics, January 1, 2007; 175(1): 429 - 439. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Holligan, X. Zhang, N. Jiang, E. J. Pritham, and S. R. Wessler The Transposable Element Landscape of the Model Legume Lotus japonicus Genetics, December 1, 2006; 174(4): 2215 - 2228. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Wang and H. K. Dooner Eukaryotic Transposable Elements and Genome Evolution Special Feature: Remarkable variation in maize genome structure inferred from haplotype diversity at the bz locus PNAS, November 21, 2006; 103(47): 17644 - 17649. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Fu, T.-J. Wen, Y. I. Ronin, H. D. Chen, L. Guo, D. I. Mester, Y. Yang, M. Lee, A. B. Korol, D. A. Ashlock, et al. Genetic Dissection of Intermated Recombinant Inbred Lines Using a New Genetic Map of Maize Genetics, November 1, 2006; 174(3): 1671 - 1683. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Bruggmann, A. K. Bharti, H. Gundlach, J. Lai, S. Young, A. C. Pontaroli, F. Wei, G. Haberer, G. Fuks, C. Du, et al. Uneven chromosome contraction and expansion in the maize genome Genome Res., October 1, 2006; 16 |