Liu et al. 10.1073/pnas.0704258104.
Fig. 3. Number of gene islands as a function of genes per island. Gene islands at the beginning or the end of a BAC were not included because the boundary of the island was not clear. Gene island size was evaluated when hypothetical genes were included, and when excluded.
Figs. 4-8. Graphic representation of five gene-free BACs; see legend to Fig. 2 for details.
Fig. 9. Histogram of LTR retrotransposon insertion dates. Gene-containing regions correspond to BACs AF123535 (Adh1 region, from ref. 4), AF448416 (Bz1 region, from ref. 5), AY555142 (Orp1 region, from ref. 6), AY664413, AY664414, and AY664415 (9002, 9008, and 9009 regions, three random BACs that contain genes, from ref. 7. However, for the Bz1 region, insertion dates were estimated using data from Brunner et al. (7), because they are not listed in ref. 5. Dates were estimated using LTR divergence described in these papers and converted to My using a substitution rate of 1.3 ´ 10-8 substitution per site per year (8). Gene-poor regions correspond to six BACs analyzed (AC147789, AC148081, AC148159, AC148172, AC148161, and AC147809). Dates were estimated using LTR divergence and converted to My using a substitution rate of 1.3 ´ 10-8 substitution per site per year (8).
SI Text
Nonrandom Genome Component Distribution Across the Maize Genome
To provide a few examples: (i) Huck is dispersed along all maize chromosomes but is more abundant in the centromere regions than in the rest of the genome (1) and absent from the knob regions (1-3); (ii) Grande, Prem2, and Zeon are abundant in the knob regions (1, 2), while Cinful, Grande, Tekay/Prem-1, Opie, and Prem-2/Ji show staining in most knobs and are poorly represented around centromeres (3); and (iii) Cinful is more abundant in knobs containing the 350-bp TR-1 repeat than in euchromatin (3). All of these results suggest differences in insertion/retention bias for different LTR retrotransposon families.
SI Methods
Choice and Annotation of Gene-Free BACs
Annotation of the six BACs sequences was performed as follows:
(1) Detection of known maize LTR retrotransposons. The nucleotide sequence of each BAC was used as query in a BlastN search against an in-house maize LTR retrotransposon database (http://data.genomics.purdue.edu/~pmiguel/projects/retros). When the boundaries of the element were not clear (often because the genomic copy and the reference copy were too distantly related), a BlastN2 comparison (www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi) of the sequence against itself was run, and the alignment of the two LTR sequences was retrieved. This usually led to a more accurate definition of the LTR boundaries for recently inserted elements, because these LTRs are still highly conserved in sequence.
(2) Detecting unknown maize LTR retrotransposons and tandem repeats. New LTR retrotransposons were discovered by the presence of two colinear LTR sequences. A BlastN2 comparison (www.ncbi.nlm.nih.gov/blast/bl2seq/ wblast2.cgi) of each remaining region against itself permitted retrieval of aligned fragments. The region flanked by these LTR sequences was then further characterized to detect the presence of the PBS and PPT. BlastX and Psi-Blast searches against the Viridiplantae database and a tBlastX search against the maize LTR retrotransposon database were performed to determine the retrotransposon type (Copia- or Gypsy-like). New elements were named using the SanMiguel nomenclature system (www.genomics.purdue.edu/~pmiguel/name_elements/examples). Because this type of search reveals direct or indirect repeats located within a short distance, it also allowed detection of tandem repeats.
(3) Detecting other known transposable elements. The remaining regions were used as query in a BlastN search against a Poaceae repeat database (http://tigrblast.tigr.org/euk-blast) to detect known transposable elements such as MITEs, other DNA transposons and Helitrons. To complete the search for Helitrons, an ab initio search for Helitron structures was undertaken on the 74 BACs (L. Yang and J.L.B., unpublished results).
(4) Detecting previously undescribed repeats. A final BlastN search of the remaining regions against genome survey sequence data (www.ncbi.nlm.nih.gov/BLAST) from maize inbred B73 allowed identification of additional maize repeat sequences. Sequences were considered as possible repeats if at least two homologies were found in this database, using an e-value threshold of e-10.
1. Ananiev EV, Phillips RL, Rines HW (1998) Proc Natl Acad Sci USA 95:13073-13078.
2. Ananiev EV, Phillips RL, Rines HW (1998) Genetics 149:2025-2037.
3. Mroczek RJ, Dawe RK (2003) Genetics 165:809-819.
4. SanMiguel P, Tikhonov A, Jin YK, Motchoulskaia N, Zakharov D, MelakeBerhan A, Springer PS, Edwards KJ, Lee M, Avramova Z, et al. (1996) Science 274:765-768.
5. Fu HH, Dooner HK (2002) Proc Natl Acad Sci USA 99:9573-9578.
6. Ma J, SanMiguel P, Lai J, Messing J, Bennetzen JL (2005) Genetics 170:1209-20.
7. Brunner S, Fengler K, Morgante M, Tingey S, Rafalski A (2005) Plant Cell 17:343-360.
8. Kimura M (1980) J Mol Evol 16:111-120.
9. Ma J, Bennetzen JL (2004) Proc Natl Acad Sci USA 101:12404-12410.