EVOPRINTER, a multigenomic comparative tool for rapid identification of functionally important DNA
- *Neural Cell-Fate Determinants Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892; and ‡Office of the Scientific Director, Intramural Research Program, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892
-
Communicated by Marshall Nirenberg, National Institutes of Health, Bethesda, MD, August 10, 2005
-
Fig. 1.
evoprinter analysis of the vertebrate achaete-scute homolog 1 locus. (A) A linear cartoon of the Ascl1 locus 15 kb used in the EvoP analysis indicating the approximate locations of sequences shown in B and C (box represents transcribed region with the red-colored inner box indicating the ORF). (B and C) EvoPs were generated with 15 kb of mouse (B) or human (C) reference-DNA that included the Ascl1-transcribed sequence plus 9 kb of upstream and 3 kb of downstream flanking intragenic sequence. We searched the following test genomes: human, chimpanzee, rhesus monkey, dog, rat, mouse, oppossum, chicken, and X. tropicalis. Invariant MCSs, shared by all test species, are identified with uppercase black letters. (B.1) An EvoP, using all test species, identifies clustered MCSs within the tissue- and region-specific regulatory region of the murine Mash1 CNS enhancer. Shown is the upper DNA strand of 1.9 kb corresponding to nucleotides -8692 to -6784 5′ to the murine Mash1-transcribed region. The solid lined box denotes the 1,158-bp CNS enhancer region, and the dashed-lined inner box identifies the 472-bp domain that contains multiple tissue/region-specific regulatory elements (21). (B.2) The MCSs that are gained when X. tropicalis is excluded from the analysis are shown as uppercase red letters and when both X. tropicalis and chicken genomes are excluded from the EvoP, the additional MCSs are shown as blue lowercase letters. Nonconserved nucleotides are indicated as lowercase gray letters. (C) EvoP analysis of the ash1 proximal promoter region, transcribed sequence, and flanking 3′ intragenic sequence reveals conserved MCSs that contain cis-regulatory and protein-encoding sequences. Shown is 3.9 kb of the human hash1 gene (nucleotides -687 to +3235). The hatched line box denotes a 259-bp region that contains the proximal enhancer and tissue-specific repressor regulatory elements (22). Underlined sequences are HES-1 DNA-binding sites, red-boxed sequences are potential binding sites for IA-1 and a potential FAST-1 binding site is highlighted with red-colored letters (see Results and Discussion). The 5′ UTR of the hash1 transcript is highlighted in light blue, the transcript ORF is shown with red background (the HLH coding sequence is marked with yellow background), and the 3′ untranslated sequence is indicated with a dark blue background. Yellow nucleotides in the 3′ trailer represent potential binding sites for 13 different microRNAs (see Results and Discussion). Note that the 3′ UTR is interrupted by a 359-bp intron (annotation according to the Ensembl sequence data base).
-
Fig. 2.
evoprinter analysis of the Drosophila Kr gene. The 7.7-kb (upper strand) of a 12-kb genomic EvoP that corresponds to the D. melanogaster reference DNA (nucleotides -4,207 to + 3,531) is shown. The EvoP was generated from blat readouts of the reference DNA aligned with D. simulans, D. yakuba, D. ananassae, D. pseudoobscura, and D. virilis DNAs. MCSs that are shared by all species are shown as uppercase black nucleotides. Boxed sequences represent the cis-regulatory regions described in Results and Discussion. Underlined MCSs within the CD1/Kr730 box contain known transcription-factor binding sites (35, 36). Underlined sequences in the AD2/NS2 box contain potential HB (TTTTAGT) and PDM1 (ATTTGCAT) DNA-binding sites, respectively. The D. melanogaster Kr transcribed sequence is annotated according to FlyBase as follows: 5′ untranslated leader (light blue), protein-encoding sequence (red; Zn finger domain, yellow), and the 3′ untranslated sequence (dark blue). Note that the protein-encoding sequence is interrupted by a 373-bp intron. The underlined nucleotides in the 3′ untranslated transcribed sequence correspond to E-box bHLH binding sites. evodif analysis of the individual test species revealed that the first two nucleotides of the first E-box (red letters) are shared by all tested species except for D. yakuba and D. ananassae.
-
Fig. 3.
evoprinter identifies a small peptide gene not annotated in the Berkeley Drosophila Genome Project (BDGP) database. EvoP analysis of the intragenic 12.9-kb region between the Drosophila Appl and vnd genes uncovered a small peptide gene conserved in D. melanogaster, D. simulans, D. yakuba, D. ananassae, D. pseudoobscura, D. virilis, and D. mojavensis species. Shown is 1.5-kb of the D. melanogaster reference species (nucleotides -9,154 to -7,670 5′ to the vnd transcribed region). MCSs shared by all species are identified by uppercase, black nucleotides. evodif analysis of individual species revealed that one 5′ upstream MCS was not conserved in D. mojavensis but is present in all other species (lowercase red nucleotides). Underlined sequence in this MCS represents a consensus Hb DNA-binding motif. A protein EvoP of the encoded 40-aa peptide is also shown. Aligned with the codons, invariant amino acids residues are shown as uppercase black letters, and residues that are different in at least one of the six species tested above are shown as lowercase gray letters.








