Yeo et al. 10.1073/pnas.0404901101.
Table 2. Method of enumeration of nonoverlapping occurrences of an oligonucleotide
|
Sequence |
Pattern(s) |
Counts |
|
GGGgcccGGGccc |
GGG |
2 |
|
GGGgCCCGGGCCC |
GGG or CCC |
4 |
|
GGGGCCcGGGccc |
GGG or GCC |
3 |
|
AATCAAacaATCAAA |
AATCAA or ATCAAA |
2 |
Consider sequences ggggcccgggccc and aatcaaacaatcaaa. Notice in the last two entries that we do not count overlapping patterns.
Fig. 6. (A) Binned frequencies of the lengths of Fugu and mouse introns. We distributed 64,313 Fugu introns and 74,908 mouse introns that were <2,000 bases long into 50 bins. (B) Binned scores from the linear discriminant analysis (LDA) for an independent test set of 32,156 Fugu introns and 37,454 mouse introns (trained on independent training set of the same size; details in Materials and Methods and Supporting Text). (C) Combined scores (LDA + intron length) for the independent test set of introns. From the independent test set, 85% of Fugu introns are classified as true Fugu introns and 88% of mouse introns are predicted to be true mouse introns. Genes 1, 2, and 3 are the Fugu RCN1, HD, and ARP3 genes, respectively. The locations of scores for introns 1–5 of gene 1 (1.1–1.5), 1–7 of gene 2 (2.1–2.7) and 1–11 of gene 3 (3.1–3.11) with respect to the overall distributions are indicated by lines. A minus sign (–) indicates intron retention; a plus sign (+) indicates correct splicing and ± indicates partial splicing of the corresponding Fugu intron in transgenic mice or mouse cell lines, as indicated by the literature or by our experimental analyses. The table insert contains the corresponding intron lengths (in bp) for the Fugu introns.
Fig. 7. (A) Classical splice signals [5′ss (Left), branch (Center), 3′ss (Right)] of Homo sapiens, Mus musculus, Danio rerio, Fugu rubripes, and Ciona intestinalis. (B) Distribution of putative branch signals upstream of 3′ss in human, mouse, and Fugu. Each point represents the midpoint of a 30-bp window.
Fig. 8. Frequency difference plots of RESCUE-ESE-predicted ESEs in human, mouse, and Fugu exons.
Fig. 9. Histograms of intron lengths (log scale). Distributions were modeled as mixtures of two log-normal distributions (red traces).
Fig. 10. Frequency difference plots of RESCUE-ESE-predicted ESEs in human exons flanked on both sides by introns of lengths <125 bp, 125–1,000 bp, or >1,000 bp.
Fig. 11. Frequency difference plots of RESCUE-ISE-predicted ISEs for human (GGG,CCC) and Fugu (ACAC,GTGT) in introns of three length groups as indicated.
Table 5. Domain conservation of U1 snRNP specific proteins
|
Accession no. |
Protein name |
Human |
Mouse |
Fugu |
|
|
|
Predicted domain |
||
|
P08621 |
U1-70kD |
1 RRM |
1RRM |
No RRM |
|
P09012 |
U1 A |
Pro-rich 2RRM |
Pro-rich |
2RRM |
|
P09234 |
U1 C |
ZF Pro-rich |
ZF Pro-rich |
ZF |
|
|
|
Ensembl ID |
||
|
P08621 |
U1-70kD |
104852 |
030810 |
124896 |
|
P09012 |
U1 A |
077312 |
040518 |
128942 |
|
P09234 |
U1 C |
124562 |
024217 |
120439 |
Swiss-Prot accession numbers, predicted domains, and the last six digits of the Ensembl gene identifiers for human (ENSG00000), mouse (ENSMUSG00000), and Fugu (SINFRUG00000) are listed. snRNP, small nuclear ribonucleoprotein; see Table 8 for domain abbreviation descriptions.
Fig. 12. Splicing phenotype of Fugu Arp3N1 expressed in human 293T cells includes unspliced introns 4 and 9 (a), truncated exon 5 (b), and skipped exon 7 (c).
Fig. 13. Rescue of Fugu ARP3 intron 4 in human 293T cells. (A) Mutants of intron 4 were generated by insertion of G triples (see main text). (B) RT-PCR of mRNA. Lanes: 1, PLHC-1 transfected with wild-type Fugu Arp3N1 (PLHC-1); 2, 293T transfected with wild-type Fugu Arp3N1(WT); 3, 293T transfected with mutant containing a single G triple insert (M2F8); 4, 293T transfected with mutant containing two G triple inserts (M5F2).
Fig. 14. Frequency difference plots of RESCUE-ISE-predicted ISEs in five chordates.