Variation in sequence and organization of splicing regulatory elements in vertebrate genes

Yeo et al. 10.1073/pnas.0404901101.

Supporting Information

Files in this Data Supplement:

Supporting Table 2
Supporting Figure 6
Supporting Figure 7
Supporting Table 3
Supporting Figure 8
Supporting Text
Supporting Figure 9
Supporting Table 4
Supporting Figure 10
Supporting Figure 11
Supporting Table 5
Supporting Table 6
Supporting Table 7
Supporting Table 8
Supporting Figure 12
Supporting Figure 13
Supporting Figure 14




Table 2. Method of enumeration of nonoverlapping occurrences of an oligonucleotide

Sequence

Pattern(s)

Counts

GGGgcccGGGccc

GGG

2

GGGgCCCGGGCCC

GGG or CCC

4

GGGGCCcGGGccc

GGG or GCC

3

AATCAAacaATCAAA

AATCAA or ATCAAA

2

Consider sequences ggggcccgggccc and aatcaaacaatcaaa. Notice in the last two entries that we do not count overlapping patterns.





Supporting Figure 6

Fig. 6. (A) Binned frequencies of the lengths of Fugu and mouse introns. We distributed 64,313 Fugu introns and 74,908 mouse introns that were <2,000 bases long into 50 bins. (B) Binned scores from the linear discriminant analysis (LDA) for an independent test set of 32,156 Fugu introns and 37,454 mouse introns (trained on independent training set of the same size; details in Materials and Methods and Supporting Text). (C) Combined scores (LDA + intron length) for the independent test set of introns. From the independent test set, 85% of Fugu introns are classified as true Fugu introns and 88% of mouse introns are predicted to be true mouse introns. Genes 1, 2, and 3 are the Fugu RCN1, HD, and ARP3 genes, respectively. The locations of scores for introns 1–5 of gene 1 (1.1–1.5), 1–7 of gene 2 (2.1–2.7) and 1–11 of gene 3 (3.1–3.11) with respect to the overall distributions are indicated by lines. A minus sign (–) indicates intron retention; a plus sign (+) indicates correct splicing and ± indicates partial splicing of the corresponding Fugu intron in transgenic mice or mouse cell lines, as indicated by the literature or by our experimental analyses. The table insert contains the corresponding intron lengths (in bp) for the Fugu introns.





Supporting Figure 7

Fig. 7. (A) Classical splice signals [5′ss (Left), branch (Center), 3′ss (Right)] of Homo sapiens, Mus musculus, Danio rerio, Fugu rubripes, and Ciona intestinalis. (B) Distribution of putative branch signals upstream of 3′ss in human, mouse, and Fugu. Each point represents the midpoint of a 30-bp window.





Supporting Figure 8

Fig. 8. Frequency difference plots of RESCUE-ESE-predicted ESEs in human, mouse, and Fugu exons.





Supporting Figure 9

Fig. 9. Histograms of intron lengths (log scale). Distributions were modeled as mixtures of two log-normal distributions (red traces).





Supporting Figure 10

Fig. 10. Frequency difference plots of RESCUE-ESE-predicted ESEs in human exons flanked on both sides by introns of lengths <125 bp, 125–1,000 bp, or >1,000 bp.





Supporting Figure 11

Fig. 11. Frequency difference plots of RESCUE-ISE-predicted ISEs for human (GGG,CCC) and Fugu (ACAC,GTGT) in introns of three length groups as indicated.





 

Table 5. Domain conservation of U1 snRNP specific proteins

Accession no.

Protein name

Human

Mouse

Fugu

 

 

Predicted domain

P08621

U1-70kD

1 RRM

1RRM

No RRM

P09012

U1 A

Pro-rich 2RRM

Pro-rich

2RRM

P09234

U1 C

ZF Pro-rich

ZF Pro-rich

ZF

 

 

Ensembl ID

P08621

U1-70kD

104852

030810

124896

P09012

U1 A

077312

040518

128942

P09234

U1 C

124562

024217

120439

Swiss-Prot accession numbers, predicted domains, and the last six digits of the Ensembl gene identifiers for human (ENSG00000), mouse (ENSMUSG00000), and Fugu (SINFRUG00000) are listed. snRNP, small nuclear ribonucleoprotein; see Table 8 for domain abbreviation descriptions.





Supporting Figure 12

Fig. 12. Splicing phenotype of Fugu Arp3N1 expressed in human 293T cells includes unspliced introns 4 and 9 (a), truncated exon 5 (b), and skipped exon 7 (c).





Supporting Figure 13

Fig. 13. Rescue of Fugu ARP3 intron 4 in human 293T cells. (A) Mutants of intron 4 were generated by insertion of G triples (see main text). (B) RT-PCR of mRNA. Lanes: 1, PLHC-1 transfected with wild-type Fugu Arp3N1 (PLHC-1); 2, 293T transfected with wild-type Fugu Arp3N1(WT); 3, 293T transfected with mutant containing a single G triple insert (M2F8); 4, 293T transfected with mutant containing two G triple inserts (M5F2).





Supporting Figure 14

Fig. 14. Frequency difference plots of RESCUE-ISE-predicted ISEs in five chordates.

This Article

  1. PNAS November 2, 2004 vol. 101 no. 44 15700-15705
  1. AbstractFree
  2. Figures Only
  3. Full Text
  4. Full Text (PDF)
  5. » Supporting Information