Previous Article |
Table of Contents
| Next Article
CELL BIOLOGY
Diversification of stem cell molecular repertoire by alternative splicing
Department of Molecular Biology, Princeton University, Princeton, NJ 08544
Edited by Phillip A. Sharp, Massachusetts Institute of Technology, Cambridge, MA and approved August 16, 2005 (received for review March 15, 2005)
| Abstract |
|---|
|
|
|---|
genome | exons | introns | transcription
Sequencing of the human genome and collections of EST have facilitated global detection of alternatively spliced variants (5, 6). Because ESTs are generally derived from mature spliced mRNA populations, they provide a broad sample of mRNA diversity. Computational analyses of cDNA and EST sequences have suggested that alternatively spliced transcripts are produced from >50% of mammalian genes (7-9). Most alternative splicing occurs within protein-coding regions (10, 11). It has been suggested that alternative splicing is a general mechanism to increase the coding capacity and diversity of the genome in metazoans (12).
Recent studies have extensively characterized the global gene expression profiles of various stem cell populations. A large number of genes were found to be preferentially expressed in primitive stem cells (13-16). These studies provided a first estimate of the molecular repertoire available for stem cell regulation. Until now, analyses of alternative splicing in stem cells have been limited to individual genes (17-19). For instance, alternative splicing of the Ikaros gene produces a variety of functionally diverse transcription factors found in hematopoietic stem cells (HSC) and lymphoid progenitors (17). Global analyses of alternative splicing have been limited to heterogeneous cell populations from whole tissues (e.g., brain or bone marrow). Tissue-specific stem cells are very rare (
1 in 10,000 whole bone-marrow cells). Therefore, it would be difficult to detect splice variants specific to stem cells in whole-tissue analyses. Herein we use a combined computational and experimental approach to analyze alternative splicing in embryonic stem (ES) cells and highly purified HSC on a genome-wide scale. In addition, we investigate genome-wide trends of alternative splicing and establish relationships between levels of transcription, tissue-specific gene expression, and the frequency of alternative splicing.
| Materials and Methods |
|---|
|
|
|---|
To measure genome-wide trends of alternative splicing, we performed a separate computational analysis that characterized the distribution of splice variants among different classes of genes. For these analyses, the reference set of 8,100 full-length cDNAs representing known human genes was extracted from SwissProt (release 41), a manually annotated database (23). The EST tissue source information was extracted from the TissueInfo database (24). Detailed descriptions and complete results of the computational analyses are included in Supporting Materials and Methods and Data Sets 1-6, which are published as supporting information on the PNAS web site; additional data also are available from the authors upon request.
Experimental Confirmation of Alternative Splicing. Murine HSCs were purified from bone marrow as described in ref. 13. Panels of total RNA from human and mouse adult tissues were purchased from BD Biosciences Clontech. Specific primers flanking predicted sites of alternative splicing were used for RT-PCR amplifications with TaqGold polymerase (Applied Biosystems). PCR products were separated by agarose gel electrophoresis, and the band intensities were quantified by using the GelDoc imaging system and QUANTITYONE software (Bio-Rad). Amplified products were extracted from gels, cloned by using the TOPO cloning kit (Invitrogen), and sequenced to confirm alternative splicing. All primers, splice junction sequences, and detailed descriptions of experimental procedures are included in Supporting Materials and Methods and Data Sets 1-6 (additional data also are available upon request).
|
| Results |
|---|
|
|
|---|
3% in mouse HSC and ES cells and
9% in human ES cells. These values are close to previous estimates, considering the number of available ESTs (7). The observed variations between the stem cell populations may reflect differences in preparation and characterization of corresponding EST libraries. For example, 87% of human ES cell ESTs came from a single study that did not employ normalization or amplification (25). In contrast, the mouse HSC and ES cell libraries were derived from a large number of studies that used normalization and/or amplification (13, 26).
|
|
30% of the splice variants identified in the stem cell data sets were not found in ESTs from other tissues. We chose a set of 15 genes encoding diverse proteins, such as transcription factors and transmembrane receptors, for experimental confirmation. Prior knowledge of alternative splicing was not used in the selection of the gene sets. Using cloning and sequencing, we were able to confirm that 12 of 15 (80%) of these genes were expressed with the splice variations (Fig. 2 and Fig. 7, which is published as supporting information on the PNAS web site). Lack of confirmation for 20% of the selected genes was likely due to differences in the stem cell lines, isolation protocols, and cell culture conditions. The RT-PCR profiles across ES, HSC, and eight adult tissues showed that selected genes are expressed in stem cells with various degrees of specificity. Literature searches indicate that some of the confirmed genes play important regulatory roles in stem cell biology. For instance, it was shown that the Polycomb group transcription factor EZH2 is required for maintenance of undifferentiated cells during the blastocyst stage of early mouse development (27). The Bric-a-Brac/Poxvirus and zinc-finger domain protein Zbtb20 is a less characterized transcription factor; however, it has been shown to be involved in developmental neurogenesis (28). In the group of transmembrane genes, we identified splice variants for platelet/endothelial cell adhesion molecule-1, a transmembrane protein involved in migration of hematopoietic stem cells (29). Splice variations were also confirmed for the weakly characterized putative adhesion receptor FAD104 and the apoptotic TNF receptor TNFR-7 (Fig. 7). The majority of our experimentally confirmed variants are not described in the published literature (Table 3, which is published as supporting information on the PNAS web site).
|
Correlations Between Transcription and Alternative Splicing. To further explore the observation that alternative splicing extensively modifies regulatory genes, we analyzed the distribution of splice variants across various expression levels. Because the available stem cell EST collection is limited, we used human EST data from all tissues. More than 4,000,000 human ESTs were aligned to a nonredundant reference set of 8,100 full-length cDNAs derived from SwissProt. This database is manually annotated and therefore reliably defines bona fide gene products. Computational analyses identified 2,471 alternative splice sites within coding regions of human genes. The complete data set is presented in Data Sets 1-3.
To describe the frequency of alternative splicing in a quantitative manner, we defined a new parameter, an exon exclusion fraction, fex (Fig. 3). Computationally, the exon exclusion fraction is defined as the number of ESTs with an excluded exon divided by the total number of ESTs spanning a site of alternative splicing:
![]() |
where Nex is the number of ESTs with an excluded exon and Nin is the number of ESTs with an included exon. Experimentally, the exon exclusion fraction is defined as the intensity of DNA bands representing the exon exclusion isoform divided by the intensity of all of the bands:
![]() |
where Iex is the intensity of the exon exclusion isoform (lower band), Iin is the intensity of the exon inclusion isoform (upper band) and L is a fragment length factor. The fragment length factor, defined as the ratio between the long and short isoform lengths, is needed to correct the effect of DNA fragment length on UV absorption. Because tissue-specific EST libraries are not yet sufficiently comprehensive, our calculations were performed across all tissues. For example, 12 ESTs and 14 ESTs represent cases of exon inclusion and exon exclusion in the cAMP-response element-binding protein gene, respectively. In this case, the exon exclusion fraction is 14/(12 + 14) = 0.54.
To test for correlations between the levels of transcription and alternative splicing, the exon exclusion fractions at each splice site were analyzed as a function of EST numbers covering this site. As shown in Fig. 4A, genes represented by a high number of ESTs show a low frequency of alternative splicing; that is, the majority is represented by a single isoform (i.e., exon exclusion fractions are close to 0 or 1). We chose sets of genes represented with various numbers of ESTs to confirm the computationally identified trends (Fig. 4 B and C). Prior knowledge of alternative splicing was not used in the selection of the gene set. The selected genes with a high number of ESTs encode proteosome components PSB1 and PSB3, the exosomal component RR46, and glucosyl transferase EXT2. According to computational results, the exon exclusion fractions were close to 0.01 for these genes. When tested experimentally, some of these genes do not show detectable splicing variants, whereas others show a very low frequency of alternative splicing (Fig. 4B and Fig. 8, which is published as supporting information on the PNAS web site). In contrast, a higher frequency of alternative splicing was observed computationally and confirmed experimentally for genes represented with low number of ESTs (Fig. 4C and Fig. 9, which is published as supporting information on the PNAS web site). Selected examples include the ion channel P2X5, the hematopoietic transmembrane receptors IL-7R and CD3D, and transcription factor subunit CBFB. For these genes, the exon exclusion fraction was closer to 0.5.
|
To further confirm a correlation between a high frequency of alternative splicing and tissue-specific gene expression, we computationally compared distribution of exon exclusion fractions between tissue-specific and ubiquitous genes (Supporting Materials and Methods). Because EST libraries are not comprehensive, we normalized expression levels for genes that are represented with 10 or more ESTs across 102 tissues. Genes were defined tissue-specific if 30% or more of their ESTs originated from a single tissue (tissue specificity P value of <0.001). Genes were defined as ubiquitous if <20% of their ESTs originated from a single tissue (tissue specificity P value of >0.01). We found that the tissue-specific subset included 4-fold more genes with exon exclusion fractions within 0.5 ± 0.3, as compared with the subset of the ubiquitous genes. These experimental and computational data indicated that frequency of alternative splicing is higher for tissue-specific rather than ubiquitous genes. However, as shown in Fig. 4A, there are clearly examples of ubiquitously expressed genes with ratios close to 0.5.
Low Frequency of Exon Exclusion. Upon examination of the genes by RT-PCR, we noticed that they were usually represented by a single isoform. The formation of other isoforms is usually rare and tissue-specific. To explore this trend, we analyzed the distribution of the exon exclusion and exon inclusion events at identified alternative splice sites. If exon exclusion and inclusion were occurring at a similar frequency, we would expect to obtain a Gaussian-like distribution with a peak centered at 0.5. However, as shown in Fig. 5A, the inclusion or exclusion frequencies of individual exons were distributed in a skewed, highly nonrandom fashion. The exon exclusion fraction was <0.2 in nearly 60% of the analyzed alternative splice sites. Thus, at most sites of alternative splicing, exon exclusion was a rare event. Interestingly, very few sites showed an equal representation of both alternatively spliced isoforms. Four examples are presented in Fig. 5B, and the complete data set of 25 genes is included in Fig. 10, which is published as supporting information on the PNAS web site. As expected, the majority of the experimentally tested genes showed predominant formation of the long isoforms, whereas exon exclusion was rare and tissue-specific. The apoptosis regulator MCL1 and transcription factor TF3A gene products shown in Fig. 5B represent examples of predominant exon inclusion. In comparison, the chromodomain-helicase-DNA-binding protein-1 gene CHD1 is an example of predominant exon exclusion. The gene encoding amyloid-like protein APP2 was an example of approximately equal exon inclusion and exclusion frequencies. In general, the computationally determined exon exclusion frequencies were in good agreement with the experimental data (Fig. 5C). These findings suggest that, at the majority of alternative splicing sites, exon inclusion is constitutive and exon exclusion is a rare and tissue-specific event.
Alternative Splicing Patterns Are Weakly Conserved in Human and Mouse. The genomic structures, specifically, exon and intron boundaries, of orthologous genes are highly conserved between human and mouse species (30). A reasonable assumption is that the patterns of alternative splicing would be similarly conserved. We experimentally analyzed 20 pairs of orthologous genes for conservation of alternative splicing patterns across the same tissues (Figs. 6 and 10). In this group of genes, the analyzed human exons were also present in mouse genes, according to the available transcript data. Surprisingly, no conservation of alternative splicing was detected in 16 of the 20 tested genes. Thirteen of the orthologous genes do not show any evidence for alternative splicing in murine tissues. In the other three cases, the splicing patterns are different. Representative results are shown in Fig. 6. No alternative splicing was detected for the mouse orthologues of the TF3A, CD3D, and KLF6 genes. In addition, different patterns of alternative splicing were detected for the murine orthologue of the CD22 gene. The APP2 gene represents an example of conserved alternative splicing of a single exon. Note that, even in the conserved cases, the ratios of two isoforms in particular tissues were not preserved. Our results suggest that patterns of alternative splicing are not conserved in most of the human and mouse genes.
|
|
| Discussion |
|---|
|
|
|---|
To deepen our understanding of alternative splicing and its regulation and functional consequences, we subsequently analyzed its trends across all tissues. We found that ubiquitously expressed genes show a very low frequency of alternative splicing. It may be that the low-frequency splice variants represent the occasional infidelity of the splicing machinery. Theoretical arguments have estimated a 0.001 frequency of such errors (33). This value is similar to the frequencies at which we detect splicing variants in genes that encode proteosomal components. In contrast, tissue-specific genes appear to show a high frequency of alternative splicing. Previous studies of individual genes have shown that splicing is coupled to transcription by protein-protein interactions between components of the transcription and splicing complexes (34, 35). Taken together, these results suggest that, on the genome-wide level, coupling of transcription and splicing results in diversification of tissue-specific and regulatory gene products, with little effect on ubiquitous "housekeeping" genes. A supporting evolutionary argument is that ubiquitous transcripts responsible for crucial and general cellular processes have evolved not to be modified, whereas diversification is advantageous for tissue-specific gene products. This explanation is further strengthened by our observation of fast evolutionary changes in alternative splicing patterns. We found that these patterns were conserved for only 20% of the examined orthologous genes in the human and mouse species, despite the general conservation of their exon-intron boundaries. These observations are in agreement with results of a recently published study (36) and consistent with previous conclusions regarding the rapid evolution of alternatively spliced exons (37). Lack of conservation of the alternative splicing patterns may contribute to the previously observed differences in functional properties of analogous cell types, such as mouse and human ES cells (38).
At the molecular level, alternative splicing results from blocking of constitutive splicing sites or activation of weak (cryptic) sites (2, 4). For a large set of alternatively spliced genes, we observed that exon inclusion is predominant, whereas exon exclusion is rare and often tissue-specific. A similar conclusion was obtained in previous computational studies (37). Our experiments confirmed these computationally derived trends, which implies that exon inclusion is a default option in the overall expression process of these genes. Based on these observations, we hypothesize that repression of constitutively used splice sites in primary transcripts is responsible for the formation of most splice variants. Furthermore, such blocking is likely to occur in a tissue-specific manner. However, the observed bias toward exon inclusion may partially reflect an artificial effect from accumulations of ESTs with rare splice errors in ubiquitous genes, as discussed above.
Alternative splicing has been implicated in several cell fate decision systems (2, 4). According to our observations that multiple genes in stem cells undergo alternative splicing and that these genes often encode regulatory proteins, we hypothesize that stem cell molecular networks are more generally dependent on this posttranscriptional mechanism. Thus, understanding stem cell biology will require the complete catalog of splice variations in addition to comprehensive analyses of transcription. Our studies initiate such a catalog.
| Acknowledgements |
|---|
| Footnotes |
|---|
This paper was submitted directly (Track II) to the PNAS office.
Abbreviations: HSC, hematopoietic stem cell; NCBI, National Center of Biotechnology Information.
* To whom correspondence should be addressed. E-mail: ilemischka{at}molbio.princeton.edu.
© 2005 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
K. E. Orwig, B.-Y. Ryu, S. R. Master, B. T. Phillips, M. Mack, M. R. Avarbock, L. Chodosh, and R. L. Brinster Genes Involved in Post-Transcriptional Regulation Are Overrepresented in Stem/Progenitor Spermatogonia of Cryptorchid Mouse Testes Stem Cells, April 1, 2008; 26(4): 927 - 938. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Yang, Y. Sui, S. Xiong, S. S. Liour, A. C. Phillips, and L. Ko Switched alternative splicing of oncogene CoAA during embryonal carcinoma stem cell differentiation Nucleic Acids Res., March 4, 2007; 35(6): 1919 - 1932. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Pritsker, N. R. Ford, H. T. Jenq, and I. R. Lemischka Genomewide gain-of-function genetic screen identifies functionally active genes in mouse embryonic stem cells PNAS, May 2, 2006; 103(18): 6946 - 6951. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. V. Bowman, A. J. McCooey, A. A. Merchant, C. A. Ramos, P. Fonseca, A. Poindexter, S. B. Bradfute, D. M. Oliveira, R. Green, Y. Zheng, et al. Differential mRNA Processing in Hematopoietic Stem Cells Stem Cells, March 1, 2006; 24(3): 662 - 670. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||