Previous Article |
Table of Contents
| Next Article
GENETICS
Distinctions in the specificity of E2F function revealed by gene expression signatures




, ¶
*Duke Institute for Genome Sciences and Policy,
Department of Molecular Genetics and Microbiology, Medical Center, and
Institute for Statistics and Decision Sciences, Duke University, Durham, NC 27710
Edited by Ed Harlow, Harvard Medical School, Boston, MA and approved September 9, 2005 (received for review May 25, 2005)
| Abstract |
|---|
|
|
|---|
DNA microarray | transcriptional control
The E2F family is comprised of nine distinct gene products encoded by seven distinct genomic loci (7-14). The size and complexity of the E2F family of proteins reflect a complexity in function with individual E2Fs performing both distinct and overlapping roles in proliferation, apoptosis, and development (15-17). E2F1, E2F2, and E2F3a make up one subset, with each of these E2Fs functioning as a strong transcriptional activator that can induce quiescent cells to enter S phase (18-20). As cells enter mid-to-late G1, many E2F-responsive promoters are bound by E2F1, E2F2, and E2F3a, coincident with histone acetylation and gene activation (21, 22). E2F4, E2F5, and the alternative version of E2F3, termed E2F3b (9, 11), constitute the second subset of E2F family members. They are not regulated by cell growth but instead can be found at nearly equivalent levels in both quiescent and proliferating cells (17). In contrast to the activating E2Fs, E2F4, E2F5, and E2F3b are mainly involved in the repression of growth-promoting E2F-responsive genes through the recruitment of complexes to E2F-responsive promoter elements that contain histone deacetylase (1, 2) or other corepressors (23).
The complexity of transcription control for the large number of protein-coding genes in a eukaryotic cell presents a major challenge in achieving specificity of transcription control with a limited number of transcription factors. A solution to this problem has been proposed based on a combinatorial mechanism of transcription control, whereby a finite number of transcription factors yield a substantial level of complexity by working in combination (24, 25). Various studies have now provided evidence for such combinatorial specificity, involving upstream binding transcription factors as well as components of the basal transcription machinery (26). Our previous work has focused on interactions involving the E2F family of transcription factors as an example of combinatorial gene control, leading to the identification of TFE3, YY1, and Myb as transcription partners for several E2F proteins (6, 22, 27, 28). Based on these observations, we have proposed that these examples of combinatorial interactions involving E2F proteins provide a basis for the specificity of transcription control in the Rb/E2F pathway. Importantly, these studies also identified a domain within the E2F family of proteins, the so-called marked box domain, that mediated the interactions between E2F proteins and the various transcription factor partners. By implication, these findings suggest a role for the marked box domain as a specificity determinant, directing a particular E2F protein to the proper promoter via protein interaction. To address this point on a more global basis, we made use of genome-wide measures of gene expression to identify patterns of gene expression that reflect the specificity of function of the E2F1 and E2F3 proteins. In particular, we demonstrate that chimeric E2F proteins that contain either the E2F1 or E2F3 marked box domain exhibit a gene expression signature that reflects the origin of the marked box, thus linking the biochemical mechanism for specificity of function with the specificity of gene activation.
| Materials and Methods |
|---|
|
|
|---|
RNA Preparation. Total RNA was extracted from the infected cells by using TRIzol, as described in the manufacturer's instructions (Invitrogen).
DNA Microarray Analysis. All of the experiments used Affymetrix MOE430A arrays. The targets for the Affymetrix arrays were prepared according to manufacturer's instructions starting with 10 µg of total RNA. Double-stranded cDNA was synthesized by using a T7-linked oligo(dT) primer followed by second-strand synthesis. Biotin-labeled complementary RNA, produced by in vitro transcription, was synthesized and subsequently fragmented. The fragmented cRNA was hybridized to the MOE430A (Affymetrix GeneChip) arrays at 45°C for 16 h and then washed and stained with streptavidin-phycoerythrin (SAPE, Molecular Probes). Signal amplification was performed by using a biotinylated antistreptavidin antibody (Vector Laboratories). The arrays were scanned by an Affymetrix GeneChip Scanner, and hybridization patterns were detected as light emitted from the fluorescent reporter groups that have been incorporated into the target and hybridized to oligonucleotide probes. The signal intensity measurements computed in the Affymetrix MICROARRAY ANALYSIS SUITE 5.0 serve as a relative indicator of the level of expression. Scaling factors were also computed for each array based on an arbitrary target intensity of 500. Files containing the computed signal intensity value for each probe cell on the arrays (CEL files), files containing both experimental and sample information (control information files), and files providing the signal intensity values for each probe set, as derived by the Affymetrix ANALYSIS SUITE Ver. 5.0 software (pivot files), are available upon request to J.R.N. These experiments comply with the Minimum Information About a Microarray Experiment (MIAME) (32).
Statistical Analysis. Microarray data were first normalized by using the GC-RMA method (33). We used methods as described for analysis of the expression data (34). Briefly, the analysis uses binary regression models combined with singular values decompositions and stochastic regularization by using Bayesian analysis. A probability model estimates a classification probability for each of the two possible states control (CMV) vs. HAE2F1, control (CMV) vs. HAE2F3, or HAE2F1 vs. HAE2F3) for each sample. This probability is structured as a probit regression model in which the expression levels of genes are scored by regression parameters in a regression b. Analysis estimates this regression vector and the resulting classification probabilities for both training and validation samples. The estimated regression itself is important not only for defining the predictive classification but also for scoring genes according to their contribution to the classification.
| Results |
|---|
|
|
|---|
To both synchronize cells and reduce levels of endogenous E2Fs, wild-type primary MEFs were brought to a quiescent state after 48 h in starvation media. The cells were then infected with control Ad (Ad-CMV) or Ad expressing either HA-tagged E2F1 (AdHAE2F1) or HA-tagged E2F3 (AdHAE2F3). We allowed the infections to proceed for only 16 h to minimize levels of ectopically expressed E2F1 and E2F3 and to focus on those genes that may be the primary targets of E2F activity. Virus infections were titrated to achieve similar levels of E2F1 and E2F3 proteins by using a target multiplicity of infection of 150 focus-forming units per cell and assayed by Western blot analysis by using an antibody against the HA tag (data not shown). Total RNA was extracted from eight independent E2F1 and E2F3 infections and six replicates of the Ad-CMV infection. Cyclin E was induced in both E2F1- and E2F3-infected MEFs but not Ad-CMV infections as measured by Northern blot (data not shown). The same RNA was then used to generate target for application to MOE430A Affymetrix microarrays. Targets generated from E2F1 and E2F3 infections were hybridized to arrays and compared with arrays hybridized to target generated from the Ad-CMV-infected control MEFs.
Using the GC-robust multiarray average normalized values for each probe set on the array over multiple experiments, we identified genes whose expression most highly correlated with the activity of either E2F1 or E2F3. We then used this group of genes in a binary regression analysis to elucidate patterns of gene expression or principal components that represent the underlying structure present in the data. Gene expression profiles were identified that can distinguish between a quiescent cell infected with a control virus and a cell infected with a virus expressing either E2F1 or E2F3. Illustrated in Fig. 1A are genes that differentiate the control-infected MEFs from the E2F1-expressing MEFs (Fig. 1A Left). Fig. 1A Center depicts the separation of the control samples from the E2F1 samples based on the first principal component (Factor 1). A list of the 100 genes that comprise this discriminator and the estimated regression parameters are found in Table 2, which is published as supporting information on the PNAS web site. Each row in Fig. 1A represents a gene, ordered from top to bottom as a function of estimated regression coefficients. High expression is depicted as red and low expression as blue. Likewise in Fig. 1B, control MEFs can be distinguished from E2F3-expressing MEFs by a second group of 100 genes. Fig. 1B Center demonstrates the capacity of the first principal component to separate the samples. In this example, it is also evident that the second principal component (Factor 2) also provides discrimination. Again, the genes that form the discriminator for E2F3 and the estimated regression parameters are found in Table 3, which is published as supporting information on the PNAS web site.
The true test of whether a pattern truly reflects the phenotype of interest, rather than just being discovered by chance alone, is the ability to accurately predict the status of an unknown sample. To verify that the patterns do indeed represent genes reflecting the E2F activities, we used a leave-one-out cross validation to assess the ability of the pattern to predict the status of the relevant samples. One sample is removed, the remainder are used for generating the patterns for prediction, and then the removed sample is used for prediction of whether it is an E2F1- or an E2F3-expressing sample. As shown in Fig. 1A Right, the E2F1 pattern did indeed accurately predict the E2F1-expressing cells, distinguishing them from control cells. The values on the horizontal axis are estimates of the signature score from the regression, and the values on the vertical axis are estimated classification probabilities with the corresponding 95% probability intervals marked as dashed lines to indicate the uncertainty about these estimated values. All of the E2F1-expressing cells have a high probability of having the E2F1 signature, whereas the control cells have a low probability. Likewise, the E2F3 profile also accurately predicted the E2F3-expressing cells, again distinguishing them from control cells.
Consistent with previous descriptions of functions for E2F1 and E2F3, the genes identified in the signatures distinguishing either E2F1 or E2F3 from control cells include ones encoding activities necessary for cell cycle progression and involve many of the previously identified E2F target genes. In addition, focusing on the role of E2F1 as an activator of apoptosis, a comparison of the 100 gene predictor lists by using FATIGO (40) at Biological Process (41) level 4 reveals that 4.8% of the annotated genes in the E2F1 gene list (Table 1) are involved in the regulation of programmed cell death, or apoptosis. In contrast, 1.8% of the annotated genes in the E2F3 predictor (Table 2) are involved in these events, and Hells is the single overlapping gene between the two predictors.
|
|
The revealing test of the training set is illustrated in Fig. 2C by using the model trained to discriminate E2F3 from E2F1 to then predict the status of both the control samples and the samples that express the E2Fs. The samples representing E2F3-expressing cells show a high probability of E2F3 activity, whereas the E2F1 and the control samples score as a low probability of E2F3 activity.
The top genes that were selected for the ability to discriminate E2F3 from E2F1 are listed in Table 1 (the full list is provided in Table 4, which is published as supporting information on the PNAS web site). Only 6% of these genes overlap with the discriminators derived from comparison of the E2F samples vs. control samples. It thus appears that the prediction of E2F1 and E2F3 activity depends on a group of genes that are largely distinct from those that distinguish E2F activity from control cells. An examination of these genes reveals a substantial enrichment for genes encoding mitotic activities: nearly 20% of the genes selected to discriminate E2F3 from E2F1 encode mitotic activities, which is also reflected in the enrichment of Gene Ontology terms, indicating mitotic functions (Fig. 4, which is published as supporting information on the PNAS web site). The link between E2F3 and control of mitotic genes is of interest given previous data that have differentiated roles for E2F1 and E2F3 as a function of initial cell cycle entry vs. continuing growth in the presence of growth factors; whereas both E2F1 and E2F3 appear to be important for the initial S phase entry after serum stimulation, only E2F3 is required in cycling cells (31). The expression profile trained to differentiate the two E2F activities would appear to emphasize this distinction, highlighting the control of G2/M transcription as the dominant characteristic that distinguishes the two E2Fs.
Gene Expression Patterns That Reflect the Molecular Basis for E2F Specificity. In considering the specificity of transcription function exhibited by E2F1 and E2F3, we focused on prior work that pointed to a mechanism involving protein-protein interactions as the basis for the specificity. These studies identified the marked box domain as a determinant of specificity of transcriptional activation by the E2F proteins by promoting the interaction with other transcription factors to allow recognition and binding to specific target promoters (6, 22, 27, 28). In the case of E2F1, further experiments have shown that the marked box domain confers specificity of apoptosis induction, coincident with the induction of apoptotic activities such as p53 and p73 (30).
Given the identification of gene expression profiles that distinguish E2F1 and E2F3, we made use of a series of E2F chimeric proteins to determine whether these profiles reflect the function of the marked box domain. As illustrated in Fig. 3A, HA-tagged chimeric E2Fs were generated from the human E2F1 and E2F3 cDNAs and introduced into Ad as described (30). We made use of two chimeras previously described. Ad-333113 expresses a protein that contains the E2F1 marked box in the backbone of E2F3, and Ad-111331 contains the E2F3 marked box in the backbone of E2F1. MEFs were infected in four independent experiments with either Ad-111331 or Ad-333113 for 16 h, and total RNA was harvested and used to generate probes for hybridization to Affymetrix MOE430A arrays. We then examined the expression of the 100 genes selected to discriminate E2F1 from E2F3 in the cells expressing the chimeric proteins. As shown in Fig. 3B, it was evident that the profiles on these genes in the chimera-expressing cells reflected the origin of the marked box domain. As shown in Fig. 3C, the training model developed to distinguish E2F1-from E2F3-expressing cells, as described in Fig. 2, accurately identified the cells expressing the chimeric protein containing the E2F1 marked box domain.
|
| Discussion |
|---|
|
|
|---|
|
We have proposed that this specificity is mediated through specific protein-protein interactions, whereby an E2F protein must physically interact with another transcription factor in a promoter-specific fashion to generate a functional outcome. In this context, the specificity of the E2F then becomes not just the 8-bp recognition sequence but the combined promoter sequence containing the E2F site and the partner protein-binding site, together constituting what we term a regulatory module. This would then provide the necessary complexity of sequence recognition to distinguish functional sites from nonfunctional binding sites. Moreover, if the protein interactions were specific for individual E2F proteins, then this mechanism would provide a basis for distinct specificities in the activation of transcription.
Our previous work has provided evidence for cooperative protein-protein interactions as a basis for such E2F specificities. Specifically, we identified the E-box-binding factor TFE3 as an E2F3-specific partner (22, 27) and the YY1 transcription factor as a partner for E2F2 and E2F3 (28). Importantly, the specificity of transcription cooperativity involving these transcription factors was shown to reflect specific interactions in which the marked box domain of the E2F proteins mediated an interaction with the appropriate partner transcription factor. The results presented here now demonstrate that a gene expression signature reflecting the distinction between E2F1 and E2F3 also distinguishes the action of chimeric proteins that differ only by the marked box domain. As such, the results thus link the biochemical mechanism proposed for E2F specificity with the specificity seen in gene expression signatures.
This work also highlights the power of gene expression profiling to focus on distinctions that reflect similar and potentially overlapping biological phenotypes. The ability to distinguish an E2F1-from an E2F3-expressing cell was facilitated by the ability to find patterns in the massive gene expression data that reflect subtle differences in the action of the two proteins. An examination of the genes whose expression provides this discrimination reveals many of the known E2F targets as well as additional genes not previously identified as being E2F-regulated. Importantly, a comparison of the genes identified in the E2F1 vs. E2F3 profiles does reveal differences consistent with known biology. One-third of the genes identified as distinguishing E2F1 or E2F3 from control cells are involved in the control of cellular proliferation, although there was also some enrichment for apoptotic genes (4.8% vs. 1.8% in the E2F3 signature) in the E2F1 signature as annotated by using FATIGO. However, it was the signature developed to specifically distinguish E2F1 from E2F3 that revealed the most dramatic distinction, highlighted by a substantial number of mitotic genes (see Tables 1 and 4). Previous work has highlighted the role of E2Fs in the control of both DNA replication genes at G1/S and also genes encoding mitotic functions at G2/M (3-5, 43). Moreover, other work has pointed to a specific role for E2F3 in the regulation of transcription in cycling cells; whereas both E2F1 and E2F3 are important for initial cell cycle entry, only E2F3 is required once cells begin to cycle (31). The gene expression profiles that distinguish the two E2F proteins clearly emphasize this distinction, thus further highlighting the difference in function for the two E2Fs. Although we might have anticipated that the distinction between E2F1 and E2F3 could also have included apoptotic genes, clearly the dominant difference was the mitotic genes, presumably reflecting this as the primary distinction of function of the two E2Fs in the control of gene expression in growing cells.
| Acknowledgements |
|---|
| Footnotes |
|---|
Conflict of interest statement: No conflicts declared.
This paper was submitted directly (Track II) to the PNAS office.
Abbreviations: MEF, mouse embryo fibroblast; HA, hemagglutinin; CMV, cytomegalovirus; Ad, adenovirus.
Present address: Department of Pharmaceutical Sciences, University of Kentucky, Lexington, KY 40536. ![]()
¶ To whom correspondence should be addressed. E-mail: j.nevins{at}duke.edu.
© 2005 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
M. Truscott, R. Harada, C. Vadnais, F. Robert, and A. Nepveu p110 CUX1 Cooperates with E2F Transcription Factors in the Transcriptional Activation of Cell Cycle-Regulated Genes Mol. Cell. Biol., May 15, 2008; 28(10): 3127 - 3138. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Yurkova, J. Shaw, K. Blackie, D. Weidman, R. Jayas, B. Flynn, and L. A. Kirshenbaum The Cell Cycle Factor E2F-1 Activates Bnip3 and the Intrinsic Death Pathway in Ventricular Myocytes Circ. Res., February 29, 2008; 102(4): 472 - 479. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Harada, C. Vadnais, L. Sansregret, L. Leduy, G. Berube, F. Robert, and A. Nepveu Genome-wide location analysis and expression studies reveal a role for p110 CUX1 in the activation of DNA replication genes Nucleic Acids Res., January 17, 2008; 36(1): 189 - 202. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. V. Srinivasan, C. N. Mayhew, S. Schwemberger, W. Zagorski, and E. S. Knudsen RB Loss Promotes Aberrant Ploidy by Deregulating Levels and Activity of DNA Replication Factors J. Biol. Chem., August 17, 2007; 282(33): 23867 - 23877. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. H. Lu, J. D. Wright, B. Belt, R. D. Cardiff, and J. M. Arbeit Hypoxia-Inducible Factor-1 Facilitates Cervical Cancer Progression in Human Papillomavirus Type 16 Transgenic Mice Am. J. Pathol., August 1, 2007; 171(2): 667 - 681. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. McClellan, V. A. Ruzhynsky, D. N. Douda, J. L. Vanderluit, K. L. Ferguson, D. Chen, R. Bremner, D. S. Park, G. Leone, and R. S. Slack Unique Requirement for Rb/E2F3 in Neuronal Migration: Evidence for Cell Cycle-Independent Functions Mol. Cell. Biol., July 1, 2007; 27(13): 4825 - 4843. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Lu, T. Bonome, Y. Li, A. A. Kamat, L. Y. Han, R. Schmandt, R. L. Coleman, D. M. Gershenson, R. B. Jaffe, M. J. Birrer, et al. Gene Alterations Identified by Expression Profiling in Tumor-Associated Endothelial Cells from Invasive Ovarian Carcinoma Cancer Res., February 15, 2007; 67(4): 1757 - 1768. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. A. Learn, P. E. Fecci, R. J. Schmittling, W. Xie, I. Karikari, D. A. Mitchell, G. E. Archer, Z. Wei, H. Dressman, and J. H. Sampson Profiling of CD4+, CD8+, and CD4+CD25+CD45RO+FoxP3+ T Cells in Patients with Malignant Glioma Reveals Differential Expression of the Immunologic Transcriptome Compared with T Cells from Healthy Volunteers Clin. Cancer Res., December 15, 2006; 12(24): 7306 - 7315. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Mussi, C. Yu, B. W. O'Malley, and J. Xu Stimulation of Steroid Receptor Coactivator-3 (SRC-3) Gene Overexpression by a Positive Regulatory Loop of E2F1 and SRC-3 Mol. Endocrinol., December 1, 2006; 20(12): 3105 - 3119. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||