Abstract

Mapping transcriptional regulatory networks is difficult because many transcription factors (TFs) are activated only under specific conditions. We describe a generic strategy for identifying genes and pathways induced by individual TFs that does not require knowledge of their normal activation cues. Microarray analysis of 55 yeast TFs that caused a growth phenotype when overexpressed showed that the majority caused increased transcript levels of genes in specific physiological categories, suggesting a mechanism for growth inhibition. Induced genes typically included established targets and genes with consensus promoter motifs, if known, indicating that these data are useful for identifying potential new target genes and binding sites. We identified the sequence 5′-TCACGCAA as a binding sequence for Hms1p, a TF that positively regulates pseudohyphal growth and previously had no known motif. The general strategy outlined here presents a straightforward approach to discovery of TF activities and mapping targets that could be adapted to any organism with transgenic technology.
Delineation of transcriptional control networks is critical to understanding how the physiology of cells and organisms is orchestrated. One of the most surprising results of genome sequencing from yeast to vertebrates is the large amount of conserved intergenic sequence, much of which is presumably cis-regulatory (13). Moreover, in most sequenced genomes, a correspondingly large proportion of genes appear to encode transcription factors (TFs), typically 3–6% of all genes (4, 5). Even in yeast, a relatively well studied organism, physiological functions and/or DNA-binding sites remain unknown for roughly half of all apparent sequence-specific DNA-binding TFs (4, 6), suggesting that there are many more transcriptional regulatory pathways than are currently known.
Several strategies have been devised to decipher regulatory codes, but none is without caveats. Algorithms that seek conserved promoter elements (1, 2) or common sequence elements in promoters of coexpressed genes (7, 8) can identify potential cis-regulatory sequences, but do not inherently identify the binding TF. Microarray-based biochemical approaches promise to rapidly identify sequence preferences of individual TFs, but additional influences apparently contribute to site occupancy in vivo (9, 10). ChIP-chip (4, 11, 12) identifies sequences bound by a TF in vivo, but positive results often depend on identifying conditions under which the TF is DNA-bound; moreover, bound sites may not be active (13).
Artificial activation of TFs by genetic modification is a promising experimental strategy for demonstrating functionality of TFs in vivo without knowing the natural condition under which the TF acts. Devaux et al. (14) showed a nearly perfect correspondence between the target genes activated by a well studied gain-of-function mutation in PDR1 (PDR1-3), and those activated by an inducible fusion protein consisting of the Pdr1p DNA-binding domain (DBD) and the Gal4p activation domain. Other studies have examined the effects of overexpressing native TFs (1517). However, to our knowledge, this general approach has not yet been tested on a large scale to ask whether it is generally effective in specifically activating primary targets of TFs, or whether there is any way to determine which TFs are likely to be amenable to this type of experimentation.
In a systematic genetic screen using an ordered clone set overexpressing full-length ORFs from the GAL promoter (18) we found that 57 of 175 yeast TFs tested (32.6%) caused growth inhibition when overexpressed. This number is more than twice as many as would be expected by chance: over the entire genome, we found that only 769 of 5,280 (14.6%) of genes caused growth inhibition, and in fact TFs are among the functional classes that are most toxic when overexpressed (18). This finding suggested that in many cases a TF might be activated by simple overexpression, even if the TF is not normally active under the specific growth condition used. To ask whether this is the case, and, if so, whether the resulting transcription profiles reflected known or apparent physiological functions of the TFs, we have now analyzed these TF overexpression strains by using DNA microarrays. Here, we show that in many cases the induced genes correspond to physiological functions and known targets and that expected binding sites of the TF can usually be identified in the promoters of these genes. Markedly fewer expression changes were observed in deletion mutants of this same collection of TFs, consistent with the view that specific regulatory events or conditions are prerequisites for activation of many TFs. We demonstrate that the basic helix–loop–helix family member Hms1p (19) binds in vitro to a cis-regulatory sequence predicted from the overexpression data and that overexpression of two of the apparent target genes causes the same pseudohyphal growth phenotype displayed by cells overexpressing HMS1. Together, these results suggest that analysis of gene expression in organisms in which TF overexpression causes a visible phenotype, a phenomenon we term “phenotypic activation,” represents a straightforward approach for rapidly characterizing TFs and mapping regulatory networks on a large scale.

Results

Overexpression of TFs Results in Diverse and Dramatic Transcriptional Responses.

Our previous analysis (18) identified 57 TFs that caused growth inhibition when overexpressed. An initial two-color microarray expression analysis of one of these, GAL-GCN4 (compared with the empty vector control; Fig. 4, which is published as supporting information on the PNAS web site), showed that many of the induced genes were known physiological targets of Gcn4p (20) and that virtually all of the catalogued Gcn4p targets were induced (see below). Gcn4p is a well characterized example of a TF whose deletion is phenotypically benign except in specific circumstances; nearly all genes encoding amino acid biosynthetic enzymes are induced by Gcn4p in amino acid-deprived cells (21). Our observation that overexpression of GCN4 was sufficient to induce a physiologically relevant response suggests that simple mass action may produce a relatively “natural” hyperactivation state and that the growth inhibition may be caused by inappropriate induction of the biosynthetic pathways controlled by Gcn4p. Moreover, Gcn4p DNA-binding sequences were enriched specifically among genes with the highest ratios (Fig. 5, which is published as supporting information on the PNAS web site) and the known Gcn4p DNA-binding site was perfectly recapitulated by seeking sequence motifs that correlated with the degree of gene induction (see below).
To ask whether Gcn4p is an exception, and whether overexpressing different TFs resulted in different transcriptional responses, we analyzed a total of 55 TF overexpression strains by using microarrays (the mating-type determinants MATα2 and HMRa2 were omitted) and corresponding deletion mutants of 51 nonessential TFs for comparison (grown under a single standard condition). A fluor-reversal strategy was used in which mRNA from each strain was compared with mRNA from an empty-vector control (in the case of overexpression strains) or WT strain (in the case of deletion mutants) twice, with the red/green fluors reversed in the replicate. Each strain was examined at a single 3-h time point, as a time course of GCN4 induction indicated that little information is gained from taking additional time points (Fig. 6, which is published as supporting information on the PNAS web site, and data not shown).
To isolate experiments in which the transcriptional alterations could not be accounted for by measurement noise or effects caused by slow growth, we identified those in which (i) the replicates had a Pearson correlation >0.3, which typically separates physiologically unrelated experiments (22), or (ii) the fluor-reversal experiments have reciprocal best-matching correlations among all dye swaps and vice versa. Forty-six TF overexpression experiments passed at least one of these criteria, indicating that the vast majority of TF overexpression microarray data contained distinctive and prominent patterns. This finding is illustrated in Fig. 1A, which shows all genes induced in any of the 46 overexpression experiments. In contrast, only 10 of the TF deletion microarray experiments passed these criteria (all of which were TFs represented among the 46 passing overexpressors) largely because there were few expression changes in these mutants beyond measurement noise, such that few experiments contained a distinctive pattern. Fig. 1B illustrates that there is less expression change in deletion mutants versus overexpressed TFs and also suggests that there is little correspondence between the genes induced upon overexpression and those whose expression is reduced in the deletion mutant (Fig. 1 A and B). Thus, it is possible that many TFs are inactive under typical unstressed growth conditions, which could account for the fact that it has been difficult to obtain meaningful ChIP-chip data for roughly half of all apparent yeast TFs (4).
Fig. 1.
Microarray expression data resulting from overexpression and/or deletion of 57 TFs that cause growth inhibition when overexpressed. Only TFs that contain expression profiles significantly above microarray noise when overproduced are shown. The diagram shows all 5,222 genes represented on the array after removal of dubious ORFs, transposable elements, mitochondria-encoded genes, and bad spots on the array. (A) Overexpression experiments. z-score-transformed data are shown (see Methods). Genes are ordered such that those with the greatest level of induction when a given TF is overexpressed are grouped, and then TFs are ordered according to the number of genes meeting this criterion. The color scale reflects z-score, which reflects noise-corrected log(ratio) (see Methods) and extends from ≈3-fold induction (red) to ≈3-fold reduction (green). (B) Microarray expression data (z-score transformed) of the corresponding deletion mutants. Rows and columns are in the same order as A, except that four essential TFs are missing. (C) Induction of specific functional classes of genes in response to TF overexpression. The columns are in the same order as A. Induction was scored with the WMW P value (see Methods).

Overexpression of TFs Induces or Represses Known Targets and Pathways.

Three lines of evidence indicate that genes induced in these experiments are likely to be physiological targets. First, most of the TF overexpression experiments displayed specific and significant induction of genes in one or more Gene Ontology categories, using the Wilcoxon–Mann–Whitney (WMW) test, which calculates a P value (Fig. 1C) for differences in the median expression ratio ranks between genes that are in a given category and those that are not. In many cases, the significant categories were related to the known specific functions of the TF. For example, whereas amino acid biosynthesis categories were induced by overexpression of GCN4, overexpression of UPC2 or ECM22 (23) resulted in a general induction of genes in the ergosterol biosynthetic pathway (Fig. 1C). We obtained similar results for known repressors (e.g., ROX1), which are much fewer in number in our study (data not shown). These trends were readily distinguished even when the experiments also contained common transcriptional alterations characteristic of growth inhibition such as induction of stress-response genes and reduction of protein biosynthesis genes; these are visible as horizontal red and green bands in Fig. 1A.
Second, among the transcriptional activators and repressors we analyzed, and for which known target genes are present in TRANSFAC (24), we generally observed induction or repression of appropriate targets. Fig. 2 shows a comparison of WMW P values obtained for TRANSFAC targets for our overexpression data and “ChIP-chip” experiments done with these same TFs (4) (Fig. 2). As above, these tests measure how well the known targets are sorted to the top of the ranked list of genes. In most cases, overexpression yielded more significant discrimination of known targets than ChIP-chip by this test. For example, the three known Adr1p targets (ACS1, CTA1, and ADH2) have significantly higher ranks among induced genes in our data (7, 13, and 15 of 5,222), in comparison to their ranks in ChIP-chip data (1,359, 2,510, and 3,148 of 6,229). Cases where ChIP-chip yields greater significance may represent instances where overexpression does not result in induction of physiological targets; Ino2p is likely such an example. However, others may involve sampling artifacts: Met4p has only two targets in TRANSFAC but only one of them (MET16) is present in our final data set (where it is ranked 450 of 5,222).
Fig. 2.
Behavior of known TF targets in response to overexpression or deletion of the TF and compared with a similar analysis of genomewide ChIP-chip data from Harbison et al. (4). Each point indicates, for the TF indicated, the WMW P value (see Methods) for the difference of medians between the ranked TRANSFAC targets and those of all other ORFs; i.e., a point with a higher −log(P) value indicates that the median of TRANSFAC targets is shifted higher toward the top of the ranked list of genes. For our data, the z-scores are ranked; for Harbison et al. data, the P values are ranked. Only TFs with P < 0.05 in either Harbison et al. or this study are shown.
Third, among the 25 TFs in our experiments with well known DNA-binding specificities, in 15 cases the established sites with at least a 75% match (i.e., 75% of the bases in the known motif were present in the found motif, without gaps) were identified in de novo motif searches, often as the top-scoring motif (Fig. 3). We initially ran a Gibbs sampling program (BioProspector) (25) on the highest 10, 30, and 50 scoring genes in each experiment; however, these analyses were often confounded by stress response elements (CCCCT) appearing in many of the induced genes, presumably as a secondary effect. We therefore developed a probabilistic inference algorithm called RankMotif (see Methods) that seeks both a motif specific to the individual experiment and a second motif that pervades multiple experiments. In addition to identifying known motifs, RankMotif generated high-scoring predicted binding sites for several TFs without established binding specificities. The full results are available on request. Fig. 3A shows the nine top-scoring transcriptional activators for which a binding specificity is known; in eight cases, we obtained at least a partial match (underlined in purple). Fig. 3A also shows motifs predicted for nine TFs for which there is no established binding specificity but for which the RankMotif z-score is comparable to the nine known activators shown.
Fig. 3.
Promoter analysis of differentially regulated genes in response to TF overexpression. (A) Motifs identified by RankMotif compared with known DNA-binding motifs for overexpressed TFs. Binding sites are displayed as logos in which the height of each letter is proportional to its weight in determining the motif. The purple underlined portion indicates bases consistent with the known binding site. The likelihood of the known motif matching the RankMotif consensus is given (formula and code are available on request). The orange underlined portion of the HMS1 motif shows the six bases that match the gel-shifted segment in B. (B) Gel mobility-shift assays. The purified DBDs of Gcn4p, Upc2p, and Hms1p TFs were incubated with oligonucleotides containing two tandem copies of the motif sequence predicted by RankMotif. The same amount of purified MBP-TF DBD was used for each oligonucleotide in the binding reaction.
The fact that known TF targets, expected functional categories, and known binding sites can be readily identified in these data indicates that there is a strong tendency for TF overexpression to cause meaningful transcriptional alterations. Although we cannot assume that all of the genes induced by overexpression of a TF are primary physiological targets (they might encompass both physiological and nonphysiological secondary effects and nonphysiological primary targets that are induced by overexpression) we reasoned that these data should facilitate identification of TF functions, target genes, and DNA-binding sites.

HMS1 Overexpression Induces Pheromone-Responsive and Metabolic Genes, and Hms1p Binds 5′-TCACGCAA.

Figs. 1 and 3A contain undiscovered functions, targets, and binding sites for a variety of yeast TFs. Among the poorly characterized TFs for which overexpression yielded both induction of significant Gene Ontology categories and a predicted binding motif, and for which previous ChIP-chip experiments produced no readily interpretable results (4), was HMS1 (high copy Mep suppressor). HMS1 encodes a basic helix–loop–helix protein implicated in pseudohyphal growth because ectopic expression promotes filamentation and suppresses the pseudohyphal defect of the high-affinity ammonium permease-deficient Δmep2mep2 strain (19). However, the precise physiological role of Hms1p remains obscure: there are no known target genes or pathways of transcriptional activation by Hms1p, and no Hms1p DNA-binding sites have been identified either in vivo or in vitro. In our microarray data, HMS1 overexpression induced some of the same genes induced by STE12 in response to pheromone (17) and genes in a variety of metabolic pathways (Fig. 7A, which is published as supporting information on the PNAS web site), either of which could provide a potential mechanism for its morphological effect: STE12 is required for pseudohyphal growth (26) and nutritional cues stimulate filamentous growth (27).
Our data also led to a predicted binding consensus for Hms1p. We performed gel-shift assays with purified Hms1p DBD on specific sequences corresponding to some of the top-scoring degenerate motifs identified by RankMotif (Fig. 3A) and detected strong binding to 5′-TCACGCAA (Fig. 3B), which overlaps six of the bases shown in Fig. 3A. Binding of Hms1p-DBD to the 5′-TCACGCAA motif is specific, as we observed no binding of Hms1p-DBD to the consensus motifs of Gcn4p and Upc2p (Fig. 3B) nor to other sequences tested, including some other variants of the consensus (data not shown and Fig. 8, which is published as supporting information on the PNAS web site). We then examined whether genes that are up-regulated in response to HMS1 overexpression and contain exactly the 5′-TCACGCAA motif have a role in promoting pseudohyphal growth in a WT Σ1278 strain. We found that overexpression of either URA10 or YPC1, which encode an orotate phosphoribosyltransferase and alkaline ceramidase, respectively (28, 29), promotes pseudohyphal growth and suppresses the pseudohyphal defect of the Δmep2mep2 diploid strain (Fig. 7B), although neither URA10 or YPC1 is by itself required for the HMS1 hyperfilamentation phenotype (Fig. 7C). Intriguingly, there is evidence that both uracil biosythesis and sphingolipid content impact pathogenesis and/or filamentation in pathogenic yeasts (3033).

Discussion

Our results show that phenotypic activation of TFs is feasible as a general approach to identifying TF activities, targets, and binding sites. Although further experimentation of individual cases will be required to conclusively distinguish all primary and secondary effects, the simple transient overexpression applied here yielded unique and meaningful results for the majority of TFs analyzed and these could be interpreted by objective statistical and machine learning techniques. Importantly, this approach appears to be much more fruitful than analysis of deletion mutants, possibly because most TFs are not active under typical growth conditions. Moreover, our results with Hms1p and other TFs (Fig. 3B) indicate that the approach also appears to be able to identify TF functions and targets not easily accessible by either phylogenetic footprinting or ChIP-chip. We note that overexpression is only one type of artificial activation; other groups have fused TF DBDs to constitutive activation domains (14, 34). However, our results indicate that in many cases overexpression of the native protein, which may contain domains besides the DBD that are required for proper physiological function, will suffice for phenotypic activation.
The fact that the genes induced upon overexpression of TFs tend to include the bona fide targets argues that TF occupancy can be an important factor in the rate of transcription of many genes, because the simplest explanation is that overexpression increases occupancy by mass action. The observation that overexpression of TFs often causes growth inhibition suggests that cells are sensitive to aberrant activation of a variety of different pathways, and/or that there are signals that sense inappropriate pathway activation and reduce division rate. Consistent with this idea, our original study (18) also identified many signaling molecules that cause growth inhibition when overexpressed, presumably because they activate their targets similarly, in an unregulated manner. Notably, when we generated microarray profiles of 23 well characterized TF overexpression strains that did not exhibit a reduced fitness when overexpressed a similar analysis to that shown here indicated that all were inactive (data not shown). These results indicate that overproduction of these TFs is not sufficient for their activation, although it remains possible that some of the TF-fusion proteins are nonfunctional. However, our simple initial phenotypic screen was sufficient to identify these constructs as unlikely to be worth pursuing.
Importantly, the general phenotypic activation approach described here, an initial screen for a visible phenotype upon TF overexpression, coupled to subsequent microarray analysis and a battery of statistical analyses, could be applied systematically in any organism for which an inducible transgene can be introduced, using commercially available custom oligonucleotide microarrays (35). There are already numerous instances in organisms ranging from microbes to mammals in which overexpression of TFs results in morphological abnormalities (3638). It will be intriguing to determine whether expression profiling in these samples reveals induction of specific pathways whose genes contain binding sites for the TF in question. It will be equally fascinating to explore cases where pathways are also induced or repressed that do not appear to be direct targets of the TF, because such cases may result from transcriptional cascades. HMS1, for example, appears to positively regulate genes involved in several diverse pathways, including several that have dedicated TFs, and do not appear to contain Hms1p-binding sites in their promoters (Fig. 7C). In such situations, cause-and-effect relationships can often be determined by using epistasis analysis, a traditional genetic approach to mapping pathways that has itself been shown to be amenable to a microarray readout (17, 39).

Methods

Microarray Experiments.

Strains carrying 2μ plasmids that contain TFs regulated by the GAL1 promoter were derived from a yeast overexpression array (18). For microarray experiments, the TF overexpression and empty vector control strains were grown concurrently in selective medium supplemented with 2% raffinose before induction with 2% galactose for 3 h, whereas TF deletion mutants and the isogenic WT strain were grown in synthetic medium supplemented with 2% dextrose. Procedures detailing culturing, RNA preparation, hybridizations, image acquisitions, and data processing for microarrays are described in Grigull et al. (22).

Microarray Data Normalization.

Spatially detrended and Lowess-smoothed microarray data were obtained using protocols and microarrays as described (40). The output of this procedure is a normalized log ratio of intensity for each spot in the mutant strain versus the WT strain and the average log intensity for each spot in the two strains. The normalized log ratio itself is not a good measure of the significance of up- or down-regulation of the spot because the SD of the log ratios of unaffected spots decreases as a function of the average spot intensity. We transformed the log ratios into intensity-independent measures of significance of regulation, by calculating a z-score for each log ratio by dividing it by a robust estimate of the SD of unaffected spots (on the same array) with similar average intensities. Specifically, for each spot i with a log ratio of ri, its z-score, zi = (rimi)/si, where mi and si are the median and median absolute deviation, respectively, of the log ratio of all spots with average log intensities within 0.25 log units of spot i. These z-scores typically correspond to five times the log2(ratio). Microarray data before and after normalization and transformation will be available at the National Center for Biotechnology Information GEO database.

WMW Tests for TF Target Enrichment.

Lists of yeast TF targets were downloaded from TRANSFAC (24). In total, binding data from Harbison et al. (4) and overexpression data from this study were available for 25 TFs in the TRANSFAC list. For each TF, we compared the log ratios of the TRANSFAC targets versus the nontargets in the overexpression assay with a two-sided WMW test. We also compared the Harbison et al. binding P values of TRANSFAC targets versus nontargets by using a one-sided WMW test. For some TFs, Harbison et al. provide binding data for the TF under multiple growth conditions; in those cases, we assigned the TF the lowest P value among all of the conditions and then multiplied the P value by the number of conditions to correct for the multiple testing.

WMW Tests for Gene Ontology Functional Enrichment.

Gene Ontology annotations (provided by the Saccharomyces Genome Database) were downloaded from www.geneontology.org on October 5, 2005. For each overexpression or deletion mutant, and for each Gene Ontology Biological Process (GO-BP) category containing >10 ORFs represented on our microarray, we used a two-sided WMW test to compare the z-scores of the ORFs annotated and unannotated in the given GO-BP category. We controlled for multiple testing by using the Benjamini–Hochberg procedure to calculate false discovery rate. In Fig. 2C, only the P values of significantly enriched TF/GO-BP pairs are shown; any pair with a false discovery rate > 0.01 is assigned a P value of 1 (i.e., appears as white).

Extraction of Yeast Promoters.

Intergenic sequences were downloaded from the Saccharomyces Genome Database on October 13, 2005 (ftp://genome-ftp.stanford.edu/pub/yeast/sequence/genomic_sequence/intergenic). Promoters were defined as the intergenic sequence spanning the region immediately upstream of the start position of a given ORF to the end position of the upstream neighboring ORF. ORFs annotated as “dubious” were omitted from analysis. A FASTA file of promoter sequences is available on request.

Motif Finding Using RankMotif.

RankMotif is a probabilistic inference algorithm that finds degenerate consensus sequences (taken as a motif model) that are overrepresented in the promoters of high-ranking ORFs in a ranked list. The input to RankMotif is a ranked list of ORFs and their associated promoter sequences. For a single TF, RankMotif searches for the highest-scoring degenerate consensus sequence. To model a stress response that is shared by multiple overexpression experiments, we introduced a shared motif model that is the same for all TFs. By incorporating the shared motif model, the score of an individual model is the maximum of the original score and the score calculated based on the sum of the ranks of all ORFs whose promoters contain either or both motifs. RankMotif iterates between updating the shared motif model given the current individual motif models (by modifying positions and shifting the alignment of the motif right and left by a single base), and updating the individual motif models given the current shared motif model. The search ends when the current state has a higher score than all possible updates. In the experiments described here, we also attempted to avoid some of the drawbacks of greedy search by also maintaining and updating a set of 19 suboptimal motif models for each TF and for the shared motif model. To allow for strand preference, we also ran RankMotif on three different sets of promoters for the ORFs consisting of the sense strands, antisense strands, and both. RankMotif was run for five iterations for each of these three promoter sets. We found the top specific and nonspecific motifs and scores for the three strand options. The individual motif models reported were those that had the highest score among the RankMotif output for the three promoter sets. Full technical details will be described elsewhere (Q.D.M., unpublished work).

Purification of DBD and Gel Mobility Shifts.

The DBDs and 10–15 flanking amino acids of Gcn4p (amino acids 206–281), Upc2p (amino acids 1–120), and Hms1p (amino acids 256–360) were PCR-amplified and fused at their N termini to the maltose-binding protein (MBP) by cloning into the pMAL-C2 vector. The fusion proteins were expressed in BL21 (DE3) cells and purified with amylose resin (NEB, Beverly, MA). The gel-shift probes consisted of two tandem copies of the 8-mer motif representing the TF binding site followed by 16 nt of nonyeast sequence common to all of the probes. Sequences were as follows: GCN4, 5′-ATGACTCAATGACTCACCTCGGCTGCAGGTAC-3′; UPC2, 5′-ATCGTTTAATCGTTTACCTCGGCTGCAGGTAC-3′; and HMS1, 5′-TCACGCAATCACGCAACCTCGGCTGCAGGTAC-3′. For the binding reaction, 0.1 pmol of 5′-32P-end-labeled probe and purified MBP-TF DBD was incubated with gel-shift reaction buffer (10 mM Hepes, pH 7.8/75 mM KCl/2.5 mM MgCl2/1 mM DTT/3% Ficoll) at room temperature in a 10-μl binding reaction. Final protein concentrations were: Gcn4p-DBD, 119 nM; Upc2p-DBD, 107 nM; and Hms1p-DBD, 129 nM. After 1 h, 3 μl of 20% Ficoll (Sigma, St. Louis, MO) was added, and the reaction was loaded onto a 5% nondenaturing acrylamide gel and then visualized with a PhosphorImager (Bio-Rad, Hercules, CA). The same amount of purified MBP-TF DBD was used for each probe in the binding reaction.

Data Availability

All microarray data (before and after z-score transformation), spreadsheets underlying the figures, lists of known TF targets, WMW scores for all functional categories in all experiments, a table of properties of the TFs, and algorithms for computing the significance of motif matches in Fig. 3A are available on request. Microarray data will be available at the National Center for Biotechnology Information GEO database.

Abbreviations

TF
transcription factor
WMW
Wilcoxon–Mann–Whitney
DBD
DNA-binding domain
MBP
maltose-binding protein.

Acknowledgments

This work was supported by grants from the Natural Sciences and Engineering Research Council (to T.R.H., C.B., and B.J.F.) and funds from Genome Canada through the Ontario Genomics Institute (to T.R.H., C.B., and B.J.A.). G.C. was supported by a Charles H. Best Postdoctoral Fellowship. Q.D.M. was supported by a Natural Sciences and Engineering Research Council postdoctoral fellowship. R.S. was supported by a National Cancer Institute of Canada Terry Fox Foundation research studentship.

Supporting Information

Adobe PDF - 05140Fig4.pdf
Adobe PDF - 05140Fig4.pdf
Adobe PDF - 05140Fig5.pdf
Adobe PDF - 05140Fig5.pdf
Adobe PDF - 05140Fig6.pdf
Adobe PDF - 05140Fig6.pdf
Adobe PDF - 05140Fig7.pdf
Adobe PDF - 05140Fig7.pdf
Adobe PDF - 05140Fig8.pdf
Adobe PDF - 05140Fig8.pdf

References

1
M. Kellis, N. Patterson, M. Endrizzi, B. Birren, E. S. Lander Nature 423, 241–254 (2003).
2
P. Cliften, P. Sudarsanam, A. Desikan, L. Fulton, B. Fulton, J. Majors, R. Waterston, B. A. Cohen, M. Johnston Science 301, 71–76 (2003).
3
A. Siepel, G. Bejerano, J. S. Pedersen, A. S. Hinrichs, M. Hou, K. Rosenbloom, H. Clawson, J. Spieth, L. W. Hillier, S. Richards, et al. Genome Res. 15, 1034–1050 (2005).
4
C. T. Harbison, D. B. Gordon, T. I. Lee, N. J. Rinaldi, K. D. Macisaac, T. W. Danford, N. M. Hannett, J. B. Tagne, D. B. Reynolds, J. Yoo, et al. Nature 431, 99–104 (2004).
5
P. A. Gray, H. Fu, P. Luo, Q. Zhao, J. Yu, A. Ferrari, T. Tenzen, D. I. Yuk, E. F. Tsung, Z. Cai, et al. Science 306, 2255–2257 (2004).
6
G. Chua, M. D. Robinson, Q. Morris, T. R. Hughes Curr. Opin. Microbiol. 7, 638–646 (2004).
7
F. P. Roth, J. D. Hughes, P. W. Estep, G. M. Church Nat. Biotechnol. 16, 939–945 (1998).
8
S. Tavazoie, J. D. Hughes, M. J. Campbell, R. J. Cho, G. M. Church Nat. Genet. 22, 281–285 (1999).
9
S. Mukherjee, M. F. Berger, G. Jona, X. S. Wang, D. Muzzey, M. Snyder, R. A. Young, M. L. Bulyk Nat. Genet. 36, 1331–1339 (2004).
10
X. Liu, D. M. Noll, J. D. Lieb, N. D. Clarke Genome Res. 15, 421–427 (2005).
11
B. Ren, F. Robert, J. J. Wyrick, O. Aparicio, E. G. Jennings, I. Simon, J. Zeitlinger, J. Schreiber, N. Hannett, E. Kanin, et al. Science 290, 2306–2309 (2000).
12
V. R. Iyer, C. E. Horak, C. S. Scafe, D. Botstein, M. Snyder, P. O. Brown Nature 409, 533–538 (2001).
13
F. Gao, B. C. Foat, H. J. Bussemaker BMC Bioinformatics 5, 31 (2004).
14
F. Devaux, P. Marc, C. Bouchoux, T. Delaveau, I. Hikkel, M. C. Potier, C. Jacq EMBO Rep. 2, 493–498 (2001).
15
J. L. DeRisi, V. R. Iyer, P. O. Brown Science 278, 680–686 (1997).
16
H. D. Madhani, T. Galitski, E. S. Lander, G. R. Fink Proc. Natl. Acad. Sci. USA 96, 12530–12535 (1999).
17
C. J. Roberts, B. Nelson, M. J. Marton, R. Stoughton, M. R. Meyer, H. A. Bennett, Y. D. He, H. Dai, W. L. Walker, T. R. Hughes, et al. Science 287, 873–880 (2000).
18
R. Sopko, D. Huang, N. Preston, G. Chua, B. Papp, K. Kafadar, M. Snyder, S. G. Oliver, M. Cyert, T. R. Hughes, et al. Mol. Cell 21, 319–330 (2006).
19
M. C. Lorenz, J. Heitman Genetics 150, 1443–1457 (1998).
20
K. Natarajan, M. R. Meyer, B. M. Jackson, D. Slade, C. Roberts, A. G. Hinnebusch, M. J. Marton Mol. Cell. Biol. 21, 4347–4368 (2001).
21
A. G. Hinnebusch Annu. Rev. Microbiol. 59, 407–450 (2005).
22
J. Grigull, S. Mnaimneh, J. Pootoolal, M. D. Robinson, T. R. Hughes Mol. Cell. Biol. 24, 5534–5547 (2004).
23
A. Vik, J. Rine Mol. Cell. Biol. 21, 6395–6405 (2001).
24
V. Matys, E. Fricke, R. Geffers, E. Gossling, M. Haubrock, R. Hehl, K. Hornischer, D. Karas, A. E. Kel, O. V. Kel-Margoulis, et al. Nucleic Acids Res. 31, 374–378 (2003).
25
X. Liu, D. L. Brutlag, J. S. Liu Pac. Symp. Biocomput. 6, 127–138 (2001).
26
R. L. Roberts, G. R. Fink Genes Dev. 8, 2974–2985 (1994).
27
M. C. Lorenz, N. S. Cutler, J. Heitman Mol. Biol. Cell 11, 183–199 (2000).
28
J. de Montigny, L. Kern, J. C. Hubert, F. Lacroute Curr. Genet 17, 105–111 (1990).
29
C. Mao, R. Xu, A. Bielawska, L. M. Obeid J. Biol. Chem. 275, 6876–6884 (2000).
30
D. R. Kirsch, R. R. Whitney Infect. Immun. 59, 3297–3300 (1991).
31
A. Varma, J. C. Edman, K. J. Kwon-Chung Infect. Immun. 60, 1101–1108 (1992).
32
A. L. Goldstein, J. H. McCusker Genetics 159, 499–513 (2001).
33
T. Prasad, P. Saini, N. A. Gaur, R. A. Vishwakarma, L. A. Khan, Q. M. Haq, R. Prasad Antimicrob. Agents Chemother. 49, 3442–3452 (2005).
34
N. Webster, J. R. Jin, S. Green, M. Hollis, P. Chambon Cell 52, 169–178 (1988).
35
T. R. Hughes, M. Mao, A. R. Jones, J. Burchard, M. J. Marton, K. W. Shannon, S. M. Lefkowitz, M. Ziman, J. M. Schelter, M. R. Meyer, et al. Nat. Biotechnol. 19, 342–347 (2001).
36
M. K. Duncan, L. Xie, L. L. David, M. L. Robinson, J. R. Taube, W. Cui, L. W. Reneker Invest. Ophthalmol. Visual Sci. 45, 3589–3598 (2004).
37
D. W. Seufert, N. L. Prescott, H. M. El-Hodiri Dev. Dyn. 232, 313–324 (2005).
38
K. Hochedlinger, Y. Yamada, C. Beard, R. Jaenisch Cell 121, 465–477 (2005).
39
N. Van Driessche, J. Demsar, E. O. Booth, P. Hill, P. Juvan, B. Zupan, A. Kuspa, G. Shaulsky Nat. Genet. 37, 471–477 (2005).
40
S. Mnaimneh, A. P. Davierwala, J. Haynes, J. Moffat, W. T. Peng, W. Zhang, X. Yang, J. Pootoolal, G. Chua, A. Lopez, et al. Cell 118, 31–44 (2004).

Information & Authors

Information

Published in

Go to Proceedings of the National Academy of Sciences
Proceedings of the National Academy of Sciences
Vol. 103 | No. 32
August 8, 2006
PubMed: 16880382

Classifications

Submission history

Received: March 14, 2006
Published online: August 8, 2006
Published in issue: August 8, 2006

Keywords

  1. microarray
  2. overexpression
  3. yeast

Acknowledgments

This work was supported by grants from the Natural Sciences and Engineering Research Council (to T.R.H., C.B., and B.J.F.) and funds from Genome Canada through the Ontario Genomics Institute (to T.R.H., C.B., and B.J.A.). G.C. was supported by a Charles H. Best Postdoctoral Fellowship. Q.D.M. was supported by a Natural Sciences and Engineering Research Council postdoctoral fellowship. R.S. was supported by a National Cancer Institute of Canada Terry Fox Foundation research studentship.

Authors

Affiliations

Gordon Chua
Banting and Best Department of Medical Research, and Departments of
Quaid D. Morris
Banting and Best Department of Medical Research, and Departments of
Computer Science,
Electrical and Computer Engineering, and
Richelle Sopko
Medical Genetics and Microbiology, University of Toronto, 160 College Street, Toronto, ON, Canada M5S 1A8
Mark D. Robinson
Banting and Best Department of Medical Research, and Departments of
Electrical and Computer Engineering, and
Owen Ryan
Medical Genetics and Microbiology, University of Toronto, 160 College Street, Toronto, ON, Canada M5S 1A8
Esther T. Chan
Medical Genetics and Microbiology, University of Toronto, 160 College Street, Toronto, ON, Canada M5S 1A8
Brendan J. Frey
Banting and Best Department of Medical Research, and Departments of
Electrical and Computer Engineering, and
Brenda J. Andrews
Banting and Best Department of Medical Research, and Departments of
Medical Genetics and Microbiology, University of Toronto, 160 College Street, Toronto, ON, Canada M5S 1A8
Charles Boone
Banting and Best Department of Medical Research, and Departments of
Medical Genetics and Microbiology, University of Toronto, 160 College Street, Toronto, ON, Canada M5S 1A8
Timothy R. Hughes [email protected]
Banting and Best Department of Medical Research, and Departments of
Medical Genetics and Microbiology, University of Toronto, 160 College Street, Toronto, ON, Canada M5S 1A8

Notes

To whom correspondence should be addressed. E-mail: [email protected]
Communicated by Stanley Fields, University of Washington, Seattle, WA, June 19, 2006
Author contributions: G.C., R.S., B.J.A., C.B., and T.R.H. designed research; G.C., O.R., and T.R.H. performed research; G.C., Q.D.M., R.S., M.D.R., B.J.F., B.J.A., and T.R.H. contributed new reagents/analytic tools; G.C., Q.D.M., M.D.R., E.T.C., B.J.F., and T.R.H. analyzed data; and G.C., Q.D.M., M.D.R., and T.R.H. wrote the paper.

Competing Interests

Conflict of interest statement: No conflicts declared.

Metrics & Citations

Metrics

Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.


Citation statements




Altmetrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

    Loading...

    View Options

    View options

    PDF format

    Download this article as a PDF file

    DOWNLOAD PDF

    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to access the full text.

    Single Article Purchase

    Identifying transcription factor functions and targets by phenotypic activation
    Proceedings of the National Academy of Sciences
    • Vol. 103
    • No. 32
    • pp. 11815-12209

    Media

    Figures

    Tables

    Other

    Share

    Share

    Share article link

    Share on social media