New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
ChIPonchip significance analysis reveals largescale binding and regulation by human transcription factor oncogenes

Edited by Barry H. Honig, Columbia University, New York, NY, and approved November 1, 2008

↵^{2}A.C., A.A.F., and G.S. contributed equally to this work. (received for review July 9, 2008)
Abstract
ChIPonchip has emerged as a powerful tool to dissect the complex network of regulatory interactions between transcription factors and their targets. However, most ChIPonchip analysis methods use conservative approaches aimed at minimizing falsepositive transcription factor targets. We present a model with improved sensitivity in detecting binding events from ChIPonchip data. Its application to human T cells, followed by extensive biochemical validation, reveals that 3 oncogenic transcription factors, NOTCH1, MYC, and HES1, bind to several thousand target gene promoters, up to an order of magnitude increase over conventional analysis methods. Gene expression profiling upon NOTCH1 inhibition shows broadscale functional regulation across the entire range of predicted target genes, establishing a closer link between occupancy and regulation. Finally, the increased sensitivity reveals a combinatorial regulatory program in which MYC cobinds to virtually all NOTCH1bound promoters. Overall, these results suggest an unappreciated complexity of transcriptional regulatory networks and highlight the fundamental importance of genomescale analysis to represent transcriptional programs.
The dysregulated activity of oncogenic transcription factors (TFs) contributes to neoplastic transformation by promoting aberrant expression of target genes involved in regulating cell homeostasis. Therefore, characterization of the regulatory networks controlled by these TFs is a critical objective in understanding the molecular mechanisms of cell transformation. ChIPonchip (ChIP^{2}) (1) has emerged as a promising technology in the dissection of transcriptional networks by providing highresolution maps of genomewide TF–chromatin interactions.
ChIP^{2} uses microarray technology to measure the relative abundance of genomic fragments derived from an immunoprecipitate (IP) sample, which is enriched in fragments bound by an immunoprecipitated protein (usually a TF), and a wholecell extract (WCE) sample, containing fragments derived from a total chromatin preparation (input control) or an immunoprecipitation with a nonspecific control antibody (2). The 2 samples may either be hybridized to different arrays or labeled with different dyes and hybridized to the same array. Correct interpretation of ChIP^{2} data depends critically on an accurate statistical model to compute the probability that a given IP/WCE ratio is produced by a binding event rather than experimental noise.
Recently, several elegant ChIP^{2} analysis methods have been proposed to tackle problems such as integrating measurements from adjacent probes (3–6) or inferring binding site locations at subprobe resolution (7). However, the lowerlevel problem of developing an accurate error model to define meaningful statistical thresholds has received comparably little attention [see SI and Fig. 1]. Thus, ChIP^{2} data analysis methods often use highly conservative approaches aimed at minimizing the rate of falsepositive predictions. Although several studies have experimentally validated novel target collections produced at a given statistical threshold (8–12), these studies likely miss a large number of true binding events, obscuring the full complexity of transcriptional processes.
Using an empirically determined model of the distribution of intensity ratios for nonIPenriched probes in ChIP^{2} experiments, we developed an analytical method called ChIP^{2} Significance Analysis (CSA). When applied to ChIP^{2} data from the NOTCH1, MYC, and HES1 protooncogenes in human T cell acute lymphoblastic leukemia (TALL) cells, CSA increased the number of detected binding sites by up to an order of magnitude compared with other routinely used methods. Both binding site analysis and biochemical validation demonstrate quantitative agreement with CSApredicted falsepositive rates. Analysis of gene expression signatures indicates functional regulation by NOTCH1 across the entire range of predicted targets. Finally, the increased sensitivity reveals that virtually all NOTCH1bound promoters are also bound by MYC. Overall, these results highlight the power of the proposed analysis framework for the identification of transcriptional networks and provide an improved and fundamentally different picture of the transcriptional programs controlled by NOTCH1, HES1, and MYC in TALL.
Results
Probe Statistics Are Accurately Modeled by CSA.
TALL is a malignant tumor characterized by the aberrant activation of oncogenic TFs (13). We recently demonstrated that constitutive activation of NOTCH1 signaling due to mutations in the NOTCH1 gene activates a transcriptional network that controls leukemic cell growth (11, 14–16). These studies also demonstrated a fundamental role for HES1 and MYC as transcriptional mediators of NOTCH1 signals (15, 17). To characterize the structure of the oncogenic transcriptional network driven by activated NOTCH1 in T cell transformation, we sought to identify the direct transcriptional targets of NOTCH1, HES1, and MYC. We hypothesized that the development of an accurate statistical model would result in improved sensitivity in the identification of TF targets and a more accurate description of the individual and combinatorial regulatory programs controlled by these TFs.
We first generated an empirical model of the distribution of IP/WCE intensity ratios for probes associated with unbound fragments (see Materials and Methods), and we used it to assign a P value to each probe in the analysis of ChIP^{2} assays representing replicate experiments for NOTCH1, MYC, and HES1. ChIP^{2} assays for these TFs were performed in HPBALL cells, a wellcharacterized TALL cell line with high expression levels of activated NOTCH1, MYC, and HES1. For NOTCH1, ChIP^{2} assays were also performed in CUTLL1 cells, another NOTCH1dependent TALL cell line. The magnitude versus amplitude plots (Fig. 2A) of the intensitydependent distributions of proberatio values showed marked differences for the four experiments. In each case CSA accurately modeled the left tail of the probe ratio probability distribution, where the contribution from bound probes is expected to be minimal (Fig. 2 A and B). We note that if boundprobe ratios are well separated from the experimental noise, the P value distribution for all probes should be uniform between zero and one (unbound probes) with a single peak near zero (bound probes). Importantly, CSA accurately captured these statistical properties (see SI).
Improved ChIP^{2} Sensitivity by CSA.
CSA then incorporates the probe significance model with an analytical method that integrates the statistics for replicate experiments and probes with nearby genomic locations (to account for ChIP^{2} fragmentation lengths, see Materials and Methods). We used CSA to compute the false discovery rate (FDR) associated with the most significant 500bp region on each of the 16,697 promoters represented on the array. Analysis of NOTCH1, MYC, and HES1 promoter occupancy in TALL showed a larger than anticipated number of candidate target genes for these TFs. Specifically, using CSA at a conservative FDR of 0.05, the number of promoters on the array bound by the TFs in this study are: MYC (8,016; 48.0%), NOTCH1 in CUTLL1 (3,154; 18.9%), HES1 (3,074; 18.4%), and NOTCH1 in HPBALL (2,471; 14.8%) (Table 1).
Although the numbers reported above are far larger than the number of predicted targets commonly reported in ChIP^{2} analysis studies, we also compared against predictions from several published analysis methods with available software. One class of methods relies heavily on analyzing the shape of ratio values from multiple probes with nearby genomic proximity (3–7). These methods are generally not applicable to the relatively sparse arrays used in this study and produced very few predicted targets. As an initial benchmark, we compared against the Single Array Error Model (SAEM) (1, 18), the standard method packaged with the Agilent analysis software. This method models the intensitydependent variance of probe ratio values, but then computes significance based on wholedataset statistics. CSA predicted approximately an order of magnitude more bound promoters than SAEM (Table 1). Two published methods, ChIPOTle (19) and Chipper (8), compute significance using only probes with low ratio values, and these methods indeed predicted more targets than SAEM. However, both methods use normalization techniques that, to varying extents, rely on wholedataset statistics, and as a result CSA predicted more targets than both. This was most apparent for MYC, which contained the largest number of high ratio probes. For the other TFs, CSA predicted roughly the same number of targets as ChIPOTle and twice as many targets as Chipper (Table 1).
Accuracy of CSA Predictions Is Supported by Binding Site Enrichment Analysis.
As a first test of the broad TF binding predictions generated by CSA, we evaluated the enrichment of MYC binding sites, using the TRANSFAC (20) positionspecific scoring matrix M00322, in the promoters of target genes identified by CSA and other analysis methods. The DNAbinding component of NOTCH1 transcriptional complexes, CSL, is not represented in TRANSFAC or JASPAR (21), and the only HES1associated matrix was found to be of low quality and a poor predictor of HES1 binding, independent of the algorithm used. For each analysis method, promoters were ranked by their P values computed from the MYC ChIP^{2} experiment, and MYC/M00322 matching sites were identified in the 600bp fixedlength window centered on the most significant probe in the highestscoring promoter region. The match threshold was set so that a negative set, S^{(−)}, of 3,000 fragments showing the least amount of MYC binding would produce a falsepositive rate of 30%. Details on the procedure are given in the SI.
Analysis of the cumulative proportion MYC/M00322matching fragments as a function of their ChIP^{2} ranking by the corresponding method showed that fragments inferred by all methods were enriched in MYC/M00322 sites and that site enrichment was correlated with the ChIP^{2} ranking (Fig. 3). However, fragments identified by CSA were more likely to contain MYC binding sites than those identified by the other methods. The highestscoring ≈2,000 sites identified by Chipper performed better than those identified by CSA; however, CSAidentified fragments beyond this ranking were significantly more enriched in MYC binding sites. Comparing nonoverlapping fragments in the top 5,000 promoters inferred by CSA versus each of the other 3 methods demonstrated a statistically significant enrichment of MYC binding sites in CSAinferred fragments (P < 10^{−10}, based on the hypergeometric distribution, for each comparison).
To compare the predicted falsepositive rate by CSA with the significance of MYC binding site enrichment, we binned the fragments based on their CSA rankings (100/bin) and assessed whether the MYC/M00322 motif could be successfully used to distinguish the fragments in each bin from those in the negative set, S^{(−)}. The classification P value based on binding site enrichment was in excellent agreement with the CSAinferred falsepositive rate of each bin, suggesting significant enrichment of MYC sites in the promoters of ≈7,000 genes, corresponding to the range of highconfidence targets predicted by CSA (Fig. 3 Inset). Beyond this threshold, both quantities degraded very rapidly, and for ranks greater than ≈8,000 the CSAinferred falsepositive rate reached 100% and, correspondingly, fragments showed no ability to be classified by MYC binding sites.
Overall, these results suggest that CSA has increased sensitivity in identifying a larger number of binding events and that meaningful statistical cutoffs can be determined from data.
Experimental Validation of CSA TF Binding Predictions.
To further test the accuracy of CSAbased TF target predictions, we performed independent chromatin immunoprecipitation (ChIP) experiments for each of the 4 ChIP^{2} conditions and tested the IP enrichment of specific promoters by quantitative PCR (qPCR). We first analyzed 8 predicted NOTCH1 targets in HPBALL cells, randomly sampled at an FDR ≤20%. Seven of these 8 predicted fragments were validated as bound by NOTCH1, and only the least significant fragment failed validation (Table 2).
We tested an additional 12 targets for HES1 and MYC in HPBALL and for NOTCH1 in CUTLL1, sampling predicted targets uniformly at an FDR of 20% (i.e., 20% expected falsepositives) (Table 2). In this analysis, 26 of 36 targets (72.2%) were positive by ChIP/qPCR, and 9 (25%) were negative. The remaining gene (the second least significant for MYC) could not be amplified by qPCR. Nonvalidated/falsepositive targets were, in general, at the end of the ranked lists (Table 2). The only outlier was the firstranked fragment for HES1 (KIAA1407 gene promoter). To obtain experimental evidence on the robustness of our validation assay, we randomly selected 10 genomic regions not identified as bound by MYC and 10 not identified as bound by HES1. Nine selected regions were within promoters, and 11 were in intergenic regions. As expected, none of these 20 regions showed evidence of binding by MYC or HES1 when tested by ChIP/qPCR.
For all experiments, numerous validated genes had CSA ranks in the thousands. The lowestranking validated genes before encountering a falsepositive were as follows: 2,223 for NOTCH1 in CUTLL1; 2,958 for NOTCH1 in HPBALL; 4,901 for MYC; and although the topranking gene for HES1 failed validation, the following 7, down to rank 3,247, were positive. Notably, many of the validated targets showed subtle ChIP^{2} signals. For example, C6orf82, a validated HES1 target, had ChIP^{2} binding ratios in replicate experiments of 1.37 and 1.68 for the most significant probe in its promoter, and there was no enrichment (ratios of 0.81 and 1.15) for its adjacent probe. However, upon ChIP/qPCR validation, this region showed binding ratios of 2.69 and 4.65. ChIP/qPCR results are available in the SI.
Overall, 33 of the 44 genes (75%) selected from those with an FDR of 20% by CSA were validated by ChIP/qPCR. These biochemical validation results support our computationally derived conclusions regarding the broad range of binding for all tested TFs and demonstrate the power of CSA for reducing the falsenegative rate in ChIP^{2} experiments.
NOTCH1 Regulates Direct Target Genes Predicted by CSA.
To test whether CSApredicted NOTCH1bound genes are also functionally regulated by this TF, we treated a panel of 10 TALL cell lines with Compound E, a γsecretase inhibitor that blocks an essential proteolytic cleavage step required for release of the intracellular domains of NOTCH1 from the membrane and their translocation to the nucleus (22). Genomewide expression profiles of cells treated for 72 h with Compound E (100 nM) or vehicle only (DMSO) were measured using microarrays, and expression changes were compared with NOTCH1 promoter occupancy identified by CSA analysis of the ChIP^{2} data. Overall, 11,606 genes were represented on both the ChIP^{2} and the expression arrays. For each gene we computed: (i) the ChIP^{2} FDR based on the highestscoring 500bp region in its promoter; (ii) the log_{2} expression ratio of the control versus treatment, averaged over the 10 cell lines and duplicate experiments; and (iii) the number of microarray experiments in which the gene was expressed (not called absent by MAS5), considering both Compound Etreated and DMSOtreated samples (because the group of expressed genes is essentially the same for both treatments, considering expressed genes using only one subgroup does not substantially change the results).
Predicted NOTCH1bound genes were more likely to be expressed than genes not identified as bound by NOTCH1 and showed clear downregulation upon NOTCH1 inhibition (Fig. 4). The 2,000 most confident NOTCH1 targets (FDR < 0.058) were expressed in 83.3% of experiments, whereas the 6,000 least confident NOTCH1 targets were expressed in 38.8% of experiments (P < 10^{−100}). The top 2,000 targets also showed coordinated downregulation upon NOTCH1 inhibition that was subtle in magnitude (mean = 12.2%) but extremely significant (P < 10^{−100}). The ChIP^{2} analysis predicted a rapid increase in falsepositives beyond the top 2,000 targets and, correspondingly, their likelihoods to be expressed and regulated by NOTCH1 decreased. However, even for genes with ChIP^{2} ranks between 4,000 and 5,000, there was significant enrichment for both the percent of expressed genes (59.4%; P < 10^{−33}) and the expression change upon NOTCH1 inhibition (P < 10^{−15}). These results demonstrate that, in contrast with previous analysis based on a limited number of targets (17), NOTCH1 directly contributes to the transcriptional activity of thousands of genes.
Interaction of NOTCH1 and MYC Regulatory Networks.
NOTCH1 and MYC operate as highly interrelated regulators of cell growth, proliferation, and survival during T cell development and transformation. In a recent study (11), we compared the regulatory networks controlled by NOTCH1 and MYC by using the ARACNE reverse engineering algorithm (23, 24) to predict 58 and 61 targets of NOTCH1 and MYC, respectively, and observed a significant overlap of 12 genes between the 2 lists (P < 10^{−51}). We went on to characterize a feedforward loop in which NOTCH1 directly regulates MYC, and both TFs regulate a common set of targets promoting leukemic cell growth. Based on these findings, we sought to further investigate the relationship between the genes bound by MYC and by NOTCH1 using the much larger list of targets inferred by CSA. Strikingly, the analysis predicted that MYC bound to 1,668 of the 1,804 (92.5%; P < 10^{−11}) genes that were bound by NOTCH1, using a ChIP^{2} FDR threshold of 0.01. In agreement with the fundamental role of NOTCH1 in controlling leukemia cell growth (11), the NOTCH1bound genes were highly enriched in gene ontology (GO) (25) categories related to cellular growth and metabolism, such as cellular metabolism (P < 10^{−41}), RNA metabolism (P < 10^{−24}), and protein biosynthesis (P < 10^{−9}). The complete output of the GO enrichment analysis is given in the SI.
Discussion
We have shown that the choice of a realistic statistical model can dramatically affect the result of ChIP^{2} data analysis and its biological interpretation and proposed the CSA algorithm to assign meaningful statistical significance scores used to predict a more complete range of TF–target interactions. The method of assessing probe statistical significance relies on minimal assumptions: that the null distribution is symmetric and that bound fragments do not significantly affect the left tail of the null hypothesis statistics. As a result, it should generalize well to ChIP^{2} experiments performed using other platforms and cellular conditions. We used an independence model for replicate experiments and adjacent probes in the null hypothesis. Although this assumption is valid for relatively sparse arrays, denser arrays may introduce correlation for unbound nearby probes that are within the DNA fragmentation length, leading to unrealistically low FDR values if independence is assumed. We therefore recommend caution that the independence assumption applies when analyzing denser arrays, in which case the CSA method may be further improved by incorporating existing, more sophisticated models for the integration of data from nearby probes (3–7). However, for the arrays used in this study, which contain an average probe spacing of more than 200 nucleotides, we show that our results are in quantitatively good agreement with biochemical validation assays and that no correction seems to be required.
The analysis of ChIP^{2} data from 3 oncogenic TFs reveals that CSA identifies far more bound gene promoters than standard analyses. Specifically, CSA predicts that each studied TF binds to several thousand target genes, with MYC binding to roughly half of the assayed promoters, providing additional insight into the extreme pluripotency of this protooncogene (26). These predictions might still be an underestimate, because only the proximal promoter regions (−0.8 kb to +0.2 kb, relative to transcription start site) are represented on the arrays used in this study.
CSA predictions were validated by 3 independent tests. ChIP/qPCR experiments are in excellent correspondence with CSAinferred FDRs, especially considering that ChIP/qPCR itself has a 10–20% falsenegative rate (9–12). Computational validation by sequence analysis further indicates that CSAinferred FDRs are in agreement with MYC binding site enrichments. Finally, gene expression analysis after NOTCH1 inhibition both provides further support for the CSA predictions and creates a stronger than expected association between bound and regulated genes. We found that NOTCH1 binds to a large number of promoters (>2,000) and that the set of corresponding genes is consistently, albeit weakly, regulated upon NOTCH1 inhibition. These results are highly consistent with a previous study performed in yeast (27) that also observed correspondence of ChIP^{2} results with both binding site enrichment and expression changes for a large number of genes.
GO enrichment analysis shows that NOTCH1 subtly regulates a large number of genes involved in the cellular growth machinery. These results add an additional layer of regulation to the effects of NOTCH1 signaling in promoting cell growth, with important implications for understanding the role of NOTCH1 signaling in development and transformation. Thus, in addition to the established role of NOTCH1 in promoting growth through its interaction with MYC (17) and the PI3KAKT (15) signaling pathway, NOTCH1 also has a direct effect in promoting cell growth. This irreversibly couples the developmental programs involved in stem cell homeostasis and lineage commitment activated upon NOTCH1 activation with the metabolic pathways needed for the expansion of stem cells and T cell progenitors.
Finally, the availability of a more complete repertoire of bound promoters allows us to truly assess the extent of a TF's regulatory program and the combinatorial overlap between independent programs. Our analysis shows that 92.5% of the promoters bound by NOTCH1 are also bound by MYC. Indeed, it appears that NOTCH1 coregulates a specific subset of the MYC regulatory program. Although this was previously hinted at by the similarity of the regulatory programs inferred for the 2 TFs by expression analysis (17), the true extent of this overlap can only be grasped after resolving a more complete map of NOTCH1 and MYC targets. While contributing to our understanding of transcriptional regulation at the genomescale, our findings suggest an even greater than expected complexity of transcriptional networks.
Materials and Methods
CSA Algorithm.
The CSA algorithm takes as input probe intensity measurements after background subtraction and correction for factors such as spatial position on the array and print tip variability. In this study we used the standard Agilent procedure to obtain backgroundcorrected probe intensity values, and we determined the statistical significance of binding regions by using the procedure described below.
SingleProbe Significance Analysis.
For each probe, the statistical significance of a measurement is inferred by computing the conditional probability of the magnitude (M) given the amplitude (A), P_{null}(MA), where M = log2(IP/WCE) and A = [log2(IP) + log2(WCE)]/2, under the null hypothesis (i.e., no enrichment in the IP compared with the WCE channel). Here, IP and WCE represent, respectively, the probe intensity measurements for the IP and WCE channels. The dependency of M on A is illustrated in Fig. 2A.
The method begins by estimating the joint probability distribution, P(M,A), using a bivariate Gaussian kernel density estimator (28). The kernel width of the estimator is calculated using the AMISE criterion (29). Conditioning on A yields the conditional distribution P(MA) = P(M,A)/P(A), where P(A) is calculated using a univariate Gaussian kernel. For a particular average intensity value, A_{0}, the conditional mean of the null distribution is inferred as
The conditional null distribution given A = A_{0} is inferred by projecting P(MA = A_{0}) across μ^_{MA0} for M < μ^_{MA0}. This procedure is used to calculate P_{null}(MA) for an evenly spaced grid of A and M values, excluding the 1% of probes with the lowest A values (which are assigned a P value of 1). In this work we used step sizes of 0.05 and 0.01 for the A and M values, respectively. The complete conditional null distribution, P_{null}(MA), is computed using 2dimensional linear interpolation. For each probe, statistical significance is assessed using a 1tailed test with reference to this distribution. Because the distribution is empirical, there is a limit to the inferable minimum P value, which depends on the number of arrayed probes. For the arrays used in this study, we set the minimum P value to 10^{−5}, which is roughly 1 divided by the number of probes on the array. We stress the importance of using an empirical distribution because we have observed that the empirical data generally display significantly nonGaussian tails.
Combining Replicates.
We use Fisher's method (30) to compute an aggregate P value for each probe based on measurements from replicate experiments, under the null hypothesis that the probe is unbound in all replicates. Let p_{i}^{j} denote the P value computed for the i^{th} probe in the j^{th} replicate experiment. Assuming that replicates are independent in the null hypothesis, a test statistic for evaluating the probability of a joint observation of P values across experiments is the product of the individual P values, s̄_{i} = Π_{j=1}^{M}p_{i}^{j}, where s̄_{i} is the test statistic and M is the number of replicate experiments. If modeled correctly, P values under the null hypothesis should be uniformly distributed (See SI). It is useful to logtransform this equation such that we evaluate −log(s̄_{i}) = −Σ_{j=1}^{M}log(p_{i}^{j}). Because the logarithm of a uniform distribution is exponentially distributed with mean 1, this equation is a sum of exponentially distributed random variables, which is a gammadistributed random variable with mean 1 and M degrees of freedom. Thus, significance can be evaluated as Γ_{CDF}^{M}(−Σ_{j=1}^{M}log(p_{i}^{j})), where Γ_{CDF}^{M} is the gamma cumulative distribution function with mean 1 and M degrees of freedom.
Combining Regions.
Because of sonication, the signal derived from a binding event may be detected by multiple probes in close genomic proximity to the binding site. To compute a combined statistic representing the probability of a binding event within the region spanned by multiple probes, we adapt a commonly used strategy (19) of using a fixedsize sliding window and integrating the values of probes falling within this window. Based on published measurements of fragmentation lengths (7), in this work we used a 500bp window and a step size of 150 bp. Assuming that measurements from adjacent probes are independent in the null hypothesis, Fisher's method can again be applied to integrate the values from nearby probes. That is, let W represent the set of probes falling within a given 500bp window. The integrated probability for this region is then calculated as To compute the probability that any region within a gene's promoter is bound, we consider the most significant window, controlling for multiple tests using Bonferroni correction based on the number of probes in the promoter. This correction is not exact, because the number of tests (i.e., the number of windows containing unique subsets of probes) is likely greater than the number of probes in a promoter, causing underestimation of the significance, and the tests are not independent (i.e., the same probe may fall within multiple windows), causing overestimation of the significance. However, because the number of probes in each promoter (and therefore the number of probes within each window) is relatively small, especially for the arrays used in this study, we expect this simplification to have little impact on the calculated statistics. For very dense arrays, a more sophisticated multipletest correction procedure, such as those described in (31), may yield more accurate results.
FDR Calculation.
After computing a corrected P value for each gene representing the probability that the most significant region on the gene's promoter is bound, we control for multiple tests across genes and compute a false discovery rate using the Benjamini–Hochberg procedure (32). Let p_{k} represent the corrected P value computed for gene k, let r_{k} represent the rank of gene k sorted by the ChIP^{2} P values, and let G represent the total number of genes on the array; then, the false discovery rate for gene k is computed as FDR_{k} = G*p_{k}/r_{k}.
Acknowledgments
A.A.M. was supported by an IBM PhD fellowship. This work was supported by National Cancer Institute Grants R01CA109755 (to A.C.) and R01CA120196 (to A.A.F.), National Institute of Allergy and Infectious Diseases Grant R01AI066116, the Alex Lemonade Stand Foundation (T.P.), The Cancer Research Institute, the WOLF Foundation, the National Centers for Biomedical Computing National Institutes of Health Roadmap Initiative (U54CA121852), and the Leukemia and Lymphoma Society (Grants 128708 and 623708). A.A.F. is a Leukemia and Lymphoma Scholar.
Footnotes
 ^{3}To whom correspondence may be addressed. Email: gustavo{at}us.ibm.com, califano{at}c2b2.columbia.edu, or af2196{at}columbia.edu

Author contributions: A.A.M., P.S., A.C., A.A.F., and G.S. designed research; A.A.M., T.P., and P.S. performed research; T.P. and A.A.F. contributed new reagents/analytic tools; A.A.M. and P.S. analyzed data; and A.A.M., T.P., P.S., A.C., A.A.F., and G.S. wrote the paper.

↵^{1}Present address: The Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The microarray data have been deposited in the Gene Expression Omnibus (GEO) Database, www.ncbi.nlm.nih.gov/geo (accession no. GSE12868). ChIP^{2} data is at http://wiki.c2b2.columbia.edu/califanolab/PNASAM2009/.

This article contains supporting information online at www.pnas.org/cgi/content/full/0806445106/DCSupplemental.

Freely available online through the PNAS open access option.
 © 2008 by The National Academy of Sciences of the USA
References
 ↵
 Ren B,
 et al.
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Lee TI,
 et al.
 ↵
 Polo JM,
 et al.
 ↵
 Palomero T,
 et al.
 ↵
 Odom DT,
 et al.
 ↵
 ↵
 ↵
 ↵
 Weng AP,
 et al.
 ↵
 Palomero T,
 et al.
 ↵
 ↵
 ↵
 Matys V,
 et al.
 ↵
 Sandelin A,
 et al.
 ↵
 Miele L
 ↵
 ↵
 ↵
 ↵
 ↵
 Tanay A
 ↵
 Beirlant J,
 Dudewicz E,
 Gyorfi L,
 van der Meulen E
 ↵
 Sheather SJ,
 Jones MC
 ↵
 ↵
 ↵
 Benjamini Y,
 Hochberg Y
Citation Manager Formats
Sign up for Article Alerts
Article Classifications
 Biological Sciences
 Genetics