Previous Article |
Table of Contents
| Next Article
GENETICS
Analysis of gene expression in pathophysiological states: Balancing false discovery and false negative rates


*Joslin Diabetes Center,
Children's Hospital, and Harvard Medical School, Boston, MA 02215
Contributed by C. Ronald Kahn, November 28, 2005
| Abstract |
|---|
|
|
|---|
Nucleotide-microarray technology, which allows the simultaneous measurement of the expression of tens of thousands of genes, has become an important tool in the study of disease. In disorders such as malignancy, gene expression often undergoes broad changes of sizable magnitude, whereas in many common multifactorial diseases, such as diabetes, obesity, and atherosclerosis, the changes in gene expression are modest. In the latter circumstance, it is therefore challenging to distinguish the truly changing from nonchanging genes, especially because statistical significance must be considered in the context of multiple hypothesis testing. Here, we present a balanced probability analysis (BPA), which provides the biologist with an approach to interpret results in the context of the total number of genes truly differentially expressed and false discovery and false negative rates for the list of genes reaching any significance threshold. In situations where the changes are of modest magnitude, sole consideration of the false discovery rate can result in poor power to detect genes truly differentially expressed. Concomitant analysis of the rate of truly differentially expressed genes not identified, i.e., the false negative rate, allows balancing of the two error rates and a more thorough insight into the data. To this end, we have developed a unique, model-based procedure for the estimation of false negative rates, which allows application of BPA to real data in which changes are modest.
metabolic disease | microarray analysis | multiple hypothesis testing | statistics
Many of the existing statistical methods for microarray analysis have been fashioned by using data sets from diseases such as cancer, where the changes in gene expression are abundant, with many genes having a high magnitude of change that far exceeds the observed variability in expression (6). Under these conditions, existing algorithms are able to detect numerous genes that survive statistical correction for the number of hypotheses tested. However, for many physiological and metabolic conditions, the changes in gene expression are often moderate compared to the variability in expression leading to modest p values, such that existing statistical models often miss most of the real changes (4, 5).
We therefore sought to develop an analytical approach that provides both the biologist and the bioinformatician with a more thorough understanding of the statistical significance of any list of genes produced from conditions where the real changes are of modest magnitude. Through the creation of a unique, model-based procedure that allows estimation of the false negative rate (FNR) and by adapting existing statistical methods, we have developed a balanced probability analysis that should be useful in addressing this challenging situation.
| Results |
|---|
|
|
|---|
As a first step toward a BPA, we assessed the capability to determine the three parameters of interest, TTP, FDR, and FNR, under real-life conditions. We used the modeling approach outlined in Methods to create synthetic data sets whereby we could track the distribution of the true positives in the significance list and, thus, determine the actual FDR and FNR. Simultaneously, we applied a "real-world" analysis by assuming no knowledge of any of the parameters or distributions used to create the synthetic data set. This approach allowed us to explore the capacities of algorithms to estimate the values of interest. By varying the conditions used to create the synthetic data sets, we studied the impact on algorithm performance of three parameters: the magnitude of the fold changes, the percentage of genes affected, and the number of experimental replicates.
Estimation of the Total Number of True Positives. Only with perfect knowledge of the total number of true positives can the FDR be accurately determined (7). However, for real-life data, the total number of true positives cannot be directly observed but, rather, only estimated. By using an adaptation of the algorithm of Storey and Tibshirani (7) coupled with the modeling approach outlined above, we explored the ability to estimate the total number of true positives at varying percentages of genes actually altered in their expression. When a larger fraction of the genes in the sample was affected, the estimates tended to be fairly accurate and precise, even at a low number of replicates (Fig. 2A). However, when the fold change was modest (1.5-fold), with lower numbers of replicates (four to eight), the algorithm tended to underestimate the number of total changing genes. As fewer genes were affected, the estimates exhibited markedly increased variability (Fig. 2B) and sometimes resulted in negative estimates of the number of true positive genes. This behavior is not unexpected, and other algorithms for estimating the number of total truly changing genes also perform with increased relative variability when the effect and population affected becomes small (8).
|
|
|
We therefore created a unique procedure for estimating the FNR. Rather than depending on TTP, our procedure calculates the FNR directly from the data, by using resampling to estimate the null and alternate distributions for each gene. Comparison of these distributions allowed the estimation of the number of truly changing genes on any given "significance list," leading directly to calculation of the FNR. The procedure weakly depends on the estimated FDR, and requires one model-dependent step to optimize a single parameter,
. In simulation trials, we found that, for any given condition, a suitable
could be chosen to allow the resultant FNR estimate to closely approximate the actual FNR (Fig. 4 AD, compare red versus black curves). This result was true, even under conditions where TTP tends to be substantially misestimated (Fig. 4A). In general, the best fitting
took on low values in simulations containing strong effects or higher numbers of replicates (Fig. 4 B and D) and required higher values when the effect under study was small (Fig. 4 A and C).
Exemplary Analysis of Real Data. To demonstrate the insight allowed by BPA, we performed this analysis on two illustrative public data sets: one data set representing malignant disease (bone marrow from individuals with two types of acute lymphoblastic leukemia) and the other representing metabolic disease (skeletal muscle from morbidly obese versus normal-weight individuals). To visualize the differences in the magnitude of change between these two data sets, we created contour plots of the observed p values versus the absolute fold changes across all genes. As can be seen for malignancy (Fig. 5A), many genes have experienced large fold changes with p values shifted toward significance, whereas, in metabolic disease, only a few genes have fold changes >1.52 (Fig. 5B). As a further marker of the changes in gene expression experienced in these data sets, we calculated the median absolute fold changes of the top-1%-scoring genes for both models (Table 1). Note that the median fold change in malignant disease was almost as great as the largest fold change experienced in the metabolic disease.
|
|
|
Balancing False Discovery and False Negative Rates. Over the past few years, most microarray investigations have focused on only the FDR when trying to determine which genes in a nucleotide-array analysis have altered expression. This approach is parallel to solely considering an ordinary p value in the context of testing a single hypothesis. This approach is extremely valuable, because it allows the researcher an estimate of the chances that a gene on a significance list is there accidentally. In many contexts, this is a wise approach, although it neglects statistical power.
However, when the effect under study is modest, as in the case of metabolic disorders, or when the percentage of genes truly changing is small, as in perturbations involving only a single pathway, stringent attention solely to the FDR can mask the majority, and sometimes even all, of the truly changing genes. Under these conditions, discovery of the truly changing genes requires consideration of the FNR in addition to the FDR.
The notion of balancing false positive and false negative errors has long existed in the single-hypothesis-testing context (12, 13). Recently, these ideas have been extended to the multiple-hypothesis-testing scenario (10, 11, 14). For BPA, we used the simple, informative method of assigning separate penalties for false discoveries and false negatives (15). The total penalty is then calculated across all genes, and the point that minimizes the total penalty is taken as the best cutoff.
To illustrate this approach, we performed alternate analyses on the exemplary data sets emphasizing balanced (i) control of false positives, (ii) control of false negatives, or (iii) a more equal temperament, by setting the relative "penalties" for false positives and false negatives at 10:1, 1:10, or 1:1, respectively. The resultant curves (Fig. 5 E and F) show the total penalty as one chooses an increasingly larger portion of the genes. For each data set, the point that minimizes the error sum for each discovery curve is furthest left for the 10:1 weighting, furthest right for the 1:10 weighting, and in between for the equally weighted approach (1:1). The resultant expected number of false positives and false negatives for the optimal balance at each weighting condition are listed in Table 2.
| Discussion |
|---|
|
|
|---|
|
In this study, we provide an FNR-determining procedure that can be applied to real-world data. Our FNR approach is a microarray algorithm that uses resampling to define the alternate distribution for each gene, thus allowing estimation of statistical power. Further refinement and broader application of this technique to multiple hypothesis-testing can be envisioned. Although our procedure provided robust estimates of the FNR in challenging simulations that emulated real-world conditions, the limitation of our approach is that it requires the optimization of a single modeling-parameter
for each data set studied. Thus, the deduced FNR will depend on the suitability of the modeling assumptions, including the chosen data distribution and range of affected fold changes. For example, we used a Gaussian data distribution for our synthetic data sets, but expression data for a given gene may not assume this distribution. Although the algorithms used in our approach make no assumptions about normality or data distribution, it is possible that synthetic modeling choices will affect the determined FNR.
The small fold changes incurred in metabolic disease make discovery by other methodologies such as RT-PCR or Northern blotting likewise challenging (18). Thus, the traditional experimental paradigm of confirmation of expression changes found on arrays by these methods may not be an effective means of defining false positives and false negatives. Instead, a strong knowledge of the biology of the system (19) coupled with alternate means of testing the hypotheses generated by the data analysis may be the best approach for maximizing the value of the array analysis. Clearly, as nucleotide array analysis becomes even more widely applied to increasingly more subtle situations, such as defining the differences between two normal populations or the impact of mild environmental variables, further refinements in the algorithms to determine the TTP and FNR as well as complementary scientific approaches will be needed.
| Methods |
|---|
|
|
|---|
Model "control" data of any number of replicates can thus be created from the real data's means and standard deviations. Furthermore, replicates for model "experiment" data can likewise be created, altering a portion of the means by a given fold change. In our model simulations, we assumed that half of the changing genes were altered upward by the given fold change and half downward. The standard deviations for the modeled "experimental" data were obtained in parallel from related mice (knockout group) (20) under similar but nonidentical biological conditions.
Actual false discovery and false negative error rates were calculated for the synthetic control and experimental data as follows. All genes were sorted from most to least significant by using the standard t statistic, assuming unequal variance. For any significance list of top-scoring genes, the FDR was defined as the ratio of genes inappropriately called significant to the total number of genes called significant (22). Likewise, the FNR was defined as the number of truly differentially expressed genes missing from the significance list divided by the total number of differentially expressed genes (10, 11, 22).
Balanced Probability Analysis. We adapted established resampling algorithms for the estimation of TTP (7) and FDR (9). We also explored the customary method for computation of FNR: simply FNR = [TTP #SLx (1 FDR)]/TTP, where #SL is the total number of genes on the significance list.
A Unique Procedure for the Estimation of the False Negative Rate. Our approach is outlined here and detailed in the supporting information, which is published on the PNAS web site. As defined, FNR = "number of truly changing genes missing from the significance list/TTP." Because the TTP is invariant across all possible significance lists, the crux is to determine the number of truly changing genes present on any significance list. The contribution of any single gene to this number is simply the probability that the gene is truly changing times the statistical power to observe the gene on the significance list. Thus, from the former probability (estimated as described in supporting information) and the statistical power (estimated as outlined below) calculated for each gene on the significance list, the number of identified truly changing genes can be estimated. The relative growth of this estimate across increasingly larger significance lists then defines (1 FNR).
To estimate the statistical power to observe any given gene, we used resampling to approximate the distribution under the alternate hypothesis. Specifically, random samples were drawn from the experimental or control replicates, with replacement (a modified bootstrap approach). Class labels were maintained, such that only experimental replicates contributed to the experimental resamples, and only control replicates contributed to the control resamples. Thus, a series of such randomly drawn resamples will recapitulate the range and distribution of possible true effects predicted by the data. By comparing the alternate distribution thus derived to the null distribution determined by resampling by randomization of class labels, the statistical power can be estimated.
Computation. Statistical algorithms were hand-implemented in the C++ programming language, with control scripts created in PERL and BASH. Contour plots were created with the program GRI after density extraction and gridding by using hand-implement scripts. Error bars represent standard deviations (Figs. 2 and 4).
Exemplary Data Analyses. Public data exemplifying clonal and metabolic disease were obtained from the Entrez Gene Expression Omnibus website (www.ncbi.nlm.nih.gov/geo) and were used to explore BPA in the context of additional real-world data, including an example of metabolic disease (accession no. GDS268, skeletal muscle from morbidly obese individuals) and cancer (accession no. GDS760, acute lymphoblastic leukemia). Analysis used eight microarrays per experimental group, chosen randomly in the GDS760 case.
| Acknowledgements |
|---|
| Footnotes |
|---|
Abbreviations: BPA, balanced probability analysis; FDR, false discovery rate; FNR, false negative rate; TTP, number of total true positives.
Present address: Department of Pediatrics, Roy and Lucille Carver College of Medicine, University of Iowa, Iowa City, IA 52242-1109. E-mail: andrew-norris{at}uiowa.edu. ![]()
To whom correspondence should be addressed at: Joslin Diabetes Center, One Joslin Place, Boston, MA 02215. E-mail: c.ronald.kahn{at}joslin.harvard.edu.
© 2006 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
W. Rodenburg, A. G. Heidema, J. M. A. Boer, I. M. J. Bovee-Oudenhoven, E. J. M. Feskens, E. C. M. Mariman, and J. Keijer A framework to identify physiological responses in microarray-based gene expression studies: selection and interpretation of biologically relevant genes Physiol Genomics, March 10, 2008; 33(1): 78 - 90. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. C. Kaizer, C. L. Glaser, D. Chaussabel, J. Banchereau, V. Pascual, and P. C. White Gene Expression in Peripheral Blood Mononuclear Cells from Children with Diabetes J. Clin. Endocrinol. Metab., September 1, 2007; 92(9): 3705 - 3711. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. R. Foster, S.-J. Chen, Aiqing He, A. Truong, V. Bhaskaran, D. M. Nelson, D. M. Dambach, L. D. Lehman-Mckeeman, and B. D. Car Invited Review: A Retrospective Analysis of Toxicogenomics in the Safety Assessment of Drug Candidates Toxicol Pathol, August 1, 2007; 35(5): 621 - 635. [Abstract] [PDF] |
||||
![]() |
T. Barrett, D. B. Troup, S. E. Wilhite, P. Ledoux, D. Rudnev, C. Evangelista, I. F. Kim, A. Soboleva, M. Tomashevsky, and R. Edgar NCBI GEO: mining tens of millions of expression profiles--database and tools update Nucleic Acids Res., January 12, 2007; 35(suppl_1): D760 - D765. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Yoon, Y. Yang, J. Choi, and J. Seong Large scale data mining approach for gene-specific standardization of microarray gene expression data Bioinformatics, December 1, 2006; 22(23): 2898 - 2904. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||