Application of a priori established gene sets to discover biologically important differential expression in microarray data

  1. Andrea Bild*,† and
  2. Phillip George Febbo*,,,§
  1. *Duke Institute for Genome Sciences and Policy and Departments ofMedicine andMolecular Genetics and Microbiology, Duke University Medical Center, Duke University, Durham, NC 27710

From inception, microarray analysis has facilitated discovery by associating gene expression with biological and/or clinical sample characteristics. However, gleaning biological insight from the long lists of genes generated by microarray analysis remains a significant challenge. In this issue of PNAS, Subramanian et al. (1) describe and validate gene set enrichment analysis (GSEA), a computational method that helps rapidly connect gene expression with biology and promises to be a valuable addition to publicly available computational resources.

Early on, investigators adapted unsupervised computational methods such as hierarchical clustering (2) and self-organized maps (3) to arrange genes and samples in groups or clusters based solely on the similarity of their gene expression. These methods successfully revealed the orchestrated gene expression underlying basic cellular processes such as yeast replication (4), fibroblast cell proliferation (5), and hematopoietic differentiation (3), and they continue to be used widely today. Unsupervised methods are unbiased and remain important tools for class discovery.

Alternatively, supervised methods of analysis use sample classifiers along with gene expression to rapidly identify hypothesis-driven correlations (i.e., tumor v. normal, pathological grade, recurrent disease, histological category, etc.). A few examples of supervised methods of analysis include significance analysis of microarray (SAM) (6), class prediction (7), support vector machines (8), and probit regression analysis (9, 10). In the field of oncology, supervised methods of gene expression have successfully identified novel marker genes for diagnosis (11), prognosis (12), and therapeutic response (13). Supervised methods can help overcome obfuscating technical or biological variation in gene expression and continue to identify important associations between sample phenotypes and gene expression.

GSEA represents an innovative method of supervised analysis. This analysis is performed by (i) ranking all genes in the data set based on their correlation to the chosen phenotype, (ii) identifying the rank positions of all members of the gene …

« Previous | Next Article »Table of Contents