Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles
- Aravind Subramaniana,b,
- Pablo Tamayoa,b,
- Vamsi K. Moothaa,c,
- Sayan Mukherjeed,
- Benjamin L. Eberta,e,
- Michael A. Gillettea,f,
- Amanda Paulovichg,
- Scott L. Pomeroyh,
- Todd R. Goluba,e,
- Eric S. Landera,c,i,j,k, and
- Jill P. Mesirova,k
- aBroad Institute of Massachusetts Institute of Technology and Harvard, 320 Charles Street, Cambridge, MA 02141;cDepartment of Systems Biology, Alpert 536, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02446; dInstitute for Genome Sciences and Policy, Center for Interdisciplinary Engineering, Medicine, and Applied Sciences, Duke University, 101 Science Drive, Durham, NC 27708; eDepartment of Medical Oncology, Dana–Farber Cancer Institute, 44 Binney Street, Boston, MA 02115; fDivision of Pulmonary and Critical Care Medicine, Massachusetts General Hospital, 55 Fruit Street, Boston, MA 02114; gFred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, C2-023, P.O. Box 19024, Seattle, WA 98109-1024; hDepartment of Neurology, Enders 260, Children's Hospital, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115; iDepartment of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142; and jWhitehead Institute for Biomedical Research, Massachusetts Institute of Technology, Cambridge, MA 02142
-
Contributed by Eric S. Lander, August 2, 2005
-
Fig. 1.
A GSEA overview illustrating the method. (A) An expression data set sorted by correlation with phenotype, the corresponding heat map, and the “gene tags,” i.e., location of genes from a set S within the sorted list. (B) Plot of the running sum for S in the data set, including the location of the maximum enrichment score (ES) and the leading-edge subset.
-
Fig. 2.
Original (4) enrichment score behavior. The distribution of three gene sets, from the C2 functional collection, in the list of genes in the male/female lymphoblastoid cell line example ranked by their correlation with gender: S1, a set of chromosome X inactivation genes; S2, a pathway describing vitamin c import into neurons; S3, related to chemokine receptors expressed by T helper cells. Shown are plots of the running sum for the three gene sets: S1 is significantly enriched in females as expected, S2 is randomly distributed and scores poorly, and S3 is not enriched at the top of the list but is nonrandom, so it scores well. Arrows show the location of the maximum enrichment score and the point where the correlation (signal-to-noise ratio) crosses zero. Table 1 compares the nominal P values for S1, S2, and S3 by using the original and new method. The new method reduces the significance of sets like S3.
-
Fig. 3.
Leading edge overlap for p53 study. This plot shows the ras, ngf, and igf1 gene sets correlated with P53– clustered by their leading-edge subsets indicated in dark blue. A common subgroup of genes, apparent as a dark vertical stripe, consists of MAP2K1, PIK3CA, ELK1, and RAF1 and represents a subsection of the MAPK pathway.
Footnotes
- Copyright © 2005, The National Academy of Sciences








