• PNAS Sustainability Science
  • Sign-up for PNAS eTOC Alerts

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

  1. Jill P. Mesirova,k
  1. aBroad Institute of Massachusetts Institute of Technology and Harvard, 320 Charles Street, Cambridge, MA 02141;cDepartment of Systems Biology, Alpert 536, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02446; dInstitute for Genome Sciences and Policy, Center for Interdisciplinary Engineering, Medicine, and Applied Sciences, Duke University, 101 Science Drive, Durham, NC 27708; eDepartment of Medical Oncology, Dana–Farber Cancer Institute, 44 Binney Street, Boston, MA 02115; fDivision of Pulmonary and Critical Care Medicine, Massachusetts General Hospital, 55 Fruit Street, Boston, MA 02114; gFred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, C2-023, P.O. Box 19024, Seattle, WA 98109-1024; hDepartment of Neurology, Enders 260, Children's Hospital, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115; iDepartment of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142; and jWhitehead Institute for Biomedical Research, Massachusetts Institute of Technology, Cambridge, MA 02142
  1. Contributed by Eric S. Lander, August 2, 2005

Abstract

Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

Freely available online through the PNAS open access option.

Online Impact

    Related Article