New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology
Correction for hidden confounders in the genetic analysis of gene expression
Edited* by David Haussler, University of California, Santa Cruz, CA, and approved July 21, 2010 (received for review February 26, 2010)

Abstract
Understanding the genetic underpinnings of disease is important for screening, treatment, drug development, and basic biological insight. One way of getting at such an understanding is to find out which parts of our DNA, such as single-nucleotide polymorphisms, affect particular intermediary processes such as gene expression. Naively, such associations can be identified using a simple statistical test on all paired combinations of genetic variants and gene transcripts. However, a wide variety of confounders lie hidden in the data, leading to both spurious associations and missed associations if not properly addressed. We present a statistical model that jointly corrects for two particular kinds of hidden structure—population structure (e.g., race, family-relatedness), and microarray expression artifacts (e.g., batch effects), when these confounders are unknown. Applying our method to both real and synthetic, human and mouse data, we demonstrate the need for such a joint correction of confounders, and also the disadvantages of other possible approaches based on those in the current literature. In particular, we show that our class of models has maximum power to detect eQTL on synthetic data, and has the best performance on a bronze standard applied to real data. Lastly, our software and the associations we found with it are available at http://www.microsoft.com/science.
- differential expression
- genome wide association
- microarray
- population structure
- expression heterogeneity
Footnotes
- 1To whom correspondence may be addressed. E-mail: jennl{at}microsoft.com and heckerma@microsoft.com.
Author contributions: J.L., E.E.S., and D.H. designed research; J.L. performed research; J.L. and C.K. contributed new reagents/analytic tools; J.L. analyzed data; and J.L. and D.H. wrote the paper.
Conflict of interest statement: Eric E. Schadt is Chief Scientific Officer of Pacific Biosciences and owns stock in the company.
*This Direct Submission article had a prearranged editor.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1002425107/-/DCSupplemental.
Freely available online through the PNAS open access option.
Citation Manager Formats
Sign up for Article Alerts
Jump to section
You May Also be Interested in
More Articles of This Classification
Physical Sciences
Statistics
Biological Sciences
Related Content
- No related articles found.
Cited by...
- Identification of Slco1a6 as a candidate gene that broadly affects gene expression in mouse pancreatic islets
- svaseq: removing batch effects and other unwanted noise from sequencing data
- LIMIX: genetic analysis of multiple traits
- The Dissection of Expression Quantitative Trait Locus Hotspots
- Identification of the Bile Acid Transporter Slco1a6 as a Candidate Gene That Broadly Affects Gene Expression in Mouse Pancreatic Islets
- Mapping eQTL Networks with Mixed Graphical Markov Models
- A cross-platform analysis of 14,177 expression quantitative trait loci derived from lymphoblastoid cell lines