Skip to main content
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian
  • Log in
  • My Cart

Main menu

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Home
Home

Advanced Search

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses

New Research In

Physical Sciences

Featured Portals

  • Physics
  • Chemistry
  • Sustainability Science

Articles by Topic

  • Applied Mathematics
  • Applied Physical Sciences
  • Astronomy
  • Computer Sciences
  • Earth, Atmospheric, and Planetary Sciences
  • Engineering
  • Environmental Sciences
  • Mathematics
  • Statistics

Social Sciences

Featured Portals

  • Anthropology
  • Sustainability Science

Articles by Topic

  • Economic Sciences
  • Environmental Sciences
  • Political Sciences
  • Psychological and Cognitive Sciences
  • Social Sciences

Biological Sciences

Featured Portals

  • Sustainability Science

Articles by Topic

  • Agricultural Sciences
  • Anthropology
  • Applied Biological Sciences
  • Biochemistry
  • Biophysics and Computational Biology
  • Cell Biology
  • Developmental Biology
  • Ecology
  • Environmental Sciences
  • Evolution
  • Genetics
  • Immunology and Inflammation
  • Medical Sciences
  • Microbiology
  • Neuroscience
  • Pharmacology
  • Physiology
  • Plant Biology
  • Population Biology
  • Psychological and Cognitive Sciences
  • Sustainability Science
  • Systems Biology
Research Article

Searching for missing heritability: Designing rare variant association studies

Or Zuk, Stephen F. Schaffner, Kaitlin Samocha, Ron Do, Eliana Hechter, Sekar Kathiresan, Mark J. Daly, Benjamin M. Neale, Shamil R. Sunyaev, and Eric S. Lander
PNAS first published January 17, 2014; https://doi.org/10.1073/pnas.1322563111
Or Zuk
aBroad Institute of Harvard and MIT, Cambridge, MA 02142;
bToyota Technological Institute at Chicago, Chicago, IL 60637;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stephen F. Schaffner
aBroad Institute of Harvard and MIT, Cambridge, MA 02142;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kaitlin Samocha
aBroad Institute of Harvard and MIT, Cambridge, MA 02142;
cAnalytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114;
dProgram in Genetics and Genomics, Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02114;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ron Do
aBroad Institute of Harvard and MIT, Cambridge, MA 02142;
eCenter for Human Genetic Research, Massachusetts General Hospital, Boston, MA 02114;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eliana Hechter
aBroad Institute of Harvard and MIT, Cambridge, MA 02142;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sekar Kathiresan
aBroad Institute of Harvard and MIT, Cambridge, MA 02142;
eCenter for Human Genetic Research, Massachusetts General Hospital, Boston, MA 02114;
fCardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114;
gDepartment of Medicine, Harvard Medical School, Boston, MA 02115;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mark J. Daly
aBroad Institute of Harvard and MIT, Cambridge, MA 02142;
cAnalytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Benjamin M. Neale
aBroad Institute of Harvard and MIT, Cambridge, MA 02142;
cAnalytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shamil R. Sunyaev
aBroad Institute of Harvard and MIT, Cambridge, MA 02142;
hDivision of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 20115;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eric S. Lander
aBroad Institute of Harvard and MIT, Cambridge, MA 02142;
iDepartment of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139; and
jDepartment of Systems Biology, Harvard Medical School, Boston, MA 02115
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: lander@broadinstitute.org
  1. Contributed by Eric S. Lander, December 10, 2013 (sent for review September 23, 2013)

  • Article
  • Figures & SI
  • Info & Metrics
  • PDF
Loading

Significance

Discovering the genetic basis of common diseases, such as diabetes, heart disease, and schizophrenia, is a key goal in biomedicine. Genomic studies have revealed thousands of common genetic variants underlying disease, but these variants explain only a portion of the heritability. Rare variants are also likely to play an important role, but few examples are known thus far, and initial discovery efforts with small sample sizes have had only limited success. In this paper, we describe an analytical framework for the design of rare variant association studies of disease. It provides guidance with respect to sample size, as well as the roles of selection, disruptive and missense alleles, gene-specific allele frequency thresholds, isolated populations, gene sets, and coding vs. noncoding regions.

Abstract

Genetic studies have revealed thousands of loci predisposing to hundreds of human diseases and traits, revealing important biological pathways and defining novel therapeutic hypotheses. However, the genes discovered to date typically explain less than half of the apparent heritability. Because efforts have largely focused on common genetic variants, one hypothesis is that much of the missing heritability is due to rare genetic variants. Studies of common variants are typically referred to as genomewide association studies, whereas studies of rare variants are often simply called sequencing studies. Because they are actually closely related, we use the terms common variant association study (CVAS) and rare variant association study (RVAS). In this paper, we outline the similarities and differences between RVAS and CVAS and describe a conceptual framework for the design of RVAS. We apply the framework to address key questions about the sample sizes needed to detect association, the relative merits of testing disruptive alleles vs. missense alleles, frequency thresholds for filtering alleles, the value of predictors of the functional impact of missense alleles, the potential utility of isolated populations, the value of gene-set analysis, and the utility of de novo mutations. The optimal design depends critically on the selection coefficient against deleterious alleles and thus varies across genes. The analysis shows that common variant and rare variant studies require similarly large sample collections. In particular, a well-powered RVAS should involve discovery sets with at least 25,000 cases, together with a substantial replication set.

  • mapping disease genes
  • power analysis
  • statistical genetics

Footnotes

  • ↵1Present address: Department of Statistics, The Hebrew University of Jerusalem, Mt. Scopus, Jerusalem 91905, Israel.

  • ↵2To whom correspondence should be addressed. E-mail: lander{at}broadinstitute.org.
  • Author contributions: O.Z., E.H., M.J.D., B.M.N., S.R.S., and E.S.L. designed research; O.Z., S.F.S., K.S., R.D., E.H., B.M.N., S.R.S., and E.S.L. performed research; K.S., R.D., and S.K. contributed new reagents/analytic tools; O.Z., S.F.S., K.S., E.H., M.J.D., B.M.N., S.R.S., and E.S.L. analyzed data; and O.Z., M.J.D., B.M.N., S.R.S., and E.S.L. wrote the paper.

  • The authors declare no conflict of interest.

  • †We focus on risk to heterozygous carriers. Our calculations implicitly assume that the risk to individuals carrying two null alleles (λC*) is the same as the risk to heterozygous carriers (λC). Although such individuals may well have higher risk, they are much rarer than heterozygous carriers (because fC is small) and thus their impact on RVAS is typically negligible. [The effective relative risk is increased by fC(λC*/λC), which is ≪1 unless λC*/λC is huge; this case would essentially be a monogenic recessive trait.]

  • ‡We focus on the number n of cases needed when the frequency fC in the population is known perfectly based on an (infinitely) large population survey. We make this assumption in the belief that very large datasets will become available in the coming years and that shared population controls can be used across studies. In the meanwhile, one can estimate fD within a study based on the frequency in either unaffected or random individuals. If a study involves cases and unaffecteds in proportion r and 1 − r, one requires approximately n/(1 − r) cases and n/r controls to detect association, where n is the number of cases given in Eq. 1 (SI Appendix, Section 3.3). For a balanced design, this corresponds to 2n cases and 2n controls.

  • §g(λ)=[(λ+1)ln(1+λ)−λ], which is approximately λ2/2 for small λ (0 ≤ λ< 1) and cλln(λ) for larger λ (with c in the range 0.7–0.8 for 2<λ<100).

  • ¶The expected CAF for new neutral alleles born in a given generation is μC and thus the increase in the expected CAF over k generations cannot exceed kμC (and will be lower because many newborn alleles are lost). It follows that the collection of neutral alleles born since the onset of human population cannot increase the CAF of neutral alleles by more than 5% (= kμC/4μCNeq, where k =1,000 and Neq = 10,000). The actual increase is typically much smaller due to loss of newborn alleles.

  • ||The relevant sampling frame for disease studies involves randomly selecting a chromosome and inspecting it for alleles. Alleles are thus sampled according to their frequency, yielding a frequency-weighted frequency distribution.

  • **One can also perform an analysis considering only those missense alleles with frequency ≤1% that are predicted to be probably damaging by PolyPhen-2 (that is, RVAS strategy 3). One observes 89 such missense alleles in cases vs. 32 in controls, which corresponds to an apparent excess relative risk λM(1%),PolyPhen2 = 2.5. This result agrees closely with the expectation of λM(1%),PolyPhen2 = 2.3, given the inferred value of s and the frequency threshold of 1%. For this type of analysis (assuming γnull = 80% and γneutral = 20%), the optimal threshold turns out to be T*= 0.15%, which would yield a much higher apparent effect size of λM(T*),PolyPhen2 = 8.2.

  • ††The paper reports that the best nominal P value observed across the genome is 2.7 × 10−8 but does not address whether the value is significant. To correct for scanning the entire genome, one can apply extreme value theory for Orenstein-Uhlenbeck diffusions (SI Appendix, Section 11.1). The probability of such a P value arising by chance somewhere in the genome is ∼70%, and thus the observation is not statistically significant. Genomewide significance at the 5% level corresponds to P ∼ 2 × 10−9 (SI Appendix, Section 11.1).

  • This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1322563111/-/DCSupplemental.

Freely available online through the PNAS open access option.

Next
Back to top
Article Alerts
Email Article

Thank you for your interest in spreading the word on PNAS.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Searching for missing heritability: Designing rare variant association studies
(Your Name) has sent you a message from PNAS
(Your Name) thought you would like to see the PNAS web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Designing rare variant association studies
Or Zuk, Stephen F. Schaffner, Kaitlin Samocha, Ron Do, Eliana Hechter, Sekar Kathiresan, Mark J. Daly, Benjamin M. Neale, Shamil R. Sunyaev, Eric S. Lander
Proceedings of the National Academy of Sciences Jan 2014, 201322563; DOI: 10.1073/pnas.1322563111

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Request Permissions
Share
Designing rare variant association studies
Or Zuk, Stephen F. Schaffner, Kaitlin Samocha, Ron Do, Eliana Hechter, Sekar Kathiresan, Mark J. Daly, Benjamin M. Neale, Shamil R. Sunyaev, Eric S. Lander
Proceedings of the National Academy of Sciences Jan 2014, 201322563; DOI: 10.1073/pnas.1322563111
Digg logo Reddit logo Twitter logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Mendeley logo Mendeley
Proceedings of the National Academy of Sciences: 118 (3)
Current Issue

Submit

Sign up for Article Alerts

Jump to section

  • Article
  • Figures & SI
  • Info & Metrics
  • PDF

You May Also be Interested in

Abstract depiction of a guitar and musical note
Science & Culture: At the nexus of music and medicine, some see disease treatments
Although the evidence is still limited, a growing body of research suggests music may have beneficial effects for diseases such as Parkinson’s.
Image credit: Shutterstock/agsandrew.
Large piece of gold
News Feature: Tracing gold's cosmic origins
Astronomers thought they’d finally figured out where gold and other heavy elements in the universe came from. In light of recent results, they’re not so sure.
Image credit: Science Source/Tom McHugh.
Dancers in red dresses
Journal Club: Friends appear to share patterns of brain activity
Researchers are still trying to understand what causes this strong correlation between neural and social networks.
Image credit: Shutterstock/Yeongsik Im.
White and blue bird
Hazards of ozone pollution to birds
Amanda Rodewald, Ivan Rudik, and Catherine Kling talk about the hazards of ozone pollution to birds.
Listen
Past PodcastsSubscribe
Goats standing in a pin
Transplantation of sperm-producing stem cells
CRISPR-Cas9 gene editing can improve the effectiveness of spermatogonial stem cell transplantation in mice and livestock, a study finds.
Image credit: Jon M. Oatley.

Similar Articles

Site Logo
Powered by HighWire
  • Submit Manuscript
  • Twitter
  • Facebook
  • RSS Feeds
  • Email Alerts

Articles

  • Current Issue
  • Special Feature Articles – Most Recent
  • List of Issues

PNAS Portals

  • Anthropology
  • Chemistry
  • Classics
  • Front Matter
  • Physics
  • Sustainability Science
  • Teaching Resources

Information

  • Authors
  • Editorial Board
  • Reviewers
  • Librarians
  • Press
  • Site Map
  • PNAS Updates

Feedback    Privacy/Legal

Copyright © 2021 National Academy of Sciences. Online ISSN 1091-6490