New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology
Limitations of GCTA as a solution to the missing heritability problem
Edited by Mary-Claire King, University of Washington, Seattle, WA, and approved November 20, 2015 (received for review October 9, 2015)

Significance
The genetic contribution to a phenotype is frequently measured by heritability, the fraction of trait variation explained by genetic differences. Hundreds of publications have found DNA polymorphisms that are statistically associated with diseases or quantitative traits [genome-wide association studies (GWASs)]. Genome-wide complex trait analysis (GCTA), a recent method of analyzing such data, finds high heritabilities for such phenotypes. We analyze GCTA and show that the heritability estimates it produces are highly sensitive to the structure of the genetic relatedness matrix, to the sampling of phenotypes and subjects, and to the accuracy of phenotype measurements. Plausible modifications of the method aimed at increasing stability yield much smaller heritabilities. It is essential to reevaluate the many published heritability estimates based on GCTA.
Abstract
Genome-wide association studies (GWASs) seek to understand the relationship between complex phenotype(s) (e.g., height) and up to millions of single-nucleotide polymorphisms (SNPs). Early analyses of GWASs are commonly believed to have “missed” much of the additive genetic variance estimated from correlations between relatives. A more recent method, genome-wide complex trait analysis (GCTA), obtains much higher estimates of heritability using a model of random SNP effects correlated between genotypically similar individuals. GCTA has now been applied to many phenotypes from schizophrenia to scholastic achievement. However, recent studies question GCTA’s estimates of heritability. Here, we show that GCTA applied to current SNP data cannot produce reliable or stable estimates of heritability. We show first that GCTA depends sensitively on all singular values of a high-dimensional genetic relatedness matrix (GRM). When the assumptions in GCTA are satisfied exactly, we show that the heritability estimates produced by GCTA will be biased and the standard errors will likely be inaccurate. When the population is stratified, we find that GRMs typically have highly skewed singular values, and we prove that the many small singular values cannot be estimated reliably. Hence, GWAS data are necessarily overfit by GCTA which, as a result, produces high estimates of heritability. We also show that GCTA’s heritability estimates are sensitive to the chosen sample and to measurement errors in the phenotype. We illustrate our results using the Framingham dataset. Our analysis suggests that results obtained using GCTA, and the results’ qualitative interpretations, should be interpreted with great caution.
In recent years, genome-wide association studies (GWASs) have become an important tool for investigating the genetic contribution to complex phenotypes. These studies use statistical techniques to find associations between single nucleotide polymorphisms (SNPs) and phenotype(s) (e.g., continuous traits such as height or discrete traits such as presence/absence of a disease). A widely used measure of genetic influence on a phenotype is the (narrow-sense) heritability, defined as the ratio of the additive genetic variance to the total phenotypic variance. A major conundrum revealed by many analyses of GWAS data has been that the small number of significant associations explain much less of the heritability than is estimated from correlations between relatives [i.e., much heritability is “missing” (1⇓–3)]. To address this problem, Yang et al. (4) posited that heritability is not missing but is “hidden.” The authors developed a statistical framework [genome-wide complex trait analysis (GCTA)] in which each SNP makes a random contribution to the phenotype, and these contributions are correlated between individuals who have similar genotypes. Applied to many GWASs, GCTA yields estimates of heritability far larger than those obtained using earlier analyses. GCTA has been used to estimate the heritability of many phenotypes from schizophrenia (5) to scholastic achievement (6). Despite its current wide use, recent studies (7, 8) have questioned the reliability of GCTA estimates.
We show here that the results produced using GCTA hinge on accurate estimation of a high-dimensional genetic relatedness matrix (GRM). We show that even when the assumptions in GCTA are satisfied exactly, heritability estimates produced by GCTA will be biased, and it is unlikely that the confidence intervals will be accurate. When there is genetic stratification in the population, we show that GCTA’s heritability estimates are guaranteed to be unstable and unreliable, which is especially relevant because stratification is common in human GWASs.
Our analysis has two other important consequences: (i) the heritability estimate produced by GCTA is sensitive to the choice of the sample used; and (ii) the estimate is sensitive to measurement errors in the phenotype. We argue that this instability and sensitivity are attributable to the fact that GCTA necessarily overfits typical GWASs. We show that a direct approach to eliminating this overfitting leads back to the small SNP heritability estimates derived previously from association studies. We illustrate our results using the Framingham dataset (9, 10) comprising information on 49,214 SNPs in 2,698 unrelated individuals.
We conclude that application of GCTA to GWAS data may not reliably improve our understanding of the genomic basis of phenotypic variability. Even when the assumptions for GCTA all hold, we recommend the use of diagnostic tests, and we describe one such test. We also discuss several ways of moving toward better methods.
The Data and the GCTA Model
The Data.
A typical GWAS takes phenotypic values for N individuals and assays the individuals’ genotypes at P single nucleotide sites (SNPs). Typically,
Mixed-Effect Models.
To quantify how genes influence phenotypes, mixed linear models (11, 12) of the form
What GCTA Assumes and Does.
GCTA applies Eq. 1 to GWASs, where genetic and phenotypic information are usually measured on unrelated individuals. Each individual’s genotype is given by P numbers, so the vector
The fixed effects term
GCTA obtains maximum-likelihood estimates (MLEs) of the parameters
The relatively simple structure of Eq. 3 (linear, additive, and no environmental or epigenetic effects) relies on assumptions that GCTA has in common with the animal-breeding literature. GCTA assumes that the SNPs used are in linkage equilibrium; in practice, SNPs in linkage disequilibrium are avoided. In our analysis and example, we assume that this selection has been made. However, GCTA makes important additional assumptions: assumption 1, each SNP makes a random contribution to the phenotype independent of the others; assumption 2, the distribution of these random contributions is identical for all SNPs; and assumption 3, there is no genetic stratification in the population.
In this paper, we use a rigorous mathematical analysis of GCTA to answer two key questions. How reliable are the GCTA estimates when the assumptions in the model are satisfied exactly? How robust are the GCTA estimates to violations of assumption 3 (with or without a correction)?
We begin with the key point that any observed realization of the matrix
Singular Values and GCTA.
The MLEs produced by GCTA depend on the properties of the GRM matrix (
We find (Appendix A) that the MLEs computed by GCTA are explicit functions of the singular values and singular vectors of
Case 1: GCTA’s Assumptions Are Satisfied Exactly.
When assumptions 1 through 3 hold, the matrix
The M-P distribution. (A) Plots of the M-P distribution of eigenvalues for different values of
In all available GWASs, there are more SNPs than people, so
In addition to the eigenvalues, the third term in the MLE expression depends on the eigenvectors of the GRM. Because the eigenvectors of the true GRM are unknown, GCTA proceeds by approximating the eigenvectors of the true GRM by the eigenvectors of the sample GRM. This approximation will be valid if the eigenvectors of the true GRM are “similar” to the eigenvectors of the sample GRM. Standard results from perturbation theory show that the eigenvectors will be similar only if the eigenvalues of the sample GRM are not packed close to one another. However, because
To demonstrate this bias, we simulated a dataset comprising 50,000 SNPs in linkage equilibrium for 2,000 people [using PLINK software (16)] and a phenotype with a heritability of 0.75 (using GCTA). The simulation assumes that the entire additive genetic contribution to the phenotype comes from 45,000 out of the total 50,000 SNPs (the causal SNPs) whose effect sizes are normally distributed with mean 0 and variance 1.
Using GCTA on this dataset, we estimate the genotypic variance
To test this hypothesis, we construct 500 genotype matrices, each comprising all 2,000 people but only 5,000 SNPs randomly chosen from the initial 50,000. We ran GCTA using each of these genotype matrices, and in each case, estimated
Large deviations in
Although our results guarantee that each estimate of
Case 2: There Is Genetic Stratification.
Assumption 3 is typically violated: stratification is widely observed in humans (18, 19) and animals (20, 21) and is a major reason for the high number of false discoveries in GWASs (22). GCTA claims to address this by incorporating eigenvectors of the GRM as fixed effects [as columns of
We begin by analyzing how genetic stratification influences the spectral properties of
Skew in the singular values. Notice that
It is well known (see theorem 3 in ref. 24 and section 4.3 in ref. 25) that such skews must occur in stratified populations. In essence, these studies show that if N is large, and the markers are sampled from K different populations, the first
This skew, and the long tail of near-zero singular values, have serious implications for the MLEs from GCTA. Recall that the second term in the MLE (Eq. A8) is sensitive to the near-zero singular values of
Sensitivity to the SNPs used in the study.
The heritability estimates from GCTA will be sensitive to the SNPs used in the study because the errors associated with the near-zero singular values (and in turn the MLEs) for datasets constructed using different sets of SNPs will be different. To demonstrate this, we construct 2,500 genotype matrices,
Sensitivity of GCTA estimates to SNPs retained in GWASs. (A) Heritability and corresponding
Sensitivity to the measurement errors in the phenotype.
Because
Sensitivity of GCTA estimates to measurement errors in the phenotype. Heritability estimates produced by GCTA using 2,500 “noisy” estimates of the phenotype (systolic blood pressure) vector; in our study, we used all 2,698 people and 49,214 SNPs.
Saturation of heritability estimates.
According to GCTA, each SNP makes a random contribution to the phenotype; therefore, the heritability estimate,
Bias in the heritability estimates.
We have shown that for a stratified population, the MLEs produced by GCTA are guaranteed to be biased. The bias arises because thousands of eigenvalues of the GRM are closely packed (near 0) and have large sampling errors associated with their values (Appendix B). As a result of the bias, the heritability estimates produced by GCTA are not reflective of the “true” underlying heritability. Furthermore, the SEs reported by GCTA are functions of the MLEs, and so these SEs will also be unreliable.
To demonstrate this unreliability, we first ran GCTA on the Framingham dataset with BP as the phenotype. GCTA reports that
To test this hypothesis, we computed the estimate and SEs of
Resampling techniques like the bootstrap cannot be used to correct for the bias in heritability estimates because every run of the bootstrap will produce a biased estimate of heritability, and there is no way of estimating the magnitude of the bias in any of these samples. Are there other approaches to fixing GCTA when there is genetic stratification in the population? Two common approaches to fixing this problem are (i) constraining the random effects associated with only some of the SNPs to be relevant (sparsity) (31) or (ii) denoising the matrix
Fixing GCTA by denoising the GRM. (A) Heritability estimates for a highly heritable phenotype computed using the GRM as prescribed by GCTA (the phenotype was simulated as a quantitative trait using GCTA; the desired heritability of the trait was set to 0.65). Notice that the P values suggest a significant contribution from the random effects term. (B) Heritability estimates for the same phenotype vector computed using a denoised GRM (obtained by setting the noise terms to 0). Notice that the random effects associated with the SNPs are no longer significant. This result shows that the pathologies of GCTA are general and not dataset-specific (we have verified a similar trend with the Framingham dataset; a significant random effect term loses significance upon denoising the GRM).
It is not surprising that the estimates produced by GCTA are sensitive and biased because the method estimates
The MLEs produced by GCTA will be unreliable irrespective of the number of principal components that are included in the model (see Fig. 7, where the MLEs produced by GCTA are unreliable even when five principal components are used as fixed effects; similar unreliabilities were observed when one and three principal components were used as fixed effects, respectively). Price and coworkers (23, 24) have shown that principal components are useful for identifying population stratification and when used with reliable methods like association studies, are effective in correcting for population stratification. When principal components are used in conjunction with GCTA, the analysis step is unreliable as a result of the overfitting. Therefore, although the principal components are still able to accurately identify population stratification, they only serve to compound the bias of GCTA.
Using principal components as fixed effects do not resolve problems with stratification. Resampling experiments were identical to those performed in Fig. 4, with five principal components included as fixed effects; GCTA’s predicted 95
Discussion
GCTA analyses have been widely accepted in large part because they produce heritability estimates that are many times larger than earlier estimates from GWAS data and are closer to those obtained from data with reliable pedigrees. The statistical model in GCTA assumes that each SNP makes a small random contribution to the variability in the phenotype. Our analytical and numerical results illustrate the problems with GCTA when (i) the assumptions of the model are satisfied exactly or (ii) the assumptions are violated as a result of genetic stratification. In both cases, the problems associated with GCTA stem from the fact that a high-dimensional correlation matrix is being estimated from a limited amount of data without dimensionality reduction.
When there is genetic stratification in the population, the GRM has a long tail of near-zero eigenvalues; here, GCTA will produce unreliable heritability estimates. GCTA claims that including the first few principal components as fixed effects (following ref. 23) will resolve the problem of stratification, but this is not the case; even after including principal components as fixed effects, the problems associated with the near-zero singular values of
Even when there is no genetic stratification, our analysis strongly suggests that the heritability estimates and their SEs produced by GCTA will be unreliable. These unreliabilities are illustrated by our simulations (Fig. 2). We do not prove that they will apply for all datasets where there is no genetic stratification. Our illustration does suggest a simple test of the reliability of GCTA’s estimates by a resampling procedure: first, construct many, say 500, genotype matrices by randomly sampling
Heritability estimates using methods other than GCTA are generally low
The problems in GCTA stem from the overfitting of a high-dimensional GRM. To make progress with GCTA-like mixed models, it is critical that the estimates of this matrix be refined. Some progress has been made in this direction using, for example, methods for covariance smoothing (37). There are several alternative methods of describing the relatedness between individuals (for a survey of these methods, see ref. 38), some of which could prove useful in improving the estimate of the GRM.
We have shown that a brute force approach to estimating the covariance structure of the random effects of SNPs (as in GCTA) does not resolve the problem stemming from the number of SNPs (P) being much larger than the number of subjects genotyped (N). We believe that future studies of GWAS data can make progress by incorporating prior information. Two possible ways of so doing are (i) insisting that the basis used for constructing the GRM is sparse (i.e., only some SNPs make random contributions and the rest have fixed contributions) and (ii) incorporating biological information about the relationships between elements in the random covariance matrix.
Appendices
Appendix A: The Likelihood Function and Sensitivity.
Expressing Eq. 2 in probabilistic form, we have
Instability of the second term in Eq. A3.
Using the SVD of
Instability of the third term in Eq. A3.
From Eq. A4, we have
Appendix B: The Likelihood Function and Bias.
Here, we reformulate the likelihood function in terms of the eigenvalues [
Note that J is not symmetric in
Case 1: There is no stratification in the population.
When the assumptions of GCTA are met exactly,
Because the M-P distribution is continuous, we assume without loss of generality that the eigenvalues are unique (i.e., there are no repeated eigenvalues). Because N is large, the eigenvalues of
This small separation causes the eigenvectors of the estimated GRM to be drastically different from those of the true GRM. To see why, consider an eigenvalue (say the
These differences are amplified in the expressions for the MLE and its derivatives (see the second and third summations in Eqs. A18 and A19). Therefore, the bias in the heritability estimates produced by GCTA can be large.
Case 2: The population is stratified.
In a stratified population, there are two sources of bias in the GCTA estimates. The first source of bias is identical to that described in the case when there is no stratification for a stratified population, thousands of eigenvalues of the GRM are tightly packed near 0, and therefore from Eqs. A18 and A19, there can be large errors associated with the MLEs produced by GCTA.
The second source of bias comes from the large errors associated with the eigenvalues of the GRM (26). Specifically, suppose there are sampling errors
The bootstrap and GCTA.
Here, we show that resampling techniques (e.g., the bootstrap, jackknife, etc.) cannot be used to improve the estimates produced by GCTA. Loosely put, the bootstrap estimates the parameter in question (in this case, the heritability) by resampling from the original sample and relying on the fact that the sampling errors “average out” to 0. We have shown in Appendix B that the GCTA estimates are overly (erroneously) biased by local information.
For run i of a bootstrap, let the heritability estimate obtained by GCTA be
Appendix C: Dynamics of GCTA and More Problems with Stratification.
Consider the most general case, where N random people and P random SNPs are used to construct the random genotype matrix
Suppose the first
GCTA tries to explain the variance of the phenotype vector
In Eq. A23, we showed that the random projection of
Acknowledgments
We thank Chiara Sabatti, Chris Gignoux, David Golan, David Steinsaltz, Edgar Dobriban, Jonathan Pritchard, and Kenneth Wachter for useful comments on earlier drafts of this paper. This project is funded by National Institutes of Health Grant AG22500 (to S.T.) and the Morrison Institute for Population and Resource Studies. S.K.K. is funded by the Stanford Center for Computational, Evolutionary and Human Genomics. D.H.R. is supported by National Institute on Aging Grant K01AG047280. Our use of the Framingham data has been approved by the Stanford Institutional Review Board.
Footnotes
- ↵1To whom correspondence should be addressed. Email: sidkk86{at}stanford.edu.
Author contributions: S.K.K. and S.T. designed research; S.K.K. conducted much of the analysis with assistance from M.W.F.; D.H.R. and S.T. performed the initial data filtering; and S.K.K., M.W.F., and S.T. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Freely available online through the PNAS open access option.
References
- ↵
- ↵
- ↵
- ↵
- ↵.
- Lee SH, et al., Schizophrenia Psychiatric Genome-Wide Association Study Consortium (PGC-SCZ); International Schizophrenia Consortium (ISC); Molecular Genetics of Schizophrenia Collaboration (MGS)
- ↵
- ↵
- ↵.
- Charney E
- ↵
- ↵.
- Splansky GL, et al.
- ↵
- ↵
- ↵
- ↵
- ↵.
- Johnstone IM
- ↵
- ↵.
- Casella G,
- Berger RL
- ↵
- ↵.
- Wacholder S,
- Rothman N,
- Caporaso N
- ↵
- ↵
- ↵.
- Tian C,
- Gregersen PK,
- Seldin MF
- ↵
- ↵
- ↵
- ↵.
- Stewart GW
- ↵.
- Kannel WB
- ↵
- ↵.
- Pickering TG, et al., Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research
- ↵
- ↵
- ↵
- ↵.
- Hastie T,
- Tibshirani R,
- Friedman J
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Horn RA,
- Johnson CR
- ↵.
- Harville DA
- ↵
- ↵.
- Wang K
- ↵.
- Johnstone IM,
- Lu AY





















