## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Limitations of GCTA as a solution to the missing heritability problem

Edited by Mary-Claire King, University of Washington, Seattle, WA, and approved November 20, 2015 (received for review October 9, 2015)

## Significance

The genetic contribution to a phenotype is frequently measured by heritability, the fraction of trait variation explained by genetic differences. Hundreds of publications have found DNA polymorphisms that are statistically associated with diseases or quantitative traits [genome-wide association studies (GWASs)]. Genome-wide complex trait analysis (GCTA), a recent method of analyzing such data, finds high heritabilities for such phenotypes. We analyze GCTA and show that the heritability estimates it produces are highly sensitive to the structure of the genetic relatedness matrix, to the sampling of phenotypes and subjects, and to the accuracy of phenotype measurements. Plausible modifications of the method aimed at increasing stability yield much smaller heritabilities. It is essential to reevaluate the many published heritability estimates based on GCTA.

## Abstract

Genome-wide association studies (GWASs) seek to understand the relationship between complex phenotype(s) (e.g., height) and up to millions of single-nucleotide polymorphisms (SNPs). Early analyses of GWASs are commonly believed to have “missed” much of the additive genetic variance estimated from correlations between relatives. A more recent method, genome-wide complex trait analysis (GCTA), obtains much higher estimates of heritability using a model of random SNP effects correlated between genotypically similar individuals. GCTA has now been applied to many phenotypes from schizophrenia to scholastic achievement. However, recent studies question GCTA’s estimates of heritability. Here, we show that GCTA applied to current SNP data cannot produce reliable or stable estimates of heritability. We show first that GCTA depends sensitively on all singular values of a high-dimensional genetic relatedness matrix (GRM). When the assumptions in GCTA are satisfied exactly, we show that the heritability estimates produced by GCTA will be biased and the standard errors will likely be inaccurate. When the population is stratified, we find that GRMs typically have highly skewed singular values, and we prove that the many small singular values cannot be estimated reliably. Hence, GWAS data are necessarily overfit by GCTA which, as a result, produces high estimates of heritability. We also show that GCTA’s heritability estimates are sensitive to the chosen sample and to measurement errors in the phenotype. We illustrate our results using the Framingham dataset. Our analysis suggests that results obtained using GCTA, and the results’ qualitative interpretations, should be interpreted with great caution.

In recent years, genome-wide association studies (GWASs) have become an important tool for investigating the genetic contribution to complex phenotypes. These studies use statistical techniques to find associations between single nucleotide polymorphisms (SNPs) and phenotype(s) (e.g., continuous traits such as height or discrete traits such as presence/absence of a disease). A widely used measure of genetic influence on a phenotype is the (narrow-sense) heritability, defined as the ratio of the additive genetic variance to the total phenotypic variance. A major conundrum revealed by many analyses of GWAS data has been that the small number of significant associations explain much less of the heritability than is estimated from correlations between relatives [i.e., much heritability is “missing” (1⇓–3)]. To address this problem, Yang et al. (4) posited that heritability is not missing but is “hidden.” The authors developed a statistical framework [genome-wide complex trait analysis (GCTA)] in which each SNP makes a random contribution to the phenotype, and these contributions are correlated between individuals who have similar genotypes. Applied to many GWASs, GCTA yields estimates of heritability far larger than those obtained using earlier analyses. GCTA has been used to estimate the heritability of many phenotypes from schizophrenia (5) to scholastic achievement (6). Despite its current wide use, recent studies (7, 8) have questioned the reliability of GCTA estimates.

We show here that the results produced using GCTA hinge on accurate estimation of a high-dimensional genetic relatedness matrix (GRM). We show that even when the assumptions in GCTA are satisfied exactly, heritability estimates produced by GCTA will be biased, and it is unlikely that the confidence intervals will be accurate. When there is genetic stratification in the population, we show that GCTA’s heritability estimates are guaranteed to be unstable and unreliable, which is especially relevant because stratification is common in human GWASs.

Our analysis has two other important consequences: (*i*) the heritability estimate produced by GCTA is sensitive to the choice of the sample used; and (*ii*) the estimate is sensitive to measurement errors in the phenotype. We argue that this instability and sensitivity are attributable to the fact that GCTA necessarily overfits typical GWASs. We show that a direct approach to eliminating this overfitting leads back to the small SNP heritability estimates derived previously from association studies. We illustrate our results using the Framingham dataset (9, 10) comprising information on 49,214 SNPs in 2,698 unrelated individuals.

We conclude that application of GCTA to GWAS data may not reliably improve our understanding of the genomic basis of phenotypic variability. Even when the assumptions for GCTA all hold, we recommend the use of diagnostic tests, and we describe one such test. We also discuss several ways of moving toward better methods.

## The Data and the GCTA Model

### The Data.

A typical GWAS takes phenotypic values for *N* individuals and assays the individuals’ genotypes at *P* single nucleotide sites (SNPs). Typically,

### Mixed-Effect Models.

To quantify how genes influence phenotypes, mixed linear models (11, 12) of the form*N* individuals,

### What GCTA Assumes and Does.

GCTA applies Eq. **1** to GWASs, where genetic and phenotypic information are usually measured on unrelated individuals. Each individual’s genotype is given by *P* numbers, so the vector

The fixed effects term

GCTA obtains maximum-likelihood estimates (MLEs) of the parameters

The relatively simple structure of Eq. **3** (linear, additive, and no environmental or epigenetic effects) relies on assumptions that GCTA has in common with the animal-breeding literature. GCTA assumes that the SNPs used are in linkage equilibrium; in practice, SNPs in linkage disequilibrium are avoided. In our analysis and example, we assume that this selection has been made. However, GCTA makes important additional assumptions: assumption 1, each SNP makes a random contribution to the phenotype independent of the others; assumption 2, the distribution of these random contributions is identical for all SNPs; and assumption 3, there is no genetic stratification in the population.

In this paper, we use a rigorous mathematical analysis of GCTA to answer two key questions. How reliable are the GCTA estimates when the assumptions in the model are satisfied exactly? How robust are the GCTA estimates to violations of assumption 3 (with or without a correction)?

We begin with the key point that any observed realization of the matrix **2** is a sample from some underlying distribution of possible data. In fact, **3**. To analyze the precision and stability of these MLEs, we now develop the connection between the MLEs and the geometry of

### Singular Values and GCTA.

The MLEs produced by GCTA depend on the properties of the GRM matrix (*N* eigenvalues (and associated eigenvectors) of the symmetric matrix

We find (*Appendix A*) that the MLEs computed by GCTA are explicit functions of the singular values and singular vectors of **A3** and **A8**) is a function of**A3** and **A9**) is a function of

### Case 1: GCTA’s Assumptions Are Satisfied Exactly.

When assumptions 1 through 3 hold, the matrix *N* variate Wishart distribution with *P* degrees of freedom. Marčenko and Pastur (13) (also see refs. 14 and 15 for more readable expositions) show that for samples *A*) converges to a limiting form when *B*). Note from Fig. 1*A* that for all values of

In all available GWASs, there are more SNPs than people, so

In addition to the eigenvalues, the third term in the MLE expression depends on the eigenvectors of the GRM. Because the eigenvectors of the true GRM are unknown, GCTA proceeds by approximating the eigenvectors of the true GRM by the eigenvectors of the sample GRM. This approximation will be valid if the eigenvectors of the true GRM are “similar” to the eigenvectors of the sample GRM. Standard results from perturbation theory show that the eigenvectors will be similar only if the eigenvalues of the sample GRM are not packed close to one another. However, because *B*, where *Appendix B* for details). If these biases in the MLEs are large, the heritability estimates produced by GCTA will not be representative of the true underlying heritability of the phenotype. Furthermore, because GCTA estimates the SE of the heritability as a function of the MLEs, large biases would make this SE meaningless.

To demonstrate this bias, we simulated a dataset comprising 50,000 SNPs in linkage equilibrium for 2,000 people [using PLINK software (16)] and a phenotype with a heritability of 0.75 (using GCTA). The simulation assumes that the entire additive genetic contribution to the phenotype comes from 45,000 out of the total 50,000 SNPs (the causal SNPs) whose effect sizes are normally distributed with mean 0 and variance 1.

Using GCTA on this dataset, we estimate the genotypic variance

To test this hypothesis, we construct 500 genotype matrices, each comprising all 2,000 people but only 5,000 SNPs randomly chosen from the initial 50,000. We ran GCTA using each of these genotype matrices, and in each case, estimated

Although our results guarantee that each estimate of

### Case 2: There Is Genetic Stratification.

Assumption 3 is typically violated: stratification is widely observed in humans (18, 19) and animals (20, 21) and is a major reason for the high number of false discoveries in GWASs (22). GCTA claims to address this by incorporating eigenvectors of the GRM as fixed effects [as columns of **1**, as suggested by Eigenstrat software (23)]. Surprisingly, GCTA finds that in most cases, the fixed-effect term has little influence (i.e., the heritability estimate from GCTA is nearly independent of the stratification). We will show that the analysis provided by GCTA is flawed and that stratification will induce large errors in the MLEs. We illustrate our points using the Framingham dataset (9, 10), which is known to be stratified.

We begin by analyzing how genetic stratification influences the spectral properties of

It is well known (see theorem 3 in ref. 24 and section 4.3 in ref. 25) that such skews must occur in stratified populations. In essence, these studies show that if *N* is large, and the markers are sampled from *K* different populations, the first

This skew, and the long tail of near-zero singular values, have serious implications for the MLEs from GCTA. Recall that the second term in the MLE (Eq. **A8**) is sensitive to the near-zero singular values of **A9**) is a function of

#### Sensitivity to the SNPs used in the study.

The heritability estimates from GCTA will be sensitive to the SNPs used in the study because the errors associated with the near-zero singular values (and in turn the MLEs) for datasets constructed using different sets of SNPs will be different. To demonstrate this, we construct 2,500 genotype matrices, *A*), because of sampling errors associated with one or more of the near-zero singular values (Fig. 4*B*).

#### Sensitivity to the measurement errors in the phenotype.

Because *i*; in general, BP readings are much noisier (29)] and use GCTA to estimate the heritability using each of these vectors. Even for the modest errors in this case, GCTA shows high variability in its heritability estimates (Fig. 5).

#### Saturation of heritability estimates.

According to GCTA, each SNP makes a random contribution to the phenotype; therefore, the heritability estimate, *P* (because *P*, *P* (i.e., *P*) which violates GCTA’s assumption that the contribution of each SNP to the heritability estimate is independent of the others.

#### Bias in the heritability estimates.

We have shown that for a stratified population, the MLEs produced by GCTA are guaranteed to be biased. The bias arises because thousands of eigenvalues of the GRM are closely packed (near 0) and have large sampling errors associated with their values (*Appendix B*). As a result of the bias, the heritability estimates produced by GCTA are not reflective of the “true” underlying heritability. Furthermore, the SEs reported by GCTA are functions of the MLEs, and so these SEs will also be unreliable.

To demonstrate this unreliability, we first ran GCTA on the Framingham dataset with BP as the phenotype. GCTA reports that

To test this hypothesis, we computed the estimate and SEs of *A*. The

Resampling techniques like the bootstrap cannot be used to correct for the bias in heritability estimates because every run of the bootstrap will produce a biased estimate of heritability, and there is no way of estimating the magnitude of the bias in any of these samples. Are there other approaches to fixing GCTA when there is genetic stratification in the population? Two common approaches to fixing this problem are (*i*) constraining the random effects associated with only some of the SNPs to be relevant (sparsity) (31) or (*ii*) denoising the matrix *Appendix C*) that the contribution of the random effects term will not be significantly different from 0 (Fig. 6); these results are consistent with our findings on denoising the Framingham dataset. Because the random effects term is the sole driver of the “improved” heritability estimates produced by GCTA, in the term’s absence, the heritability estimates will be no better than those obtained using the significant SNPs in association studies.

It is not surprising that the estimates produced by GCTA are sensitive and biased because the method estimates *N* nonzero singular values and their corresponding left and right singular vectors, plus *P* values for significance and so greatly reduce the effective number of parameters being estimated (see ref. 33 for details); the resulting effective number of parameters is much smaller than the size of the dataset, so overfitting is not a problem.

The MLEs produced by GCTA will be unreliable irrespective of the number of principal components that are included in the model (see Fig. 7, where the MLEs produced by GCTA are unreliable even when five principal components are used as fixed effects; similar unreliabilities were observed when one and three principal components were used as fixed effects, respectively). Price and coworkers (23, 24) have shown that principal components are useful for identifying population stratification and when used with reliable methods like association studies, are effective in correcting for population stratification. When principal components are used in conjunction with GCTA, the analysis step is unreliable as a result of the overfitting. Therefore, although the principal components are still able to accurately identify population stratification, they only serve to compound the bias of GCTA.

## Discussion

GCTA analyses have been widely accepted in large part because they produce heritability estimates that are many times larger than earlier estimates from GWAS data and are closer to those obtained from data with reliable pedigrees. The statistical model in GCTA assumes that each SNP makes a small random contribution to the variability in the phenotype. Our analytical and numerical results illustrate the problems with GCTA when (*i*) the assumptions of the model are satisfied exactly or (*ii*) the assumptions are violated as a result of genetic stratification. In both cases, the problems associated with GCTA stem from the fact that a high-dimensional correlation matrix is being estimated from a limited amount of data without dimensionality reduction.

When there is genetic stratification in the population, the GRM has a long tail of near-zero eigenvalues; here, GCTA will produce unreliable heritability estimates. GCTA claims that including the first few principal components as fixed effects (following ref. 23) will resolve the problem of stratification, but this is not the case; even after including principal components as fixed effects, the problems associated with the near-zero singular values of *P* value criteria. We believe that stratification is responsible for many of the counterintuitive results reported by studies using GCTA (a more detailed discussion of these studies can be found in ref. 8). Furthermore, numerous studies on sensitive subjects like childhood intelligence (34), Tourette syndrome (35), and schizophrenia (36) need to be critically reviewed.

Even when there is no genetic stratification, our analysis strongly suggests that the heritability estimates and their SEs produced by GCTA will be unreliable. These unreliabilities are illustrated by our simulations (Fig. 2). We do not prove that they will apply for all datasets where there is no genetic stratification. Our illustration does suggest a simple test of the reliability of GCTA’s estimates by a resampling procedure: first, construct many, say 500, genotype matrices by randomly sampling *P* SNPs in the dataset and use GCTA to compute the *P* SNPs. Under GCTA’s null, ∼99.5% of the distribution should lie in the interval [

Heritability estimates using methods other than GCTA are generally low

The problems in GCTA stem from the overfitting of a high-dimensional GRM. To make progress with GCTA-like mixed models, it is critical that the estimates of this matrix be refined. Some progress has been made in this direction using, for example, methods for covariance smoothing (37). There are several alternative methods of describing the relatedness between individuals (for a survey of these methods, see ref. 38), some of which could prove useful in improving the estimate of the GRM.

We have shown that a brute force approach to estimating the covariance structure of the random effects of SNPs (as in GCTA) does not resolve the problem stemming from the number of SNPs (*P*) being much larger than the number of subjects genotyped (*N*). We believe that future studies of GWAS data can make progress by incorporating prior information. Two possible ways of so doing are (*i*) insisting that the basis used for constructing the GRM is sparse (i.e., only some SNPs make random contributions and the rest have fixed contributions) and (*ii*) incorporating biological information about the relationships between elements in the random covariance matrix.

## Appendices

### Appendix A: The Likelihood Function and Sensitivity.

Expressing Eq. **2** in probabilistic form, we have**A3** with

#### Instability of the second term in Eq. A3*.*

Using the SVD of **A2**, we have**A5**, setting **A3**, we must differentiate **A7** is**A8** will be extremely sensitive to small changes in the values of *w _{i}*

#### Instability of the third term in Eq. A3.

From Eq. **A4**, we have

### Appendix B: The Likelihood Function and Bias.

Here, we reformulate the likelihood function in terms of the eigenvalues [*Singular Values and GCTA*, we have **3** by **A12**,**A2**. Using Eq. **A13**, we can write the likelihood function as**A15** in Eq. **A14** and comparing with Eq. **A3**, we note that the first two terms in the likelihood function are identical. The only term whose derivative with respect to **A14** can be written as**A17** in Eq. **A16**, we get

Note that *J* is not symmetric in *J* with respect to **A19** that unless

#### Case 1: There is no stratification in the population.

When the assumptions of GCTA are met exactly, *N* variate Wishart distribution with *P* degrees of freedom (15). Marčenko–Pastur theory (13) provides an empirical distribution (which we henceforth refer to as the M-P distribution) for the eigenvalues of the variance-covariance matrix (which in our case will be the GRM, *B*). The distribution shows that most of the eigenvalues of the GRM will lie in 0–4. Note from Fig. 1*A* that when *A*).

Because the M-P distribution is continuous, we assume without loss of generality that the eigenvalues are unique (i.e., there are no repeated eigenvalues). Because *N* is large, the eigenvalues of *A*), and therefore we expect the eigenvalues near 0.5 to be a lot closer than 0.001 (for our simulation in Fig. 1*B*, the minimum spacing between the eigenvalues is

This small separation causes the eigenvectors of the estimated GRM to be drastically different from those of the true GRM. To see why, consider an eigenvalue (say the **A17**, the angle between

These differences are amplified in the expressions for the MLE and its derivatives (see the second and third summations in Eqs. **A18** and **A19**). Therefore, the bias in the heritability estimates produced by GCTA can be large.

#### Case 2: The population is stratified.

In a stratified population, there are two sources of bias in the GCTA estimates. The first source of bias is identical to that described in the case when there is no stratification for a stratified population, thousands of eigenvalues of the GRM are tightly packed near 0, and therefore from Eqs. **A18** and **A19**, there can be large errors associated with the MLEs produced by GCTA.

The second source of bias comes from the large errors associated with the eigenvalues of the GRM (26). Specifically, suppose there are sampling errors *κ* need not be close to 0. As a result, all of the error terms in Eqs. **A18** and **A19** will have additional amplification factors of the form *p* and *q*) of near-zero eigenvalues of the GRM.

#### The bootstrap and GCTA.

Here, we show that resampling techniques (e.g., the bootstrap, jackknife, etc.) cannot be used to improve the estimates produced by GCTA. Loosely put, the bootstrap estimates the parameter in question (in this case, the heritability) by resampling from the original sample and relying on the fact that the sampling errors “average out” to 0. We have shown in *Appendix B* that the GCTA estimates are overly (erroneously) biased by local information.

For run *i* of a bootstrap, let the heritability estimate obtained by GCTA be **A21** over sufficient bootstrap samples, we have

### Appendix C: Dynamics of GCTA and More Problems with Stratification.

Consider the most general case, where *N* random people and *P* random SNPs are used to construct the random genotype matrix *N* dimensions (41) which implies that with high probability, the eigenvectors of *n* = 2,698 and *P* = 49,214,

Suppose the first **3**. First, we express **A23** becomes

GCTA tries to explain the variance of the phenotype vector *k* random projections onto a plane defined by the columns of **A23**, we expressed these random projections using the left singular vectors of *N* is fixed, *c* (43)].

In Eq. **A23**, we showed that the random projection of **A23** and the heritability estimate. It appears that it is this term that is responsible for the high values of the GCTA estimate (because the first term makes nearly 0 contribution to the projection). For every set of people and SNPs that are chosen, the data matrix generates a new set of

## Acknowledgments

We thank Chiara Sabatti, Chris Gignoux, David Golan, David Steinsaltz, Edgar Dobriban, Jonathan Pritchard, and Kenneth Wachter for useful comments on earlier drafts of this paper. This project is funded by National Institutes of Health Grant AG22500 (to S.T.) and the Morrison Institute for Population and Resource Studies. S.K.K. is funded by the Stanford Center for Computational, Evolutionary and Human Genomics. D.H.R. is supported by National Institute on Aging Grant K01AG047280. Our use of the Framingham data has been approved by the Stanford Institutional Review Board.

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. Email: sidkk86{at}stanford.edu.

Author contributions: S.K.K. and S.T. designed research; S.K.K. conducted much of the analysis with assistance from M.W.F.; D.H.R. and S.T. performed the initial data filtering; and S.K.K., M.W.F., and S.T. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Freely available online through the PNAS open access option.

## References

- ↵
- ↵
- ↵
- ↵
- ↵.
- Lee SH, et al., Schizophrenia Psychiatric Genome-Wide Association Study Consortium (PGC-SCZ); International Schizophrenia Consortium (ISC); Molecular Genetics of Schizophrenia Collaboration (MGS)

- ↵
- ↵
- ↵.
- Charney E

*Independent Science News* - ↵
- ↵.
- Splansky GL, et al.

- ↵
- ↵
- ↵
- ↵
- ↵.
- Johnstone IM

- ↵
- ↵.
- Casella G,
- Berger RL

- ↵
- ↵.
- Wacholder S,
- Rothman N,
- Caporaso N

- ↵
- ↵
- ↵.
- Tian C,
- Gregersen PK,
- Seldin MF

- ↵
- ↵
- ↵
- ↵.
- Stewart GW

- ↵.
- Kannel WB

- ↵
- ↵.
- Pickering TG, et al., Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research

- ↵
- ↵
- ↵
- ↵.
- Hastie T,
- Tibshirani R,
- Friedman J

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Horn RA,
- Johnson CR

- ↵.
- Harville DA

- ↵
- ↵.
- Wang K

- ↵.
- Johnstone IM,
- Lu AY