# Selection against variants in the genome associated with educational attainment

^{a}deCODE genetics/Amgen Inc., Reykjavik 101, Iceland;^{b}School of Engineering and Natural Sciences, University of Iceland, Reykjavik 101, Iceland;^{c}Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, United Kingdom;^{d}Department of Applied Economics, Erasmus School of Applied Economics, Erasmus University Rotterdam, 3062 PA Rotterdam, The Netherlands;^{e}Institute for Behavior and Biology, Erasmus University Rotterdam, 3062 PA Rotterdam, The Netherlands;^{f}Department of Anthropology, University of Iceland, Reykjavik 101, Iceland;^{g}Faculty of Medicine, University of Iceland, Reykjavik 101, Iceland

See allHide authors and affiliations

Edited by Andrew G. Clark, Cornell University, Ithaca, NY, and approved December 5, 2016 (received for review July 22, 2016)

## Significance

Epidemiological studies suggest that educational attainment is affected by genetic variants. Results from recent genetic studies allow us to construct a score from a person’s genotypes that captures a portion of this genetic component. Using data from Iceland that include a substantial fraction of the population we show that individuals with high scores tend to have fewer children, mainly because they have children later in life. Consequently, the average score has been decreasing over time in the population. The rate of decrease is small per generation but marked on an evolutionary timescale. Another important observation is that the association between the score and fertility remains highly significant after adjusting for the educational attainment of the individuals.

## Abstract

Epidemiological and genetic association studies show that genetics play an important role in the attainment of education. Here, we investigate the effect of this genetic component on the reproductive history of 109,120 Icelanders and the consequent impact on the gene pool over time. We show that an educational attainment polygenic score, POLY_{EDU,} constructed from results of a recent study is associated with delayed reproduction (*P* < 10^{−100}) and fewer children overall. The effect is stronger for women and remains highly significant after adjusting for educational attainment. Based on 129,808 Icelanders born between 1910 and 1990, we find that the average POLY_{EDU} has been declining at a rate of ∼0.010 standard units per decade, which is substantial on an evolutionary timescale. Most importantly, because POLY_{EDU} only captures a fraction of the overall underlying genetic component the latter could be declining at a rate that is two to three times faster.

Epidemiological studies have estimated that the genetic component of educational attainment can account for as much as 40% of the trait variance (1). Recent meta-analyses (2, 3) yielded sequence variants contributing to the underlying genetic component. A negative correlation between educational attainment and number of children has been observed in many populations (4⇓⇓–7). A recent study of ∼20,000 genotyped Americans born between 1931 and 1953 provided direct evidence that the genetic propensity for educational attainment is associated with reduced fertility (8, 9), supporting previously postulated notions (10) that the population average of the genetic propensity for educational attainment and related traits must be declining. Here, using a population-wide sample that is both much larger and covers a substantially greater time span, and with additional auxiliary information, we aim to estimate the change of the genetic propensity of educational attainment in the Icelandic population over the last few decades, starting with an in-depth investigation of the relationship between a measurable genetic component of educational attainment and various aspects of reproduction (11⇓⇓–14).

## Results

The number of living Icelanders is ∼317,000 (Fig. S1). A genealogical database of Icelanders (15⇓–17) that is very close to complete for individuals born after 1910 (*Materials and Methods*) is used in this study. Probands used for the genetic analyses here are limited to those with both parents and all four grandparents listed in the genealogy. For the fertility studies, only children who survived their first year are counted. The first step was to use results from a recent genome-wide association study (GWAS) of educational attainment (3) to determine the per-locus allele-specific weightings of 620,000 markers used to calculate a polygenic score (18, 19), POLY_{EDU} (*Materials and Methods* for details on polygenic score construction). After excluding the Icelandic cohorts in the GWAS to avoid confounding, 278,948 samples from 62 cohorts were used to determine the weightings for POLY_{EDU}. We computed POLY_{EDU} for over 150,000 Icelanders who were directly genotyped with chip arrays and imputed for additional sequence variants discovered through whole-genome sequencing of 8,453 Icelanders (20) (*Materials and Methods*). POLY_{EDU} was scaled to an SD of 1, hereafter referred to as standard units (SUs). When applied to 46,079 Icelanders with educational attainment data POLY_{EDU} was found to explain 3.74% of the trait variance (*P* < 10^{−300}). By contrast, the strongest single variant only explains 0.10% of the variance, indicating that educational attainment is a complex trait influenced by many variants in the genome and highlighting the increased power of using the polygenic score for our analyses. Our first analysis focused on 109,120 individuals (58,560 females and 50,560 males) with year of birth (yob) between 1910 and 1975 (Fig. S2). The genealogical database was used to obtain the number of children (NC) and, where applicable, the age at first child (AGFC) and the average age at child birth (AACB) for this set. The estimated effects of POLY_{EDU} on these reproductive traits, adjusted for yob and 20 principal components (21), are presented in Table 1 for females and males separately. For females, an increase of 1 SU of POLY_{EDU} corresponds to an average decrease of 0.084 children [*P* = 1.0 × 10^{−43}, calculated with genomic control adjustment (22)], and for those with children AGFC and AACB increased by 0.59 years (*P* = 5.3 × 10^{−155}) and 0.46 years (*P* = 1.0 × 10^{−117}), respectively. A similar, albeit weaker, pattern of results was observed for males. The finding of a substantially stronger association for AGFC than NC suggests that the effect of POLY_{EDU} on NC is mainly manifested through delayed reproduction. Thus, for females with children, the association between AGFC and POLY_{EDU} remains highly significant (*P* = 2.9 × 10^{−118}) after adjusting for NC, whereas the association between NC and POLY_{EDU} is not significant (*P* = 0.17) after adjusting for AGFC. This led us to examine the effect of POLY_{EDU} on NC[x], the number of children a proband had at or after age x, as a function of x. The results are presented in Fig. 1. At x = 14, the estimated effect on NC[x] per SU of POLY_{EDU}, denoted by eff[x], is −0.084 for females and −0.054 for males. These correspond to results in Table 1 because none of the probands here had children before 14 years of age. As x increases, the estimated effect becomes less negative and is essentially zero at 22 for females and 23 for males. In other words, if children born to mothers at 21 years of age or younger (18% of all children counted here, Fig. S3) and children born to males at 22 or younger (13% of all children counted here) are ignored, there is no correlation between NC and POLY_{EDU}. As x increases further, eff[x] becomes positive and continues to increase until x = 30 for females and starts to drop slowly to zero after that. Note that the difference eff[x] − eff[x + 1] corresponds to the estimated effect of POLY_{EDU} on children born to the proband at precisely age x. Thus, for age x > 30, females with higher POLY_{EDU} tend to have more children than those with lower POLY_{EDU}, whereas the reverse is true for x < 30. Having more children after 30 (*P* < 1 × 10^{−15}) compensates for having fewer children between 22 and 30 years of age but does not compensate for the reduced number of children at age 21 years and younger. Similar results apply to the males with the age boundaries shifting 1 to 2 years upward. The negative effect of POLY_{EDU} on NC is less for males than for females, and the difference is mainly accounted for by children born to them at 19 years or younger. The analyses performed using POLY_{EDU} maximize statistical power, but the effects on fertility traits can also be seen with individual variants. Results for 120 SNPs that are genome-wide significant (*P* < 5 × 10^{−8}) in the meta-analysis for educational attainment excluding Icelandic data (*Materials and Methods*) are given in Table S1 and Figs. S4 and S5. For example, 35 of the 120 SNPs have associations with AGFC of females that are in the same direction and nominally significant (one-sided *P* < 0.05). The minor allele of one of these SNPs, rs192818565, is associated with reduced education. It is known to tag the H2 haplotype of a common inversion on chromosome 17 that was shown to exhibit characteristics consistent with having been positive-selected (23). It has subsequently been shown that H2 is also associated with reduced intracranial volume (24, 25) and neuroticism (26). Combining our male and female data, the minor allele of rs192818565 is significantly associated with more children (*P* = 5.2 × 10^{−3}) and having children earlier (*P* = 2.2 × 10^{−3}). This is thus a striking case where a variant associated with a phenotype typically regarded as unfavorable could nonetheless be also associated with increased “fitness” in the evolutionary sense.

Among the genotyped individuals with yob between 1910 and 1975, information about educational attainment is available for 25,794 females and 19,903 males. For these individuals, the effects of POLY_{EDU} and educational attainment (EDU) itself on the reproductive traits were estimated individually, through separate regressions, and jointly, through regressions including both as predictors (Table 2). We coded EDU as in a recent meta-analysis (3). Individuals fall into four categories: 10, 13, 15, and 20 years (mean = 14.0 and SD = 3.4 for males and mean = 13.4 and SD = 3.7 for females). The first category corresponds to the mandatory minimum education in Iceland and the last corresponds to a college degree. For females, when analyzed separately, each SU increase of POLY_{EDU} decreases expected NC by 0.097 (*P* = 1.7 × 10^{−23}), whereas each year increase in EDU corresponds to a reduction of 0.045 (*P* = 5.0 × 10^{−56}). When analyzed jointly, the estimated effect of POLY_{EDU} on NC adjusted for EDU reduces to −0.071, a shrinkage that is meaningful but not drastic, and remains highly significant (*P* = 7.2 × 10^{−13}). Similar results were observed for AGFC and AACB. Clearly, EDU here is not a complete measure of educational attainment (e.g., it does not include information on postcollege education). With a more comprehensive measure of educational attainment, the estimated effects for POLY_{EDU} upon adjustment might shrink further, but the changes are unlikely to be drastic. For example, limiting to females with 10 years of education (*n* = 11,055), the estimated effect of POLY_{EDU} on NC is −0.079 (*P* = 5.8 × 10^{−6}) (Table S2). These results indicate that POLY_{EDU} has a direct effect on reproduction that is independent of the amount of education that is actually attained. Crucially, these results indicate that the magnitude of selection acting on the underlying genetic component of educational attainment has to be estimated directly using genotype data and could be severely underestimated if one attempts to deduce it based solely on the observed negative correlation between educational attainment and fertility. For males, the results tend to be similar to those of the females, only weaker. There is one striking exception. High EDU, similar to having a high POLY_{EDU}, delays reproduction. However, high EDU, unlike high POLY_{EDU}, does not lead to having fewer children for males (27). Indeed, in the joint analysis, the estimated effect of POLY_{EDU} is 0.061 fewer children (*P* = 2.5 × 10^{−7}), whereas the estimated effect per year of EDU is 0.011 children more. This again highlights that the effect of POLY_{EDU} on reproduction is not simply manifested through educational attainment.

For 129,808 genotyped individuals born between 1910 and 1990 POLY_{EDU} shows a notable and highly significant decline with yob (−0.0182 SU per decade, *P* = 5.8 × 10^{−35}). Average polygenic scores calculated for 10-year bins are displayed in Fig. 2. The relationship between POLY_{EDU} and yob exhibits nonlinear behavior (i.e., the downward slope seems to be steeper in the earlier years). When a quadratic fit was performed (blue line), the quadratic term of yob is significant (*P* = 1.7 × 10^{−3}). A closer examination suggests that the nonlinear behavior mainly reflects a survival effect rather than a birth cohort effect. The samples studied here were collected between 1998 and 2014, with a majority (68%) ascertained before 2006. For 85,520 of the latter, survival data at 2016 are available. The death rate overall is 19.4% (16,610/85,520) and is 54.5% (13,954/25,610) for those with yob before 1940, compared with 4.4% (2,656/59,910) for those with yob ≥ 1940. After adjustment for sex, yob, and age at ascertainment, each SU of POLY_{EDU} is estimated to increase the odds of survival by a factor of 1.083 (*P* = 2.5 × 10^{−11}). The positive effect of POLY_{EDU} on survival is not surprising because it is significantly associated with many other behavioral and health-related traits in Iceland. For example, POLY_{EDU} is positively correlated with high-density lipoprotein levels, and negatively correlated with triglyceride levels, body mass index, glucose fasting levels, and amount of smoking (*P* < 1 × 10^{−30} for each of these five quantitative traits; Table S3). Because POLY_{EDU} has a substantial impact on lifespan, when the samples were ascertained, there would be a positive ascertainment bias, particularly with those born before 1940, for those with high polygenic scores due to the greater likelihood to be alive at the time of ascertainment than those with low polygenic scores. This survival effect has a real impact on the difference in POLY_{EDU} between the young and the old in the population at any given time. However, for the purpose of estimating the change of the average polygenic score over time with respect to birth cohorts, this can be a source of bias. This bias is expected to be small for individuals with yob ≥1940. Using the latter, the estimated rate of decline of the average polygenic score is −0.0122 SU per decade (*P* = 2.4 × 10^{−7}, SE = 0.0024) (red line in Fig. 2). For comparison, we computed two other polygenic scores based on meta-analyses for height and schizophrenia. The polygenic score for height is not significantly associated with yob (*P* ≥ 0.5). The polygenic score for schizophrenia is estimated to decline at a rate of −0.0078 SU per decade (*P* = 1.1 × 10^{−3}, SE = 0.0024) for individuals with yob ≥1940.

An alternative to estimating the rate of decline of POLY_{EDU} is to perform calculations based on the information about reproductive history. If generations were discrete, then the contribution from each parent type (mother/father) to the change of the average polygenic score for the next generation is (eff/2)/(ANC), where eff is the effect of POLY_{EDU} on number of children and ANC is the average number of children. For the females in Table 1, eff = −0.084 and ANC is 2.84, and the estimated contribution to the change per generation is (−0.084/2)/2.84 = −0.015 SU. Given that the average AACB for these females is 27.5 years, this translates to −0.015/27.5 = −0.00054 SU per year, or −0.0054 SU per decade. For the males in Table 1, eff = −0.054, ANC = 2.73, and average AACB = 30.0, translating to an effect of −0.0033 SU per decade. Combining the contributions from females and males gives a change of −0.0087 per decade. This estimate, however, does not take into account that individuals with high POLY_{EDU} tend to have their children later (Table 1), leading to a slower contribution to the generations that follow. After applying equations derived for incorporating the generation time effect (28, 29) (*Materials and Methods*), the female and male contribution is estimated, respectively, to be −0.0065 and −0.0039 SU per decade, with the sum equal to −0.0104 SU per decade. This estimate is smaller in magnitude than the −0.0122 SU per decade estimate based on the observed decline. However, because the difference is within 1 SE, the two estimates can be considered as consistent.

Although there are challenges to getting a precise estimate of the rate of change of the average POLY_{EDU} value due to nonsampling errors that could be difficult to gauge, with the analyses taken together we consider −0.010 SU per decade to be a reasonable estimate for the period from 1910 and 1990 that is more likely to underestimate than overestimate the true decline. Most importantly, POLY_{EDU} is just a fraction of the full genetic component of educational attainment, which we denote by POLY_{FULL}. It is the rate of change of POLY_{FULL} that is of ultimate interest. Under an assumption that the part of POLY_{FULL} that is not captured by POLY_{EDU} behaves in a similar fashion in its impact on reproduction, the rate of change is proportional to the square root of the variance explained (*SI Text*). Thus, if POLY_{FULL} is assumed to account for 30% of the variance of EDU, then its estimated rate of change, by extrapolation, is −0.010 × (30/3.74)^{1/2} = −0.028 SUs per decade. To test the validity of this method of extrapolation we computed a separate polygenic score for educational attainment, denoted by POLY_{-U.K.B}, which was based on the same GWAS results used to construct POLY_{EDU}, except that the contribution from 111,349 UK Biobank samples was removed (*Materials and Methods*). When we applied POLY_{-U.K.B} to the Icelandic data, it explained 2.52% of the variance of EDU, and the rate of decline estimated based on its effects on reproduction is −0.0085 SU per decade (*Materials and Methods*). Hence, with the polygenic score strengthening from POLY_{-U.K.B} to POLY_{EDU}, the estimated rate of decline increased by a factor of (0.0104/0.0085) = 1.22, nearly identical to (3.74/2.52)^{1/2} = 1.22, the square root of the variance explained ratio.

Here we explore the implications of the observed trends on the distributions of cognitive traits in the population. Based on a sample of 1,577 genotyped Icelanders (653 males and 924 females; yob, mean = 1968 and SD = 13 years) with intelligence quotient (IQ) measurements (mean = 102 and SD = 15), each SU of POLY_{EDU} is estimated to increase IQ by 3.8 points (*P* < 10^{−20}). Given that POLY_{EDU} is estimated to decline at a rate of 0.01 SU per decade, this translates to a decline of 0.038 IQ points per decade. However, under the assumptions that POLY_{FULL} accounts for 30% of the variance of EDU, and the part of POLY_{FULL} that is not captured by POLY_{EDU} behaves in a similar fashion in its impact on both reproduction and IQ, by extrapolation, the decline of POLY_{FULL} would lead to a decline of 0.038 × (30/3.74) = 0.30 IQ points per decade. This would be a very substantial effect if the trend persists for centuries. By contrast, a meta-analysis estimated that IQ scores have increased by 13.8 points between 1932 and 1978, a rate of 3.0 points per decade (30), a phenomenon referred to as the Flynn effect. This rate is 10 times the estimated effect due to the decline of the genetic component, and, more importantly, in the opposite direction. Many commentators [including Flynn himself (31)] consider the Flynn effect to be due to changes in the socioeconomic and technological environment faced by successive generations of humans. Unfortunately, we are unable to assess the Flynn effect in our IQ data, because they were measured within a narrow time interval. Assuming that a similar magnitude of the Flynn effect is found in the Icelandic population, then it is clear that such environmentally induced increases of IQ scores more than compensate for, and indeed mask, any potential decline in the genetic propensity for IQ.

## Discussion

From the results presented here it is clear that there has been a slow but steady decline in the frequency of certain variants in the Icelandic gene pool that are associated with educational attainment. It is also clear that education attained does not explain all of the effect. Hence, it seems that the effect is caused by a certain capacity to acquire education that is not always realized. We postulate that, in addition to being correlated with cognitive ability (32, 33), POLY_{EDU} is capturing a portion of the propensity to long-term planning and delayed gratification. To address the question of whether and how these results could be extended to other populations and other time periods it should first be emphasized that the negative selection observed here is likely an example of gene–environment interaction, that is, both the direction of the effect and its magnitude could and would change given a different socioeconomic environment (5, 34, 35). It is likely that in any population where educational attainment is negatively correlated with fertility the underlying genetic propensity would be in decline, but the actual magnitude and characteristics of the decline could vary substantially. Based purely on epidemiological/demographical data, there were concerns about this sort of decline in Great Britain more than eight decades ago (10). However, the possibility that such a phenomenon could be temporary or transitional was also raised (10, 29). Indeed, there might be a cyclical element to this phenomenon, because it is only reasonable to assume that alleles associated with greater educational attainment must have been under positive selection at some time during the evolutionary history of *Homo sapiens*. The main message here is that the human race is genetically far from being stagnant with respect to one of its most important traits. It is remarkable to report changes in POLY_{EDU} that are measurable across the several decades covered by this study. In evolutionary time, this is a blink of an eye. However, if this trend persists over many centuries, the impact could be profound.

## Materials and Methods

### Genealogical Database.

For nearly 20 years a genealogical database of Iceland has been used for genetics studies performed by deCODE genetics (15⇓–17). This database is constantly updated. Currently, the deCODE Genetics genealogical database contains essentially all of ∼317,000 living Icelanders (some recent immigrants may not be included in this tabulation) and the vast majority of their ancestors go back to about 1650 and a smaller portion of ancestors before that time. In total, just over 840,000 individuals are presently recorded in the genealogical database, with the earliest recorded yob 740 AD. The database contains information about the yob and sex of each individual, and when available the year of death, the identities of the father and mother, and geographical locations, such as places of birth, residence, and death. The database was constructed from a number of different sources, the most important of which were 14 national censuses spanning the period from 1703 to 1930, parish records from 1780, and the national registry from 1994. Additional key sources include annals, genealogical publications, biographical lists of members of professional associations, and other official records. The database is particularly complete for the probands used in this study, who were all born after 1910. For the vast majority of these individuals, both parents and grandparents are recorded, and all children that survived the first weeks of life.

### Sample Collection.

All samples and questionnaire data were collected through studies approved by the National Bioethics Committee and the Icelandic Data Protection Authority. All participants signed informed consent before blood samples were drawn and all data were analyzed under pseudonyms assigned by a third-party encryption system overseen by the Icelandic Data Protection Authority (36).

### Meta-Analysis and Polygenic Scores.

In a recent meta-analysis on educational attainment (3) the initial total sample size was 293,724, which included 76,155 samples from 23andMe, and 49,970 Icelandic samples [46,758 from deCODE and 3,212 from Age, Gene/Environment Susceptibility (AGES Reykjavik) Study]. Excluding the Icelandic samples and 23andMe, the remaining sample size was 167,599. When the manuscript was revised for final publication, an additional 111,349 UK Biobank samples were added as replication (full genome association results also available). It is important to note that the meta-analysis produces trait association results for each marker separately (i.e., joint analyses are not performed). When deriving the weights for computing POLY_{EDU} (see below for the method used), for the current study, GWAS results from 23andMe and Iceland were excluded. The 23andMe results were excluded because their policy forbids the release of full GWAS results. The Icelandic results were excluded to avoid confounding/bias and/or overfitting. Thus, the weights for computing POLY_{EDU} were derived based on results from 167,599 + 111,349 = 278,948 samples. Similarly, the weights for POLY_{-U.K.B} were based on 167,599 samples. For the 120 genomewide significant markers, the estimated effects on educational attainment (used in Figs. S4 and S5) did incorporate the 23andMe data and were based on 278,948 + 76,155 = 355,103 samples.

### Markers and Methods Used to Compute the Polygenic Score.

The basic method used to process the genotype data for Icelanders, including imputations based on full-genome sequencing results, was described in ref. 20. A framework set of ∼620,000 high-quality SNPs covering the whole genome was used to compute POLY_{EDU} and POLY_{-U.K.B}. Note that a polygenic score is constructed as a linear combination of the genotypes of the markers. In determining the weights used for the linear combination the goal is to maximize the correlation between the resulting score and the trait. This is not a trivial problem in part because, as noted above, the meta-analysis only gives association results for each marker separately, and the markers are in general correlated (i.e., in linkage disequilibrium). We adjusted for linkage disequilibrium using LDpred (19), a recently proposed method. The linkage disequilibrium between markers was estimated using the Icelandic samples. We have explored different ways of constructing the polygenic score (e.g., using a larger set of markers and different ways for adjusting linkage disequilibrium). We found the method used to give close to the best-performing score we could achieve. Most importantly, the main results in this paper are robust to the specific method (as long as it is a reasonable one) used to construct the polygenic score.

### Educational Attainment.

As noted above, the deCODE data on educational attainment were part of the published meta-analysis (3). The original Icelandic data were collected through various questionnaires including questions on educational attainment of adults (we used responses from adults 30 years or older assuming maximum educational attainment had been achieved by this age). Responses were then mapped to the International Standard Classification of Education (ISCED) 1997 classification (UNESCO: www.unesco.org/education/information/nfsunesco/doc/isced_1997.htm) format that was also used for the meta-analysis as described in detail in Okbay et al. (3) and briefly also reviewed below. The ISCED 1997 classification includes seven categories of educational attainment that are internationally comparable. The categories are translated into US years-of-schooling equivalents, which have a quantitative interpretation as follows:

0. Preprimary education: 1 year

1. Primary education or first stage of basic education: 7 years

2. Lower secondary or second stage of basic education: 10 years

3. (Upper) secondary education: 13 years

4. Postsecondary nontertiary education: 15 years

5. First stage of tertiary education (not leading directly to an advanced research qualification): 19 years

6. Second stage of tertiary education (leading to an advanced research qualification, e.g., a Ph.D.): 22 years.

In our data, questionnaire responses could be categorized according to the major educational levels in Iceland and were mapped to ISCED 1997 levels according to the mapping schema for Iceland maintained by UNESCO (uis.unesco.org/en/isced-mappings) and accordingly to comparable years of educational attainment in the United States as demonstrated below:

2. Compulsory basic education (10 grades): 10 years

3. (Upper) secondary education or vocational programs: 13 years

4. Postsecondary nontertiary education: 15 years

5–6. Advanced education representing A-levels and/or any university degree: 20 years.

### IQ Data.

IQ measurements from population controls were collected in years 2009–2016. Intelligence was measured using the Icelandic version of the Wechsler Abbreviated Scale of Intelligence (WASI^{IS}) (37, 38).

### Genomic Control.

Results in this paper are mainly based on regression analyses. The standard output of regressions assumes that the data points are statistically independent. However, because the individuals are genetically related and the trait values of individuals who are genetically closely related tend to be correlated, taking the standard output at face value would tend to produce anticonservative results (i.e., the test statistics tend to have a variance, under the null hypothesis of no effect, that is higher than assumed). Adjusting for 20 principal components reduces, but does not eliminate, this effect. Genomic control is a method that uses the observed results of a large number of SNPs in the genome (1.1 million are used here), most of them expected to have no effect, to evaluate and adjust for the overdispersion of the test statistics. The first paper to describe such an approach is by Devlin and Roeder (39), but the method described there could be somewhat conservative, particularly when many variants in the genome do actually contribute to the trait. The method used here, based on LD score regression (22), is more recent and adjusts for the conservativeness of the original method. Because genomic control is a form of variance adjustment, theoretically it should apply to a polygenic score in the same way as a single marker. This has been confirmed by simulations. For example, applying this method, the t-statistic for the correlation between POLY_{EDU} and AGFC is divided by 1.13 and 1.14 for males and females, respectively. Genomic control was also applied to the correlation between POLY_{EDU} and yob, where the null hypothesis corresponds to a scenario that changes of marker frequencies over time, if any, are a result of random genetic drift. Here, however, no adjustment was found to be necessary; for the analyses restricted to individuals with yob ≥1940, there is actually some indication that the unadjusted results could be slightly conservative. This is probably because whereas values for traits such as EDU tend to be positively correlated between close relatives that is not necessarily the case for yob. We also note that *P* values given are two-sided unless explicitly stated otherwise.

### Determining the Rate of Change of the Polygenic Score As a Result on Its Impact on Fertility Traits.

To derive the (approximate) relationship between the effects of a polygenic score *X* on the fertility traits and the change of the average polygenic score over time we assume that the effects are linear and small per generation. Specifically, with *X* standardized to have mean 0 and variance 1, we assume

and

The main mathematical result we are going to show is that, under these assumptions, to the first order, the rate of change of the mean of *X* per year is

(We note that Eq. **1** might have been explicitly derived in some other publications, although we are not currently aware of it.) In situations where the males and females behave differently, that is, have different values for *a*, *b*, *c*, and *d*, we have *β*_{M} for males and *β*_{F} for females, so that (*β*_{M} /2) + (*β*_{F} /2) would be the estimate of the rate of change. Note that the first term in Eq. **1**, *b/ac*, is capturing the contribution of the effect of *X* on NC to the rate of change, whereas the second term, −*d*log(*a*/2)*c*^{−2}, is capturing the contribution of the effect of *X* on AACB.

Before showing how to derive the general form (Eq. **1**), we think it is helpful to see how the result can be shown for the special case with *d* = 0. Here, to the first order, we can assume that mating is performed in discrete generations with generation time *c*. Let *X* be the (random) polygenic score for a female in generation *t*, and scaled to have mean 0 and variance 1. Let *Y* denote, for a random person in generation *t* + 1, what is inherited from the mother. It follows that

where *w* = *a* + *bX*. The factor (1/2) results from the fact that only one-half of the genetic material is passed on to the offspring. *E*(*wX*)*/E*(*w*) corresponds to a weighted average of *X* with weights proportional to *w*. [The absolute weight is *wt* = *w/E*(*w*) with expectation 1.] It follows from *E*(*X*) = 0 and *var*(*X*) = 1 that *E*(*wX*) = *b* and *E*(*w*) = *a*. Thus, *E*(*Y*) = (1/2) × (*b/a*). Taking into account that generation time is *c*, the contribution of the females to the change of the mean polygenic score per year is (1/2) × (*b/ac*). The same calculations apply to the fathers.

Deriving the general form (Eq. **1**) where the polygenic score also has an effect on generation time (AACB) is more complicated. To do that, we start with equation 6.5 in section 6.3 of ref. 29:

where *r* is the intrinsic rate of change, *R*_{0} is the net reproductive rate, and *T* is the mean generation time. Because only one-half of the genetic material is transmitted from a parent to an offspring, we should think of *R*_{0} as the number of children divided by two. For females, based on the estimated effects of the polygenic score *X* on number of children and AACB, and assuming linearity, we have

The derivative is

Evaluating at

From equation 6.9 of ref. 29, the relative fitness between two genotypes is

where *r*_{1} and *r*_{2} are the two intrinsic rates of increase and *c* here. When the relative difference in fitness is small, the relative fitness of *X* = *x* and *X* = 0 is

Notice that *wt* is already scaled to have expectation one (approximately). Thus, the weighted average of *X*, with the weight proportional to fitness, is

Because this is the approximate rate of change per generation, the rate of change per year is

giving us Eq. **1**. Here we have shown how to derive Eq. **1** from equations in ref. 29. We note expression Eq. **1** can also be derived using equations from ref. 28.

With POLY_{EDU}, for females *a* = 2.84, *b* = −0.084, *c* = 27.5, and *d* = 0.46, and for males *a* = 2.73, *b* = −0.054, *c* = 30.0, and *d* = 0.37. Applying these values to the equation, we get

and

For POLY_{-U.K.B}, for females *b* = −0.069 and *d* = 0.39, and for males *b* = −0.043, and *d* = 0.31. Similar calculations estimate the expected change to be −0.00085 SU per year.

## SI Text

Here we explore the relationship between the rate of decline of POLY_{EDU} and that of POLY_{FULL}, assuming each is standardized to have mean zero and variance one. Decompose POLY_{FULL} as_{EDU}, POLY_{FULL}) × POLY_{EDU} and B = POLY_{FULL} – A. Notice that A, which is proportional to POLY_{EDU}, and B are uncorrelated because A corresponds to the fitted values when POLY_{FULL} is regressed on the POLY_{EDU}, and B corresponds to the residuals. Let cor(A,EDU) = **S1** to hold will be discussed below after we demonstrate its consequences.) The fraction of the variance of EDU explained by POLY_{EDU} and POLY_{FULL} is _{EDU} and POLY_{FULL} is _{EDU} and POLY_{FULL} is **S1** holds, then**1**, the rate of decline, *b* and *d* are the effects of the polygenic score under study. Because variance explained is proportional to effect squared, or equivalently effect is proportional to the square root of variance explained, Eq. **S2** implies*b* and *d*, this implies**S1**, we are assuming that the relative effect sizes of B (the part of POLY_{FULL} that is not captured by POLY_{EDU}) on EDU and the fertility traits NC/AACB are the same as the relative effect sizes of POLY_{EDU}. Although this assumption might not be satisfied exactly, it is likely to be approximately true. Statistically, POLY_{EDU} is capturing the contributions of many variants over the genome, with noise, and hence things could average out approximately. The comparison we made between POLY_{EDU} and POLY_{-U.K.B} supports this belief.

## Acknowledgments

We thank David Cesarini, Philipp Koellinger, and the Social Science Genetic Association Consortium for allowing us early access to genome-wide association study (GWAS) results.

## Footnotes

- ↵
^{1}To whom correspondence may be addressed. Email: kong{at}decode.is or kari.stefansson{at}decode.is.

Author contributions: A.K. and K.S. designed research; A.K., H.S., A.I.Y., G.A.J., A.O., P.S., G.M., D.F.G., A.H., G.B., and U.T. performed research; A.K., M.L.F., G.T., and F.Z. analyzed data; A.K. derived the mathematical results in

*Materials and Methods*; M.L.F. prepared the figures and tables for publication; H.S. provided IQ data and references; G.A.J. processed the IQ data to a form suitable for analyses; A.I.Y. assisted in deriving the mathematical results in*Materials and Methods*; A.O. provided meta-analysis results with various cohorts removed; P.S., G.M., and D.F.G. contributed to processing the Icelandic genotype data for analysis; A.H. provided key references and contributed to writing the*Discussion*; G.B. collected and processed Icelandic education data and provided key references; U.T. oversaw the generation of the genotype data in the laboratory; A.K. wrote the paper; and K.S. contributed to the writing of the final version of the paper.The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1612113114/-/DCSupplemental.

Freely available online through the PNAS open access option.

## References

- ↵.
- Branigan AR,
- McCallum KJ,
- Freese J

- ↵.
- Rietveld CA, et al., LifeLines Cohort Study

- ↵
- ↵
- ↵
- ↵
- ↵.
- D’Addio AC,
- d’Ercole MM

- ↵.
- Beauchamp JP

- ↵.
- Courtiol A,
- Tropf FC,
- Mills MC

- ↵.
- Fisher RA

- ↵
- ↵
- ↵
- ↵.
- Day FR, et al.

- ↵.
- Arngrímsson R, et al.

- ↵
- ↵.
- Helgason A,
- Pálsson S,
- Gudbjartsson DF,
- Kristjánsson T,
- Stefánsson K

- ↵
- ↵.
- Vilhjálmsson BJ, et al., Schizophrenia Working Group of the Psychiatric Genomics Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study

- ↵.
- Gudbjartsson DF, et al.

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Okbay A, et al.

- ↵
- ↵.
- Charlesworth B

- ↵.
- Cavalli-Sforza LL,
- Bodmer WF

- ↵
- ↵.
- Flynn J

- ↵.
- Rietveld CA, et al.

- ↵
- ↵
- ↵.
- Hazan MZH

- ↵
- ↵.
- Wechsler D

- ↵.
- Gudmundsson E

*Mat á greind fullorðinna: WASI*(Menntamalastofnun, Reykjavik, Iceland) [The Assessment of Intelligence in Adults: WASI^{IS}^{IS}]. - ↵

## Citation Manager Formats

## Article Classifications

- Biological Sciences
- Evolution

- Social Sciences
- Social Sciences