Selection against variants in the genome associated with educational attainment

Edited by Andrew G. Clark, Cornell University, Ithaca, NY, and approved December 5, 2016 (received for review July 22, 2016)
January 17, 2017
114 (5) E727-E732


Epidemiological studies suggest that educational attainment is affected by genetic variants. Results from recent genetic studies allow us to construct a score from a person’s genotypes that captures a portion of this genetic component. Using data from Iceland that include a substantial fraction of the population we show that individuals with high scores tend to have fewer children, mainly because they have children later in life. Consequently, the average score has been decreasing over time in the population. The rate of decrease is small per generation but marked on an evolutionary timescale. Another important observation is that the association between the score and fertility remains highly significant after adjusting for the educational attainment of the individuals.


Epidemiological and genetic association studies show that genetics play an important role in the attainment of education. Here, we investigate the effect of this genetic component on the reproductive history of 109,120 Icelanders and the consequent impact on the gene pool over time. We show that an educational attainment polygenic score, POLYEDU, constructed from results of a recent study is associated with delayed reproduction (P < 10−100) and fewer children overall. The effect is stronger for women and remains highly significant after adjusting for educational attainment. Based on 129,808 Icelanders born between 1910 and 1990, we find that the average POLYEDU has been declining at a rate of ∼0.010 standard units per decade, which is substantial on an evolutionary timescale. Most importantly, because POLYEDU only captures a fraction of the overall underlying genetic component the latter could be declining at a rate that is two to three times faster.
Epidemiological studies have estimated that the genetic component of educational attainment can account for as much as 40% of the trait variance (1). Recent meta-analyses (2, 3) yielded sequence variants contributing to the underlying genetic component. A negative correlation between educational attainment and number of children has been observed in many populations (47). A recent study of ∼20,000 genotyped Americans born between 1931 and 1953 provided direct evidence that the genetic propensity for educational attainment is associated with reduced fertility (8, 9), supporting previously postulated notions (10) that the population average of the genetic propensity for educational attainment and related traits must be declining. Here, using a population-wide sample that is both much larger and covers a substantially greater time span, and with additional auxiliary information, we aim to estimate the change of the genetic propensity of educational attainment in the Icelandic population over the last few decades, starting with an in-depth investigation of the relationship between a measurable genetic component of educational attainment and various aspects of reproduction (1114).


The number of living Icelanders is ∼317,000 (Fig. S1). A genealogical database of Icelanders (1517) that is very close to complete for individuals born after 1910 (Materials and Methods) is used in this study. Probands used for the genetic analyses here are limited to those with both parents and all four grandparents listed in the genealogy. For the fertility studies, only children who survived their first year are counted. The first step was to use results from a recent genome-wide association study (GWAS) of educational attainment (3) to determine the per-locus allele-specific weightings of 620,000 markers used to calculate a polygenic score (18, 19), POLYEDU (Materials and Methods for details on polygenic score construction). After excluding the Icelandic cohorts in the GWAS to avoid confounding, 278,948 samples from 62 cohorts were used to determine the weightings for POLYEDU. We computed POLYEDU for over 150,000 Icelanders who were directly genotyped with chip arrays and imputed for additional sequence variants discovered through whole-genome sequencing of 8,453 Icelanders (20) (Materials and Methods). POLYEDU was scaled to an SD of 1, hereafter referred to as standard units (SUs). When applied to 46,079 Icelanders with educational attainment data POLYEDU was found to explain 3.74% of the trait variance (P < 10−300). By contrast, the strongest single variant only explains 0.10% of the variance, indicating that educational attainment is a complex trait influenced by many variants in the genome and highlighting the increased power of using the polygenic score for our analyses. Our first analysis focused on 109,120 individuals (58,560 females and 50,560 males) with year of birth (yob) between 1910 and 1975 (Fig. S2). The genealogical database was used to obtain the number of children (NC) and, where applicable, the age at first child (AGFC) and the average age at child birth (AACB) for this set. The estimated effects of POLYEDU on these reproductive traits, adjusted for yob and 20 principal components (21), are presented in Table 1 for females and males separately. For females, an increase of 1 SU of POLYEDU corresponds to an average decrease of 0.084 children [P = 1.0 × 10−43, calculated with genomic control adjustment (22)], and for those with children AGFC and AACB increased by 0.59 years (P = 5.3 × 10−155) and 0.46 years (P = 1.0 × 10−117), respectively. A similar, albeit weaker, pattern of results was observed for males. The finding of a substantially stronger association for AGFC than NC suggests that the effect of POLYEDU on NC is mainly manifested through delayed reproduction. Thus, for females with children, the association between AGFC and POLYEDU remains highly significant (P = 2.9 × 10−118) after adjusting for NC, whereas the association between NC and POLYEDU is not significant (P = 0.17) after adjusting for AGFC. This led us to examine the effect of POLYEDU on NC[x], the number of children a proband had at or after age x, as a function of x. The results are presented in Fig. 1. At x = 14, the estimated effect on NC[x] per SU of POLYEDU, denoted by eff[x], is −0.084 for females and −0.054 for males. These correspond to results in Table 1 because none of the probands here had children before 14 years of age. As x increases, the estimated effect becomes less negative and is essentially zero at 22 for females and 23 for males. In other words, if children born to mothers at 21 years of age or younger (18% of all children counted here, Fig. S3) and children born to males at 22 or younger (13% of all children counted here) are ignored, there is no correlation between NC and POLYEDU. As x increases further, eff[x] becomes positive and continues to increase until x = 30 for females and starts to drop slowly to zero after that. Note that the difference eff[x] − eff[x + 1] corresponds to the estimated effect of POLYEDU on children born to the proband at precisely age x. Thus, for age x > 30, females with higher POLYEDU tend to have more children than those with lower POLYEDU, whereas the reverse is true for x < 30. Having more children after 30 (P < 1 × 10−15) compensates for having fewer children between 22 and 30 years of age but does not compensate for the reduced number of children at age 21 years and younger. Similar results apply to the males with the age boundaries shifting 1 to 2 years upward. The negative effect of POLYEDU on NC is less for males than for females, and the difference is mainly accounted for by children born to them at 19 years or younger. The analyses performed using POLYEDU maximize statistical power, but the effects on fertility traits can also be seen with individual variants. Results for 120 SNPs that are genome-wide significant (P < 5 × 10−8) in the meta-analysis for educational attainment excluding Icelandic data (Materials and Methods) are given in Table S1 and Figs. S4 and S5. For example, 35 of the 120 SNPs have associations with AGFC of females that are in the same direction and nominally significant (one-sided P < 0.05). The minor allele of one of these SNPs, rs192818565, is associated with reduced education. It is known to tag the H2 haplotype of a common inversion on chromosome 17 that was shown to exhibit characteristics consistent with having been positive-selected (23). It has subsequently been shown that H2 is also associated with reduced intracranial volume (24, 25) and neuroticism (26). Combining our male and female data, the minor allele of rs192818565 is significantly associated with more children (P = 5.2 × 10−3) and having children earlier (P = 2.2 × 10−3). This is thus a striking case where a variant associated with a phenotype typically regarded as unfavorable could nonetheless be also associated with increased “fitness” in the evolutionary sense.
Table 1.
Estimated effects of POLYEDU on fertility traits
NC58,560−0.0841.0 × 10−4350,560−0.0542.2 × 10−15
AGFC55,2080.595.3 × 10−15545,6690.446.2 × 10−57
AACB55,2080.461.0 × 10−11745,6690.376.5 × 10−50
POLYEDU is in standard units (SU). NC denotes number of children, AGFC denotes age at first child, and AACB denotes average age at child birth.
Fig. 1.
Effect of POLYEDU on number of children with lower bound for age. Blue, males; red, females; error bars indicate plus/minus 1 SE. Estimated effect calculated by only counting children born to the proband at or after a certain age (the x axis).
Fig. S1.
Number of living Icelanders by year.
Fig. S2.
Total number of Icelanders and number in our fertility study by birth years.
Fig. S3.
Distribution of age of child birth. For our fertility study, this shows the percentage of children born to the parent at a specific age of the (A) father and (B) mother.
Table S1.
Associations between 120 genome-wide significant markers and three reproductive traits
Zmetaedu = z-score from the educational attainment meta-analysis. The other z-scores correspond to associations between each of the variants and the three reproductive traits. Chr, chromosome; Pos, position.
Fig. S4.
Associations between 120 genome-wide significant SNPs and three reproductive traits for females. x axis: Zmetaedu = z-score from the educational attainment meta-analysis. y axes: z-scores of associations between each of the variants and the three reproductive traits. ±1.645 correspond to one-sided P = 0.05.
Fig. S5.
Associations between 120 genome-wide significant SNPs and three reproductive traits for males. Labels as in Fig. S4.
Among the genotyped individuals with yob between 1910 and 1975, information about educational attainment is available for 25,794 females and 19,903 males. For these individuals, the effects of POLYEDU and educational attainment (EDU) itself on the reproductive traits were estimated individually, through separate regressions, and jointly, through regressions including both as predictors (Table 2). We coded EDU as in a recent meta-analysis (3). Individuals fall into four categories: 10, 13, 15, and 20 years (mean = 14.0 and SD = 3.4 for males and mean = 13.4 and SD = 3.7 for females). The first category corresponds to the mandatory minimum education in Iceland and the last corresponds to a college degree. For females, when analyzed separately, each SU increase of POLYEDU decreases expected NC by 0.097 (P = 1.7 × 10−23), whereas each year increase in EDU corresponds to a reduction of 0.045 (P = 5.0 × 10−56). When analyzed jointly, the estimated effect of POLYEDU on NC adjusted for EDU reduces to −0.071, a shrinkage that is meaningful but not drastic, and remains highly significant (P = 7.2 × 10−13). Similar results were observed for AGFC and AACB. Clearly, EDU here is not a complete measure of educational attainment (e.g., it does not include information on postcollege education). With a more comprehensive measure of educational attainment, the estimated effects for POLYEDU upon adjustment might shrink further, but the changes are unlikely to be drastic. For example, limiting to females with 10 years of education (n = 11,055), the estimated effect of POLYEDU on NC is −0.079 (P = 5.8 × 10−6) (Table S2). These results indicate that POLYEDU has a direct effect on reproduction that is independent of the amount of education that is actually attained. Crucially, these results indicate that the magnitude of selection acting on the underlying genetic component of educational attainment has to be estimated directly using genotype data and could be severely underestimated if one attempts to deduce it based solely on the observed negative correlation between educational attainment and fertility. For males, the results tend to be similar to those of the females, only weaker. There is one striking exception. High EDU, similar to having a high POLYEDU, delays reproduction. However, high EDU, unlike high POLYEDU, does not lead to having fewer children for males (27). Indeed, in the joint analysis, the estimated effect of POLYEDU is 0.061 fewer children (P = 2.5 × 10−7), whereas the estimated effect per year of EDU is 0.011 children more. This again highlights that the effect of POLYEDU on reproduction is not simply manifested through educational attainment.
Table 2.
Estimated effects of POLYEDU and EDU on fertility traits
   Individual analysesJoint analyses
FemaleNC25,794−0.0971.7 ×10−23−0.0455.0 ×10−56−0.0717.2 ×10−13−0.0412.2 ×10−45
AGFC24,1910.591.3 ×10−700.351.6 ×10−2780.391.2 ×10−310.331.3 ×10−239
AACB24,1910.426.4 ×10−760.281.3 ×10−2220.264.6 ×10−180.267.5 ×10−195
MaleNC19,903−0.0533.5 ×10−60.00630.07−0.0612.5 ×10−70.0112.6 ×10−3
AGFC17,9960.433.0 ×10−220.181.3 ×10−430.313.2 ×10−120.164.2 ×10−33
AACB17,9960.361.4 ×10−190.178.1 ×10−470.255.2 ×10−100.161.4 ×10−36
Individual analyses refer to results when POLYEDU and EDU are associated with the traits in separate regressions. Joint analyses are when POLYEDU and EDU are associated jointly with each trait in one regression. POLYEDU is in standard units (SU). Units for EDU are number of years. EDU is coded into four categories, 10, 13, 15, and 20 years. Distributions of EDU for males and females separately and information and how they change over time are given in Fig. S6. To interpret the effect, note that the difference between the highest category of EDU (20 years) and the lowest (10 years) is 10 years. NC denotes number of children, AGFC denotes age at first child, and AACB denotes average age at child birth.
Fig. S6.
Distributions of educational attainment for males and females. The first panel includes all samples studied. The second and third panels show, for males and females, respectively, how distributions of educational attainment change over time.
Table S2.
Associations between POLYEDU and three reproductive traits stratified by four EDU categories
EDU = 10 years      
 NC11,055−0.0795.8 × 10−64,894−0.0965.9 × 10−4
 AGFC10,3880.3537.0 × 10−144,0860.3791.5 × 10−4
 AACB10,3880.2291.2 × 10−74,0860.2802.5 × 10−3
EDU = 13 years   
 NC5,599−0.0669.5 × 10−47,633−0.0691.3 × 10−4
 AGFC5,3170.2921.9 × 10−57,1350.3625.5 × 10−8
 AACB5,3170.1823.3 × 10−37,1350.2534.0 × 10−5
EDU = 15 years   
 NC4,181−0.0763.1 × 10−43,475−0.0200.47
 AGFC3,8960.3582.8 × 10−53,1580.3242.6 × 10−3
 AACB3,8960.2501.0 × 10−33,1580.3453.9 × 10−4
EDU = 20 years   
 NC4,959−0.0522.7 × 10−33,901−0.0210.35
 AGFC4,5900.5926.3 × 10−123,6170.0420.68
 AACB4,5900.3986.8 × 10−83,6170.0690.44
For 129,808 genotyped individuals born between 1910 and 1990 POLYEDU shows a notable and highly significant decline with yob (−0.0182 SU per decade, P = 5.8 × 10−35). Average polygenic scores calculated for 10-year bins are displayed in Fig. 2. The relationship between POLYEDU and yob exhibits nonlinear behavior (i.e., the downward slope seems to be steeper in the earlier years). When a quadratic fit was performed (blue line), the quadratic term of yob is significant (P = 1.7 × 10−3). A closer examination suggests that the nonlinear behavior mainly reflects a survival effect rather than a birth cohort effect. The samples studied here were collected between 1998 and 2014, with a majority (68%) ascertained before 2006. For 85,520 of the latter, survival data at 2016 are available. The death rate overall is 19.4% (16,610/85,520) and is 54.5% (13,954/25,610) for those with yob before 1940, compared with 4.4% (2,656/59,910) for those with yob ≥ 1940. After adjustment for sex, yob, and age at ascertainment, each SU of POLYEDU is estimated to increase the odds of survival by a factor of 1.083 (P = 2.5 × 10−11). The positive effect of POLYEDU on survival is not surprising because it is significantly associated with many other behavioral and health-related traits in Iceland. For example, POLYEDU is positively correlated with high-density lipoprotein levels, and negatively correlated with triglyceride levels, body mass index, glucose fasting levels, and amount of smoking (P < 1 × 10−30 for each of these five quantitative traits; Table S3). Because POLYEDU has a substantial impact on lifespan, when the samples were ascertained, there would be a positive ascertainment bias, particularly with those born before 1940, for those with high polygenic scores due to the greater likelihood to be alive at the time of ascertainment than those with low polygenic scores. This survival effect has a real impact on the difference in POLYEDU between the young and the old in the population at any given time. However, for the purpose of estimating the change of the average polygenic score over time with respect to birth cohorts, this can be a source of bias. This bias is expected to be small for individuals with yob ≥1940. Using the latter, the estimated rate of decline of the average polygenic score is −0.0122 SU per decade (P = 2.4 × 10−7, SE = 0.0024) (red line in Fig. 2). For comparison, we computed two other polygenic scores based on meta-analyses for height and schizophrenia. The polygenic score for height is not significantly associated with yob (P ≥ 0.5). The polygenic score for schizophrenia is estimated to decline at a rate of −0.0078 SU per decade (P = 1.1 × 10−3, SE = 0.0024) for individuals with yob ≥1940.
Fig. 2.
Average educational attainment polygenic score and year of birth (yob). Results for 10-year bins are presented. Error bars indicate plus/minus 1 SE. The blue line is a quadratic fit for the full yob range indicated. The red line is a linear fit applied to individuals with yob ≥1940.
Table S3.
Association between POLYEDU and five quantitative traits
Quantitative traitSample sizeEffectP value
High-density lipoprotein88,6610.0576.3 × 10−40
Triglycerides85,583−0.0564.8 × 10−43
Body mass index76,062−0.0512.9 × 10−35
Glucose fasting levels67,194−0.0511.9 × 10−31
Cigarettes per day (smokers)35,757−0.0541.6 × 10−22
General descriptions about these quantitative traits and their ascertainment can be found in previous publications (4043). The sample sizes here correspond to current data and hence may not match the publications perfectly.
An alternative to estimating the rate of decline of POLYEDU is to perform calculations based on the information about reproductive history. If generations were discrete, then the contribution from each parent type (mother/father) to the change of the average polygenic score for the next generation is (eff/2)/(ANC), where eff is the effect of POLYEDU on number of children and ANC is the average number of children. For the females in Table 1, eff = −0.084 and ANC is 2.84, and the estimated contribution to the change per generation is (−0.084/2)/2.84 = −0.015 SU. Given that the average AACB for these females is 27.5 years, this translates to −0.015/27.5 = −0.00054 SU per year, or −0.0054 SU per decade. For the males in Table 1, eff = −0.054, ANC = 2.73, and average AACB = 30.0, translating to an effect of −0.0033 SU per decade. Combining the contributions from females and males gives a change of −0.0087 per decade. This estimate, however, does not take into account that individuals with high POLYEDU tend to have their children later (Table 1), leading to a slower contribution to the generations that follow. After applying equations derived for incorporating the generation time effect (28, 29) (Materials and Methods), the female and male contribution is estimated, respectively, to be −0.0065 and −0.0039 SU per decade, with the sum equal to −0.0104 SU per decade. This estimate is smaller in magnitude than the −0.0122 SU per decade estimate based on the observed decline. However, because the difference is within 1 SE, the two estimates can be considered as consistent.
Although there are challenges to getting a precise estimate of the rate of change of the average POLYEDU value due to nonsampling errors that could be difficult to gauge, with the analyses taken together we consider −0.010 SU per decade to be a reasonable estimate for the period from 1910 and 1990 that is more likely to underestimate than overestimate the true decline. Most importantly, POLYEDU is just a fraction of the full genetic component of educational attainment, which we denote by POLYFULL. It is the rate of change of POLYFULL that is of ultimate interest. Under an assumption that the part of POLYFULL that is not captured by POLYEDU behaves in a similar fashion in its impact on reproduction, the rate of change is proportional to the square root of the variance explained (SI Text). Thus, if POLYFULL is assumed to account for 30% of the variance of EDU, then its estimated rate of change, by extrapolation, is −0.010 × (30/3.74)1/2 = −0.028 SUs per decade. To test the validity of this method of extrapolation we computed a separate polygenic score for educational attainment, denoted by POLY-U.K.B, which was based on the same GWAS results used to construct POLYEDU, except that the contribution from 111,349 UK Biobank samples was removed (Materials and Methods). When we applied POLY-U.K.B to the Icelandic data, it explained 2.52% of the variance of EDU, and the rate of decline estimated based on its effects on reproduction is −0.0085 SU per decade (Materials and Methods). Hence, with the polygenic score strengthening from POLY-U.K.B to POLYEDU, the estimated rate of decline increased by a factor of (0.0104/0.0085) = 1.22, nearly identical to (3.74/2.52)1/2 = 1.22, the square root of the variance explained ratio.
Here we explore the implications of the observed trends on the distributions of cognitive traits in the population. Based on a sample of 1,577 genotyped Icelanders (653 males and 924 females; yob, mean = 1968 and SD = 13 years) with intelligence quotient (IQ) measurements (mean = 102 and SD = 15), each SU of POLYEDU is estimated to increase IQ by 3.8 points (P < 10−20). Given that POLYEDU is estimated to decline at a rate of 0.01 SU per decade, this translates to a decline of 0.038 IQ points per decade. However, under the assumptions that POLYFULL accounts for 30% of the variance of EDU, and the part of POLYFULL that is not captured by POLYEDU behaves in a similar fashion in its impact on both reproduction and IQ, by extrapolation, the decline of POLYFULL would lead to a decline of 0.038 × (30/3.74) = 0.30 IQ points per decade. This would be a very substantial effect if the trend persists for centuries. By contrast, a meta-analysis estimated that IQ scores have increased by 13.8 points between 1932 and 1978, a rate of 3.0 points per decade (30), a phenomenon referred to as the Flynn effect. This rate is 10 times the estimated effect due to the decline of the genetic component, and, more importantly, in the opposite direction. Many commentators [including Flynn himself (31)] consider the Flynn effect to be due to changes in the socioeconomic and technological environment faced by successive generations of humans. Unfortunately, we are unable to assess the Flynn effect in our IQ data, because they were measured within a narrow time interval. Assuming that a similar magnitude of the Flynn effect is found in the Icelandic population, then it is clear that such environmentally induced increases of IQ scores more than compensate for, and indeed mask, any potential decline in the genetic propensity for IQ.


From the results presented here it is clear that there has been a slow but steady decline in the frequency of certain variants in the Icelandic gene pool that are associated with educational attainment. It is also clear that education attained does not explain all of the effect. Hence, it seems that the effect is caused by a certain capacity to acquire education that is not always realized. We postulate that, in addition to being correlated with cognitive ability (32, 33), POLYEDU is capturing a portion of the propensity to long-term planning and delayed gratification. To address the question of whether and how these results could be extended to other populations and other time periods it should first be emphasized that the negative selection observed here is likely an example of gene–environment interaction, that is, both the direction of the effect and its magnitude could and would change given a different socioeconomic environment (5, 34, 35). It is likely that in any population where educational attainment is negatively correlated with fertility the underlying genetic propensity would be in decline, but the actual magnitude and characteristics of the decline could vary substantially. Based purely on epidemiological/demographical data, there were concerns about this sort of decline in Great Britain more than eight decades ago (10). However, the possibility that such a phenomenon could be temporary or transitional was also raised (10, 29). Indeed, there might be a cyclical element to this phenomenon, because it is only reasonable to assume that alleles associated with greater educational attainment must have been under positive selection at some time during the evolutionary history of Homo sapiens. The main message here is that the human race is genetically far from being stagnant with respect to one of its most important traits. It is remarkable to report changes in POLYEDU that are measurable across the several decades covered by this study. In evolutionary time, this is a blink of an eye. However, if this trend persists over many centuries, the impact could be profound.

Materials and Methods

Genealogical Database.

For nearly 20 years a genealogical database of Iceland has been used for genetics studies performed by deCODE genetics (1517). This database is constantly updated. Currently, the deCODE Genetics genealogical database contains essentially all of ∼317,000 living Icelanders (some recent immigrants may not be included in this tabulation) and the vast majority of their ancestors go back to about 1650 and a smaller portion of ancestors before that time. In total, just over 840,000 individuals are presently recorded in the genealogical database, with the earliest recorded yob 740 AD. The database contains information about the yob and sex of each individual, and when available the year of death, the identities of the father and mother, and geographical locations, such as places of birth, residence, and death. The database was constructed from a number of different sources, the most important of which were 14 national censuses spanning the period from 1703 to 1930, parish records from 1780, and the national registry from 1994. Additional key sources include annals, genealogical publications, biographical lists of members of professional associations, and other official records. The database is particularly complete for the probands used in this study, who were all born after 1910. For the vast majority of these individuals, both parents and grandparents are recorded, and all children that survived the first weeks of life.

Sample Collection.

All samples and questionnaire data were collected through studies approved by the National Bioethics Committee and the Icelandic Data Protection Authority. All participants signed informed consent before blood samples were drawn and all data were analyzed under pseudonyms assigned by a third-party encryption system overseen by the Icelandic Data Protection Authority (36).

Meta-Analysis and Polygenic Scores.

In a recent meta-analysis on educational attainment (3) the initial total sample size was 293,724, which included 76,155 samples from 23andMe, and 49,970 Icelandic samples [46,758 from deCODE and 3,212 from Age, Gene/Environment Susceptibility (AGES Reykjavik) Study]. Excluding the Icelandic samples and 23andMe, the remaining sample size was 167,599. When the manuscript was revised for final publication, an additional 111,349 UK Biobank samples were added as replication (full genome association results also available). It is important to note that the meta-analysis produces trait association results for each marker separately (i.e., joint analyses are not performed). When deriving the weights for computing POLYEDU (see below for the method used), for the current study, GWAS results from 23andMe and Iceland were excluded. The 23andMe results were excluded because their policy forbids the release of full GWAS results. The Icelandic results were excluded to avoid confounding/bias and/or overfitting. Thus, the weights for computing POLYEDU were derived based on results from 167,599 + 111,349 = 278,948 samples. Similarly, the weights for POLY-U.K.B were based on 167,599 samples. For the 120 genomewide significant markers, the estimated effects on educational attainment (used in Figs. S4 and S5) did incorporate the 23andMe data and were based on 278,948 + 76,155 = 355,103 samples.

Markers and Methods Used to Compute the Polygenic Score.

The basic method used to process the genotype data for Icelanders, including imputations based on full-genome sequencing results, was described in ref. 20. A framework set of ∼620,000 high-quality SNPs covering the whole genome was used to compute POLYEDU and POLY-U.K.B. Note that a polygenic score is constructed as a linear combination of the genotypes of the markers. In determining the weights used for the linear combination the goal is to maximize the correlation between the resulting score and the trait. This is not a trivial problem in part because, as noted above, the meta-analysis only gives association results for each marker separately, and the markers are in general correlated (i.e., in linkage disequilibrium). We adjusted for linkage disequilibrium using LDpred (19), a recently proposed method. The linkage disequilibrium between markers was estimated using the Icelandic samples. We have explored different ways of constructing the polygenic score (e.g., using a larger set of markers and different ways for adjusting linkage disequilibrium). We found the method used to give close to the best-performing score we could achieve. Most importantly, the main results in this paper are robust to the specific method (as long as it is a reasonable one) used to construct the polygenic score.

Educational Attainment.

As noted above, the deCODE data on educational attainment were part of the published meta-analysis (3). The original Icelandic data were collected through various questionnaires including questions on educational attainment of adults (we used responses from adults 30 years or older assuming maximum educational attainment had been achieved by this age). Responses were then mapped to the International Standard Classification of Education (ISCED) 1997 classification (UNESCO: format that was also used for the meta-analysis as described in detail in Okbay et al. (3) and briefly also reviewed below. The ISCED 1997 classification includes seven categories of educational attainment that are internationally comparable. The categories are translated into US years-of-schooling equivalents, which have a quantitative interpretation as follows:
Preprimary education: 1 year
Primary education or first stage of basic education: 7 years
Lower secondary or second stage of basic education: 10 years
(Upper) secondary education: 13 years
Postsecondary nontertiary education: 15 years
First stage of tertiary education (not leading directly to an advanced research qualification): 19 years
Second stage of tertiary education (leading to an advanced research qualification, e.g., a Ph.D.): 22 years.
In our data, questionnaire responses could be categorized according to the major educational levels in Iceland and were mapped to ISCED 1997 levels according to the mapping schema for Iceland maintained by UNESCO ( and accordingly to comparable years of educational attainment in the United States as demonstrated below:
Compulsory basic education (10 grades): 10 years
(Upper) secondary education or vocational programs: 13 years
Postsecondary nontertiary education: 15 years
Advanced education representing A-levels and/or any university degree: 20 years.

IQ Data.

IQ measurements from population controls were collected in years 2009–2016. Intelligence was measured using the Icelandic version of the Wechsler Abbreviated Scale of Intelligence (WASIIS) (37, 38).

Genomic Control.

Results in this paper are mainly based on regression analyses. The standard output of regressions assumes that the data points are statistically independent. However, because the individuals are genetically related and the trait values of individuals who are genetically closely related tend to be correlated, taking the standard output at face value would tend to produce anticonservative results (i.e., the test statistics tend to have a variance, under the null hypothesis of no effect, that is higher than assumed). Adjusting for 20 principal components reduces, but does not eliminate, this effect. Genomic control is a method that uses the observed results of a large number of SNPs in the genome (1.1 million are used here), most of them expected to have no effect, to evaluate and adjust for the overdispersion of the test statistics. The first paper to describe such an approach is by Devlin and Roeder (39), but the method described there could be somewhat conservative, particularly when many variants in the genome do actually contribute to the trait. The method used here, based on LD score regression (22), is more recent and adjusts for the conservativeness of the original method. Because genomic control is a form of variance adjustment, theoretically it should apply to a polygenic score in the same way as a single marker. This has been confirmed by simulations. For example, applying this method, the t-statistic for the correlation between POLYEDU and AGFC is divided by 1.13 and 1.14 for males and females, respectively. Genomic control was also applied to the correlation between POLYEDU and yob, where the null hypothesis corresponds to a scenario that changes of marker frequencies over time, if any, are a result of random genetic drift. Here, however, no adjustment was found to be necessary; for the analyses restricted to individuals with yob ≥1940, there is actually some indication that the unadjusted results could be slightly conservative. This is probably because whereas values for traits such as EDU tend to be positively correlated between close relatives that is not necessarily the case for yob. We also note that P values given are two-sided unless explicitly stated otherwise.

Determining the Rate of Change of the Polygenic Score As a Result on Its Impact on Fertility Traits.

To derive the (approximate) relationship between the effects of a polygenic score X on the fertility traits and the change of the average polygenic score over time we assume that the effects are linear and small per generation. Specifically, with X standardized to have mean 0 and variance 1, we assume
The main mathematical result we are going to show is that, under these assumptions, to the first order, the rate of change of the mean of X per year is
(We note that Eq. 1 might have been explicitly derived in some other publications, although we are not currently aware of it.) In situations where the males and females behave differently, that is, have different values for a, b, c, and d, we have βM for males and βF for females, so that (βM /2) + (βF /2) would be the estimate of the rate of change. Note that the first term in Eq. 1, b/ac, is capturing the contribution of the effect of X on NC to the rate of change, whereas the second term, −dlog(a/2)c−2, is capturing the contribution of the effect of X on AACB.
Before showing how to derive the general form (Eq. 1), we think it is helpful to see how the result can be shown for the special case with d = 0. Here, to the first order, we can assume that mating is performed in discrete generations with generation time c. Let X be the (random) polygenic score for a female in generation t, and scaled to have mean 0 and variance 1. Let Y denote, for a random person in generation t + 1, what is inherited from the mother. It follows that
where w = a + bX. The factor (1/2) results from the fact that only one-half of the genetic material is passed on to the offspring. E(wX)/E(w) corresponds to a weighted average of X with weights proportional to w. [The absolute weight is wt = w/E(w) with expectation 1.] It follows from E(X) = 0 and var(X) = 1 that E(wX) = b and E(w) = a. Thus, E(Y) = (1/2) × (b/a). Taking into account that generation time is c, the contribution of the females to the change of the mean polygenic score per year is (1/2) × (b/ac). The same calculations apply to the fathers.
Deriving the general form (Eq. 1) where the polygenic score also has an effect on generation time (AACB) is more complicated. To do that, we start with equation 6.5 in section 6.3 of ref. 29:
where r is the intrinsic rate of change, R0 is the net reproductive rate, and T is the mean generation time. Because only one-half of the genetic material is transmitted from a parent to an offspring, we should think of R0 as the number of children divided by two. For females, based on the estimated effects of the polygenic score X on number of children and AACB, and assuming linearity, we have
The derivative is
Evaluating at X=0,
From equation 6.9 of ref. 29, the relative fitness between two genotypes is
where r1 and r2 are the two intrinsic rates of increase and T¯ is the average generation time, which can be taken as c here. When the relative difference in fitness is small, the relative fitness of X = x and X = 0 is
Notice that wt is already scaled to have expectation one (approximately). Thus, the weighted average of X, with the weight proportional to fitness, is
Because this is the approximate rate of change per generation, the rate of change per year is
giving us Eq. 1. Here we have shown how to derive Eq. 1 from equations in ref. 29. We note expression Eq. 1 can also be derived using equations from ref. 28.
With POLYEDU, for females a = 2.84, b = −0.084, c = 27.5, and d = 0.46, and for males a = 2.73, b = −0.054, c = 30.0, and d = 0.37. Applying these values to the equation, we get
For POLY-U.K.B, for females b = −0.069 and d = 0.39, and for males b = −0.043, and d = 0.31. Similar calculations estimate the expected change to be −0.00085 SU per year.

SI Text

Here we explore the relationship between the rate of decline of POLYEDU and that of POLYFULL, assuming each is standardized to have mean zero and variance one. Decompose POLYFULL as
where A = cor(POLYEDU, POLYFULL) × POLYEDU and B = POLYFULL – A. Notice that A, which is proportional to POLYEDU, and B are uncorrelated because A corresponds to the fitted values when POLYFULL is regressed on the POLYEDU, and B corresponds to the residuals. Let cor(A,EDU) = rA, cor(B,EDU) = rB, cor(A,NC) = πA, cor(B,NC) = πB, cor(A,AACB) = ωA, cor(B,AACB) = ωB. Assume
(The conditions for Eq. S1 to hold will be discussed below after we demonstrate its consequences.) The fraction of the variance of EDU explained by POLYEDU and POLYFULL is rA2 and (rA2+rB2), respectively. Similarly, the fraction of the variance of NC explained by POLYEDU and POLYFULL is πA2 and (πA2+πB2), respectively, and the variance of AACB explained by POLYEDU and POLYFULL is ωA2 and (ωA2+ωB2), respectively. If Eq. S1 holds, then
Note that in the expression of β, Eq. 1, the rate of decline, a and c are population characteristics and can be treated as constants, whereas b and d are the effects of the polygenic score under study. Because variance explained is proportional to effect squared, or equivalently effect is proportional to the square root of variance explained, Eq. S2 implies
(the subscript indicates the corresponding polygenic score). Because β is a linear function of b and d, this implies
With Eq. S1, we are assuming that the relative effect sizes of B (the part of POLYFULL that is not captured by POLYEDU) on EDU and the fertility traits NC/AACB are the same as the relative effect sizes of POLYEDU. Although this assumption might not be satisfied exactly, it is likely to be approximately true. Statistically, POLYEDU is capturing the contributions of many variants over the genome, with noise, and hence things could average out approximately. The comparison we made between POLYEDU and POLY-U.K.B supports this belief.


We thank David Cesarini, Philipp Koellinger, and the Social Science Genetic Association Consortium for allowing us early access to genome-wide association study (GWAS) results.

Supporting Information

Supporting Information (PDF)
Supporting Information


AR Branigan, KJ McCallum, J Freese, Variation in the heritability of educational attainment: An international meta-analysis (Institute for Policy Research, Northwestern University, Evanston, IL). (2013).
CA Rietveld, et al., GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science; LifeLines Cohort Study 340, 1467–1471 (2013).
A Okbay, et al., Genome-wide association study identifies 74 loci associated with educational attainment. Nature; LifeLines Cohort Study 533, 539–542 (2016).
JC Caldwell, Mass education as a determinant of the timing of fertility decline. Popul Dev Rev 6, 225–255 (1980).
T Castro Martín, Women’s education and fertility: Results from 26 demographic and health surveys. Stud Fam Plann 26, 187–202 (1995).
RR Rindfuss, SP Morgan, K Offutt, Education and the changing age pattern of American fertility: 1963-1989. Demography 33, 277–290 (1996).
AC D’Addio, MM d’Ercole, Trends and determinants of fertility rates: The role of policies. OECD Social, Employment and Migration Working Paper 27 (Organisation for Economic Co-operation and Development, Paris). (2005).
JP Beauchamp, Genetic evidence for natural selection in humans in the contemporary United States. Proc Natl Acad Sci USA 113, 7774–7779 (2016).
A Courtiol, FC Tropf, MC Mills, When genes and environment disagree: Making sense of trends in recent human evolution. Proc Natl Acad Sci USA 113, 7693–7695 (2016).
RA Fisher The Genetical Theory of Natural Selection (Oxford Univ Press, New York, 1930).
KM Kirk, et al., Natural selection and quantitative genetics of life-history traits in Western women: A twin study. Evolution 55, 423–435 (2001).
HP Kohler, JL Rodgers, WB Miller, A Skytthe, K Christensen, Bio-social determinants of fertility. Int J Androl 29, 46–53 (2006).
FC Tropf, et al., Human fertility, molecular genetics, and natural selection in modern societies. PLoS One 10, e0126821 (2015).
FR Day, et al., Physical and neurobehavioral determinants of reproductive onset and success. Nat Genet 48, 617–623 (2016).
R Arngrímsson, et al., A genome-wide scan reveals a maternal susceptibility locus for pre-eclampsia on chromosome 2p13. Hum Mol Genet 8, 1799–1805 (1999).
H Gudmundsson, DF Gudbjartsson, M Frigge, JR Gulcher, K Stefánsson, Inheritance of human longevity in Iceland. Eur J Hum Genet 8, 743–749 (2000).
A Helgason, S Pálsson, DF Gudbjartsson, T Kristjánsson, K Stefánsson, An association between the kinship and fertility of human couples. Science 319, 813–816 (2008).
SM Purcell, et al., Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature; International Schizophrenia Consortium 460, 748–752 (2009).
BJ Vilhjálmsson, et al., Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am J Hum Genet; Schizophrenia Working Group of the Psychiatric Genomics Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study 97, 576–592 (2015).
DF Gudbjartsson, et al., Sequence variants from whole genome sequencing a large group of Icelanders. Sci Data 2, 150011 (2015).
AL Price, et al., Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38, 904–909 (2006).
BK Bulik-Sullivan, et al., LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet; Schizophrenia Working Group of the Psychiatric Genomics Consortium 47, 291–295 (2015).
H Stefansson, et al., A common inversion under selection in Europeans. Nat Genet 37, 129–137 (2005).
MA Ikram, et al., Common variants at 6q22 and 17q21 are associated with intracranial volume. Nat Genet; Early Growth Genetics Consortium; Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium 44, 539–544 (2012).
DP Hibar, et al., Common genetic variants influence human subcortical brain structures. Nature; Alzheimer’s Disease Neuroimaging Initiative; CHARGE Consortium; EPIGEN; IMAGEN; SYS 520, 224–229 (2015).
A Okbay, et al., Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat Genet 48, 624–33 (2016).
J Weeden, MJ Abrams, MC Green, J Sabini, Do high-status people really have fewer children?: Education, income, and fertility in the contemporary U.S. Hum Nat 17, 377–392 (2006).
B Charlesworth Evolution in Age-Structured Populations (Cambridge Univ Press, Cambridge, UK, 1980).
LL Cavalli-Sforza, WF Bodmer The Genetics of Human Populations (Dover, Mineola, NY, 1999).
LH Trahan, KK Stuebing, JM Fletcher, M Hiscock, The Flynn effect: A meta-analysis. Psychol Bull 140, 1332–1360 (2014).
J Flynn What Is Intelligence? Beyond the Flynn Effect (Cambridge Univ Press, Cambridge, UK, 2007).
CA Rietveld, et al., Common genetic variants associated with cognitive performance identified using the proxy-phenotype method. Proc Natl Acad Sci USA 111, 13790–13794 (2014).
G Davies, et al., Genome-wide association study of cognitive functions and educational attainment in UK Biobank (N=112 151). Mol Psychiatry 21, 758–767 (2016).
M Myrskylä, HP Kohler, FC Billari, Advances in development reverse fertility declines. Nature 460, 741–743 (2009).
MZH Hazan, Do highly educated women choose smaller families? Econ J (Oxf) 125, 1191–1226 (2015).
JR Gulcher, K Kristjánsson, H Gudbjartsson, K Stefánsson, Protection of privacy by third-party encryption in genetic research in Iceland. Eur J Hum Genet 8, 739–742 (2000).
D Wechsler Wechsler Abbreviated Scale of Intelligence (WASI) Manual (Psychological Corp, San Antonio, 1999).
E Gudmundsson, Mat á greind fullorðinna: WASIIS (Menntamalastofnun, Reykjavik, Iceland) [The Assessment of Intelligence in Adults: WASIIS]. (2015).
B Devlin, K Roeder, Genomic control for association studies. Biometrics 55, 997–1004 (1999).
TE Thorgeirsson, et al., A variant associated with nicotine dependence, lung cancer and peripheral arterial disease. Nature 452, 638–642 (2008).
G Thorleifsson, et al., Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity. Nat Genet 41, 18–24 (2009).
J Flannick, et al., Loss-of-function mutations in SLC30A8 protect against type 2 diabetes. Nat Genet; Go-T2D Consortium; T2D-GENES Consortium 46, 357–363 (2014).
A Helgadottir, et al., Variants with large effects on blood lipids and the role of cholesterol and triglycerides in coronary disease. Nat Genet 48, 634–639 (2016).

Information & Authors


Published in

Go to Proceedings of the National Academy of Sciences
Go to Proceedings of the National Academy of Sciences
Proceedings of the National Academy of Sciences
Vol. 114 | No. 5
January 31, 2017
PubMed: 28096410


Submission history

Published online: January 17, 2017
Published in issue: January 31, 2017


  1. selection
  2. educational attainment
  3. genes
  4. fertility
  5. sequence variants


We thank David Cesarini, Philipp Koellinger, and the Social Science Genetic Association Consortium for allowing us early access to genome-wide association study (GWAS) results.


This article is a PNAS Direct Submission.



Augustine Kong1 [email protected]
deCODE genetics/Amgen Inc., Reykjavik 101, Iceland;
School of Engineering and Natural Sciences, University of Iceland, Reykjavik 101, Iceland;
Michael L. Frigge
deCODE genetics/Amgen Inc., Reykjavik 101, Iceland;
Gudmar Thorleifsson
deCODE genetics/Amgen Inc., Reykjavik 101, Iceland;
Hreinn Stefansson
deCODE genetics/Amgen Inc., Reykjavik 101, Iceland;
Alexander I. Young
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, United Kingdom;
Florian Zink
deCODE genetics/Amgen Inc., Reykjavik 101, Iceland;
Gudrun A. Jonsdottir
deCODE genetics/Amgen Inc., Reykjavik 101, Iceland;
Department of Applied Economics, Erasmus School of Applied Economics, Erasmus University Rotterdam, 3062 PA Rotterdam, The Netherlands;
Institute for Behavior and Biology, Erasmus University Rotterdam, 3062 PA Rotterdam, The Netherlands;
Patrick Sulem
deCODE genetics/Amgen Inc., Reykjavik 101, Iceland;
Gisli Masson
deCODE genetics/Amgen Inc., Reykjavik 101, Iceland;
Daniel F. Gudbjartsson
deCODE genetics/Amgen Inc., Reykjavik 101, Iceland;
School of Engineering and Natural Sciences, University of Iceland, Reykjavik 101, Iceland;
Agnar Helgason
deCODE genetics/Amgen Inc., Reykjavik 101, Iceland;
Department of Anthropology, University of Iceland, Reykjavik 101, Iceland;
Gyda Bjornsdottir
deCODE genetics/Amgen Inc., Reykjavik 101, Iceland;
Unnur Thorsteinsdottir
deCODE genetics/Amgen Inc., Reykjavik 101, Iceland;
Faculty of Medicine, University of Iceland, Reykjavik 101, Iceland
Kari Stefansson1 [email protected]
deCODE genetics/Amgen Inc., Reykjavik 101, Iceland;
Faculty of Medicine, University of Iceland, Reykjavik 101, Iceland


To whom correspondence may be addressed. Email: [email protected] or [email protected].
Author contributions: A.K. and K.S. designed research; A.K., H.S., A.I.Y., G.A.J., A.O., P.S., G.M., D.F.G., A.H., G.B., and U.T. performed research; A.K., M.L.F., G.T., and F.Z. analyzed data; A.K. derived the mathematical results in Materials and Methods; M.L.F. prepared the figures and tables for publication; H.S. provided IQ data and references; G.A.J. processed the IQ data to a form suitable for analyses; A.I.Y. assisted in deriving the mathematical results in Materials and Methods; A.O. provided meta-analysis results with various cohorts removed; P.S., G.M., and D.F.G. contributed to processing the Icelandic genotype data for analysis; A.H. provided key references and contributed to writing the Discussion; G.B. collected and processed Icelandic education data and provided key references; U.T. oversaw the generation of the genotype data in the laboratory; A.K. wrote the paper; and K.S. contributed to the writing of the final version of the paper.

Competing Interests

The authors declare no conflict of interest.

Metrics & Citations


Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.

Citation statements



If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by


    View Options

    View options

    PDF format

    Download this article as a PDF file


    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to get full access to it.

    Single Article Purchase

    Selection against variants in the genome associated with educational attainment
    Proceedings of the National Academy of Sciences
    • Vol. 114
    • No. 5
    • pp. 783-E905







    Share article link

    Share on social media