Mendelian randomization identifies blood metabolites previously linked to midlife cognition as causal candidates in Alzheimer’s disease

Significance The absence of disease-modifying therapeutics for Alzheimer’s disease (AD) continues, and an understanding of early, easily accessible biomarkers to inform treatment strategies remains elusive. This study uses knowledge of blood metabolites previously associated with midlife cognition—a preclinical predictor of AD—to systematically investigate causal associations with later AD status. Given that the pathological changes underlying AD are thought to develop years before clinical manifestations of the disease, developing these findings further could hold special utility in informing early treatment intervention.


Motivation and Scope a. The primary hypothesis of interest
Metabolite sub-fractions previously shown to be associated with midlife cognitive functioning lie on the causal pathway to later Alzheimer's Disease diagnosis. As such, intervening on levels of these metabolites prior to disease onset will go some way to reducing Alzheimer's Disease risk.

b. Motivation for the study
The absence of disease modifying therapeutics for Alzheimer's Disease (AD) continues, and an understanding of early, easily accessible biomarkers to inform treatment strategies remains sparse. Using knowledge of associations between preclinical risk factors and potential biomarkers and assessing how well such markers translate through to later clinical risk could therefore hold special utility in informing early treatment intervention, particularly if a causal relationship can be shown. Midlife cognitive factors have consistently been shown to predict later AD risk; thus selecting candidate AD biomarkers based on associations with these earlier factors could hold particular promise. More specifically, associations with blood metabolites are of particular interest due to their ease of accessibility via a simple blood sample, making both routine measurement and intervention possible. Past epidemiological studies have implicated metabolites -particularly lipids -in AD, but causal relationships have yet to be established. Using knowledge from previously observed associations between a number of metabolites and mid-life cognition in our 2014 and 2018 studies (3)(4), we therefore wanted to investigate how well such metabolic markers translate through to later AD diagnosis, and as such investigate their utility as AD-relevant biomarkers.

c. Motivation for using Mendelian Randomization
Mendelian Randomization (MR) is an accessible method for assessing causality in instances where controlled randomized trial data are not available. Unlike alternative genetically inspired methods, such as LD-score regression (1) and genetic colocalisation (5), which uncover potentially shared genetic aetiology between two traits, MR is unique in its ability to provide information on the direction of the causal effect, and the likely magnitude of impact upon exposure intervention specifically.
Given that this study is interested in investigating whether metabolites which demonstrate association with cognitive processes prior to AD onset translate to causally impact later AD risk, MR offers an appropriate method which can utilise readily available genome-wide association data to test such a causal hypothesis.

d. The study scope
Conduct a preliminary causal analysis using MR methodology, to: i. Investigate whether any one of the metabolite subfractions previously found to be associated with mid-life cognition show evidence of having a causal effect on clinically diagnosed AD.
ii. Investigate whether sub-groups of metabolites, of those previously found to be associated with mid-life cognition, together show evidence of having a causal effect on clinically diagnosed AD.
iii. Investigate the possibility of reverse causation. That is, whether clinically diagnosed AD shows evidence of causally impacting levels of metabolites, rather than vice versa.

e. The primary analysis (what and how many)
i. Univariable bidirectional Mendelian randomization to assess the causal relationship between 19 metabolite subfractions -one at a time -and clinically diagnosed Alzheimer's Disease (AD). 38 univariable analyses conducted in total: 19 * metabolite à AD 19 * AD à metabolite.
ii. Bayesian model averaging MR to assess potential groups of metabolites which may together be on the causal pathway to AD and, again, to assess per-metabolite-AD causal relationships using a method better equipped to handle high correlation amongst risk factors. 1 analysis conducted in total, consisting of 9 metabolite risk factors, each genetically correlated <95%.

Data Sources
For both sets of primary analyses outlined in section 1d, a two-sample MR approach was adopted, selecting publicly available genome-wide summary statistics for each metabolite, and separately for clinically diagnosed AD. To the best of our knowledge, no sample overlap existed between the metabolite and AD datasets. Populations were also comparable across each of the datasets -being of white, European ancestry.
For metabolites and AD, the latest and largest peer-reviewed GWAS datasets were utilised in an attempt to obtain the greatest statistical power. More specifically, Kettunen et al (6) was selected due to the relevance of metabolite subfractions and their quantification method (Nuclear Magnetic Resonance spectroscopy) which matched that of the observational data for which metabolites were selected on the basis of (4). In this way, it allowed for the direct comparison between those metabolites previously shown to associate with midlife cognition, and how they may translate across to causally associate with clinically relevant AD.

Selection of genetic variants a. GWAS selection (including p-value thresholding / clumping)
GWAS summary statistics were utilised to select instrumental variables using a genome-wide approach. This was favoured over that of a candidate gene-region(s) strategy due to the polygenic nature of the phenotypes of interest. For both metabolites and AD, multiple SNP-phenotype associations have been shown to exist, spanning a number of regions across the genome (6) (7). With the exception of APOE -a genomic region located on chromosome 19 with an unusually large effect size for its association with AD (7) -per-SNP effect sizes also remain small, making a pooled IV approach which exploits the large power gains of GWAS most appropriate in this instance.
To ensure robustness of instrumental variables and to avoid introduction of pleiotropy, only SNPs which were significantly associated with each exposure at the level of genome-wide significance (5*10 -08 ) were considered as instruments.
Clumping procedures differed slightly between metabolite exposures and AD. For metabolites, instruments were selected using a list of pre-curated metabolite quantitative trait loci (mQTLs) extracted from Kettunen et al (6) and made available within the MR-Base catalogue. Pre-curated instruments were not available for AD data, and thus genome-wide significant instruments were clumped using an r2 threshold of 0.001.

b. Exclusion criteria
i. SNPs with a computed F statistic <10.

c. Assessment of instrumental validity
i. Computation of per-instrument F statistic ii. Sensitivity analyses à leave-one-out, MR-Egger, MR-PRESSO, Weighted median, Cochran's Q, Cooks Distance.

Harmonization procedure
Data were harmonized prior to all MR analyses, where all inferable SNPs were aligned across the exposure and outcome dataset. Palindromic SNPs were retained and assumed to be on the forward strand, and additional sensitivity analyses were conducted which removed all palindromic SNPs during harmonization and re-computed causal estimates.

Primary analysis and multiple testing
As stated in section 1d, the primary analyses conducted in this study were: a. Univariable bidirectional Mendelian randomization to assess the causal relationship between 19 metabolite subfractions -one at a time -and clinically diagnosed Alzheimer's Disease (AD). 38 univariable analyses conducted in total: 19 x metabolite à AD 19 x AD à metabolite.
These analyses identified four metabolites to be significantly causally associated with AD at the adjusted level of p<0.009: XL.HDL.FC, XL.HDL.PL, XL.HDL.P, XL.HDL.P. A number of additional metabolites were also associated at the 5% level, including GP, a number of large HDLs, and XL.HDL.C. b. Bayesian model averaging MR to assess potential groups of metabolites which may together be on the causal pathway to AD and to again assess per-metabolite-AD causal relationships using a method better equipped to handle high correlation amongst risk factors. 1 analysis conducted in total, consisting of 9 metabolite risk factors, each genetically correlated <95%.
Of the four metabolites demonstrating adjusted significance in univariable analyses, only 1 -XL.HDL.FC -was taken forward to Bayesian analyses, as the remaining 3 were pruned out due to high correlation. XL.HDL.FC was identified as the third highest ranked "true causal" metabolite by Bayesian analyses, with GP identified as the most strongly ranked, followed by XL.HDL.C.
Multiple testing was corrected for using an adjusted alpha of 0.009. This was calculated using an independent tests package within Python (https://github.com/hagax8/independent_tests) which computes an adjusted p-value threshold whilst accounting for correlations amongst metabolites (see supplementary information (SI3)).

Sensitivity analyses
A number of sensitivity analyses were conducted to assess the validity of primary analyses and instrumental variables. GP also demonstrated a nominally significant positive association with AD in primary univariable analysis, with p=0.0099, and was the only metabolite to replicate in a small scale replication using individual level data from an independent cohort. Like XL.HDLs, the magnitude of effect was however small (95% CI=1.045-1.375), indicating that this metabolite may explain a piece of the causal puzzle, but that it alone does not explain the whole story. Sensitivity analyses -which are largely conservative in comparison to primary analyses -demonstrated consistent direction of effect for significant results but failed to retain significance. However, specific tests of pleiotropy indicated that this was due to lack of power rather than notable violations to instrumental assumptions. Our study therefore offers a number of interesting causal candidates -namely XL.HDLs and GP which may hold value as early indicators of AD risk, and possible early targets of intervention. Future studies with greater statistical power and which incorporate a wider network of risk factors will, however, be important for building upon the foundations within this study.

Info. S2. Metabolite Pruning in preparation for Bayesian Model Averaging Mendelian Randomization.
Bayesian Model Averaging MR (BMA-MR) adopts a multivariable framework, whereby multiple exposures can be included within the model, provided a) they are each robustly associated with a least one SNP-instrument used within the model, and b) they do not induce multi-collinearity. As with univariable models outlined within the main text of the present paper, criterion a) was met through the inclusion of only those exposures which had at least five GWS SNPs available, each with a minimum F statistic of 10. To meet criterion b), pairwise genetic correlations (rg) across metabolites were computed using linkage-disequilibrium score regression (LDSC)(1), and any metabolites observed with rg>0.95 assumed non-independent and pruned according to the following stepwise procedure: 3) For rg>0.95 metabolite pairs which also had an equal number of wider metabolite pairwise rg>0.95 at adjusted significance, the metabolite with the greatest number of nominally significant rgs (p<0.05) was removed.
During data preparation, MUFA was dropped from further analyses due to a low mean chisquare statistic (! =1.01) computed during LDSC data munging, making it unsuitable for cross-trait LDSC analyses.

Info. S3. Multiple test corrections
An adjusted p-value, correcting for multiple independent tests while accounting for correlations amongst metabolites, was computed using an "independent_tests" package in Python (https://github.com/hagax8/independent_tests). An adjusted significance of p<0.009 was calculated for primary analyses, as per the following: Table S1 outlines the results of five logistic regression analyses performed using baseline metabolite and sample information from the Alzheimer's Disease Neuroimaging Initiative (ADNI) (8). Each model was adjusted for age, sex, and the APOE4 genotype (dummy coded, with 0 as the reference category). Samples were restricted to those which (i) had available metabolite data at baseline, and (ii) were classified as either AD cases or clinically healthy controls. As genotype information was not required within this phase of analyses, retained sample sizes were larger than those available within the same study cohort utilised for two-stage least squares Mendelian randomization (glycoprotein acetyls (GP) N=1,140, high-density lipoproteins (HDLs) N=1,116. See table S2 for confirmation of pre-processing steps).