Understanding the consequences of education inequality on cardiovascular disease: mendelian randomisation study

Abstract Objectives To investigate the role of body mass index (BMI), systolic blood pressure, and smoking behaviour in explaining the effect of education on the risk of cardiovascular disease outcomes. Design Mendelian randomisation study. Setting UK Biobank and international genome-wide association study data. Participants Predominantly participants of European ancestry. Exposure Educational attainment, BMI, systolic blood pressure, and smoking behaviour in observational analysis, and randomly allocated genetic variants to instrument these traits in mendelian randomisation. Main outcomes measure The risk of coronary heart disease, stroke, myocardial infarction, and cardiovascular disease (all subtypes; all measured in odds ratio), and the degree to which this is mediated through BMI, systolic blood pressure, and smoking behaviour respectively. Results Each additional standard deviation of education (3.6 years) was associated with a 13% lower risk of coronary heart disease (odds ratio 0.86, 95% confidence interval 0.84 to 0.89) in observational analysis and a 37% lower risk (0.63, 0.60 to 0.67) in mendelian randomisation analysis. As a proportion of the total risk reduction, BMI was estimated to mediate 15% (95% confidence interval 13% to 17%) and 18% (14% to 23%) in the observational and mendelian randomisation estimates, respectively. Corresponding estimates were 11% (9% to 13%) and 21% (15% to 27%) for systolic blood pressure and 19% (15% to 22%) and 34% (17% to 50%) for smoking behaviour. All three risk factors combined were estimated to mediate 42% (36% to 48%) and 36% (5% to 68%) of the effect of education on coronary heart disease in observational and mendelian randomisation analyses, respectively. Similar results were obtained when investigating the risk of stroke, myocardial infarction, and cardiovascular disease. Conclusions BMI, systolic blood pressure, and smoking behaviour mediate a substantial proportion of the protective effect of education on the risk of cardiovascular outcomes and intervening on these would lead to reductions in cases of cardiovascular disease attributable to lower levels of education. However, more than half of the protective effect of education remains unexplained and requires further investigation.


Covariates
Variables considered as covariates were measured at the baseline assessment centres through interviews. Sex and ethnicity were confirmed according to genetic data. Place of Birth was adjusted for by the northing and easting birth location coordinates. Although the Townsend Deprivation Index (TDI) of historic birth locations are not recorded in UK Biobank, this has been estimated from the index of multiple deprivation indices using the current TDI of birth location as a proxy for historic birth place TDI. Mendelian randomisation (MR) models were also adjusted for the same confounders. Although a core assumption of MR is that the genetic variants are unrelated to confounders, there is some evidence of small associations with place of birth for the educational attainment variants in UK Biobank (5,6). These were only considered in observational and one sample MR analyses, where individual level data were available.

Outcomes
Incident cases of CVD were defined according to hospital episode statistics (HES). Date of diagnoses are provided by HES data, which was linked with the date of assessment centre provided by UK Biobank.

One-sample MR instrument selection (including GWAS methods)
For the one-sample MR analysis using individual-level data, instruments were selected from analysis of populations that did not overlap with those considered in the outcome estimates. Accordingly, we used 74 independent single-nucleotide polymorphisms (SNPs) that attained genome-wide significance (P<5x10 -8 ) for education reported in main results from the 2016 SSGAC GWAS metaanalysis of 293,723 individuals that did not include UK Biobank participants, to create a weighted allele score (7). The 77 reported genome-wide significant SNPs from the GIANT consortium's metaanalysis of 322,154 individuals of European ancestry identified were used to create a weighted allele score for BMI (2). Five instruments for education were not available in UK Biobank and proxy SNPs in perfect LD (r 2 =1) were used (Supplementary Table 4).
MR studies require the SNP-exposure and the SNP-outcome associations to be estimated in independent samples, otherwise the MR estimates can be overestimated (8,9). Existing SBP and lifetime smoking GWASs have been estimated using UK Biobank data (4,10,11). To avoid participant overlap for exposure and outcome genetic estimates in the UK Biobank (8), split sample GWASs of SBP and smoking respectively were performed using the University of Bristol MRC Integrative Epidemiology Unit GWAS Pipeline (12). A total of 318,147 unrelated UK Biobank participants were eligible for inclusion to the GWAS (see Supplementary Figure 2). All the eligible participants were randomly allocated into one of two halves (sample 1 and sample 2). A GWAS was performed on both samples 1 and 2 separately, adjusted for age, sex and the first 40 principle components in UK Biobank. A BOLT-LMM model was used to account for population stratification. The top hit SNPs were determined using the 'clump_data' command in the Two-Sample MR R package (r 2 > 0.001, distance >10,000kb) (default settings of the 'clump_data' command) (13). This process was carried out for both SBP and lifetime smoking phenotypes.
The genetic score was created for each sample independently, by weighting each SNP by its relative effect size from the GWAS results of the opposing sample (i.e. the genome-wide significant SNPs and betas identified in the GWAS of sample 1 were used to generate the genetic score in sample 2 individuals). All genetic variants were summed together in an additive model. A total of 65 and 55 genome-wide significant SNPs were identified for SBP (with 10mmHg added for antihypertensive use) for sample 1 and sample 2 respectively (see Supplementary Table 5). In the split-sample GWAS for smoking, 18 SNPs were identified in the GWAS of sample 1 individuals and 15 SNPs in sample 2 individuals (see Supplementary Table 6).
Additionally, published SBP GWAS have been adjusted for BMI. Given the consideration for BMI as a mediator in this analysis and the potential for collider bias, we carried out a further GWAS of SBP in UK Biobank using the full eligible sample in a single GWAS to provide instruments for two-sample MR analysis that were not adjusted for BMI. This used the same pipeline and adjustments as described previously for the split sample analysis. As with the split sample approach, the independent genome-wide significant SNPs were determined using the 'clump_data' command in the Two-Sample MR R package (r 2 > 0.001 and distance >10,000kb) (default settings of the 'clump_data' command) (14).

GWAS meta-analysis data: education coding
In the SSGAC education GWAS, education was assessed at or above the age of 30 years, with comparability between studies heterogeneous in their educational systems maximised by mapping major educational qualifications on to one of seven categories of the ISCED (7,17).

GWAS meta-analysis data: CHD, MI and stroke data sources
For the risk of CHD we used publicly available genetic association estimates from the CARDIoGRAMplusC4D 1000 Genomes-based GWAS meta-analysis of 60,801 cases and 123,504 controls (15). Participants were of European, East Asian, South Asian, Hispanic and African American ancestry, and adjustment was made for population stratification using the genomic control method For the risk of MI, we used genetic association estimates were generated from the CARDIoGRAMplusC4D subgroup analysis of approximately 70% of the total cases that had a reported history of MI (15). For risk of stroke risk, we used publicly available genetic association estimates from the MEGASTROKE consortium GWAS meta-analysis of 67,162 stroke cases (comprising of ischaemic stroke, intracerebral haemorrhage and stroke of unknown type) and 406,111 controls (16).
All genetic association estimates used in each two-sample MR analyses are provided in Supplementary Tables 7-21.

Statistical analysis for one-sample MR
In the one-sample MR of UK Biobank data, the total effect of education on CVD outcomes was investigated using two-stage least squares regression. In the first regression, we estimated the effect of the education weighted allele score on self-reported educational attainment. We used this estimate to generate a prediction of educational attainment. In the second stage, we estimate the effect of predicted educational attainment on the CVD outcome using robust standard errors (18). Both regression stages were adjusted for adjusted for age, sex, place of birth, birth distance from London, and TDI as well as the first ten principal components (PCs).
To estimate the effect of the education weighted allele score on each of BMI, SBP and smoking, the Stata IVREG2 package was used, adjusted covariates and PCs as above.
To then estimate the effect of each risk factor individually on the CVD outcomes, linear regression models were used to estimate the gene-exposure association between the weighted allele score for each risk factor and the observed value, whilst controlling for the weighted allele score of education. Additionally, the gene-education estimates whilst controlling for the allele score of the risk factor were calculated. For both models, the predicted values were stored for use in a second stage regression, where they were regressed against each CVD outcome risk using logistic regression. The final estimate of interest came from the predicted value of the mediator, controlling for education and all other covariates and PCs as previously described. Where split sample GWAS estimates were used to create the allele score in SBP and smoking the MR analyses were run separately for each 50% sample and meta-analysed to estimate an overall effect.
These estimates were then multiplied to estimate the indirect effect, which is the amount of the association between education and CVD going via each of the three risk factors individually.

Investigating all three risk factors combined
When investigating the role of all three risk factors together on the association between education and CVD, we used the difference method (19). This involved estimating the total effect of education on each CVD subtype, as described in the main Methods. We estimated the direct effect of education on each CVD subtype controlling for all three risk factors together, using either multivariable regression or multivariable MR, in observational and MR analyses respectively. In twosample MR the direct effect was divided by the total effect to give a proportion, and this was then subtracted from one to estimate the amount of indirect mediation through the risk factors. To estimate the total effect of education mediated indirectly through all three risk factors collectively using two-sample MR, the direct effect of education after adjusting for the three risk factors together was estimated using MVMR, with this estimate divided by the total effect and then subtracted from one. In observational analysis, a multivariable logistic model for the effect of education on CVD (and subtypes) adjusting for all three risk factors was used to estimate the direct effect of education independently of the risk factors. This was subtracted from the total effect to estimate the indirect effect of education through the three risk factors collectively.

Sensitivity analyses
MR estimates are prone to bias if the underlying assumptions of the analysis are violated (20). Horizontal pleiotropy, where a genetic variant is associated to the outcome of interest via an alternative pathway, can potentially bias the MR estimates (20). MR-Egger allows for directional (unbalanced) horizontal pleiotropy under the assumption that the size of the effects of the variants on the exposure are independent of their direct effects on the outcome (i.e. there is no doseresponse confounding) (21). Furthermore, the weighted median estimator is able to provide robust MR estimates when more than half of the information for the analysis comes from valid instruments (22). In the MR analysis of the total effect of education on CVD outcome risk, and the effect of education on each risk factor, we also perform these techniques to investigate the robustness of our findings when relaxing assumptions on horizontal pleiotropy. Incidentally, these techniques are not yet developed for application in MR mediation analysis.
For all analyses in UK Biobank, models were replicated on the risk difference scale using multivariable linear regression. For the one-sample MR analyses, the IVREG2 Stata package was used for this (23). Additionally, all analyses were replicated using unadjusted models, models adjusted for age and sex only, and models stratified by sex and age dichotomized at the median (39-57 years compared with 58 to 72 years). On a subsample of UK Biobank participants with dietary recall questionnaires (including protein, carbohydrate, total fat, saturated fat, polyunsaturated fat, total sugar and fibre consumption) and exercise (weekly duration of moderate and vigorous physical activity) measures (N = 20,298), an observational multivariable multiple mediator model was analysed. This could not be completed using MR analyses as there are not suitable instruments for diet and exercise phenotypes. This analysis, and those stratified by age and sex, were carried out for the association between education and CVD (all subtypes) only, due to limited outcome events. Total effect = overall effect of education on CVD, including that mediated by BMI, SBP, smoking, and all other risk factors on the causal pathway. Direct effect = the effect of education on CVD that is not mediated by the risk factor listed. E.g. direct effect of education on CHD with risk factor BMI is the effect of education on CHD that is not mediated by BMI. Analyses adjusted for: age, sex, place of birth and Townsend deprivation index at birth. BMI, SBP and smoking were measured in 1-SD units.

Supplementary Figure 8: Association between 1-SD higher education and BMI, SBP and smoking in one-sample MR.
Analyses were adjusted for: age, sex, place of birth and Townsend deprivation index at birth. BMI, SBP and smoking were measured in 1-SD units.

Supplementary Figure 12: Associations between BMI, SBP and smoking, and risk of the CVD outcomes in one-sample MR.
Analyses adjusted for: age, sex, place of birth and Townsend deprivation index at birth. BMI, SBP and smoking were measured in 1-SD units.