Vitamin D and risk of pregnancy related hypertensive disorders: mendelian randomisation study

Abstract Objective To use mendelian randomisation to investigate whether 25-hydroxyvitamin D concentration has a causal effect on gestational hypertension or pre-eclampsia. Design One and two sample mendelian randomisation analyses. Setting Two European pregnancy cohorts (Avon Longitudinal Study of Parents and Children, and Generation R Study), and two case-control studies (subgroup nested within the Norwegian Mother and Child Cohort Study, and the UK Genetics of Pre-eclampsia Study). Participants 7389 women in a one sample mendelian randomisation analysis (751 with gestational hypertension and 135 with pre-eclampsia), and 3388 pre-eclampsia cases and 6059 controls in a two sample mendelian randomisation analysis. Exposures Single nucleotide polymorphisms in genes associated with vitamin D synthesis (rs10741657 and rs12785878) and metabolism (rs6013897 and rs2282679) were used as instrumental variables. Main outcome measures Gestational hypertension and pre-eclampsia defined according to the International Society for the Study of Hypertension in Pregnancy. Results In the conventional multivariable analysis, the relative risk for pre-eclampsia was 1.03 (95% confidence interval 1.00 to 1.07) per 10% decrease in 25-hydroxyvitamin D level, and 2.04 (1.02 to 4.07) for 25-hydroxyvitamin D levels <25 nmol/L compared with ≥75 nmol/L. No association was found for gestational hypertension. The one sample mendelian randomisation analysis using the total genetic risk score as an instrument did not provide strong evidence of a linear effect of 25-hydroxyvitamin D on the risk of gestational hypertension or pre-eclampsia: odds ratio 0.90 (95% confidence interval 0.78 to 1.03) and 1.19 (0.92 to 1.52) per 10% decrease, respectively. The two sample mendelian randomisation estimate gave an odds ratio for pre-eclampsia of 0.98 (0.89 to 1.07) per 10% decrease in 25-hydroxyvitamin D level, an odds ratio of 0.96 (0.80 to 1.15) per unit increase in the log(odds) of 25-hydroxyvitamin D level <75 nmol/L, and an odds ratio of 0.93 (0.73 to 1.19) per unit increase in the log(odds) of 25-hydroxyvitamin D levels <50 nmol/L. Conclusions No strong evidence was found to support a causal effect of vitamin D status on gestational hypertension or pre-eclampsia. Future mendelian randomisation studies with a larger number of women with pre-eclampsia or more genetic instruments that would increase the proportion of 25-hydroxyvitamin D levels explained by the instrument are needed.


WEBAPPENDIX
This appendix has been provided by the authors to give readers additional information about their work. because of non-European ethnicity/relatedness/gender mismatch, which left a total of 1,875 PE cases for the final analysis. Furthermore, of the 5,667 controls that were genotyped, 406 were removed because of genotyping quality issues, and a further 173 were removed because of non-European ethnicity/relatedness/gender mismatch, which left a total of 5,088 controls available for the final analysis.

25-hydroxyvitamin D levels
In ALSPAC, serum samples taken as part of routine antenatal care were collected and stored initially at −20 °C and then at −80 °C, with no further freeze-thaw cycles. Serum samples could be from any stage of pregnancy, and dates of blood sampling were obtained from medical records and verified from the freezer storage data. Serum 25-hydroxyvitamin D2 (25(OH)D2) and 25-hydroxyvitamin D3 (25(OH)D3) levels were measured with high performance liquid chromatography-tandem mass spectrometry (LC-MS/MS) at a laboratory at the University of East Anglia (East Anglia, United Kingdom) meeting the performance target set by the Vitamin D External Quality Assessment Scheme (DEQAS). 9 Inter-assay coefficients of variation were less than 10% across a working range of 2.5-624 nmol/L for both 25(OH)D3 and 25(OH)D2.
In the Generation R Study, antenatal blood samples were collected in mid-pregnancy (median 20.3 weeks, range 18.5-23.3 weeks). 10 Plasma was stored at -80°C from collection until the assays were conducted. Plasma levels of 25(OH)D2 and 25(OH)D3 was quantified using isotope dilution LC-MS/MS at the Queensland Brain Institute (Brisbane, Australia) also approved by DEQAS. Assay accuracy was assessed at four concentration levels for 25(OH)D3 (48. 3, 49.4, 76.4, 139.2 nmol/ L) and a single level for 25(OH)D2 (32.3 nmol/L), and was excellent at all concentration levels tested (<10% and <17%, respectively). 11

Standardization of 25-hydroxyvitamin D measurement by season of blood sample collection
The 25(OH)D levels were standardized by the season of blood sample collection. The date of blood sample collection was available in ALSPAC, while the calendar week of blood sample collection was available in Generation R and MoBa. We used the following sine-cosine function of the date of blood sample collection to model the seasonal variation in natural logarithm transformed 25(OH)D levels in a linear regression, which has been described in detail in a previous publication from the ALSPAC cohort. 9 f(t)= α + βh sin(2hπt) + θh cos(2hπt) Where α, βh & θh are estimated regression parameters and t is the date of the blood sampling/calendar week of blood sampling. This function has been shown to adequately describe the normal seasonal variations in 25(OH)D throughout the year among individuals of European ethnicity. 12 The steps of the standardization was as follows. First, we generated the linearly predicted value of the woman's natural log transformed 25(OH)D level. We subsequently back transformed the linearly predicted value of 25(OH)D on the log scale to the raw scale by exponentiating the value and generated the residual as the difference in the measured and predicted 25(OH)D values on the raw scale. Finally, we generated the standardized value of the woman's 25(OH)D on the raw scale by adding the population geometric mean and this estimated residual.

Genotyping
In ALSPAC, DNA samples were extracted from whole blood samples taken during pregnancy. The Centre National de Génotypage (Evry, France) carried out DNA genotyping on the Illumina human660W-quad array and genotypes were called with Illumina GenomeStudio. PLINK version 1.07 (http://pngu.mgh.harvard.edu/purcell/plink) 13 was used to carry out Quality Control (QC) measures on an initial set of 10,015 subjects and 557,124 directly genotyped single nucleotide polymorphisms (SNPs). SNPs were removed if they displayed more than 5% missing, a Hardy-Weinberg equilibrium (HWE) p-value of less than 1.0x10 -6 , or a minor allele frequency (MAF) of less than 1%. Samples/individuals were excluded if they displayed more than 5% missing values, had indeterminate X chromosome heterozygosity, extreme autosomal heterozygosity, or showed evidence of non-European ancestry. Multidimensional scaling of genome-wide identity was conducted by state pairwise distances using the four HapMap populations as a reference. Autosomal SNPs were imputed using Impute2 v2.2.2 and the 1000 genomes phase 1 version 3 reference panel including 2186 haplotypes from all populations. SNP call rates for rs10741657, rs12785878, rs2282679 and rs6013897 were 99.9%, 100%, 97.2% and 99.2%, respectively. All of these four SNPs used as instruments were imputed in ALSPAC, and they had a high imputation quality (R 2 >0.98).
In the Generation R Study, DNA was derived from whole blood samples in early pregnancy. DNA was extracted, plated and normalized from 5 ml whole blood at the Human Genotyping and Sequencing Facility of the Genetic Laboratory at the Department of Internal Medicine, Erasmus MC. Genotyping was performed at LGC Genomics (UK) using KASP genotyping. KASP genotyping assays are based on competitive allelespecific polymerase chain reaction and enable bi-allelic scoring of SNPs. Assays are deemed to be working successfully if clusters are distinct and call rates are consistently high. The data is automatically quality control checked. The automatic control is performed at LGC Genomics using KlusterCaller which performs review of the spatial distribution of clustering groups, genotype calling and assessment of control wells. No Template Controls (NTCs) are included on each plate to enable the detection of contamination or non-specific amplification. All plated samples are included in the QC (n = 7,675 women). SNP call rates for rs10741657, rs12785878, rs2282679 and rs6013897 were 99.4%, 99.2%, 99.6% and 99.2%, respectively. All SNPs used as genetic instruments in our study were directly genotyped. Generation R has previously reported the accuracy of KASP genotyping observing a concordance rate of 99.4% and non-reference discordant rate of 1.7% with other genotyping technology. 14 In MoBa, DNA was extracted manually from whole blood samples obtained at recruitment (around 18 gestational weeks) using the FlexiGene kit (Qiagen, Hilden, Germany). Mothers were genotyped by the UNC Mammalian Genotyping Core using the HumanCoreExome Bead Chip from Illumina (Illumina, Inc., San Diego, CA). Samples and SNPs were examined using PLINK 1.07 (http://pngu.mgh.harvard.edu/purcell/plink) for quality control. SNPs were excluded if the missing rate exceeded 5%, there was substantial deviation from HWE (p < 1x10 -3 ) or the MAF was <5%. Known genotype and DNA replicates were included on each plate and exhibited high genotyping quality. All subject-specific call rates were acceptable (minimum 97.2%). Sexspecific markers were inspected and relatedness and inbreeding within the cohort was assessed by identity by descent (IBD>0.125). For each pair of related mothers, we preferentially included the one with the most complete genetic data, or in the case of equivalence, randomly sampled between them. Quantile-quantile plots and calculation of genomic control lambda 15 (λGC =1.01) indicated no systematic test statistic inflation, unidentified relationships, or cryptic admixture. Outliers for any of the first three 1000 Genomes axes of variation (based on CEU, YRI, CHB, PUR, CLM, and MXL) >3 standard deviations from the mean were excluded. The post-QC dataset was imputed using PBWT 16 and pre-phased using SHAPEIT2 17 against the 1000 Genomes Phase 3 reference panel 18 . Imputation was conducted by the Sanger Imputation Service provided by the Wellcome Trust Sanger Institute. 19 Three of the four SNPs were directly genotyped (rs6013897 was imputed).
In the GOPEC study, the Illumina 660 chip was used to genotype 1,990 pregnant women with PE. A total of 594,398 variants were called with the GenCall algorithm. Quality control analysis was conducted using PLINK (http://zzz.bwh.harvard.edu/plink/) and SMARTPCA21. Briefly, the quality control included the following subject-level exclusion criteria: individual call rate <98%, heterozygosity >3 s.d. from the mean; any of the first three HapMap (based on CEU, YRI, CHB, JPT and GIH populations) principal axes of variation >4 s.d. from the mean; and sex mismatch. Related individuals (identity by descent (IBD) > 0.1) with the lowest call rates were preferentially removed. The variant-level exclusion criteria were as follows: call rate <98%; exact Hardy-Weinberg equilibrium P < 1 × 10−6; minor allele frequency (MAF) <1%; and non-random missingness of uncalled genotypes (plink-test-mishap) with Bonferroni-corrected P < 0.05. These filters left 1,882 samples and 508,748 variants. The GOPEC study obtained population controls from the National Blood Donors Cohort and the UK 1958 Birth Cohort. These samples were genotyped on the Illumina 1.2M chip and variants were called using GenCall. Strand-ambiguous markers were removed, and the standard QC procedure described above was then applied to the two control data sets. The merged control data set consisted of 5,121 samples and 860,427 variants. This control data set was merged with the case data set, resulting in 495,890 variants after quality for 1,875 cases and 5,088 controls. SNP call rates for rs10741657, rs12785878, rs2282679 and rs6013897 were 99.8%, 99.9%, 99.8% and 98.3%, respectively.
Cases and controls were imputed together with IMPUTE2 (impute_v2.3.0)22 and SHAPEIT23 using the prephasing workflow with the 1000 Genomes Project Phase 1 reference panel (December 2013) downloaded from the IMPUTE2 website. Imputation resulted in 11,553,589 biallelic variants with MAF >0.25% that were either directly genotyped or imputed with IMPUTE2 INFO score >0.6. This case-control datasets was densely imputed using IMPUTE2 (impute_v2.3.0) 20 and SHAPEIT2 21 using the pre-phasing workflow against the 1000 Genomes Phase 1 reference panel (Dec 2013) downloaded from the IMPUTE2 website. Three of the four genetic instruments used in the current study were imputed in the GOPEC study (rs2282679 was the only SNP directly genotyped) with a high quality score (R 2 >0.98). The final analysis sample consisted of 1,875 PE cases and 5,088 healthy controls. Post-imputation association analysis was carried out using SNPTEST (v2.4.1)22 with the "expected" method, including five principal components to account for an influence of population stratification and we subsequently applied a genomic control of λGC= 1.05.

Gestational hypertension and pre-eclampsia
In ALSPAC, information on pre-existing hypertension was available based on self-report through a questionnaire completed at recruitment, while six trained research midwives abstracted information on all measurements of blood pressure and proteinuria that were taken as part of routine antenatal care from the women's obstetric records. There was no between-midwife variation in mean values of the data abstracted, and error rates were consistently <1% in repeated data entry checks. Blood pressure measurements were taken in the seated position with Korotkoff phase V cuff. The median (IQR) number of blood pressure and proteinuria measurements throughout pregnancy were 13 (11,16) and 12 (9,14), respectively. Gestational hypertension (GH) was defined based on systolic blood pressure ≥140 mm Hg and/or diastolic blood pressure ≥90 mm Hg on two occasions after 20 weeks' gestation among previously normotensive women. PE was defined as GH along with proteinuria of ≥1+ (≥ 300 mg/dl) on urine dipstick testing on at least two occasions after 20 weeks' gestation.
In the Generation R Study, information on pre-existing hypertension was also available based on selfreport through a questionnaire administered at the time of recruitment. Hypertensive disorders in pregnancy was defined by certified medical doctors comprehensively reviewing the participant's medical charts. 22 Women were classified with GH if they had a systolic blood pressure ≥140 mmHg and/or a diastolic blood pressure of ≥90 mmHg first occurring after 20 weeks of gestation at two time-points. These criteria and the presence of proteinuria (defined as 2 or more dipstick readings of 2 or greater, 1 catheter sample reading of 1 or greater, or a 24-hour urine collection containing ≥ 0.at least 300 mg of protein) were used to identify women with PE.
The definition of PE in the MoBa PE validation study using antenatal medical records (used for MoBa3) 6 was defined as new-onset hypertension and proteinuria after 20 gestational weeks, with systolic blood pressure ≥ 140 mmHg and/or diastolic blood pressure of ≥ 90 mmHg on at least two occasions, and proteinuria defined by urine protein ≥ 0.3 g/24-hour or 1+ on urine dipstick. In the GOPEC study, the information used to define PE was exclusively from antenatal medical records. PE was defined based on new-onset hypertension after 20 gestational weeks, with systolic blood pressure ≥140 mmHg and/or diastolic blood pressure rose to ≥90 mmHg on two occasions, in combination with proteinuria as defined based on ≥0.3 g/24-hour or ≥1+ on dipstick testing of urine.

Dealing with missing data in multivariable analyses
In ALSPAC, missing covariable data were imputed using multivariable multiple imputation by chained equations, by generating 20 datasets with missing data imputed from a distribution of predicted missing values obtained by including 25(OH)D, GH/PE and all covariates, together with information on the four genetic instruments, in prediction models using chained equations. This method allows you to specify the type of regression that is to be used for each of the covariates. Association estimated were then obtained by summing across these datasets using Rubins rules. 23 Missing data in Generation-R was imputed according to the Fully Conditional Specification method predictive mean matching (Generation R). 24 Both of these methods assume that the data are missing at random. Distributions of observed and imputed data were consistent with each other (Supplementary Table 1). BMI=body-mass index.

Supplementary Tables eTable 1 Distribution of background characteristics in the Avon Longitudinal Study of Parents and Children (ALSPAC) and the Generation R Study
a In ALSPAC, this category included individuals who were previous smokers, while in Generation R, it included individuals who smoked early in pregnancy before realizing that they were pregnant.
b Calcium and energy intake was estimated using food-frequency questionnaires in Generation R, while ionized calcium was measured during pregnancy in ALSPAC. BMI=body-mass index.

eTable 2 Distribution of background characteristics within the observed and imputed datasets in the Avon Longitudinal Study of Parents and Children (ALSPAC) and the Generation R Study
a In ALSPAC, this category included individuals who were previous smokers, while in Generation R, it included individuals who smoked early in pregnancy before realizing that they were pregnant.
b Calcium and energy intake was estimated using food-frequency questionnaires in Generation R, while ionized calcium was measured during pregnancy in ALSPAC. BMI=body-mass index; SD=standard deviation.  Measures of association obtained from multinomial logistic regression analysis. a Adjusted for age, parity, pre-pregnancy BMI, education, smoking, calcium level/calcium intake and gestational week of blood sampling. b Associations reflect the change in risk of the outcome per 10% decrease in 25-hydroxyvitamin D. Approximately 15% of observations have missing information on one or more covariates in the multivariable analyses. Multiple imputation of missing covariate information was therefore conducted using chained equations, where a total of 20 imputed datasets were generated. Measures of association obtained from multinomial logistic regression analysis. Associations reflect the additive risk of each additional copy of the risk allele associated with decreased 25-hydroxyvitamin D, and are adjusted for seven principal components to account for population stratification (ALSPAC only).

eTable 10 Causal associations of 25-hydroxyvitamin D with gestational hypertension and pre-eclampsia for each genetic instrument in a one-sample Mendelian Randomization analysis of the Avon Longitudinal Study of Parents and Children (ALSPAC) and the Generation R Study
Genetic OR=odds ratio; CI=confidence interval.
The causal association was estimated using instrumental variable probit regression, and associations reflect the change in risk per 10% decrease in 25-hydroxyvitamin D.
The associations are adjusted for gestational week of blood sampling and seven principal components to account for population stratification (ALSPAC only). The associations were estimated from ordinary logistic regression, and reflect the additive risk of each additional copy of the risk allele associated with decreased 25-hydroxyvitamin D.
The estimates are adjusted for five principal components. The associations are estimated using the Wald ratio, and reflect the change in risk per 10% decrease in 25-hydroxyvitamin D.

eTable 14 The causal association between 25-hydroxyvitamin D cut-off levels and pre-eclampsia for each genetic instrument from a twosample Mendelian Randomization of the Norwegian Mother and Child Cohort Study (MoBa) and the UK Genetics of Pre-eclampsia Study (GOPEC)
Genetic The associations are estimated using the Wald ratio.

Supplementary figures eFigure 1 The Avon Longitudinal Study of Parents and Children (ALSPAC)
ALSPAC cohort n= 14,541 Singleton pregnancies available for vitamin D measurements n=14,269 n=198 excluded due to multiple pregnancy n=69 excluded due to no delivery details Women with information on vitamin D blood levels n=7,760 Women with vitamin D measured between 6 and 42 gestational weeks n=7,643 Women who were self-reported European who were genotyped n=4,691 n=6 excluded due to no information on gestational age at blood sampling n= 86 excluded due to gestational age at blood sampling before 6 weeks or after 42 weeks n= 24 excluded due to gestational age at blood sampling higher than gestational age at delivery Women with obstetric information on pregnancy complications without pre-existing hypertension n=4,066 n=40 excluded due to no information on pregnancy complications n= 415 excluded due to no information on hypertension before pregnancy n=170 excluded due to hypertensive disorder before pregnancy