External validation of prognostic models to predict risk of gestational diabetes mellitus in one Dutch cohort: prospective multicentre cohort studyBMJ 2016; 354 doi: https://doi.org/10.1136/bmj.i4338 (Published 30 August 2016) Cite this as: BMJ 2016;354:i4338
- Marije Lamain-de Ruiter, PhD student1,
- Anneke Kwee, obstetrician1,
- Christiana A Naaktgeboren, assistant professor of clinical epidemiology2,
- Inge de Groot, midwife3,
- Inge M Evers, obstetrician4,
- Floris Groenendaal, neonatologist5,
- Yolanda R Hering, midwife6,
- Anjoke J M Huisjes, obstetrician7,
- Cornel Kirpestein, midwife8,
- Wilma M Monincx, obstetrician9,
- Jacqueline E Siljee, senior researcher10,
- Annewil Van ’t Zelfde, midwife11,
- Charlotte M van Oirschot, obstetrician12,
- Simone A Vankan-Buitelaar, midwife13,
- Mariska A A W Vonk, midwife14,
- Therese A Wiegers, senior researcher15,
- Joost J Zwart, obstetrician, 16,
- Arie Franx, professor of obstetrics1,
- Karel G M Moons, professor of clinical epidemiology2,
- Maria P H Koster, assistant professor of reproductive epidemiology1 17
- 1Department of Obstetrics, Division Woman and Baby, University Medical Centre Utrecht, KE.04.123.1, PO box 85090, 3508 AB, Utrecht, Netherlands
- 2Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht, Netherlands
- 3Livive, Centre for Obstetrics, Tilburg, Netherlands
- 4Department of Obstetrics, Meander Medical Centre, Amersfoort, Netherlands
- 5Department of Neonatology, Division Woman and Baby, University Medical Centre Utrecht, Utrecht, Netherlands
- 6Department of Obstetrics, Zuwe Hofpoort Hospital, Woerden, Netherlands
- 7Department of Obstetrics, Gelre Hospital, Apeldoorn, Netherlands
- 8Department of Obstetrics, Hospital Rivierenland, Tiel, Netherlands
- 9Department of Obstetrics, St Antonius Hospital, Nieuwegein, Netherlands
- 10Centre for Infectious Diseases Research, Diagnostics and Screening (IDS), National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
- 11Midwifery practice Verloskundigen Amersfoort, Amersfoort, Netherlands
- 12Department of Obstetrics, St Elisabeth Hospital, Tilburg, Netherlands
- 13Midwifery practice GCM, Maarssenbroek, Netherlands
- 14Midwifery practice Het Wonder, Houten, Netherlands
- 15Netherlands Institute for health services research (NIVEL), Utrecht, Netherlands
- 16Department of Obstetrics, Deventer Hospital, Deventer, Netherlands
- 17Department of Obstetrics and Gynaecology, Erasmus MC, University Medical Centre Rotterdam, Rotterdam, Netherlands
- Correspondence to: M Lamain-de Ruiter
- Accepted 19 July 2016
Objective To perform an external validation and direct comparison of published prognostic models for early prediction of the risk of gestational diabetes mellitus, including predictors applicable in the first trimester of pregnancy.
Design External validation of all published prognostic models in large scale, prospective, multicentre cohort study.
Setting 31 independent midwifery practices and six hospitals in the Netherlands.
Participants Women recruited in their first trimester (<14 weeks) of pregnancy between December 2012 and January 2014, at their initial prenatal visit. Women with pre-existing diabetes mellitus of any type were excluded.
Main outcome measures Discrimination of the prognostic models was assessed by the C statistic, and calibration assessed by calibration plots.
Results 3723 women were included for analysis, of whom 181 (4.9%) developed gestational diabetes mellitus in pregnancy. 12 prognostic models for the disorder could be validated in the cohort. C statistics ranged from 0.67 to 0.78. Calibration plots showed that eight of the 12 models were well calibrated. The four models with the highest C statistics included almost all of the following predictors: maternal age, maternal body mass index, history of gestational diabetes mellitus, ethnicity, and family history of diabetes. Prognostic models had a similar performance in a subgroup of nulliparous women only. Decision curve analysis showed that the use of these four models always had a positive net benefit.
Conclusions In this external validation study, most of the published prognostic models for gestational diabetes mellitus show acceptable discrimination and calibration. The four models with the highest discriminative abilities in this study cohort, which also perform well in a subgroup of nulliparous women, are easy models to apply in clinical practice and therefore deserve further evaluation regarding their clinical impact.
In the field of obstetrics, the number of publications on prognostic models has more than tripled in the past decade,1 which reflects an increasing interest in risk based medicine. Risk based medicine aims to provide the most appropriate care to each patient, often guided by outcome risk estimates based on individual patient characteristics, test results, or even genetic information.2
As a result of the obesity pandemic, the incidence of gestational diabetes mellitus, notably occurring in the second or third trimester, is rising and is increasingly contributing to perinatal complications such as macrosomia, shoulder dystocia, caesarean section, and neonatal hypoglycaemia.3 4 Moreover, long term sequelae of gestational diabetes mellitus are type 2 diabetes in mothers and obesity in their offspring.5 6 Early diagnosis and treatment of gestational diabetes mellitus have been proven to improve pregnancy outcomes.7 8 Some guidelines propose a population strategy for diagnosing the disorder9 10 11 12 (that is, an oral glucose tolerance test) in all pregnant women, whereas others opt for a high risk strategy,13 which tests for gestational diabetes mellitus only in women with known risk factors. Both strategies include oral glucose tolerance tests in substantial numbers of women, most of which will lead to negative results, and therefore pose a too high burden to patients as well as health care resources.14 Accurate prognostic models for the risk of patients developing gestational diabetes mellitus early in pregnancy could discriminate between high risk and low risk pregnancies, and move towards more tailored care in pregnancy. In particular, this tailored care could result in fewer women undergoing a burdensome diagnostic test (that is, women with a predicted low risk for gestational diabetes mellitus not having to undergo an oral glucose tolerance test).
Several prognostic models for gestational diabetes mellitus have been developed. However, these prognostic models are not commonly used in routine clinical care nor are they recommended by current guidelines. This might be due to the fact that external validation of these prognostic models are scarce,15 16 17 18 let alone that all these models have been directly evaluated and compared on the basis of their predictive accuracy in one independent cohort by independent investigators. To acquire a fair comparison of their predictive accuracy, and thus of their clinical value, it is essential to perform a head-to-head comparison of all published prognostic models in one independent cohort.19 20 21 Thus, the aim of our study was to perform external validation and direct head-to-head comparison of all published first trimester prognostic models for gestational diabetes mellitus, in one independent cohort.
Study population of external validation cohort
We performed a large prospective multicentre cohort study (the Risk EStimation for PrEgnancy Complications to provide Tailored care (RESPECT) study) to validate prognostic models for gestational diabetes mellitus. From December 2012 to January 2014, we included pregnant women at their initial prenatal visit (<14 weeks of pregnancy) in 31 independent midwifery practices (primary care) and six hospitals (secondary or tertiary care) in the central region of the Netherlands. We excluded women with any type of pre-existing diabetes mellitus from the cohort. During their pregnancies, participants received routine antenatal care according to Dutch clinical guidelines.
This study was approved by the medical ethics committee of the University Medical Centre Utrecht (protocol no 12-432/C) and written informed consent was obtained from all participants. Results have been reported to conform with the TRIPOD statement.22 23
Predictors for gestational diabetes mellitus were all measured in the first trimester at the initial prenatal visit by caregivers or via a self-administered questionnaire. Supplementary appendix A provides detailed information on predictor definition and measurement. We did not report on the distributions of predictors among the original studies that we validated, because it was often missing in the original publications.
Gestational diabetes mellitus was diagnosed by a 75 g, two hour, oral glucose tolerance test between 24 and 28 weeks of gestation. According to the World Health Organization 1999 guidelines, which conform with the Dutch national recommendation, the disorder is indicated by the presence of either a fasting glucose level of at least 7.0 mmol/L (126 mg/dL) or a glucose level of at least 7.8 mmol/L (140 mg/dL) after two hours.24 25 Women were offered an oral glucose tolerance test if risk factors or any signs of gestational diabetes mellitus were present. Without risk factors or signs, women were not tested and considered as not having gestational diabetes mellitus.
Body mass index in first trimester greater than 30, history of gestational diabetes mellitus, history of macrosomia (birthweight above 95th centile of the Dutch population),26 family history of diabetes mellitus (first degree), non-western ethnicity, history of unexplained intrauterine fetal death, and polycystic ovary syndrome were considered as risk factors for gestational diabetes mellitus. Polyhydramnios and macrosomia were considered as possible signs of the disorder.
For studies validating prognostic models, there is no solid sample size recommendation, but a minimum of 100 patients with events and at least 100 patients without events has been suggested.27
Selection of prognostic models for external validation
In a previous systematic review, we identified 14 published prognostic models for gestational diabetes mellitus that can apply to the first trimester of pregnancy and that only consist of routine measures that are easy to obtain (unpublished data). A short summary of this systematic review is provided in supplementary appendix B.
For proper external validation, it is preferable that the exact definitions of the predictors included in the 14 prognostic models under validation are known as well as how they were measured. Although five of the author groups of the publications of these prognostic models were contacted by email for additional information on intercepts, predictor weights (regression coefficients), and definitions of predictors in the model, none of them responded. Despite this lack of information, it was still possible to include four of these five models in our head-to-head validation study, leaving one model that had to be excluded owing to missing intercept and coefficients.28 Moreover, we excluded one prognostic model from analysis for using maternal abdominal circumference and diagnosis of polycystic ovary syndrome as predictors, which we did not collect in our validation cohort and for which there was also no proxy variable available.29 Thus, a total of 12 prognostic models remained for external validation in the current study.30 31 32 33 34 35 36 37 38 39 40 41 Supplementary appendix C shows the equations of these 12 prognostic models as applied in our cohort. Table 1⇓ 30 31 32 33 34 35 36 37 38 39 40 41 also summarises predictors that were included in the prognostic models. None of the authors of the current external validation study was involved in the development of any of these models.
Predictor and outcome information was missing for some patients in the validation cohort, and these data were not missing completely at random, as can be derived from table 2⇓. To avoid biased validation of the models, we imputed the missing values using multiple imputation.42 All possible predictors and outcomes were used in the imputation model. Ten imputations were performed. Results shown are the results after the multiple imputed data, unless otherwise specified.
Firstly, we applied the original prognostic models—that is, exactly as they were published—to our study cohort when the full prediction rule, including its intercept, was available (supplementary appendix C). Next, to allow for fair comparison of the prognostic models, we performed logistic recalibration by fitting logistic regression models using the linear predictor as the only covariate. This step resulted in an updated calibration slope and intercept.43 44
We assessed discrimination using Harrell’s C statistic, which is equivalent to the area under the receiver operating characteristic curve.45 It verifies whether participants with a higher predicted risk for gestational diabetes mellitus are indeed more likely to have the disease.
Calibration of the validated original and logistically recalibrated models was assessed by calculation of the predicted probabilities of gestational diabetes mellitus for each individual and comparison of these with their observed outcomes in calibration plots. When a model is well calibrated, the predicted probabilities equal the observed proportions for all groups of predicted probabilities. Thus, when a model is well calibrated, the calibration plot has an intercept of 0 and a slope of 1 and all groups (normally 10 groups) of predicted probabilities fit close to this line. Some calibration plots, however, have fewer than 10 points because it was not possible to split the predicted probabilities into 10 groups. This was the case for models that included only a few categorical variables (eg, sum score models) in which a limited number of predicted probabilities (<10) were possible.
The calibration intercept and slope of the linear prediction after recalibration were used to assess overestimation and underestimation of the models as well as overfitting. A calibration intercept of less than 0 indicates overestimation (the predictions from the original models are too high), whereas an intercept of greater than 0 indicates underestimation. A calibration slope of less than 1 indicates overfitting of the original prognostic models.46
A history of gestational diabetes mellitus is an important predictor in most models, but is always scored as negative (=zero) in the prognostic algorithms for nulliparous women, owing to them not having had a previous pregnancy. Therefore, we also reassessed discrimination and calibration of all 12 logistically recalibrated models in a subgroup analysis of nulliparous women to see if the results of the whole population were not merely the result of an excellent prediction of gestational diabetes mellitus in multiparous women (that is, women with a known history of the disorder).
Finally, decision curve analysis was performed for the prognostic models with the best discriminative abilities.47 Such analyses provide insight into the range of predicted risks for which the model has a higher net benefit than simply either classifying all patients as having the outcome or no (zero) patients as having the outcome. Decision curve analysis can also compare the net benefits of models.
All analyses were done on each of the multiple imputed datasets, and Rubin’s rules were used to combine the results into summary estimates. Analyses were performed by the mice and rms packages of R-3.1 for Windows (http://cran.r-project.org).
The HELLP Foundation, a Dutch patient confederation for patients who had a pregnancy complicated by hypertensive disorders, was involved in defining the research question and the design of the study. Pregnant women were not involved in defining the outcome measures. Focus group interviews with pregnant women participating in the RESPECT cohort were organised to discuss the results and how these results should be implemented into routine care. The final results will be disseminated on the internet through the websites of midwifery practices and regional midwifery collaboration associations.
Of 3723 women included for analysis, 1655 (44%) were nulliparous (fig 1⇓). Table 2⇑ shows the baseline characteristics of these women. Gestational diabetes mellitus was diagnosed in 181 (4.9%) women, 33 (18%) of whom needed insulin for glycaemic control. In the nulliparous subgroup, 71 women (4.3%) developed gestational diabetes mellitus.
Calibration of prognostic models
Three original publications provided the full prediction rule (Gabbay-Benziv 2014, Savona-Ventura 2013, and Van Leeuwen 2010) of which two models showed good calibration, as evidenced both by the calibration plots and the calibration slopes and intercepts (Gabbay-Benziv 2014, Van Leeuwen 2010; fig 2⇓). The model of Savona-Ventura 2013 had a poor calibration, probably partly due to overfitting (slope <1), and tended to overestimate the risk of gestational diabetes mellitus (intercept <0). The calibration plot in figure 2⇓ illustrates how the predicted risks are higher than the observed risks.
However, even though the original intercept was missing, calibration plots were also drawn for each recalibrated model (fig 3⇓). Eight of 12 recalibrated models showed good calibration, with the calibration line closely following the ideal calibration line. The models by Nanda 2011, Pintaudi 2014, Shirazian 2009, and Teede 2011 had sporadic overestimation and underestimation.
C statistics for the original and recalibrated models ranged from 0.67 to 0.78 (table 3⇓). As expected, C statistics of the recalibrated models were slightly lower than those of the original development population. The four models with the highest C statistic (Gabbay-Benziv 2014, Nanda 2011, Teede 2011, and Van Leeuwen 2010) included maternal age, body mass index, history of gestational diabetes mellitus, ethnicity, and family history of diabetes mellitus as predictors. The poorest discriminating models were the models containing the fewest predictors (Eleftheriades 2014, Savona-Ventura 2013, and Tran 2013).
In four prognostic models (Gabbay-Benziv 2014, Nanda 2011, Naylor 1997, and Teede 2011), discrimination for nulliparous women was worse than that for the overall population (table 3⇑). For all other models, the C statistic was higher for nulliparous women only than for all women. Calibration of the prognostic models was also acceptable to good in the nulliparous subgroup (fig 4⇓).
Decision curve analysis
The decision curve analysis results of the four most discriminating models (Gabbay-Benziv 2014, Nanda 2011, Teede 2011, and Van Leeuwen 2010) are shown in figure 5⇓. For predicted probability thresholds between 0% and 40%, either prognostic model showed a positive net benefit.
A total of 12 first trimester prognostic models for gestational diabetes mellitus were selected by comprehensive review of literature and were compared head-to-head for their predictive accuracy in our population based cohort of 3723 women. Two prognostic models overall (Teede 2011 and Van Leeuwen 2010) had the best performance based on discrimination, calibration, and performance in nulliparous subgroup. Predictors in these particular prognostic models (that is, maternal age, body mass index, ethnicity, parity, history of gestation diabetes mellitus, and history of macrosomia) are easy to measure and widely applicable. Calibration was good for all models and improved by recalibration of the models to our population. Although obstetric history is an important predictor in most models, the prognostic models for gestational diabetes mellitus also performed well in nulliparous women.
Strengths and limitations
This external validation study comprises almost all published, first trimester, prognostic models for gestational diabetes mellitus in one cohort study, allowing for head-to-head comparison of these models. Our study had a large sample size and many cases of gestational diabetes mellitus, used a prospective multicentre approach, and included an unselected population of women from primary care (low risk) as well as secondary or tertiary care (high risk) within a geographically defined area. Additionally, missing data were handled by multiple imputation, which is the most preferable method.48
However, some limitations of our study need to be addressed. Firstly, according to Dutch guidelines, a high risk strategy was adhered. To prevent unnecessary testing in study participants, women without predefined risk factors only underwent an oral glucose tolerance test in case of any symptoms of gestational diabetes mellitus. This strategy could have led to an underestimation of the disorder in low risk women. A study on the performance of similar strategies estimated that 7.3% of diagnoses of gestational diabetes mellitus may have been missed.49
However, this possible underestimation is unlikely to have affected the discriminative ability of the validated models, because the C statistic is a rank order insensitive to systematic errors in calibration such as differences in outcome incidence.50 Moreover, this potential underestimation is also unlikely to have affected our inferences on the predictive accuracy of the models because we recalibrated the models, which accounts for any differences in overall incidence of gestational diabetes mellitus between the original model development studies and our external validation study. Thus, regardless of any known differences in disease incidence, it is important that prognostic models are recalibrated before they are applied to a new population or when different diagnostic criteria are used.
Secondly, we were not able to include two published prognostic models in our external validation. For one model, information on the prediction rule was not available despite contacting the authors. The other study was published after the start of data collection for our validation cohort, and information on some predictors (such as maternal abdominal circumference and presence of polycystic ovary syndrome) was not collected.
Comparison with other studies
Validation studies on prognostic models for gestational diabetes mellitus are scarce. Our study differs from previous validation studies15 16 17 18 by performing a head-to-head comparison, whereas other studies validated a single prognostic model or only a small selection of the prognostic models for gestational diabetes mellitus. However, our findings are similar to those of these external validation studies, except for the external validation of the Van Leeuwen 2010 model by Lovati and colleagues.15 In that study, the Van Leeuwen 2010 model yielded a poor C statistic (0.60), by contrast with the other external validation studies (including our current study) that showed C statistics between 0.74 and 0.77.17 38 This difference might be due to the case-control study design chosen by Lovati and colleagues, in which it is not possible to adjust for observed outcome frequency.
Clinical implications and conclusions
The use of accurate first trimester prognostic models for gestational diabetes mellitus allows for early and personalised risk stratification in pregnancy. This approach is in contrast to current strategies in which an oral glucose tolerance test is either performed in all women9 10 11 12 or performed in the presence of any prespecified risk factors.13 Prognostic models have the advantage of being cheap and easy to implement and could avoid the need to perform an oral glucose tolerance test in women with a low risk of developing gestational diabetes mellitus, which relieves both burden and costs.9 10 A comparison of performance between the best discriminating prognostic models and current strategies allows an approach that weighs up the pros and cons (eg, missed cases), and will help choose the model to be implemented into clinical practice. The decision on which of the four best models to implement in clinical practice might also depend on population characteristics, availability of predictors, and the incidence of gestational diabetes mellitus, and could therefore be country or region specific.
Implementation of prognostic models for gestational diabetes mellitus early in pregnancy provides room for preventive measures—that is, lifestyle modification interventions such as diet and exercise counselling.51 52 Although drug treatments should probably not be the preferred primary intervention to prevent gestational diabetes mellitus, metformin could have a role in the prevention of the disorder in high risk populations.53 54 Early prevention, screening, diagnosis, and treatment of gestational diabetes mellitus, when necessary, can and will most likely reduce the rates of caesarean section, neonatal hypoglycaemia and macrosomia, and long term neonatal complications.4
In conclusion, most of the 12 previously published prognostic models for gestational diabetes mellitus that have been validated in this study show acceptable to good discrimination and calibration. Four models had C statistics of at least 0.75 (Gabbay-Benziv 2014, Nanda 2011, Teede 2011, and Van Leeuwen 2010). We recommend that these four models be further investigated for implementation in clinical practice. The models by Teede and colleagues and Van Leeuwen and colleagues, with the best overall performance, are easy to apply in clinical practice. The models consist of straightforward predictors: maternal age, body mass index, ethnicity, parity, history of gestational diabetes mellitus, and history of macrosomia. Once prognostic models for gestational diabetes mellitus are applied in routine clinical care, further research is recommended on the effects on clinical impact, actual development of the disorder, and subsequent pregnancy outcomes.
What is already known on this topic
Gestational diabetes mellitus is an increasingly common complication of pregnancy, and pregnancy outcomes can be improved through early screening, diagnosis, and treatment
Many prognostic models estimating the risk of gestational diabetes mellitus have been developed, but an external validation and direct comparison in an independent large cohort of all published models is lacking
What this study adds
This external validation study shows that in a direct comparison, most published prognostic models for gestational diabetes mellitus in the first trimester have an acceptable discrimination and good calibration
The best performing models in this group can be considered for implementation in routine clinical care
We thank all the pregnant women who participated in the RESPECT study.
Contributors: MPHK, AK, AF, KGMM, and the RESPECT study group (IdG, IME, FG, YRH, AJMH, CK, WMM, JES, AV’tZ, CMvO, SAV-B, MAAWV, TAW, JJZ) had the original idea for the study and were involved in writing the original study protocol. The RESPECT study group and ML-dR were involved in data collection. CAN and ML-dR performed data analysis. ML-dR, CAN, and MPHK wrote the first draft of the manuscript, which was subsequently revised by AF, AK, and KGMM. All authors participated in the final approval of the manuscript. MPHK and AF are the guarantors of this study. All authors had full access to all of the data (including statistical reports and tables) in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Funding: This study has been conducted with the support of the Netherlands Organisation for Health Research and Development (project no 50-50200-98-060). The funding source had no role in the design, conduct, analyses, or reporting of the study or in the decision to submit the manuscript for publication.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: support from the Netherlands Organisation for Health Research and Development for the submitted work; no relationships with companies that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: This study was approved by the medical ethics committee of the University Medical Centre Utrecht (protocol no 12-432/C) and written informed consent was obtained from all participants.
Data sharing: Patient level data and full dataset and technical appendix and statistical code are available from the corresponding author (email@example.com). Informed consent was not obtained but the presented data are anonymised and the risk of identification is low.
The lead author affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/.