Introduction

Lifestyle and drug intervention in individuals at high-risk for type 2 diabetes mellitus have been shown to be able to prevent or delay the occurrence of type 2 diabetes [1, 2]. More specifically, lifestyle intervention was most effective among individuals with a high score on the Finnish diabetes risk tool that predicts the occurrence of type 2 diabetes [3]. Therefore, detection of high-risk individuals using a risk score may be relevant to identify those who may benefit most from interventions.

Next to existing risk scores to detect undiagnosed diabetes [46], several investigators have developed risk scores that predict incident diabetes [715]. Some of these risk scores include genetic information [7], laboratory measurements [8, 9] or systolic blood pressure [10] and are therefore applicable in clinical practice only. Others contain readily available information that is suitable for self-report questionnaires [1115]. The questionnaire approach has been used as a first step in risk assessment or screening programmes because it does not require blood testing and is therefore simple, short and inexpensive [16]. The Finnish diabetes risk score is a widely applied risk score in Europid populations. The Finnish risk questionnaire has been shown to predict drug-treated type 2 diabetes in Finland [11] and type 2 diabetes as diagnosed by an OGTT in a Dutch population [17].

However, the Finnish risk questionnaire was developed to predict drug-treated type 2 diabetes without screening with an OGTT. It therefore seems appropriate to update it using a combined endpoint of clinically diagnosed (not only drug-treated) and screen-detected type 2 diabetes [18]. It is possible that the current risk questionnaire poorly identifies people whose diabetes would not be diagnosed and treated in clinical practice. For example, this may concern those with isolated elevation of post-load glucose levels, because an OGTT is usually not applied in general practice. It has been shown that there is limited overlap between fasting and post-load glucose levels; however, post-load glucose levels are strongly associated with cardiovascular risk [19]. Furthermore, the Finnish investigators had no information on family history of type 2 diabetes, hip circumference or smoking, which are potentially important predictors [2022]. Updating an existing prediction tool instead of developing a new model may be a good alternative with the advantage of combining prior information on predictors with new data [23].

The aim of the present study was to update the Finnish diabetes risk questionnaire for the prediction of future clinically diagnosed and screen-detected type 2 diabetes and to consider additional predictors in an international population.

Methods

Study population

The Evaluation of Screening and Early Detection Strategies for Type 2 Diabetes and Impaired Glucose Tolerance (DETECT-2) project is an international data-pooling collaboration addressing issues related to screening for type 2 diabetes [24]. Minimum requirements for DETECT-2 participation were (1) population-based surveys of large cohorts (n > 500 people); (2) with all participants without known diabetes having a 75 g OGTT to classify type 2 diabetes according to the WHO 2006 criteria [18].

For the present analysis, all centres in the DETECT-2 collaboration with published incidence rates of type 2 diabetes were asked for baseline and follow-up information on oral glucose tolerance status. A total of six centres located in Europe (Hoorn Study [25], Inter99 [26], The Northern Sweden MONICA Study [27] and Whitehall-II [28]), Australia (AusDiab [29]) and Africa (Mauritius [30]), were included in the present analysis. Participation rates at the follow-up examination ranged from 50% in NSW-MONICA to 75% in Inter99.

Of the 23,226 persons with information on glucose tolerance status at follow up, we excluded 1,334 patients with known diabetes at baseline, 985 patients with screen-detected diabetes at baseline and 343 persons with missing or incomplete information on glucose status at baseline, leaving 20,564 participants for the present analysis. Participants provided informed consent and the studies have been approved by the relevant local ethics committees in accordance with the Declaration of Helsinki (revision 2000).

Incidence of type 2 diabetes

This was defined as self-reported clinically diagnosed type 2 diabetes and/or screen-detected type 2 diabetes at the follow-up examination as defined by standard 75 g OGTT according to the WHO 2006 criteria [18]. To reach comparable follow-up periods for each centre, the follow-up period for each individual was truncated at 5 years after baseline and the assumption was made that the date of type 2 diabetes diagnosis was in the middle of the follow-up period. Hence, incident cases are all persons who developed type 2 diabetes within 5 years after the baseline examination.

The Finnish diabetes risk score in DETECT-2

The Finnish diabetes risk score predicts the incidence of drug-treated type 2 diabetes within 10 years. The variables of the Finnish diabetes risk score are: age (in categories <45; 45–54; 55–64 and ≥65 years), BMI (in categories ≤25; 25–29.99 and ≥30 kg/m2 and higher), waist circumference (in categories for men: <94; 94–101.99 and ≥102 cm and for women: <80; 80–87.99 and ≥88 cm), use of blood pressure medication and history of high blood glucose [11]. Family history of diabetes was proposed as an item of the original Finnish risk score but the scores for parent or sibling diabetes were based on literature reports, and no data were available to verify the attributed score [11].

In the present analysis, if information on the use of blood pressure medication was missing then we set this variable to ‘no’. Information on history of high blood glucose was limited to the history of gestational diabetes. Multiple imputations by chained equations were applied to eliminate missing values on age (0.03%), BMI (0.4%), waist circumference (0.6%) and history of gestational diabetes (missing in four of the six centres = 65% of the study population), resulting in five imputed datasets. Regression coefficients and standard errors of the predictors of type 2 diabetes in the five imputed datasets were comparable to each other and to the pooled estimates after applying Rubin’s rules, which means that imputation variation was low. Therefore, we used one randomly chosen imputed dataset for further analysis.

Updating

The Finnish diabetes risk scores were re-estimated by multiple logistic regression analysis with all variables of the Finnish model.

Thereafter, relevant predictors (identified by a literature search) were added one by one to the multiple logistic regression model. These included sex, smoking, hip circumference (in sex-specific tertiles), use of lipid-lowering medication and family history of diabetes (parent or sibling or both). Finally, in additional analyses, we assessed an item of the Rose questionnaire [31]: reported ‘shortness of breath when walking with people of the same age’ as a potential predictor that was available in two centres. Except for potential interactions with study region, interactions were not considered because we aimed to develop a risk score that was internationally applicable and adding an interaction term would hamper the practical feasibility of the model. We investigated whether these variables improved the discriminative value of the model. The discriminative value was assessed using the area under the receiver-operating characteristic (ROC) curves, plotting sensitivity against 1 – specificity. The area under the ROC curve is a global summary statistic of the discriminative value of a model, describing the probability that the score is higher in an individual developing type 2 diabetes than in an individual not developing type 2 diabetes. The difference between models in area under the ROC curve was tested by a method described by Hanley and McNeil [32]. To test model goodness-of-fit we performed the Hosmer and Lemeshow test [33]. We also calculated the net reclassification improvement of the models that included additional predictors compared with the model without the additional predictors according to the method of Pencina et al. [34]. For that procedure, three risk categories were chosen a priori: low risk (<10%), intermediate risk (10–20%) and high risk (>20%).

Sensitivity analyses

In sensitivity analyses, we assessed the ability of the model to detect undiagnosed diabetes and to predict future clinically diagnosed (i.e. without screen-detected cases) diabetes.

Validation and comparison with existing risk scores

It is well-known that the apparent performance of a model is better in the derivation dataset than in another dataset, which is called overfitting. To estimate the amount of overfitting in the regression coefficients and the area under the ROC curves, bootstrapping techniques were used [35]. These bootstrap-adjusted measures represent the values that can be expected when the model is applied to future similar populations. Therefore, 200 bootstrap samples of equal size to the original dataset were drawn from the original dataset. The amount of overfitting in regression coefficients can be estimated by the slope index. The slope is calculated with the logistic regression model and the linear predictor scores as the only determinant. The linear predictor scores in each bootstrap and original sample are calculated by multiplying the model’s regression coefficients by the predictor values for each patient and summing them (including the intercept). Ideally this slope is 1, which means that the model’s predicted and observed probabilities agree over the whole range of predictions. The difference in slope between the bootstrap and original sample can then be used as a shrinkage factor by multiplying the slope with each regression coefficient in the original model [23]. Thereafter, these bootstrap-adjusted regression coefficients were transformed into scores, i.e. multiplied by 4 and rounded off for simplicity to make the score values comparable to those of the Finnish risk score. For each individual, the total risk score can then be calculated by adding up the scores of each separate item. The performance of the updated model was compared with existing risk models for prevalent and incident diabetes for which (almost complete) information was available. Analyses were performed in SPSS 15.0.0 (SPSS, Chicago, IL, USA) and bootstrapping was performed in R 2.9.0 for Windows.

Results

Characteristics of the study population for each centre are given in Table 1. Because of interactions between almost all risk predictors and the study region Mauritius, we excluded this study cohort from further analyses. In total, the number of persons who developed type 2 diabetes during the 5 year follow up was 844 of 18,301 (4.6%). Incidence rate varied from 4.6 to 20.6 per 1,000 person-years and was highest among the older Hoorn study participants.

Table 1 Characteristics of the study populations

The Finnish diabetes risk score had an area under the ROC curve of 0.742 (95% CI 0.726–0.758) in the whole study population, ranging from 0.63 to 0.78 among centres. Re-estimating the coefficients of the original Finnish score showed that all variables of the Finnish risk model were independent and significant predictors in the multiple regression model (Table 2). Also, re-estimating the regression coefficients resulted in an improvement in the area under the ROC curve (0.766 [95% CI 0.750–0.783]) compared with the original Finnish model (p < 0.05).

Table 2 Multiple logistic regression models predicting 5 year incidence of type 2 diabetes in the DETECT-2 cohort (n = 18,301)

Information on sex, smoking and hip circumference was added one by one to the model. Male sex and smoking were both significant predictors in the multiple model, contributing significantly to the area under the ROC curve (p < 0.05), compared with the re-estimated Finnish model. However, hip circumference did not improve the area under the ROC curve significantly.

Of those variables that were available in fewer than five centres, a family history of diabetes was a significant predictor in the multiple model (Table 2) and contributed to the area under the ROC curve on top of the re-estimated model including sex and smoking (p < 0.05). In contrast, information on the use of lipid-lowering medication did not statistically significantly change the area under the ROC curve. The full, extended model included three additional variables to the Finnish risk model; sex, smoking and family history of diabetes. In the three centres that had information on all variables, the area under the ROC curve improved from 0.754 (95% CI 0.735–0.773) to 0.764 (95% CI 0.746–0.783, p < 0.05) ranging from 0.70 to 0.78 among centres (Fig. 1). Some heterogeneity between study region and risk predictors was found, as observed by the range in coefficients among centres, but this could not be attributed to one specific study centre (Table 2). As indicated by non-overlapping confidence intervals, the regression coefficient for age (category 45–55 years) was lower in AUSDIAB, the coefficient for BMI (≥30 kg/m2) was lower in Hoorn and the coefficient for history of gestational diabetes was higher in AUSDIAB. The Hosmer and Lemeshow test results showed good calibration (χ 2 = 10.0, p = 0.27, Fig. 2).

Fig. 1
figure 1

ROC curve of the extended model (AUC 0.764 [95% CI 0.746–0.783]) compared with the re-estimated model (AUC 0.742 [95% CI 0.723–0.762]) and the original Finnish model (AUC 0.712 [95% CI 0.692–0.731]) in three centres (n = 11,523) with information on the extended model including family history. Light grey line, original Finish risk score; medium grey line, re-estimated score, black line, extended score

Fig. 2
figure 2

Observed and expected risk for developing type 2 diabetes (%) in deciles of expected risk according to the extended model in three centres (n = 11,523) with information on the extended model including family history; grey bars, observed risk; black bars, expected risk

Reclassification improved by 7.3% among persons who developed diabetes and worsened by 1.0% among persons who did not develop diabetes, so net reclassification was 6.3% (p < 0.001) compared with the re-estimated Finnish model.

Sensitivity analyses

Because the present risk model was developed as a risk questionnaire, which is often used as a first step in risk assessment, prevalent undiagnosed type 2 diabetes may remain undiagnosed. Therefore, we assessed to what extent the present developed model is able to detect prevalent, screen-detected type 2 diabetes. Prevalent undiagnosed type 2 diabetes was equally well predicted as incident type 2 diabetes by the present model (area under the ROC curve 0.765 [95% CI 0.746–0.783] as compared with 0.764 [95% CI 0.746–0.783] for incident type 2 diabetes).

The performance of the updated risk model to predict future clinically diagnosed cases only was higher than if clinically diagnosed and screen-detected patients were predicted together (area under the ROC curve 0.787 [95% CI 0.760–0.814] for clinically diagnosed only compared with 0.764 [95% CI 0.746–0.783]).

Validation and comparison with existing risk scores

Comparison of bootstrap-adjusted areas and the original areas under the ROC curve showed only a marginal difference. The area under the ROC curve of the extended model decreased from 0.764 (95% CI 0.746–0.783) to 0.759 (95% CI 0.739–0.779). Bootstrap-adjusted regression coefficients were also marginally smaller than the original coefficients (slope index 0.974, Table 2).

The original Finnish risk score and the updated score performed better than existing risk scores for prevalent or incident type 2 diabetes. The area under the ROC curve of the Framingham Offspring Study [13] was 0.658 (95% CI 0.637–0.679), of the Aekplakorn et al. risk score [14] 0.654 (95% CI 0.633–0.675), of Cambridge risk score [4] without information on use of steroids 0.669 (95% CI 0.647–0.691) and of the Danish diabetes risk score [5] without information on physical activity 0.665 (95% CI 0.643–0.687).

Cut-off points

Test characteristics for a range of cut-off values are presented in Table 3. At a score of 7 or higher, the sum of sensitivity and specificity was maximised. Of the total population, 40% had a score of 7 or higher. A score of 7 or higher captures 76% of the cases who will develop type 2 diabetes (sensitivity). Furthermore, 63% of the persons who do not develop type 2 diabetes had a score lower than 7 (specificity).

Table 3 Sensitivity, specificity and positive predictive value for various cut-off values in a subsample of three centres with information on the extended model including family history

Additional analyses

Information on self-reported shortness of breath when walking was evaluated as a risk predictor in two centres (Inter99 and Hoorn). This item was a significant predictor in the multiple model but adding information on this item to the extended model did not improve discrimination (p > 0.05) or net reclassification (p > 0.05).

Discussion

The present dataset is derived from a large data pooling collaboration containing five cohort studies with information on glucose tolerance status based on an OGTT. This provides an excellent opportunity to assess screen-detected next to clinically diagnosed incident diabetes and as such to validate and update the widely used Finnish diabetes risk score.

We showed that the Finnish score was already a good predictor for clinically diagnosed and screen-detected type 2 diabetes. Also, adding easily accessible information on sex, smoking and family history of diabetes improved the test’s predictive power.

The present study differs from the Finnish study with respect to both population characteristics and the way that type 2 diabetes was diagnosed [11], for this reason we cannot simply ascribe the observed differences to the inclusion of screen-detected incident type 2 diabetes in our endpoint. Nevertheless, the present definition including screen-detected and clinically diagnosed type 2 diabetes involves more patients than the drug-treated patients that were predicted by the original Finnish score. Drug-treated patients probably have more advanced diabetes than screen-detected patients or patients treated by lifestyle only. This may at least in part, explain the lower discrimination of the present risk score as compared with the original Finnish risk score in the Finnish study in which it was developed. Indeed, the present risk score also predicted clinically diagnosed patients better than screen-detected patients. We think however that use of the present definition of type 2 diabetes is an improvement because it will also detect future diabetes patients who are less likely to be detected by general practice, for example because of isolated elevation of post-load glucose levels. This will lead to the identification of a population that may benefit from preventive interventions given their high cardiovascular risk [19, 36].

One advantage of updating an existing model instead of developing a new one is that it combines earlier information with new information. Updating methods vary from re-calibration to extensive revision: re-estimation of regression coefficients and consideration of new predictors [23]. As we also used a different definition of the endpoint, and our dataset had sufficient power, we chose to extensively update this model.

Recent assessment, especially of new biomarkers for cardiovascular disease, have shown that measures of discrimination, calibration and reclassification are, when used in isolation, not sufficient to decide whether one model is superior to another [34]. Therefore, we used more than one of these measures to decide whether or not to include additional risk factors.

In the present analysis, we excluded patients with prevalent known or screen-detected type 2 diabetes. However, because the questionnaire usually serves as a first step in risk assessment, prevalent type 2 diabetes may remain undiagnosed. Therefore, the predictive value for prevalent undiagnosed type 2 diabetes was also evaluated, which showed equally good performance of the model for prevalent undiagnosed and incident type 2 diabetes.

We evaluated the concise Finnish risk model instead of the full risk model, which also includes physical activity and daily consumption of vegetables, fruit or berries [11]. The main reason for this is the lack of information in the present dataset. Furthermore, it is uncertain whether asking one isolated question about diet or physical activity is predictive for type 2 diabetes, taking into account that for the development of the original risk model, these items were obtained from an extensive questionnaire. Finally, even when obtained from an extensive questionnaire, the items on physical activity and diet only minimally improved prediction [11].

We previously reported that the proposed score points associated with the presence of one family member with diabetes were relatively high (5 compared with the present 3 for parent or sibling) [17]. We also found that someone with a sibling with diabetes is at similar risk to someone with a parent with diabetes. Data on this are scarce, but this finding is in line with a study among older Australians who reported no consistent difference in the risk between those with parental and sibling history of type 2 diabetes [37] but contradicts a study among ethnic Chinese people [38]. The difference between the studies may relate to age of onset of diabetes in the family members. Among younger people, a sibling with diabetes is likely to have developed diabetes at a younger age than a parent with diabetes, and hence may reflect a greater risk. However, among older adults, there is less likely to be a difference in age of onset between siblings and parents with diabetes. Furthermore, we found that the presence of both a parent and a sibling with diabetes greatly increases the risk for diabetes. This graded effect of family history on risk for diabetes was pointed out by the NHANES investigators [39].

Our study obviously has some limitations. First, although especially the case for smaller datasets, external validation is needed before implementing a prediction model. It is well-known that the performance of a prediction model is generally higher in the development dataset than in another dataset, especially when the population differs (in time or place) from the original population. However, to some extent, external validation can be studied by testing the performance of the model in a non-random selection of the population, which was done here by separate assessments of the performance in each study centre. This showed good performance, which is promising for external application of the model. Second, although we showed good model performance in Europids, extrapolation to other populations should be done with caution. Indeed, we found different associations between risk predictors and incident type 2 diabetes in the Mauritius cohort and therefore excluded the latter from the present analysis. Recalibration of the score in other populations would probably lead to more precise risk estimates and is therefore indicated if data are available. Ethnicity within England and Wales has indeed been shown to be an important predictor for future drug-treated type 2 diabetes [15]. In the same study, socioeconomic differences were also found to be predictive. Studies are needed to assess whether ethnic and socioeconomic measures would improve existing models in the developing world. Third, the characteristics of participants of current population-based cohort studies may differ from the characteristics of people participating in real-life step-wise screening programmes because attendance rate for screening programmes is generally low [40]. Selective participation of those at lower risk for example will reduce the positive predictive value of a risk score. Selective participation was not observed in the ADDITION-Denmark study, using a mail-distributed risk chart as a first screening step [40]. Instead, overall participation rate was low, which indicated that one of the major challenges in implementing primary prevention programmes is to reach appropriate attendance rates. Fourth, although test characteristics of the present score show good results, follow-up measurements will be indicated for a substantial part of the population (40%), which has important implications for healthcare. However, of this high-risk population, 11% will indeed develop diabetes within the relatively short timeframe of 5 years whereas lifetime diabetes risk will be higher. Further, diabetes risk scores have also been shown to predict cardiovascular disease [41, 42]. Thus, the 11% of the high-risk population that will develop diabetes within 5 years is the minimum that will benefit from timely intervention. Fifth, the WHO definition of type 2 diabetes requires two abnormal glucose values for the diagnosis of type 2 diabetes without symptoms [18]. Our diagnosis of type 2 diabetes is based on one OGTT only. To our knowledge, no observational studies have information on the incidence of type 2 diabetes measured by duplicate OGTT, so the present data are the best estimate of OGTT-based incidence of type 2 diabetes currently available.

The present risk questionnaire may serve as a first assessment tool identifying those who may need further blood testing, probably not limited to glucose testing alone. Integrated prevention strategies and targeted interventions should also involve blood pressure and lipid levels because these risk factors are highly inter-related. Future research may aim to develop questionnaires that predict overall risk, including cardiovascular disease risk.

To conclude, the predictive value of the original Finnish risk questionnaire for clinically diagnosed and screen-detected type 2 diabetes could be improved by re-estimation of the score points and by additional information on sex, smoking and family history of diabetes. The DETECT-2 update of the Finnish diabetes risk questionnaire is an adequate and robust risk score to predict future type 2 diabetes in Europid populations.