Intended for healthcare professionals

CCBYNC Open access
Research

Performance of prediction models for nephropathy in people with type 2 diabetes: systematic review and external validation study

BMJ 2021; 374 doi: https://doi.org/10.1136/bmj.n2134 (Published 28 September 2021) Cite this as: BMJ 2021;374:n2134
  1. Roderick C Slieker, postdoctoral fellow12,
  2. Amber A W A van der Heijden, assistant professor3,
  3. Moneeza K Siddiqui, principal investigator (tenure track)4,
  4. Marlous Langendoen-Gort, doctoral student3,
  5. Giel Nijpels, professor emeritus3,
  6. Ron Herings, professor15,
  7. Talitha L Feenstra, professor67,
  8. Karel G M Moons, professor89,
  9. Samira Bell, consultant nephrologist4,
  10. Petra J Elders, professor3,
  11. Leen M ’t Hart, associate professor1210,
  12. Joline W J Beulens, professor18
  1. 1Department of Epidemiology and Data Science, Amsterdam Public Health Institute, Amsterdam Cardiovascular Sciences Institute, Amsterdam UMC, Location VUmc, 1081 HV, Amsterdam, Netherlands
  2. 2Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, Netherlands
  3. 3Department of General Practice, Amsterdam Public Health Institute, Amsterdam UMC, Location VUmc, Amsterdam, Netherlands
  4. 4Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, UK
  5. 5PHARMO Institute for Drug Outcomes Research, Utrecht, Netherlands
  6. 6Groningen Research Institute of Pharmacy, University of Groningen, Groningen, Netherlands
  7. 7Centre for Nutrition, Prevention and Health Services, Institute for Public Health and the Environment, Bilthoven, Netherlands
  8. 8Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
  9. 9Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
  10. 10Molecular Epidemiology section, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, Netherlands
  1. Correspondence to: R C Slieker r.slieker{at}amsterdamumc.nl (or @rcslieker on Twitter)
  • Accepted 25 August 2021

Abstract

Objectives To identify and assess the quality and accuracy of prognostic models for nephropathy and to validate these models in external cohorts of people with type 2 diabetes.

Design Systematic review and external validation.

Data sources PubMed and Embase.

Eligibility criteria Studies describing the development of a model to predict the risk of nephropathy, applicable to people with type 2 diabetes.

Methods Screening, data extraction, and risk of bias assessment were done in duplicate. Eligible models were externally validated in the Hoorn Diabetes Care System (DCS) cohort (n=11 450) for the same outcomes for which they were developed. Risks of nephropathy were calculated and compared with observed risk over 2, 5, and 10 years of follow-up. Model performance was assessed based on intercept adjusted calibration and discrimination (Harrell’s C statistic).

Results 41 studies included in the systematic review reported 64 models, 46 of which were developed in a population with diabetes and 18 in the general population including diabetes as a predictor. The predicted outcomes included albuminuria, diabetic kidney disease, chronic kidney disease (general population), and end stage renal disease. The reported apparent discrimination of the 46 models varied considerably across the different predicted outcomes, from 0.60 (95% confidence interval 0.56 to 0.64) to 0.99 (not available) for the models developed in a diabetes population and from 0.59 (not available) to 0.96 (0.95 to 0.97) for the models developed in the general population. Calibration was reported in 31 of the 41 studies, and the models were generally well calibrated. 21 of the 64 retrieved models were externally validated in the Hoorn DCS cohort for predicting risk of albuminuria, diabetic kidney disease, and chronic kidney disease, with considerable variation in performance across prediction horizons and models. For all three outcomes, however, at least two models had C statistics >0.8, indicating excellent discrimination. In a secondary external validation in GoDARTS (Genetics of Diabetes Audit and Research in Tayside Scotland), models developed for diabetic kidney disease outperformed those for chronic kidney disease. Models were generally well calibrated across all three prediction horizons.

Conclusions This study identified multiple prediction models to predict albuminuria, diabetic kidney disease, chronic kidney disease, and end stage renal disease. In the external validation, discrimination and calibration for albuminuria, diabetic kidney disease, and chronic kidney disease varied considerably across prediction horizons and models. For each outcome, however, specific models showed good discrimination and calibration across the three prediction horizons, with clinically accessible predictors, making them applicable in a clinical setting.

Systematic review registration PROSPERO CRD42020192831.

Introduction

People with type 2 diabetes are at high risk of microvascular and macrovascular complications.12 About 25-35% of people with type 2 diabetes develop nephropathy,345 one of the leading causes of end stage renal disease, which is associated with a low quality of life and high mortality.6789 As renal histological changes might already be advanced by the time a decline in renal function is detected,101112 early identification of those at risk is essential to initiate targeted preventive treatment. Moreover, interventions are more effective at earlier than more advanced stages of nephropathy.713 Decisions for interventions can be achieved by regularly estimating an individual’s risk of nephropathy based on validated prediction models. These prediction models should preferably be based on routine clinical markers and not expensive biomarkers, given that routine and low cost markers will be easier to implement into clinical practice.

Preventive treatment includes intensified glycaemic control, nephroprotective glucose lowering drugs such as glucagon-like peptide 1 receptor antagonists and sodium-glucose cotransporter 2 inhibitors, blood pressure control, and inhibition of the renin-angiotensin system.14151617 Moreover, treatment with glucose lowering drugs such as sodium-glucose cotransporter 2 inhibitors has been suggested to prevent the progression of renal disease.18 Treatment of end stage renal disease comprises renal replacement therapy with dialysis and transplantation.719 The efficacy of renal replacement therapy, however, has been shown to be worse in people with diabetes.4

Prediction models for nephropathy allow the accurate estimation of nephropathy risk in people with type 2 diabetes. To ensure reliability though, developed prediction models need to be externally validated in their targeted populations, and ideally compared head to head on their predictive performance.2021 Only when proven valid in external validation, can prediction models be considered relevant to clinical practice to improve decision making and improve the cost effectiveness of care. Several prediction models for renal impairment and nephropathy in people with type 2 diabetes or applicable to this population have been developed.

Known risk factors for diabetic nephropathy include urinary albumin excretion, glucose levels, blood pressure, dyslipidaemia, obesity, smoking, duration of diabetes, age, sex, and retinopathy.22 More recently, identified risk factors include oxidative stress, inflammation, genetic background, ethnicity, and glomerular hyperfiltration.22

Several systematic reviews on prognostic models for nephropathy have been performed,232425 although several years ago232526 or focusing only on the general population.24 Also, although some external validation of the models has been done, it was either limited to the general population242627 or only included a small number of (other) models as part of a model development paper.28 Given that people with type 2 diabetes have an increased risk of developing nephropathy, it is important to assess the performance of prognostic models in a head-to-head comparison in a large scale population of those with type 2 diabetes. We therefore systematically reviewed existing prediction models for nephropathy applicable to people with type 2 diabetes. The retrieved models were appraised on quality and subsequently validated and compared head to head in the Hoorn DCS cohort, a prospective cohort of more than 14 000 people from the Hoorn region in the Netherlands.29 Finally, we validated the top performing models in GoDARTS (Genetics of Diabetes Audit and Research in Tayside Scotland), a secondary prospective cohort from Scotland.30

Methods

We carried out a systematic review and external validation study. This systematic review was performed according to guidance of the Cochrane Prognosis Methods Group (methods.cochrane.org/prognosis) and reported to the criteria of the preferred reporting items for systematic reviews and meta-analyses statement and a measurement tool to assess systematic reviews guidelines.313233 The external validation study was reported in line with the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis guideline (supplementary tables S1 and S2).3435

Search strategy

We systematically searched PubMed and Embase from inception until 16 June 2020 to identify prediction models for the risk of developing nephropathy applicable to people with type 2 diabetes. This systematic literature search was based on a predefined search string that was specifically developed for both databases (supplementary table S3). Online software, covidence (Melbourne, Australia), was used to perform a systematic review of identified development studies.

Study selection

Four researchers (MLG, AAvdH, RCS, and JWJB) independently reviewed in duplicate the title, abstract, and full text article of retrieved studies. Disagreements between two reviewers were resolved by consensus with a third reviewer. A study was included when the prediction model was developed in a population in which the majority of people had a diagnosis of type 2 diabetes, or in the general population, and included diabetes as a predictor; the risk of developing nephropathy could be calculated from the reported prediction model or rule; nephropathy was the outcome of the model; and follow-up was longer than one year. We also included studies for which the type of diabetes was not specified but was probably type 2 diabetes based on the population age. Excluded studies were those performed in populations restricted to other forms of diabetes. A study was also excluded when it was conducted in animals, it was not written in English or Dutch, the prediction model was developed in a population with other physical severe conditions or in a postsurgical population, a new predictor was added to an original model, or the model only consisted of one predictor. In addition, reference lists of all screened full text articles and relevant systematic reviews were assessed for additional eligible studies. Any study type other than cross sectional (ie, without patient follow-up) was included—namely, randomised trials, cohort studies, and registry based studies.

Data extraction

Data from the included studies were systematically extracted according to the criteria of the checklist for critical appraisal and data extraction for systematic reviews of prediction modelling studies checklist.36 This checklist includes specifics about the research design, study population, outcome of the model, predictors, sample size, missing values, model development, performance of the model, model validation, results, and interpretation of the model. MLG, JWJB, and AAvdH extracted data on all these domains in duplicate from the included studies.

Quality assessment

Risk of bias and concern for applicability was assessed with the prediction model risk of bias assessment tool, which was specifically developed to assess diagnostic and prognostic prediction models.3738 This tool consists of 20 items and includes specifics about the research design, study population, outcome of the model, predictors, handling of the data, and performance measures. MLG, AAvdH, and JWJB independently extracted data and assessed the quality of each included study in duplicate. Any disagreements were resolved in consensus meetings.

External validation

External validation of the retrieved models was performed in the Hoorn Diabetes Care System cohort, a prospective cohort study from the Hoorn region in the Netherlands.[25] People with type 2 diabetes visit the Diabetes Care System in Hoorn once a year to have their diabetes monitored and multiple standardised measurements taken. Individuals were able to opt-out for the anonymous use of their data. We considered the year of entry to the study as baseline and used data until 31 December 2017 in the current study. The data of 12 155 people were included. Age at diagnosis was not imputed, and those with missing diagnosis dates were excluded (n=705), resulting in 11 450 individuals for analysis. Age at diabetes diagnosis was based on registry data. All laboratory measurements were measured after the participant had fasted. Glycated haemoglobin (HbA1c) was assessed by turbidimetric inhibition immunoassay of haemolysed whole EDTA blood. Blood glucose level was assessed in fluorinated plasma with the ultraviolet test using hexokinase. Levels of triglycerides, total cholesterol, and high density lipoprotein cholesterol were determined enzymatically (Cobas c501; Roche Diagnostics, Mannheim, Germany). Low density lipoprotein cholesterol levels were calculated. Albumin was determined by the antigen’s reaction with anti-albumin antibodies and measured turbidimetrically from an overnight first voided urine sample. Creatinine concentrations were determined enzymatically in heparinised plasma and urine (Cobas c501; Roche Diagnostics).

Presence of retinopathy was based on the EURODIAB 0-5 scale, with grade 0 representing no retinopathy; grade 1, minimal non-proliferative retinopathy; grade 2, moderate non-proliferative retinopathy; grade 3, severe non-proliferative or pre-proliferative retinopathy; grade 4, photocoagulated retinopathy; and grade 5, proliferative retinopathy.39 Blood pressure was measured twice at an interval of three minutes using a random zero sphygmomanometer (Hawksley-Gelmam, Lancing, Sussex, UK). Pulse pressure was calculated by subtracting the diastolic blood pressure from the systolic blood pressure. Current drug use was based on dispensing labels participants took to their annual visit. As waist circumference was not available in the Hoorn DCS cohort, we estimated this on the basis of age, body mass index, and ethnicity using a previously published method.40 Cardiovascular events, educational level, ethnic background, and smoking were self-reported.

Models with the top three highest C statistics were validated in GoDARTS, a secondary validation cohort (table 1). GoDARTS is a prospective cohort comprised of people from the Tayside region, Scotland and is described elsewhere.30 Briefly, laboratory and prescribing data were obtained from medical records and smoking status from a baseline lifestyle questionnaire. Weight and height were measured at recruitment. Outcomes were defined as described in the model development studies (supplementary table S10). Given that albumin was relatively sparsely available in GoDARTS, models with microalbuminuria and macroalbuminuria were not validated, nor were models that included albumin as a predictor. A total of 8698 people from GoDARTS2 and GoDARTS3 with type 2 diabetes were included. The median time to chronic kidney disease from recruitment was 4.7 years (interquartile range 1.98-6.69) and included follow-up data until 2017.

Predicted outcomes

The included outcomes for external validation were microalbuminuria, macroalbuminuria, diabetic kidney disease, and chronic kidney disease. The outcomes defined in our cohort are as described in the individual studies. Generally, diabetic kidney disease and chronic kidney disease were defined as an estimated glomerular filtration rate of <60 mL/min/1.73 m2, microalbuminuria as a urinary albumin:creatinine ratio ≥30 mg/g, and macroalbuminuria as an albumin:creatinine ratio ≥300 mg/g. Supplementary table S10 lists the definitions used in each of the studies. All models from the original publications were considered but were excluded if a missing predictor could not be replaced by a proxy or when the frequency of events was too low to allow accurate validation. The latter was the case for the end stage renal disease models, where the incidence was low, with 8, 19, and 28 cases at the 2, 5, and 10 year horizon, respectively.

Statistical analysis

Missing predictor data were imputed five times using the mice function from the R package MICE. Missing values ranged from 1.4% for total cholesterol to 6.9% for urinary albumin; 2868 individuals had one missing variable or more in one of the included predictors. Baseline characteristics were given as means with standard deviations, and as medians with an interquartile range for not normally distributed data. For each validated model, three prediction horizons of 2, 5, and 10 years were assessed. The performance was estimated based on calibration and discrimination. Calibration was assessed in three ways: visually by plotting the predicted risk against the observed risk in a calibration plot, comparing the observed and expected outcome, and performing the Hosmer-Lemeshow χ2 test. A model is considered well calibrated when observed and expected outcome is close to 1 and when the observed versus predicted risk fit on the x=y line. Given that most retrieved prediction models did not report an intercept, we recalibrated all models based on the incidence of the Hoorn DCS cohort outcome. Two models could not be recalibrated because the base risk was small, close to zero (macroalbuminuria in the study by Basu et al41 and albuminuria in the study by Jardine et al43) and are reported without recalibration. Discrimination of the models was evaluated using Harrell’s C statistic. All analyses were performed in R (version 4.0.2), and figures were produced using ggplot2.

Patient and public involvement

Speaking to patient councils inspired this systematic review. Study participants and members of the public were not involved in the design, conduct, or reporting of our research.

Results

Systematic review results

Characteristics of included models

Overall, 10 883 of 11 009 articles screened on title and abstract were excluded, leaving 126 for full text review. When the full text of papers was screened, a total of 87 studies were excluded primarily because the studies did not develop a prediction model or because the study design was cross sectional (supplementary fig S1). Finally, 41 studies were included of which 29 developed one model or more specifically in people with a diagnosis of type 2 diabetes and 12 developed one model or more for use in the general population but with diabetes as predictor (see supplementary file for references of included studies). Overall, these studies accounted for 64 prediction models that predicted some nephropathy outcome, of which 46 were developed in people with diabetes and 18 in the general population (supplementary fig S1). The sample size for models developed in people with diabetes ranged from 86 to 660 856 (supplementary table S4). For the general population models, the sample size ranged from 1549 to 799 658 individuals. The models developed from data of patients with diabetes mainly predicted microalbuminaria, macroalbuminuria, diabetic kidney disease, and end stage renal disease as outcomes (supplementary table S4). General populations models were developed to predict chronic kidney disease and end stage renal disease (supplementary table S6). Although various prediction horizons were used across the models (supplementary table S6), most models used a five year prediction horizon. The most common predictors for the 42 models developed in people with diabetes were age (22 models), estimated glomerular filtration rate at baseline (21 models), and systolic blood pressure (18 models, supplementary table S5). For the 20 general population models, the most frequently used predictors in addition to the inclusion criterion diabetes were age (14 models) and hypertension (eight models, supplementary table S7).

Risk of bias and applicability

Most of the studies scored a high risk of bias in the analysis domain (supplementary fig S2a-c), mainly as a result of incorrect handling of missing data and lack of adjustment for overfitting (supplementary fig S2). The participants domain—which covers the data source and selection of individuals based on inclusion and exclusion criteria—was rated as low risk of bias in 68% of the models. The domains predictors and outcomes were rated as low risk of bias in 56% and 51% of the models, respectively. In the remaining models these domains were unclearly reported (supplementary fig S2a and b). Of the included studies, 40% scored high concern for applicability to the objectives of the review, as a result of selective sampling of the participants or predictors that are not typically measured in routine diabetes care, including cystatin C and B type natriuretic peptide. Low concern for applicability was scored for 44% of the studies for all domains (supplementary fig S2a and c).

Apparent performance of the models

The discriminative ability of the prediction models was reported through the area under the curve and C statistics in 25 development studies in people with type 2 diabetes. Values ranged from 0.60 (95% confidence interval 0.56 to 0.64) to 0.99 (not available) (fig 1). Across outcomes, apparent performance varied widely. For example, for albuminuria the performance ranged from 0.62 (0.61 to 0.64) to 0.84 (0.80 to 0.88) and for diabetic kidney disease from 0.61 (0.60 to 0.63) to 0.81 (0.79 to 0.84) (fig 1). For models developed for chronic kidney disease in the general population, the performance ranged from 0.70 (not available) to 0.88 (0.87 to 0.88) (fig 2). For end stage renal disease, the performance ranged from 0.60 (0.56 to 0.64) to 0.99 (not available) in people with type 2 diabetes and 0.85 (0.83 to 0.86) to 0.96 (0.95 to 0.97) in the general population.

Fig 1
Fig 1

Apparent discrimination for models developed in people with type 2 diabetes across studies and outcomes (see supplementary file for complete reference list of included studies). Whiskers represent 95% confidence intervals or interquartile ranges (Nelson 2019). Two interconnected diamonds represent reported range of C statistics. In Basu 2017 composite outcomes were used. Composite 1=doubling of serum creatinine or >20 mL/min/1.73 m2 decrease in estimated glomerular filtration rate (eGFR); composite 2=macroalbuminuria, renal failure, end stage renal disease, doubling of serum creatinine or >20 mL/min/1.73 m2 decrease in eGFR; composite 3=macroalbuminuria, microalbuminuria, renal failure, or end stage renal disease. MDRD=Modification of Diet in Renal Disease study; eCrCl=estimated creatinine clearance

Fig 2
Fig 2

Apparent discrimination for models developed in the general population across studies and outcomes (see supplementary file for complete reference list of included studies). Two interconnected diamonds represent reported range of C statistics. Whiskers represent 95% confidence intervals

In nine studies, calibration was evaluated based on the Hosmer-Lemeshow test. One study used the Greenwood-D’Agostino-Nam test, and six studies used a calibration plot, two merely a description of the calibration, and two a calibration table; nine studies did not report on calibration (supplementary table S4). Almost all studies showed good calibration with the Hosmer-Lemeshow test P>0.05. In one study, four models did not pass the Greenwood-D’Agostino-Nam test for calibration.41

Eleven studies developed in the general population reported on discrimination, with C statistics ranging from 0.59 (confidence interval not available) to 0.96 (0.95 to 0.97), whereas one study did not report discrimination (supplementary table S6). Calibration was assessed in eight studies based on the Hosmer-Lemeshow test and one study based on a calibration plot, one study gave a description of the calibration, and two studies did not report on the calibration. The P values based on the Hosmer-Lemeshow test ranged from 0.01 to 0.99, indicating poor to excellent calibration, and all models but one passed the test (P>0.05).

External validation in Hoorn DCS cohort and GoDARTS

Of the 64 identified models, 21 could be externally validated; 15 developed in a population with type 2 diabetes and six in the general population. Studies were mainly excluded because they predicted end stage renal disease for which the number of events was too low in the Hoorn DCS cohort (26 models), with fewer than 30 events.42 A second reason for exclusion was that the variables in the prediction models were not available in the Hoorn DCS cohort, such as serum uric acid, cystatin C, and B type natriuretic peptide (16 models). Finally, one model was part of a larger simulation model (supplementary tables S8 and S9). Table 1 shows the baseline characteristics of the the Hoorn DCS cohort. The average age was 62.6 years (SD 12.1 years), and the median duration of diabetes was 0.75 years (intequartile range 0.2-3.8 years). Follow-up ranged from one year to 21.7 years. At the two year horizon, 1691 people had microalbuminuria, 263 had macroalbuminuria, and 2159 had an eGFR <60 mL/min/1.73 m2 (supplementary table S10). Top performing models were validated in the secondary cohort, GoDARTS (table 1). The average age in GoDARTS was 65.2 (SD 11.1) years, and the median duration of diabetes was 7.1 (intequartile range 2.9-10.4) years.

Table 1

Characteristics of external validation cohorts. Values are means (standard deviations) unless stated otherwises

View this table:

Discrimination

The discriminatory ability of the models developed in people with diabetes varied considerably, with C statistics ranging from 0.50 to 0.96 in the Hoorn DCS cohort (fig 3). The performance on the two year horizon was generally best within a model, followed by the five year and 10 year horizons. For most models, however, the difference in performance between the horizons within a model was smaller than the performance between models. In the Hoorn DCS cohort, no apparent difference was found in performance between the models developed in people with diabetes and the general population. In GoDARTS, however, the models developed in the general population performed generally worse than those developed specifically in people with diabetes (fig 3).

Fig 3
Fig 3

Discriminatory ability of models included in external validation by outcome and prediction horizons. Whiskers represent 95% confidence intervals. See supplementary file for complete reference list of included studies. In Basu 2017 three composite outcomes were defined: composite 1=doubling of serum creatinine or >20 mL/min/1.73m2 decrease in estimated glomerular filtration rate (eGFR); composite 2=macroalbuminuria, renal failure, end stage renal disease, and doubling of serum creatinine or >20 mL/min/1.73m2 decrease in eGFR; composite 3=macroalbuminuria, microalbuminuria, renal failure, or end stage renal disease. DCS=Diabetes Care System; MDRD=Modification of Diet in Renal Disease study; eCrCl=estimated creatinine clearance

The discriminatory ability for albuminuria expressed as C statistics ranged from 0.55 (0.54 to 0.56) to 0.96 (0.95 to 0.97). For macroalbuminuria, two models performed well, with C statistics >0.8, whereas the other models showed poorer performances, with C statistics <0.7. The best performing model was that of Basu et al for macroalbuminuria, with a C statistic of 0.96 (0.95 to 0.97) at the two year horizon (fig 3).41 The model also performed well at the five and 10 year horizons, with C statistics of 0.91 (0.90 to 0.92) and 0.87 (0.85 to 0.88), respectively. The second best performing albuminuria model was of Jardine et al, with C statistics of 0.86 (0.85 to 0.87), 0.84 (0.83 to 0.85), and 0.78 (0.77 to 0.79) at the 2, 5, and 10 year horizon, respectively (fig 3).43 Afghahi et al’s model for albuminuria performed the worst, with C statistics ranging from 0.55 to 0.59 for the three horizons (fig 3).44

Four models showed good performance for diabetic kidney disease models, with C statistics >0.75, whereas the other six models showed poorer performances, with C statistics <0.7. Afghahi et al’s model performed best, especially when the eGFR was based on the Cockcroft-Gault equation, with C statistics of 0.95 (0.95 to 0.95), 0.93 (0.92 to 0.93), and 0.91 (0.91 to 0.91) at the 2, 5, and 10 year horizon, respectively (fig 3).44 The second best model was that of Nelson et al, which was based on eGFR from the Modification of Diet in Renal Disease study, with C statistics of 0.87 (0.87 to 0.88), 0.81 (0.81 to 0.82), and 0.76 (0.75 to 0.77) at the 2, 5, and 10 year horizon, respectively.28 Dagliati et al’s model performed worst, with C statistics of 0.51 (0.50 to 0.51), 0.52 (0.51 to 0.52), and 0.50 (0.49 to 0.51) at the 2, 5, and 10 year horizon, respectively.45 Observed C statistics in the current study’s data were similar to the external validation performed in the development study, with 0.81 at the five year horizon (supplementary table S11). In addition, the C statistic of Dunkler et al’s46 model was similar in the current study compared with the performance observed by Dunkler in their discovery and validation at the five year horizon, with C statistics ranging from 0.68 to 0.70 (supplementary table S11). Good performances were observed for Afghahi et al’s model based on eGFR from the Modification of Diet in Renal Disease study, with C statistics of 0.83 (0.83 to 0.84), 0.80 (0.80 to 0.81), and 0.78 (0.77 to 0.79) at the 2, 5 and 10 year horizon, respectively.44 Of the top three best performing models, Afghahi et al’s two models could be validated in GoDARTS (fig 3).44 The discrimination in GoDARTS outperformed that in the Hoorn DCS cohort, with C statistics of 0.99 (0.99 to 0.99) and 0.85 (0.84 to 0.86) for the endpoint based on estimated creatine clearance and the Modification of Diet in Renal Disease study, respectively.

Of the six models for chronic kidney disease developed in the general population, a clear distinction was apparent between model performance. The C statistics for the performance of the top three models were >0.8, whereas the remaining models performed noticeably worse. Hanratty et al’s model for chronic kidney disease performed best, with C statistics of 0.94 (0.94 to 0.94), 0.90 (0.89 to 0.90), and 0.85 (0.85 to 0.86) at the 2, 5, and 10 year horizon, respectively.47 Saranburut et al’s laboratory model also performed well, with C statistics at the corresponding three horizons of 0.89 (0.89 to 0.90), 0.86 (0.86 to 0.87), and 0.84 (0.83 to 0.85).48 The worst performing model was that of Wen et al, with C statistics ranging from 0.57 to 0.59.49 The top three best performing models were additionally validated in GoDARTS, where the discrimination was considerably lower than in the Hoorn DCS cohort (fig 3), with C statistics of 0.74 (0.73 to 0.75) in Hanratty et al’s model,47 0.75 (0.74 to 0.76) in Saranburut et al’s model,48 and 0.71 (0.69 to 0.73) in O’Seaghdha et al’s model.50 The models of O’Seaghdha et al50 and Chien et al51 have been externally validated in both Nelson et al28 and in Fraccaro et al26 (supplementary table S11). In Nelson et al’s study,28 the discrimination was similar to the C statistics observed in the current study at the five year horizon, with a C statistic 0.81. The C statistics >0.90 reported in Fraccaro et al26 for both Chien et al’s and O’Seaghdha et al’s models were much higher than those observed in the Hoorn DCS cohort and GoDARTS and in the external validation by Nelson et al28 (supplementary table S11).

When the performance of the models was investigated between men and women (supplementary fig S3), a high correlation was found between the C statistics of models in men and women (Pearson’s r ≥0.96), suggestive of almost identical performance of investigated models in both sexes.

Calibration

The calibration of models was generally good after recalibration based on the incidence of the specific outcome in the Hoorn DCS cohort. Two models are reported without recalibration (Basu,41 for macroalbuminuria and Jardine,43 for albuminuria). Two of the five models developed for albuminuria (Basu,41 Jardine,43 supplementary fig S4) showed an overestimation of the risk, whereas the other three models showed good calibration. No differences were observed between horizons. All albuminuria models were significant in the Hosmer-Lemeshow χ2 test, except for Basu et al’s microalbuminuria model.41 The models developed for diabetic kidney disease developed well in terms of observed to expected ratios, with ratios ≥0.99 in nine models. The three models of Basu et al41 and two models of Dunkler et al46 showed the best calibration (supplementary fig S5h). The poorest calibration was observed for the models of Low et al52 and Dagliati et al45 (supplementary fig S5). Again, the model of Basu et al showed the best calibration (supplementary fig S5h). The observed to expected ratios were generally close to 1, with Nelson et al’s model having the highest deviation.28

Five of the six general population models showed good calibration based on observed to expected ratios. Calibration was best for Saranburut et al’s clinical model.48 Chien et al’s model showed an overestimation of risk51 (supplementary fig S6).

Discussion

We conducted a systematic review and quality assessment of prediction models for nephropathy in people with type 2 diabetes. Overall, 46 models were developed in a type 2 diabetes population and 18 in the general population. These studies accounted for a total of 64 prediction models for albuminuria, renal impairment, diabetic kidney disease, chronic kidney disease, and end stage renal disease. Of the 64 prediction models, 21 were externally validated in the Hoorn Diabetes Care System cohort. The reported discrimination of the models varied considerably across outcomes and models and, to some extent, prediction horizon. Multiple models performed well, with C statistics >0.80 for the three investigated outcomes of albuminuria, chronic kidney disease, and diabetic kidney disease. The calibration showed the same variation between outcomes and studies compared with the discrimination, but to a lesser extent between horizons. The models performed better in terms of discrimination at the two year horizon compared with the 5 and 10 year horizons. For most models, however, the difference in performance between the horizons within a model was smaller than the performance between models. Models developed in people with diabetes did, in general, perform similar to models developed in the general population.

Principal findings

Of the five validated models for albuminuria, those by Basu et al41 and Jardine et al43 performed best in terms of discrimination. Basu et al’s models showed better calibration than Jardine et al’s models. Jardine et al’s model has eight predictors compared with 14 in Basu et al’s model. In terms of clinical practicality, both models are equal and contain routinely available predictors. Basu et al’s model, however, has the highest number of predictors, including less common accessible predictors, such as cardiovascular disease and the use of anticoagulants.

Ten models for diabetic kidney disease were validated. In terms of discrimination, the two models of Afghahi et al44 and the model of Nelson et al28 performed best. The estimated creatinine clearance of the model of Afghahi et al44 uses six and the model of Nelson et al28 uses 12 routinely available predictors. In terms of calibration, the model of Afghahi et al outperformed that of Nelson et al.28 Models with fewer variables, such as the those of Afghahi et al44 and Saranburut et al,48 especially in resource poor settings, might be more useful than models with many variables, such as Nelson et al’s model28 (supplementary table S12). Furthermore, both models of Afghahi et al44 also showed excellent discrimination in GoDARTS.

For the models developed for chronic kidney disease in the general population, Hanratty et al’s model47 and Saranburut et al’s laboratory model48 performed well in a population with type 2 diabetes. Saranburut et al’s model, however, showed a better calibration than Hanratty et al’s model. Both models contain routinely available predictors, with five (age, diabetes, peripheral vascular disease, eGFR, period of observation) predictors in Hanratty et al’s model and six predictors (age, sex, systolic blood pressure, waist circumference, diabetes, eGFR) in Saranburut et al’s model, and could therefore be applied in clinical practice (supplementary table S12). The discrimination of Hanratty et al’s and Saranburut et al’s models was lower in GoDARTS.

Implications

As the number of people with type 2 diabetes increases worldwide, the incidence of diabetic kidney disease will also increase. It is therefore vital to identify those at higher risk of nephropathy to be able to enhance monitoring and possibly to intervene at an early stage to slow down renal decline. Interventions include intensified glycaemic control; nephroprotective glucose lowering drugs, including glucagon-like peptide 1 receptor antagonists and sodium-glucose cotransporter 2 inhibitors; blood pressure control; and renin-angiotensin system inhibition.14151617 More stringent glycaemic control has been suggested to reduce the development and worsening of nephropathy.13 Prediction models will help to estimate the risk of diabetic kidney disease, especially when the models are based on routine clinical markers. Our results show that for the endpoints albuminuria, diabetic kidney disease, and chronic kidney disease, prediction models are available with the capability of reliably discriminating between people with a low and a high risk. In the validation study, models that predict chronic kidney disease and diabetic kidney disease (eGFR <60 mL/min/1.73m2) performed well across all three prediction horizons, which could give people with diabetes an estimation of risk for the near future (two year risk) and in the long term (10 year). For example, Saranburut et al’s model48 and Afghahi et al’s model44 both showed good discrimination and validation. However, we observed that the two year horizon outperformed the 5 and 10 year horizons, which is not unexpected given that baseline measures are more likely to predict the near future accurately. In contrast, in the long term, much can change in terms of, for example, changes in lifestyle or treatment regimens and thus outcome events.

Strengths and limitations of this study

This systemic review has some limitations and strengths. One limitation is that we excluded articles that were not in English or Dutch. During full text screening, although we excluded three studies based on language,444546 we were still able to review the reference lists for additional eligible studies. Other partially comparable systematic reviews did not include any of these three studies as relevant prediction models.232553 Therefore, it can be questioned whether the excluded studies were relevant to this systematic review. A second limitation is that we could not validate models that predicted end stage renal disease because of too limited events in our study population; more investigation is needed. The options to intervene in end stage renal disease are limited, and as such interventions ideally should occur earlier. A third limitation is that the Hoorn DCS cohort comprised people with generally well controlled diabetes. As such, models developed on high risk populations performed less well in the external validation in contrast with models developed on lower risk populations. Finally, we were unable to use several models because comparable measurements were not available in the Hoorn DCS cohort. However, given that these variables were not measured as part of routine care is an indication of their limited applicability.

The study’s strengths include the systematic review to identify relevant prognostic models and quality assessment of the included studies. Secondly, we externally validated the models in a large prospective cohort study with long term follow-up and detailed phenotyping based on routine care data thus enabling a head-to-head comparison of the models. Thirdly, we validated models with the best discrimination in a secondary cohort from a different country. Future studies should evaluate the generalisability of the models in people with other forms of diabetes and to what extent these models have been implemented in day to day clinical practice.

Conclusion

Many prognostic models are available to predict nephropathy in people with type 2 diabetes. We identified 64 prediction models, 21 of which could be validated and directly compared in the Hoorn DCS cohort. For each of the three included outcomes of albuminuria, diabetic kidney disease, and chronic kidney disease, discrimination and calibration varied considerably across horizons and models. Several models showed good discrimination and calibration across various prediction horizons for each outcome, although models performed best at the 2 year horizon. In a secondary validation cohort, models developed for diabetic kidney disease especially showed good discrimination compared with those developed for chronic kidney disease. This study identified several suitable models that will contribute to preventing or postponing renal decline and ultimately end stage renal disease in people with diabetes.

What is already known on this topic

  • People with type 2 diabetes are at increased risk of nephropathy

  • Many studies developed prediction models to identify those at high risk of nephropathy in people with type 2 diabetes or with diabetes as a predictor in the model

  • These models can be used in clinical practice provided that they are also accurate in external target populations

What this study adds

  • This study identified many studies that developed prognostic models to predict nephropathy in people with type 2 diabetes or with diabetes as a risk factor with variable performance across time horizons and models

  • Several models showed good performance in people with type 2 diabetes across various prediction horizons for different nephropathy outcomes,

  • Models, however, performed best at the two year horizon and for diabetic kidney disease as outcome

Ethics statements

Ethical approval

This study was approved by the medical ethical committee of VU University Medical Center.

Data availability statement

The steering committee of the Hoorn studies will consider reasonable requests for the sharing of deidentified patient level data. Requests should be made to the corresponding author.

Acknowledgments

This study was been made possible by the collaboration with the Diabetes Care System West-Friesland. We thank the participants and staff of the Diabetes Care System West-Friesland. We acknowledge the help of the information specialists of the Vrije Universiteit in defining the search terms for the systematic review.

Footnotes

  • Contributors: RCS, AAH, and JWB designed the study. RCS, AAH, JWB, and MLG screened citations for inclusion and were involved in risk of bias assessment and data extraction and interpretation. RCS collected, cleaned, and analysed the data. All co-authors were involved in the interpretation of the data. MKS and SB performed the external validation in GoDARTS. RCS wrote the draft manuscript with input from all co-authors. All authors approved the final version of this manuscript. RCS is the guarantor of this manuscript and accepts full responsibility for the work and the conduct of the study, had access to the data, and controlled the decision to publish. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

  • Funding: This research was supported by the Dutch Diabetes Research Foundation (grant No 2014.00.1753). The funder had no role in the study design; collection, analysis, and interpretation of data; or preparation of the manuscript. GoDARTS is funded and supported by the Wellcome Trust Type 2 Diabetes Case Control Collection (072960/Z/03/Z, 084726/Z/08/Z, 084727/Z/08/Z, 085475/Z/ 08/Z, 085475/B/08/Z) and as part of the EU IMI-SUMMIT programme.

  • Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: support from Dutch Diabetes Research foundation Fund, IMI-RHAPSODY, and ZorgInstitute Netherlands (Dutch Healthcare Institute) for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

  • The lead authors (RCS, AAH, JWB) affirm that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.

  • Dissemination to participants and related patient and public communities: To disseminate our results we aim to target a broad audience, including health professionals, scientists, and members of the public through written communications, social media, and the cohort’s website.

  • Provenance and peer review: Not commissioned; externally peer reviewed.

http://creativecommons.org/licenses/by-nc/4.0/

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

References