Intended for healthcare professionals

CCBYNC Open access

Prediction system for risk of allograft loss in patients receiving kidney transplants: international derivation and validation study

BMJ 2019; 366 doi: (Published 17 September 2019) Cite this as: BMJ 2019;366:l4923
  1. Alexandre Loupy, professor of nephrology and epidemiology1 2,
  2. Olivier Aubert, assistant professor of nephrology1 2,
  3. Babak J Orandi, assistant professor of surgery3,
  4. Maarten Naesens, associate professor of nephrology4,
  5. Yassine Bouatou, assistant professor of nephrology1,
  6. Marc Raynaud, PhD candidate1,
  7. Gillian Divard, PhD candidate1,
  8. Annette M Jackson, associate professor in surgery5,
  9. Denis Viglietti, clinical nephrologist1 6,
  10. Magali Giral, professor of nephrology7,
  11. Nassim Kamar, professor of nephrology8,
  12. Olivier Thaunat, professor of nephrology9,
  13. Emmanuel Morelon, professor of nephrology9,
  14. Michel Delahousse, clinical nephrologist10,
  15. Dirk Kuypers, professor of nephrology4,
  16. Alexandre Hertig, professor of nephrology11,
  17. Eric Rondeau, professor of nephrology11,
  18. Elodie Bailly, clinical nephrologist11,
  19. Farsad Eskandary, clinical nephrologist12,
  20. Georg Böhmig, professor of nephrology12,
  21. Gaurav Gupta, associate professor of nephrology13,
  22. Denis Glotz, professor of nephrology1 6,
  23. Christophe Legendre, professor of. nephrology1 2,
  24. Robert A Montgomery, professor of surgery14,
  25. Mark D Stegall, professor of surgery15,
  26. Jean-Philippe Empana, professor of epidemiology1 16,
  27. Xavier Jouven, professor of cardiology and epidemiology1,
  28. Dorry L Segev, professor of surgery and epidemiology17,
  29. Carmen Lefaucheur, professor of nephrology1 6
  1. 1Université de Paris, INSERM, Paris Translational Research Centre for Organ Transplantation, Paris, France
  2. 2Kidney Transplant Department, Necker Hospital, Assistance Publique - Hôpitaux de Paris, Paris, France
  3. 3Department of Surgery, University of California San Francisco School of Medicine, San Francisco, CA, USA
  4. 4Department of Microbiology, Immunology and Transplantation, KU Leuven, Leuven, Belgium
  5. 5Department of Surgery, Duke University School of Medicine, Durham, NC, USA
  6. 6Kidney Transplant Department, Saint-Louis Hospital, Assistance Publique - Hôpitaux de Paris, Paris, France
  7. 7Department of Nephrology, Centre Hospitalier Universitaire de Nantes, Nantes, France
  8. 8Université Paul Sabatier, INSERM, Department of Nephrology and Organ Transplantation, CHU Rangueil & Purpan, Toulouse, France
  9. 9Department of Transplantation, Nephrology and Clinical Immunology, Hospices Civils de Lyon, France
  10. 10Department of Transplantation, Nephrology and Clinical Immunology, Foch Hospital, Suresnes, France
  11. 11Kidney transplant department, Tenon Hospital, Assistance Publique - Hôpitaux de Paris, Paris, France
  12. 12Division of Nephrology and Dialysis, Department of Medicine III, General Hospital Vienna, Vienna, Austria
  13. 13Division of Nephrology, Department of Internal Medicine, Virginia Commonwealth University School of Medicine, Richmond, VA, USA
  14. 14New York University Langone Transplant Institute, New York, NY, USA
  15. 15William J. von Liebig Centre for Transplantation and Clinical Regeneration, Mayo Clinic, Rochester, MN, USA
  16. 16Cardiology and Heart Transplant department, Pompidou hospital, Assistance Publique - Hôpitaux de Paris, Paris, France
  17. 17Department of Surgery, Johns Hopkins University School of Medicine, Baltimore, MD, USA
  1. Correspondence to: A Loupy alexandre.loupy{at} (or @AlexandreLoupy on Twitter)
  • Accepted 15 July 2019


Objective To develop and validate an integrative system to predict long term kidney allograft failure.

Design International cohort study.

Setting Three cohorts including kidney transplant recipients from 10 academic medical centres from Europe and the United States.

Participants Derivation cohort: 4000 consecutive kidney recipients prospectively recruited in four French centres between 2005 and 2014. Validation cohorts: 2129 kidney recipients from three centres in Europe and 1428 from three centres in North America, recruited between 2002 and 2014. Additional validation in three randomised controlled trials (NCT01079143, EudraCT 2007-003213-13, and NCT01873157).

Main outcome measure Allograft failure (return to dialysis or pre-emptive retransplantation). 32 candidate prognostic factors for kidney allograft survival were assessed.

Results Among the 7557 kidney transplant recipients included, 1067 (14.1%) allografts failed after a median post-transplant follow-up time of 7.12 (interquartile range 3.51-8.77) years. In the derivation cohort, eight functional, histological, and immunological prognostic factors were independently associated with allograft failure and were then combined into a risk prediction score (iBox). This score showed accurate calibration and discrimination (C index 0.81, 95% confidence interval 0.79 to 0.83). The performance of the iBox was also confirmed in the validation cohorts from Europe (C index 0.81, 0.78 to 0.84) and the US (0.80, 0.76 to 0.84). The iBox system showed accuracy when assessed at different times of evaluation post-transplant, was validated in different clinical scenarios including type of immunosuppressive regimen used and response to rejection therapy, and outperformed previous risk prediction scores as well as a risk score based solely on functional parameters including estimated glomerular filtration rate and proteinuria. Finally, the accuracy of the iBox risk score in predicting long term allograft loss was confirmed in the three randomised controlled trials.

Conclusion An integrative, accurate, and readily implementable risk prediction score for kidney allograft failure has been developed, which shows generalisability across centres worldwide and common clinical scenarios. The iBox risk prediction score may help to guide monitoring of patients and further improve the design and development of a valid and early surrogate endpoint for clinical trials.

Trial registration NCT03474003.


End stage renal disease affects an estimated 7.4 million people worldwide.12 According to data from the World Health Organization, more than 1 500 000 people live with transplanted kidneys, and 80 000 new kidneys are transplanted each year.3 Despite the considerable advances in short term outcomes, kidney transplant recipients continue to experience late allograft failure, and little improvement has been made over the past 15 years.45 Although the failure of a kidney allograft represents an important cause of end stage renal disease, robust and widely validated prognostication systems for the risk of allograft failure in individual patients are lacking.6 Accurately predicting individual patients’ risk of allograft loss would help to stratify patients into clinically meaningful risk groups, which may help to guide monitoring of patients. Moreover, regulatory agencies and medical societies have highlighted the need for an early and robust surrogate endpoint in transplantation that adequately predicts long term allograft failure.7 An enhanced ability to predict allograft outcomes would not only inform daily clinical care, counselling of patients, and therapeutic decisions but also facilitate the performance of clinical trials, which generally lack statistical power because of the low event rates during the first year after transplantation.8

Taken individually, parameters such as estimated glomerular filtration rate (eGFR),910 proteinuria,11 histology,12 or human leukocyte antigen (HLA) antibody profiles,13 fail to provide sufficient predictive accuracy. Previous efforts at developing prognostic systems in nephrology based on various combinations of parameters have been hampered by small sample sizes, the absence of proper validation, limited phenotypic details from registries, the absence of systematic immune response monitoring, and the failure to include key prognostic factors that affect allograft outcome (for example, donor derived factors, polyoma virus associated nephropathy, disease recurrence).141516 Finally, no scoring system has been evaluated in large cohorts from different countries with different transplant practices, allocation systems, and practice patterns, thereby limiting their exportability, which is an important consideration for health authorities to accept a scoring system as a surrogate endpoint.17

The objectives of this study (NCT03474003) were to develop a practical risk stratification score in a multicentre, prospective cohort of kidney transplant recipients that could be used to identify patients at high risk of future allograft loss; to validate the score on a large scale in geographically distinct independent cohorts with different allocation policies and types of transplant management; and to test the performance of the risk score for predicting graft failure in randomised controlled trials covering distinct clinical scenarios of transplant.


Study design and participants

Derivation cohort.

The derivation cohort consisted of 4000 consecutive patients over 18 years of age who were prospectively enrolled at the time of transplantation of a kidney from a living or deceased donor at Necker Hospital (n=1473), Saint-Louis Hospital (n=928), Foch Hospital (n=714), and Toulouse Hospital (n=885) in France between 1 January 2005 and 1 January 2014. We excluded patients with grafts that never functioned (primary non-functioning grafts; n=116). The clinical data were collected from each centre and entered into the Paris Transplant Group database (French data protection authority (CNIL) registration number: 363505). All data were anonymised and prospectively entered at the time of transplantation, at the time of post-transplant allograft biopsies, and at each transplant anniversary by using a standardised protocol to ensure harmonisation across study centres. We submitted data from the derivation cohort for an annual audit to ensure data quality (see the methods section and the study protocol in the supplementary material for detailed data collection procedures). We retrieved data from the database in March 2018. All patients provided written informed consent at the time of transplantation.

Validation cohorts.

The external validation cohorts comprised 3557 recipients of kidney transplants from a living or a deceased donor who were over 18 years of age and represented all patients eligible for post-transplant risk evaluation (that is, undergoing allograft biopsy as part of the standard of care of each centre with adequate biopsy according to the Banff criteria) from six centres: 2129 recipients recruited in Europe and 1428 recipients recruited in the US between 2002 and 2014. The European centres were Hôpital Hôtel Dieu, Nantes, France (n=632); Hospices Civils, Lyon, France (n=608); and the University Hospitals, Leuven, Belgium (n=889). The US centres were the Johns Hopkins Medical Institute, Baltimore, MD (n=580); the Mayo Clinic, Rochester, MN (n=556); and the Virginia Commonwealth University School of Medicine, Richmond, VA (n=292). Datasets from the validation centres were prospectively collected as part of routine clinical practice, entered in the centres’ databases in compliance with local and national regulatory requirements, and sent anonymised to the Paris Transplant Group.

In France, the transplantation allocation system followed the rules of the French National Agency for Organ Procurement (Agence de la Biomédecine). The European centre outside France (Leuven) followed the rules of the Eurotransplant allocation system (, and the US centres (Johns Hopkins Hospital, Mayo Clinic, and Virginia) followed the rules of the US Organ Procurement and Transplantation System (

Additional external validation cohort.

Additional external validation was conducted in kidney transplant recipients previously recruited in three registered and published phase II and III clinical trials: a randomised, open label, multicentre trial that compared a cyclosporine based immunosuppressive regimen with an everolimus based regimen in kidney recipients (Certitem, NCT01079143); a randomised, multicentre, double blind, placebo controlled trial that investigated the efficacy of rituximab in kidney recipients with acute antibody mediated rejection (Rituxerah, EudraCT 2007-003213-13); and a randomised, double blind, placebo controlled, single centre trial that investigated the efficacy of bortezomib in kidney recipients with late antibody mediated rejection (Borteject, NCT01873157).181920 The details of the clinical trials including the population characteristics, study design, inclusion criteria, and interventions are provided in supplementary table A.

Candidate predictors

Post-transplant risk evaluation times

Risk evaluation after transplantation was conducted at the time of allograft biopsy performed for clinical indication or as per protocol, which was performed after transplantation according to the centres’ practices. In patients with multiple biopsies, risk evaluation used the date of the first biopsy. The distribution of post-transplant risk evaluation times is provided in supplementary figure A.

Risk evaluation after transplant comprised demographic characteristics (including recipients’ comorbidities, age, sex, and transplant characteristics), biological parameters (including kidney allograft function, proteinuria, and circulating anti-HLA antibody specificities and concentrations), and allograft pathology data (including elementary lesion scores and diagnoses). All these factors are commonly and routinely collected in kidney transplant centres worldwide. See supplementary methods for the list of all prognostic determinants assessed from the derivation cohort.

Measurements performed at time of risk evaluation

Kidney allograft function was assessed by the glomerular filtration rate estimated by the Modification of Diet in Renal Disease Study equation (eGFR) and proteinuria level by using the protein/creatinine ratio in the derivation and validation cohorts. Circulating donor specific antibodies against HLA-A, HLA-B, HLA-Cw, HLA-DR, HLA-DQ, and HLA-DP were assessed using single antigen flow bead assays in the derivation cohort (see supplementary methods) and according to local centres’ practice in the validation cohorts. Kidney allograft pathology data, including elementary lesion scores and diagnoses, were recorded according to the Banff classification in the derivation and validation cohorts (see supplementary methods). All the measurements (eGFR, proteinuria, histopathology, and circulating anti-HLA DSA) were performed on the day of risk evaluation.


The outcome of interest was allograft loss defined as a patient’s definitive return to dialysis or pre-emptive kidney retransplantation. This outcome was prospectively assessed in the derivation and validation cohorts at each transplant anniversary up to 31 March 2018.

Missing data

We excluded 59 (0.01%) patients in the derivation cohort from the final model owing to at least one data point being missing. We excluded 158 (7.4%) patients in the European validation cohort and 71 (5.0%) in the North American validation cohort from the final model owing to at least one data point being missing.

Statistical analysis

We followed the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) statement (supplementary methods) for reporting the development and validation of the multivariable prediction model.21 We describe continuous variables by using means and standard deviations or medians and interquartile ranges. We compared means and proportions between groups by using Student’s t test, analysis of variance (Mann-Whitney test for mean fluorescence intensity), or the χ2 test (or Fisher’s exact test if appropriate). We used the Kaplan-Meier method to estimate graft survival. The duration of follow-up was from the patient’s risk evaluation (starting point) to the date of kidney allograft loss or the end of the follow-up (31 March 2018). For patients who died with a functioning allograft, allograft survival was censored at the time of death as a surviving or functional allograft.22

In the derivation cohort, we used univariable Cox regression analyses to assess the associations between allograft failure and clinical, histological, functional, and immunological factors measured at the patient’s risk evaluation (see above). We used the log graphic method to test hazard proportional assumptions. The factors identified in these analyses were thereafter included in a final multivariable model.

We confirmed the internal validity of the final model by using a bootstrap procedure, which involved generating 1000 datasets derived from resampling the original dataset and permitting the calculation of optimism corrected performance estimates.23 We tested the centre effect in stratified analyses. We investigated potential non-linear relations between continuous predictors and graft loss by using fractional polynomial methods (see supplementary methods).

We assessed the accuracy of the prediction model on the basis of its discrimination ability and calibration performance. We evaluated the discrimination ability (the ability to separate patients with different prognoses) of the final model by using Harrell’s concordance index (C index) (see supplementary methods).24 We assessed calibration (the ability to provide unbiased survival predictions in groups of similar patients) on the basis of a visual examination of the calibration plots by using the rms package in R. We used the SurvIDINRI package in R to calculate net reclassification improvement for censored survival data.2526 We then evaluated the external validity of the final model in the external validation cohorts, including discrimination tests and model calibration as mentioned above.

We calculated a risk prediction score (integrative box risk prediction score—iBox) for each patient according to the β regression coefficients estimated from the final multivariable Cox model. Allograft survival probabilities are given at three, five, and seven years after iBox risk evaluation. The seven year post-transplant iBox risk assessment was guided by the median follow-up after iBox risk assessment of 7.65 (interquartile range 5.39-8.21) years.

We used R version 3.2.1 foe all analyses and considered P values below 0.05 to be significant; all tests were two tailed. Details of the interpretation of important statistical concepts are given in the supplementary methods.

Patient and public involvement

The iBox initiative, including study design, study results, and potential for patient care, was presented and discussed among the two main French patients’ associations, involving patients, nurses, and healthcare professionals.


Characteristics of derivation and validation cohorts

The derivation cohort (n=4000) and the two validation cohorts (n=3557) comprised a total of 7557 participants with 1067 (14.1%) allograft failures after a median post-transplant follow-up time of 7.12 (interquartile range 3.51-8.77) years. The characteristics of the derivation and validation cohorts (overall, European, and US validation cohorts), as well as the transplant procedures, policies and allocation systems, are detailed in table 1 and supplementary tables B-D. The distribution of the time of the post-transplant risk evaluation is provided in supplementary figure A. The median time from kidney transplantation to post-transplant risk evaluation was 0.98 (0.27-1.07) years in the derivation cohort and 0.99 (0.18-1.04) years in the validation cohort. The median follow-up after transplantation was 7.65 (5.39-8.21) years in the derivation cohort. The cumulative numbers of graft losses in the development cohort were 332 at three years, 449 at five years, and 549 at seven years.

Table 1

Patients’ characteristics by cohort. Values are numbers (percentages) unless stated otherwise

View this table:

Prediction of kidney allograft failure in derivation cohort

We first investigated the prognostic factors measured at the time of post-transplant risk evaluation that were associated with long term kidney allograft failure in a univariable analysis. These factors included recipient’s demographics, characteristics of transplant, allograft functional parameters, immunological parameters, and allograft histopathology (table 2). In the multivariable analysis, the following independent predictors of long term allograft failure were identified: time of post-transplant risk evaluation (P=0.005); allograft functional parameters, including eGFR (P<0.001) and proteinuria (logarithmic transformation, P<0.001); allograft histological parameters, including interstitial fibrosis and tubular atrophy (P=0.031), microcirculation inflammation defined by glomerulitis and peritubular capillaritis (P=0.001), interstitial inflammation and tubulitis (P=0.014), and transplant glomerulopathy (P=0.004); and recipient’s immunological profile as defined by the presence and concentration of the immunodominant circulating anti-HLA donor specific antibodies (P<0.001) (table 3). We used a Cox model stratified by centre to test the effect of centre. We obtained stratified estimates (with equal coefficients across centres but with a baseline hazard unique to each centre). We confirmed that the eight prognostic parameters identified in the primary analysis remained independently associated with allograft survival (supplementary table E).

Table 2

Factors assessed at time of post-transplant risk evaluation associated with kidney allograft failure in derivation cohort: univariable analysis

View this table:
Table 3

Independent determinants of kidney allograft loss assessed at time of post-transplant risk evaluation in derivation cohort: multivariable analysis

View this table:

We calculated the prognostic score, named iBox, for each patient according to the β regression coefficients estimated from the final multivariable Cox model. On the basis of this score, we built a ready to use online interface for the clinician to provide allograft survival estimates for individual patients ( We are also providing, in supplementary figure B, examples of clinical use of iBox risk prediction scoring in daily practice.

Prediction model performance in internal and external validation cohorts

We first internally validated the final multivariable model via a bootstrapping procedure with 1000 samples from the original dataset of the derivation cohort (supplementary methods). Using this approach, we confirmed the robustness of the final multivariable model: the internal validity of the final model using a bootstrap procedure, which involved generating 1000 datasets derived from resampling the original dataset, thus permitting the calculation of optimism corrected performance estimates. Models were fitted for each of the 1000 samples by using backwards elimination. The eight independent predictors identified in the final multivariable Cox model were replicated in more than 85% of the 1000 estimated models. We also confirmed the discrimination ability of the model at three, five, and seven years (C index 0.835 (95% confidence interval 0.813 to 0.856), 0.819 (0.799 to 0.839), and 0.808 (0.790 to 0.827), respectively) by internally validating it using bootstrap resampling with optimism corrected C index 0.831 (0.813 to 0.854), 0.816 (0.799 to 0.837), and 0.806 (0.790 to 0.827) at three, five, and seven years, respectively.

We then used several independent validation cohorts and confirmed the transportability of the iBox risk score in these geographically distinct cohorts. The cumulative number of allograft losses were 72 (3.4%), 155 (7.3%), and 206 (9.7%) in the European validation cohort and 73 (5.1%), 108 (7.6%), and 148 (10.4%) in the US validation cohort at three, five, and seven years after iBox risk evaluation.

Overall, we showed good discrimination performance in the external validation cohorts with a C statistic of 0.81 (95% bootstrap percentile confidence interval 0.78 to 0.84) in Europe and 0.80 (0.76 to 0.84) in the US. Visual inspection of the calibration plots showed good agreement between the iBox risk score predicted probabilities of allograft survival at three, five, and seven years after risk evaluation and actual kidney allograft survival (fig 1).

Fig 1
Fig 1

Calibration plots at three, five, and seven years of iBox risk scores for validation cohorts: three year (A, B), five year (C, D), and seven year (E, F) predictions. Data are from European validation cohort (A, C, E) and US cohort (B, D, F). Vertical axis is observed proportion of grafts surviving at time of interest. Average predicted probability (predicted survival; x-axis) was plotted against Kaplan-Meier estimate (observed overall survival; y-axis). Black line represents perfectly calibrated model, and blue line represents optimism corrected iBox model

Effect of therapeutic interventions on iBox risk score

We applied the iBox risk score to patients with therapeutic interventions, including 844 kidney transplant recipients from the derivation cohort who received standard of care treatment for antibody mediated rejection, standard of care treatment for T cell mediated rejection, and calcineurin inhibitor weaning for calcineurin inhibitor toxicity with belatacept (characteristics, protocols, and treatment interventions detailed in supplementary table F). Overall, we found that the therapeutic interventions were associated with significant changes in the iBox risk scores (supplementary figure C). The iBox prediction capability after treatment was accurate in these three therapeutic scenarios (C index 0.81, 95% bootstrap percentile confidence interval 0.77 to 0.85). The calibration plot showed a good agreement between the iBox prediction model after therapeutic intervention and the actual observation of kidney allograft loss.

Performance of iBox risk prediction score in therapeutic randomised controlled clinical trials

We tested the performance of the iBox risk prediction score in three registered and published phase II and III clinical trials.181920 The details of the clinical trials including the population, intervention, clinical scenario, and follow-up times are presented in supplementary table A. We calculated the iBox risk prediction scores of all patients included in the trials and compared them with the actual allograft failures. The iBox risk prediction score applied in the three trials showed accurate discrimination overall (C index 0.87, 0.82 to 0.92). The calibration plot showed a good agreement between the risk prediction score based on predicted allograft loss and the actual observations of kidney allograft loss.

Sensitivity analyses

We did various sensitivity analyses to test the robustness and generalisability of the iBox risk score in different clinical scenarios and subpopulations.

iBox integrative risk prediction score using allograft monitoring (eGFR/proteinuria) parameters

We showed that the iBox risk score using the full model was superior in terms of prediction capability to a simplified iBox model including eGFR, proteinuria, and circulating anti-HLA DSA (C index 0.79, 0.77 to 0.81; P<0.001). This was further demonstrated by a continuous net reclassification improvement of 0.228 for the full iBox model compared with the simplified iBox model (95% confidence interval 0.174 to 0.290; P<0.001). To account for potentially different medico-economic contexts limiting the availability of allograft biopsies, we are providing a simplified iBox score based on functional-immunological parameters. The calibration plot showed a good agreement between allograft loss predicted by the simplified iBox model and the actual observations of kidney allograft loss.

Added value of iBox risk prediction score compared with previously reported risk scores

We did a systematic review (supplementary table G) and compared the iBox risk prediction score with previously published risk scores assessing long term allograft outcomes. This showed that the iBox prediction score outperformed other risk scores (supplementary table G).

Prediction model performance using histological diagnoses instead of Banff international classification histological lesion grading

When we included histological diagnoses in the multivariable model instead of histological lesions graded according to the international Banff classification, antibody mediated rejection (P<0.001), T cell mediated rejection (P=0.04), primary nephropathy recurrence (P=0.003), and BK virus nephropathy (P=0.05) showed significant and independent associations with allograft failure. In this model, the set of non-histological predictors of allograft failure identified in the primary analyses remained unchanged (hazard ratios are shown for each parameter in supplementary table H). The discrimination ability of the histological diagnosis based model showed a C index of 0.81 (0.79 to 0.83).

iBox performance when applied at time of clinically indicated biopsies versus protocol biopsies

We tested and confirmed the performance of the iBox risk prediction score when risk evaluation started at the time of clinically indicated allograft biopsies performed at any time after transplantation (n=1598; 40%), as well as at the time of one year protocol biopsies (n=2402; 60%) (table 4). Similarly, the iBox risk score showed accurate discrimination ability for long term allograft loss when risk evaluation started before one year post-transplant or after one year post-transplant (mean post-transplantation time of 0.89 (SD 0.23) years and 2.31 (1.66) years, respectively; table 4).

Table 4

iBox risk prediction score performance when assessed in different clinical scenarios and subpopulations

View this table:

iBox risk score performance versus risk score based on parameters assessed at time of transplantation

When we tested the parameters assessed at time of transplantation (recipient’s age, recipient’s sex, donor’s age, donor’s sex, deceased donor, donor’s cause of death, donor’s diabetes, donor’s hypertension, expanded criteria donor, previous kidney transplant, HLA mismatches, and anti-HLA donor specific antibody), none of them remained independently associated with allograft survival after adjustment for post-transplant parameters assessed at the time of iBox risk evaluation. Similarly, when we added day 0 parameters to the multivariable model including risk factors evaluated post-transplantation, we saw no improvement in its discrimination ability. Lastly, when we ran the Cox model with these parameters assessed at the time of transplantation, the C index was 0.62 (0.593 to 0.643).

iBox assessed in other clinical scenarios and subpopulations

Finally, we confirmed the performance of the iBox risk prediction score when applied in different subpopulations and clinical scenarios including living and deceased donors, according to recipient’s ethnicity, in highly sensitised (high immunological risk) and non-highly sensitised (low immunological risk) recipients, and in patients receiving induction by anti-interleukin-2 receptor or anti-thymocyte globulin (table 4). When parameters assessed at the time of transplant (such as HLA mismatches), recipient blood pressure at the time of risk assessment (log scale), and calcineurin inhibitor through blood concentration at the time of risk assessment were forced in the risk prediction score, we saw no significant improvement in its prognostic performance (table 4).


The iBox, a risk prediction score combining functional, histological, and immunological allograft parameters together with HLA antibody profiling, showed good performance in predicting the risk of long term kidney allograft failure. We confirmed the generalisability of the iBox risk prediction score by showing its external validity in six geographically distinct cohorts recruited in Europe and the US with distinct allocation systems, patients’ characteristics, and management practices. The iBox risk prediction score also showed its accuracy when measured at different times after transplantation, which permits updating of the score on the basis of new events that patients might encounter in their long term course. We also showed that the iBox risk prediction score outperformed other available risk scores applied in kidney transplant patients. Lastly, we confirmed the predictive accuracy of the risk score in the data reported from three published randomised therapeutic trials covering different clinical scenarios encountered after transplantation, further enhancing its value as a potential surrogate endpoint in transplantation.181920

Overall, the predictor variables used in the iBox risk prediction score are easily available after transplantation in most centres worldwide, making it feasible for implementation in routine clinical practice. The iBox risk prediction system assessed the risk at a given time point, but we have shown that it can be re-evaluated at different time points after transplantation, enabling clinicians to calculate a new risk that takes into account the updated values of eGFR, proteinuria, allograft scarring, allograft inflammation, damage, and presence and concentration of anti-HLA DSA. Therefore, we confirmed the iBox system’s transportability for additional and updated evaluations in the patient’s long term course. To account for different potential medico-economic contexts limiting the availability of allograft biopsies, we also provide an abbreviated iBox score based on clinical-functional- immunological parameters.

Comparison with other prognostic scores

Current prognostic scores implemented in clinical practice in transplant medicine mostly predict allograft survival at the time of transplantation; thus, their use is limited to allograft allocation because they do not inform post-transplant clinical decision making and monitoring of patients.27 The few attempts to develop post-transplant prognostic scores have failed to provide useful tools for transplant clinicians. According to a systematic review without date restrictions for publications up to 28 September 2018, for allograft survival scoring systems among kidney transplant recipients (see supplementary table G), no study has developed and externally validated a post-transplant prognostic score usable at any time after transplantation that shows accuracy in clinical trials. The main limitations to achieving a robust and validated scoring system depend on multiple factors including the insufficient data quality of the previously studied cohorts and the fact that no registry or database system has been primarily designed to tackle the specific aspect of prognostication. An even more important aspect is external validation in different populations, which prompted us to conduct a large external validation in multiple centres worldwide. Despite some expected loss of discriminative performance, models are typically considered useful for clinical decision making when the C statistic is greater than 0.70 and strong when the C statistic exceeds 0.80, suggesting that the iBox risk prediction score could support decision making.28 For prognostication systems in other fields such as oncology (for example, locally advanced pancreatic cancer and metastatic colonic cancer), the C index is typically closer to 0.60 or 0.70.29 Taken together, these results confirm not only the robustness and validity of the iBox risk prediction score but also its generalisability to other transplant cohorts with different kidney allocation systems, donor and recipient profiles, and distinct patient management and healthcare environments.

Strengths of study

In this study, we have shown that the iBox risk prediction score outperformed the current gold standard (eGFR and proteinuria) for the monitoring of kidney recipients. In particular, compared with previous attempts at developing a prognostication system, we found that allograft histological lesions such as microcirculation inflammation, interstitial inflammation-tubulitis (reflecting active rejection process) and atrophy-fibrosis, and transplant glomerulopathy (reflecting chronic allograft damage), in addition to measuring allograft functional parameters and recipient antibody profiles, improved the overall discrimination capacity of the model and that a multidimensional risk prediction score performs better than its individual components. This risk prediction score reflects the main patterns of allograft deterioration leading to failure, represented by alloimmune processes and allograft scarring.30 Two other prognostic scores have attempted to combine several transplant diagnostic dimensions, including allograft function and pathology and alloantibodies; however, these scores were outperformed by the iBox risk prediction score.1631

Importantly, our results and the parameters included in the final model reinforce the potential of the iBox to be implemented into contemporary clinical practice by using automated approaches within electronic medical record systems (an online electronic risk calculator is provided at, and examples are provided in supplementary figure B).

In addition, the combination of major drivers of allograft failure in the iBox risk prediction score allowed us to evaluate the early effect of clinical interventions on long term allograft outcomes. In this study, we tested and validated the iBox risk prediction score in the setting of therapeutic clinical trials covering different clinical scenarios and showed accurate performance overall. We found that the prediction of allograft failure assessed by the iBox score accurately fits with the actual graft failures observed in these trials at five years after risk evaluation. Importantly, the accuracy of the iBox risk prediction score was conserved regardless of the therapeutic intervention and population in those trials, with accurate performance in the Certitem (NCT01079143) calcineurin inhibitor minimisation trial and in rejection treatment trials (EudraCT 2007-003213-13; NCT01873157).181920 This finding reinforced the potential of the iBox risk prediction score for defining a valid surrogate endpoint. In our study, a well validated, strong, and robust association existed between the surrogate endpoint and the true endpoint, and this association was consistent across different treatment settings. Finally, because the criteria for defining a surrogate endpoint also include the capacity of a surrogate to be modified by therapeutics, we tested the iBox across three prototypic therapeutic interventions and showed that the iBox score was significantly modified by these therapeutic interventions and showed good performance in this setting as well. Thus, the iBox risk prediction score fulfils all the Prentice criteria for a satisfactory surrogate endpoint.1732

As a development perspective, implementation of patient reported experience data would probably be very relevant in future, so that quality of life predictions can complement those on graft survival, around indicators such as the experience of treatments, the relationship with the transplant doctor, adherence to the therapeutic strategy, engagement, participation in decisions, fatigue, anxiety, depression, and so on. This would imply that other sources of data can be mobilised, from collections made from the patients themselves.

Limitations of study

Regarding the limitations of this study, we acknowledge that statistical significance as a criterion to select variables may not be ideal as it may exclude confounding factors. However, the multiple external validations performed consistently confirm the robustness of our final model. Emerging predictors post-transplant might be also missing in our model. Despite the already high performance achieved by the iBox risk prediction score, future studies should evaluate the added value of new non-invasive biomarkers or genetic factors in addition to those currently reported regarding discriminative capability, generalizability, and overcoming the need for an invasive procedure (kidney allograft biopsy). Although intragraft gene measurements may improve diagnostic accuracy in T cell mediated rejection and antibody mediated rejection, their additive value for allograft survival compared with classical prognostic factors has not yet been demonstrated in large unselected populations.

Another limitation is that information on the adherence to drug treatment of individual patients was lacking in our dataset. Although non-adherence is inherently difficult to capture, especially at a population level,30 the iBox score, because its mechanistically informed design could likely capture the consequences of non-adherence (development of de novo donor specific anti-HLA antibodies, allograft injury, scarring, inflammation, and diminished glomerular filtration rate).

Although the iBox risk prediction score was primarily generated using a large, prospective, unselected cohort, a prospective validation of the iBox in daily clinical practice remains desirable. Finally, despite the validation of the iBox risk prediction score in an interventional setting, future trials are needed to determine whether a strategy based on a systematic risk evaluation compared with an empirical approach might improve clinical management.


We have developed and validated a risk prediction score that accurately predicts allograft failure after kidney transplantation. We have shown its generalisability and transportability across centres in Europe and the US and its performance in therapeutic clinical trials. The risk prediction score provides an accurate but simple strategy that can be easily implemented to stratify patients into clinically meaningful risk groups and that can be time updated after transplant, which may help to guide monitoring of patients in everyday practice and upgrade the shared decision making process. Lastly, as the risk score fulfils the Prentice criteria, it may represent a valid surrogate endpoint that could open avenues for improving the design of clinical trials and development of drugs in transplantation.

What is already known on this topic

  • The transplant field lacks robust studies specifically designed for prediction of risk of long term allograft failure

  • Existing studies do not integrate a large spectrum of prognostic factors and validate scoring systems in multiple large cohorts worldwide with different transplant allocation systems

  • This represents a serious limitation for further improving patient care and drug development

What this study adds

  • This is the first international study of risk prediction in kidney transplant recipients, developed and validated across several large independent populations and in randomised controlled clinical trials

  • The iBox score represents a novel integration of demographic, functional, histological, and immunological factors that can be implemented in routine clinical practice

  • It has potential to upgrade the shared decision making process for transplant patients and represents a valid and early surrogate endpoint for clinical trials and drug development in transplantation


We thank the participants who made this study possible, the advocacy groups that participated in evaluating the iBox, and the two patients’ associations Renaloo and France REIN which gave feedback on the applications of the risk score system and its translation to patient care. We plan to share the results to the wider community and our participants.


  • Contributors: AL and OA designed the study, did the data analysis, and wrote the manuscript. BO, MN, AMJ, DV, CLegendre, MG, NK, OT, EM, MD, DK, AH, ER, EB, FE, GB, GD, MR, GG, DG, JPE, XJ, RAM, MDS, DLS, and CLefaucheur contributed to data acquisition and interpretation. AL, OA, BO, MN, YB, DV, DLS, DG, CLegendre, JPE, and XJ interpreted the data. BO, MN, and YB participated in data interpretation and critically reviewed the manuscript. All authors revised the manuscript for important intellectual content. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted. AL, OA, BO, and MN contributed equally as first authors. XJ, DLS, and CLefaucheur contributed equally as last authors. AL is the guarantor.

  • Funding: INSERM–Action thématique incitative sur programme Avenir (ATIP-Avenir) provided financial support; OA received a grant from the Fondation Bettencourt Schueller; MN received grants from the Research Foundation, Flanders (FWO; IWT.150199), the Flanders Innovation and Entrepreneurship of the Flemish government (IWT.130758), and the Clinical Research Foundation of the University Hospitals Leuven.

  • Competing interests: All authors have completed the ICMJE uniform disclosure form at (available on request from the corresponding author) and declare: support as detailed above for the submitted work; AL holds shares in Cibiltech, a company that develops software; no other relationships or activities that could appear to have influenced the submitted work.

  • Ethical approval: Each patient from the Paris Transplant Group cohort provided written informed consent to be included in the Paris Transplant Group database. This database has been approved by the National French Commission for Bioinformatics, Data, and Patient Liberty: CNIL registration number: 363505, validated 3 April 1996. The institutional review boards of the Paris Transplant Group participating centres approved the study.

  • Data sharing: Technical appendix is available from the corresponding author at

  • Transparency: The lead author (the manuscript's guarantor) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: