Skip to main content
Log in

Assessing the discriminative ability of risk models for more than two outcome categories

European Journal of Epidemiology Aims and scope Submit manuscript

Abstract

The discriminative ability of risk models for dichotomous outcomes is often evaluated with the concordance index (c-index). However, many medical prediction problems are polytomous, meaning that more than two outcome categories need to be predicted. Unfortunately such problems are often dichotomized in prediction research. We present a perspective on the evaluation of discriminative ability of polytomous risk models, which may instigate researchers to consider polytomous prediction models more often. First, we suggest a “discrimination plot” as a tool to visualize the model’s discriminative ability. Second, we discuss the use of one overall polytomous c-index versus a set of dichotomous measures to summarize the performance of the model. Third, we address several aspects to consider when constructing a polytomous c-index. These involve the assessment of concordance in pairs versus sets of patients, weighting by outcome prevalence, the value related to models with random performance, the reduction to the dichotomous c-index for dichotomous problems, and interpretation. We illustrate these issues on case studies dealing with ovarian cancer (four outcome categories) and testicular cancer (three categories). We recommend the use of a discrimination plot together with an overall c-index such as the Polytomous Discrimination Index. If the overall c-index suggests that the model has relevant discriminative ability, pairwise c-indexes for each pair of outcome categories are informative. For pairwise c-indexes we recommend the ‘conditional-risk’ method which is consistent with the analytical approach of the multinomial logistic regression used to develop polytomous risk models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Steyerberg EW. Clinical prediction models: a practical approach to development, validation, and updating. New York: Springer; 2009.

    Google Scholar 

  2. Biesheuvel CJ, Vergouwe Y, Steyerberg EW, Grobbee DE, Moons KGM. Polytomous logistic regression analysis could be applied more often in diagnostic research. J Clin Epidemiol. 2008;61:125–34.

    Article  PubMed  CAS  Google Scholar 

  3. Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15:361–87.

    Article  PubMed  Google Scholar 

  4. Harrell FE Jr. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. New York: Springer; 2001.

    Google Scholar 

  5. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36.

    PubMed  CAS  Google Scholar 

  6. Mossman D. Three-way ROCs. Med Decis Making. 1999;19:78–89.

    Article  PubMed  CAS  Google Scholar 

  7. Hand DJ, Till RJ. A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn. 2001;45:171–86.

    Article  Google Scholar 

  8. Obuchowski NA, Goske MJ, Applegate KE. Assessing physicians’ accuracy in diagnosing paediatric patients with acute abdominal pain: measuring accuracy for multiple diseases. Stat Med. 2001;20:3261–78.

    Article  PubMed  CAS  Google Scholar 

  9. Provost F, Domingos P. Tree induction for probability-based ranking. Mach Learn. 2003;52:199–215.

    Article  Google Scholar 

  10. Obuchowski NA. Estimating and comparing diagnostic tests’ accuracy when the gold standard is not binary. Acad Radiol. 2005;12:1198–204.

    Article  PubMed  Google Scholar 

  11. Van Calster B, Van Belle V, Vergouwe Y, Timmerman D, Van Huffel S, Steyerberg EW. Extending the c-statistic to nominal polytomous outcomes: the Polytomous Discrimination Index. Stat Med. 2012;31:2610–26.

    Google Scholar 

  12. Nakas CT, Yiannoutsos CT. Ordered multiple-class ROC analysis with continuous measurements. Stat Med. 2004;23:3437–49.

    Article  PubMed  Google Scholar 

  13. Nakas CT, Alonzo TA. ROC graphs for assessing the ability of a diagnostic marker to detect three disease classes with an umbrella ordering. Biometrics. 2007;63:603–9.

    Article  PubMed  Google Scholar 

  14. Van Calster B, Van Belle V, Vergouwe Y, Steyerberg EW. Discrimination ability of prediction models for ordinal outcomes: relationship between existing measures and a new measure. Biom J. 2012;54:674–85.

    Article  PubMed  Google Scholar 

  15. Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21:128–38.

    Article  PubMed  Google Scholar 

  16. Panici PB, Muzii L, Palaia I, Manci N, Bellati F, Plotti F, et al. Minilaparotomy versus laparoscopy in the treatment of benign adnexal cysts: a randomized clinical study. Eur J Obstet Gynecol Reprod Biol. 2007;133:218–22.

    Article  PubMed  Google Scholar 

  17. Tinelli R, Tinelli A, Tinelli FG, Cicinelli E, Malvasi A. Conservative surgery for borderline ovarian tumors: a review. Gynecol Oncol. 2006;100:185–91.

    Article  PubMed  Google Scholar 

  18. Hennessy BT, Coleman RL, Markman M. Ovarian cancer. Lancet. 2009;374:1371–82.

    Article  PubMed  CAS  Google Scholar 

  19. Timmerman D, Testa AC, Bourne T, Ferrazzi E, Ameye L, Konstantinovic ML, et al. A logistic regression model to distinguish between the benign and malignant adnexal mass before surgery: a multicenter study by the International Ovarian Tumor Analysis (IOTA) group. J Clin Oncol. 2005;23:8794–801.

    Article  PubMed  Google Scholar 

  20. Van Holsbeke C, Van Calster B, Testa AC, Domali E, Lu C, Van Huffel S, et al. Prospective internal validation of mathematical models to predict malignancy in adnexal masses: results from the International Ovarian Tumor Analysis Study. Clin Cancer Res. 2009;15:684–91.

    Article  PubMed  Google Scholar 

  21. Timmerman D, Van Calster B, Testa AC, Guerriero S, Fischerova D, Lissoni AA, et al. Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression models: a temporal and external validation study by the IOTA group. Ultrasound Obstet Gynecol. 2010;36:226–34.

    Article  PubMed  CAS  Google Scholar 

  22. Van Holsbeke C, Van Calster B, Bourne T, Ajossa S, Testa AC, Guerriero S, et al. External validation of diagnostic models to estimate the risk of malignancy in adnexal masses. Clin Cancer Res. 2012;18:815–25.

    Article  PubMed  Google Scholar 

  23. Timmerman D, Valentin L, Bourne TH, Collins WP, Verrelst H, Vergote I. Terms, definitions and measurements to describe the ultrasonographic features of adnexal tumors: a consensus opinion from the international ovarian tumor analysis (IOTA) group. Ultrasound Obstet Gynecol. 2000;16:500–5.

    Article  PubMed  CAS  Google Scholar 

  24. Van Calster B, Valentin L, Van Holsbeke C, Zhang J, Jurkovic D, Lissoni AA, et al. A novel approach to predict the likelihood of specific ovarian tumor pathology based on serum CA-125: a multicenter observational study. Cancer Epidemiol Biomarkers Prev. 2011;20:2420–8.

    Article  PubMed  Google Scholar 

  25. Hosmer DW, Lemeshow S. Applied logistic regression. 2nd ed. New York: Wiley; 2000.

    Book  Google Scholar 

  26. Van Calster B, Valentin L, Van Holsbeke C, Testa AC, Bourne T, Van Huffel S, et al. Polytomous diagnosis of ovarian tumors as benign, borderline, primary invasive or metastatic: development and validation of standard and kernel-based risk prediction models. BMC Med Res Methodol. 2010;10:96.

    Article  PubMed  Google Scholar 

  27. Steyerberg EW, Keizer HJ, Fosså SD, Sleijfer DT, Toner GC, Schraffordt Koops H, et al. Prediction of residual retroperitoneal mass histology after chemotherapy for metastatic nonseminomatous germ cell tumor: multivariate analysis of individual patient data from six study groups. J Clin Oncol. 1995;13:1177–87.

    PubMed  CAS  Google Scholar 

  28. Steyerberg EW, Gerl A, Fosså SD, Sleijfer DT, de Wit R, Kirkels WJ, et al. Validity of predictions of residual retroperitoneal mass histology in nonseminomatous testicular cancer. J Clin Oncol. 1998;16:269–74.

    PubMed  CAS  Google Scholar 

  29. Vergouwe Y, Steyerberg EW, de Wit R, Roberts JT, Keizer HJ, Collette L, et al. External validity of a prediction rule for residual mass histology in testicular cancer: an evaluation for good prognosis patients. Br J Cancer. 2003;88:843–7.

    Article  PubMed  CAS  Google Scholar 

  30. Vergouwe Y, Steyerberg EW, Foster RS, Sleijfer DT, Fosså SD, Gerl A, et al. Predicting retroperitoneal histology in postchemotherapy testicular germ cell cancer: a model update and multicentre validation with more than 1000 patients. Eur Urol. 2007;51:424–32.

    Article  PubMed  Google Scholar 

  31. Steyerberg EW, Mushkudiani N, Perel P, Butcher I, Lu J, McHugh GS, et al. Predicting outcome after traumatic brain injury: development and international validation of prognostic scores based on admission characteristics. PLoS Med. 2008;5:e165.

    Article  PubMed  Google Scholar 

  32. Van Calster B, Van Belle V, Condous G, Bourne T, Timmerman D, Van Huffel S. Multi-class AUC metrics and weighted alternatives. In: Liu D, Kozma R, editors. Proceedings of the 21st international joint conference on neural networks. Los Alamitos: IEEE Computer Society; 2008. p. 1391–7.

    Google Scholar 

  33. Vickers AJ, Cronin AM, Begg CB. One statistical test is sufficient for assessing new predictive markers. BMC Med Res Methodol. 2011;11:13.

    Article  PubMed  Google Scholar 

  34. Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006;26:565–74.

    Article  PubMed  Google Scholar 

  35. Leeflang MMG, Bossuyt PMM, Irwig L. Diagnostic test accuracy may vary with prevalence: implications for evidence-based diagnosis. J Clin Epidemiol. 2009;62:5–12.

    Article  PubMed  Google Scholar 

  36. Webb GI, Ting KM. On the application of ROC analysis to predict classification performance under varying class distributions. Mach Learn. 2005;58:25–32.

    Article  Google Scholar 

  37. Whiting P, Rutjes AWS, Reitsma JB, Glas AS, Bossuyt PMM, Kleijnen J. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med. 2004;140:189–202.

    PubMed  Google Scholar 

  38. Moons KGM, van Es GA, Deckers JW, Habbema JDF, Grobbee DE. Limitations of sensitivity, specificity, likelihood ratio, and Bayes’ theorem in assessing diagnostic probabilities: a clinical example. Epidemiology. 1997;8:12–7.

    Article  PubMed  CAS  Google Scholar 

  39. Pepe MS, Janes HE. Gauging the performance of SNPs, biomarkers, and clinical factors for predicting risk of breast cancer (editorial). J Natl Cancer Inst. 2008;100:978–9.

    Article  PubMed  CAS  Google Scholar 

  40. Janes H, Pepe MS, Gu W. Assessing the value of risk predictions using risk stratification tables. Ann Intern Med. 2008;149:751–60.

    PubMed  Google Scholar 

  41. Dreiseitl S, Ohno-Machado L, Binder M. Comparing three-class diagnostic tests by three-way ROC analysis. Med Decis Making. 2000;20:323–31.

    Article  PubMed  CAS  Google Scholar 

  42. Skaltsa K, Jover L, Fuster D, Carrasco JL. Optimum threshold estimation based on cost function in a multistate diagnostic setting. Stat Med. 2012;31:1098–109.

    Article  PubMed  Google Scholar 

  43. O’Brien DB, Gupta MR, Gray RM. Cost-sensitive multi-class classification from probability estimates. In: Cohen WW, McCallum A, Roweis ST, editors. Proceedings of the 25th international conference on machine learning. New York: Association for Computing Machinery; 2008. p. 712–9.

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was supported by the Research Foundation—Flanders (FWO) (grants 1.2516.09 N, 1.2516.12 N, G.0493.12 N), Agency for Innovation by Science and Technology (IWT Vlaanderen) (grant TBM070706-IOTA3), the Research Council of the KU Leuven, and the Netherlands Organization for Scientific Research (grant 9120.8004). Ben Van Calster is a postdoctoral fellow of the Research Foundation—Flanders (FWO). Vanya Van Belle is supported by a postdoctoral fellowship from KU Leuven’s Special Research Fund (BOF) and a postdoctoral fellowship from FWO.

Conflict of interest

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ben Van Calster.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Van Calster, B., Vergouwe, Y., Looman, C.W.N. et al. Assessing the discriminative ability of risk models for more than two outcome categories. Eur J Epidemiol 27, 761–770 (2012). https://doi.org/10.1007/s10654-012-9733-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10654-012-9733-3

Keywords

Navigation