BMJ 1994;309:188 (16 July)

Education and debate

Statistics Notes: Diagnostic tests 3: receiver operating characteristic plots

D G Altman, J M Bland 

Medical Statistics Laboratory, Imperial Cancer Research Fund, London WC2A 3PX Department of Public Health Sciences, St George's Hospital Medical School, London SW17 1RE.

We have previously considered diagnosis based on tests that give a yes or no answer.1,2 Many diagnostic tests, however, are quantitative, notably in clinical chemistry. The same statistical approach can be used only if we can select a cut off point to distinguish "normal" from "abnormal," which is not a trivial problem. Firstly, we can investigate to what extent the test results differ among people who do or do not have the diagnosis of interest. The receiver operating characteristic (ROC) plot is one way to do this. These plots were developed in the 1950s for evaluating radar signal detection. Only recently have they become commonly used in medicine.

We assume that high values are more likely among those dubbed "abnormal." Figure 1 shows the values of an index of mixed epidermal cell lymphocyte reactions in bone marrow transplant recipients who did or did not develop graft versus host disease.3 The usefulness of the test for predicting graft versus host disease will clearly relate to the degree of non- overlap between the two distributions.



View larger version (15K):
[in this window]
[in a new window]
 
FIG 1 (left) - Distribution of values of an index of mixed epidermal cell lymphocyte reactions in patients who did or did not develop grafts versus host disease3

A receiver operating characteristic plot is obtained by calculating the sensitivity and specificity of every observed data value and plotting sensitivity against 1 - specificity, as in Figure 2. A test that perfectly discriminates between the two groups would yield a "curve" that coincided with the left and top sides of the plot. A test that is completely useless would give a straight line from the bottom left corner to the top right corner. In practice there is virtually always some overlap of the values in the two groups, so the curve will lie somewhere between these extremes.



View larger version (15K):
[in this window]
[in a new window]
 
FIG 2 (above) - Receiver operating characteristic curve for the data shown in fig 1

A global assessment of the performance of the test (sometimes called diagnostic accuracy4) is given by the area under the receiver operating characteristic curve. This area is equal to the probability that a random person with the disease has a higher value of the measurement than a random person without the disease. (This probability is a half for an uninformative test - equivalent to tossing a coin.)

No test will be clinically useful if it cannot discriminate,4 so a global assessment of discriminatory power is an important step. Having determined that a test does provide good discrimination the choice can be made of the best cut off point for clinical use. This requires the choice of a particular point, and is thus a local assessment. The simple approach of minimising "errors" (equivalent to maximising the sum of the sensitivity and specificity) is not necessarily best. Consideration needs to be given to the costs (not just financial) of false negative and false positive diagnoses and to the prevalence of the disease in the subjects being tested.4 For example, when screening the general population for cancer the cut off point would be chosen to ensure that most cases were detected (high sensitivity) at the cost of many false positives (low specificity), who could then be eliminated by a further test.

A receiver operating characteristic plot is particularly useful when comparing two or more measures. A test with a curve that lies wholly above the curve of another will be clearly better. Methods for comparing the areas under two curves for both paired and unpaired data are reviewed by Zweing and Campbell,4 who give a full assessment of this method.

  1. Altman DG, Bland M. Diagnostic tests 1: sensitivity and specificity. BMJ 1994;308:1552. [Free Full Text]
  2. Altman DG, Bland M. Diagnostic tests 2: predictive values. BMJ 1994;309:102. [Free Full Text]
  3. Bagot M, Mary J-Y, Heslan M, et al. The mixed epidermal cell lymphocyte - reaction is the most predictive factor of acute graft-versus- host disease in bone marrow graft recipients. Br J Haematol 1988;70:403-9. [Medline]
  4. Zweig MH, Campbell G. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 1993;39:561-77. [Abstract/Free Full Text]

This article has been cited by other articles:

  • Wilford, J., McMahon, A. D., Peters, J., Pickvance, S., Jackson, A., Blank, L., Craig, D., O'Rourke, A., Macdonald, E. B. (2008). Predicting job loss in those off sick. Occup Med (Lond) 58: 99-106 [Abstract] [Full text]  
  • Pashayan, N., Duff, C., Mason, B. W. (2007). Selection into specialty training in public health: performance of the Medical Training Application Service shortlisting. J Public Health (Oxf) 29: 331-337 [Abstract] [Full text]  
  • Malviya, S., Voepel-Lewis, T., Tait, A. R., Watcha, M. F., Sadhasivam, S., Friesen, R. H. (2007). Effect of Age and Sedative Agent on the Accuracy of Bispectral Index in Detecting Depth of Sedation in Children. Pediatrics 120: e461-e470 [Abstract] [Full text]  
  • Wyffels, P. A. H., Durnez, P.-J., Helderweirt, J., Stockman, W. M. A., De Kegel, D. (2007). Ventilation-Induced Plethysmographic Variations Predict Fluid Responsiveness in Ventilated Postoperative Cardiac Surgery Patients. Anesth. Analg. 105: 448-452 [Abstract] [Full text]  
  • Gagliardi, L., Bellu, R. (2007). Score for Neonatal Acute Physiology (SNAP) or Vermont Oxford Risk-Adjustment Model for Very Low Birth Weight Infants?. Pediatrics 119: 1246-1247 [Full text]  
  • Ng, E. H. Y., Chan, C. C. W., Tang, O. S., Yeung, W. S. B., Ho, P. C. (2007). Endometrial and subendometrial vascularity is higher in pregnant patients with livebirth following ART than in those who suffer a miscarriage. Hum Reprod 22: 1134-1141 [Abstract] [Full text]  
  • Woodward, M., Brindle, P., Tunstall-Pedoe, H., for the SIGN group on risk estimation*, (2007). Adding social deprivation and family history to cardiovascular risk assessment: the ASSIGN score from the Scottish Heart Health Extended Cohort (SHHEC). Heart 93: 172-176 [Abstract] [Full text]  
  • Lex, C., Ferreira, F., Zacharasiewicz, A., Nicholson, A. G., Haslam, P. L., Wilson, N. M., Hansel, T. T., Payne, D. N. R., Bush, A. (2006). Airway Eosinophilia in Children with Severe Asthma: Predictive Values of Noninvasive Tests. Am. J. Respir. Crit. Care Med. 174: 1286-1291 [Abstract] [Full text]  
  • Bulgiba, A. M., Fisher, M. H. (2006). Using neural networks and just nine patient-reportable factors of screen for AMI.. Health Informatics Journal 12: 213-225 [Abstract]  
  • Cowling, B. J., Muller, M. P., Wong, I. O. L., Ho, L.-M., Lo, S.-V., Tsang, T., Lam, T. H., Louie, M., Leung, G. M. (2006). Clinical prognostic rules for severe acute respiratory syndrome in low- and high-resource settings.. Arch Intern Med 166: 1505-1511 [Abstract] [Full text]  
  • Ng, E. H. Y., Chan, C. C. W., Tang, O. S., Yeung, W. S. B., Ho, P. C. (2006). The role of endometrial and subendometrial vascularity measured by three-dimensional power Doppler ultrasound in the prediction of pregnancy during frozen-thawed embryo transfer cycles. Hum Reprod 21: 1612-1617 [Abstract] [Full text]  
  • Tunstall-Pedoe, H, Woodward, M, for the SIGN group on risk estimation, (2006). By neglecting deprivation, cardiovascular risk scoring will exacerbate social gradients in disease. Heart 92: 307-310 [Abstract] [Full text]  
  • Ng, E. H. Y., Chan, C. C. W., Tang, O. S., Yeung, W. S. B., Ho, P. C. (2006). The role of endometrial and subendometrial blood flows measured by three-dimensional power Doppler ultrasound in the prediction of pregnancy during IVF treatment. Hum Reprod 21: 164-170 [Abstract] [Full text]  
  • Sergerie, M., Laforest, G., Bujan, L., Bissonnette, F., Bleau, G. (2005). Sperm DNA fragmentation: threshold value in male fertility. Hum Reprod 20: 3446-3451 [Abstract] [Full text]  
  • O'Sullivan, M, Morris, R G, Markus, H S (2005). Brief cognitive assessment for patients with cerebral small vessel disease. J. Neurol. Neurosurg. Psychiatry 76: 1140-1145 [Abstract] [Full text]  
  • Lindholm, L., Sarkkinen, H. (2004). Direct Identification of Gram-Positive Cocci from Routine Blood Cultures by Using AccuProbe Tests. J. Clin. Microbiol. 42: 5609-5613 [Abstract] [Full text]  
  • Vanluchene, A. L. G., Struys, M. M. R. F., Heyse, B. E. K., Mortier, E. P. (2004). Spectral entropy measurement of patient responsiveness during propofol and remifentanil. A comparison with the bispectral index. Br J Anaesth 93: 645-654 [Abstract] [Full text]  
  • Kalpoe, J. S., Kroes, A. C. M., de Jong, M. D., Schinkel, J., de Brouwer, C. S., Beersma, M. F. C., Claas, E. C. J. (2004). Validation of Clinical Application of Cytomegalovirus Plasma DNA Load Measurement and Definition of Treatment Criteria by Analysis of Correlation to Antigen Detection. J. Clin. Microbiol. 42: 1498-1504 [Abstract] [Full text]  
  • Thomas, M, Greenough, A, Johnson, A, Limb, E, Marlow, N, Peacock, J L, Calvert, S (2003). Frequent wheeze at follow up of very preterm infants: which factors are predictive?. Arch. Dis. Child. Fetal Neonatal Ed. 88: F329-F332 [Abstract] [Full text]  
  • Wiegman, A., Rodenburg, J., de Jongh, S., Defesche, J. C., Bakker, H. D., Kastelein, J. J.P., Sijbrands, E. J.G. (2003). Family History and Cardiovascular Risk in Familial Hypercholesterolemia: Data in More Than 1000 Children. Circulation 107: 1473-1478 [Abstract] [Full text]  
  • Makkar, G., Ng, E. H. Y., Yeung, W. S. B., Ho, P. C. (2003). The significance of the ionophore-challenged acrosome reaction in the prediction of successful outcome of controlled ovarian stimulation and intrauterine insemination. Hum Reprod 18: 534-539 [Abstract] [Full text]  
  • Stoeber, K., Swinn, R., Prevost, A. T., de Clive-Lowe, P., Halsall, I., Dilworth, S. M., Marr, J., Turner, W. H., Bullock, N., Doble, A., Hales, C. N., Williams, G. H. (2002). Diagnosis of Genito-Urinary Tract Cancer by Detection of Minichromosome Maintenance 5 Protein in Urine Sediments. JNCI J Natl Cancer Inst 94: 1071-1079 [Abstract] [Full text]  
  • Gosche, K. M., Mortimer, J. A., Smith, C. D., Markesbery, W. R., Snowdon, D. A. (2002). Hippocampal volume as an index of Alzheimer neuropathology: Findings from the Nun Study. Neurology 58: 1476-1482 [Abstract] [Full text]  
  • Belhassen, L., Carville, C., Pelle, G., Monin, J. L., Teiger, E., Duval-Moulin, A.-M., Dupouy, P., Dubois Rande, J. L., Gueret, P. (2002). Evaluation of carotid artery and aortic intima-media thickness measurements for exclusion of significant coronary atherosclerosis in patients scheduled for heart valve surgery. J Am Coll Cardiol 39: 1139-1144 [Abstract] [Full text]  
  • Dimitriou, G, Greenough, A, Endo, A, Cherian, S, Rafferty, G F (2002). Prediction of extubation failure in preterm infants. Arch. Dis. Child. Fetal Neonatal Ed. 86: F32-35 [Abstract] [Full text]  
  • Esterhuizen, A.D., Franken, D.R., Lourens, J.G.H., van Rooyen, L.H. (2001). Clinical importance of zona pellucida-induced acrosome reaction and its predictive value for IVF. Hum Reprod 16: 138-144 [Abstract] [Full text]  
  • Renehan, A. G., Painter, J. E., O’Halloran, D., Atkin, W. S., Potten, C. S., O’Dwyer, S. T., Shalet, S. M. (2000). Circulating Insulin-Like Growth Factor II and Colorectal Adenomas. J. Clin. Endocrinol. Metab. 85: 3402-3408 [Abstract] [Full text]  
  • Esterhuizen, A.D., Franken, D.R., Lourens, J.G.H., Prinsloo, E., van Rooyen, L.H. (2000). Sperm chromatin packaging as an indicator of in-vitro fertilization rates. Hum Reprod 15: 657-661 [Abstract] [Full text]  
  • Le Fevre, P., Devereux, J., Smith, S., Lawrie, S. M, Cornbleet, M. (1999). Screening for psychiatric illness in the palliative care inpatient setting: a comparison between the Hospital Anxiety and Depression Scale and the General Health Questionnaire-12. Palliat Med 13: 399-407 [Abstract]  
  • Adams, M. R., Nakagomi, A., Keech, A., Robinson, J., McCredie, R., Bailey, B. P., Freedman, S. B., Celermajer, D. S. (1995). Carotid Intima-Media Thickness Is Only Weakly Correlated With the Extent and Severity of Coronary Artery Disease. Circulation 92: 2127-2134 [Abstract] [Full text]  

Rapid Responses:

Read all Rapid Responses

Application of ROC Plot in Predicting Adolescents’ Suicidal Behavior
Kam Cheong WONG, et al.
bmj.com, 13 Jan 2004 [Full text]
common and terrible misunderstanding
Maciej Górkiewicz
bmj.com, 24 Jun 2005 [Full text]
Clarification of the 'avoidable misunderstanding' on ROC methodology
Maciej Górkiewicz, et al.
bmj.com, 28 Jan 2006 [Full text]



Student BMJ

Risk of surgery for inflammatory bowel disease: record linkage studies

What can you learn from this BMJ paper? Read Leanne Tite's Paper+

www.student.bmj.com

Listen to the latest BMJ Interview