BMJ 1994;308:1552 (11 June)

General practice

Statistics Notes: Diagnostic tests 1: sensitivity and specificity

D G Altman, J M Bland 

Medical Statistics Laboratory, Imperial Cancer Research Fund, London WC2A 3PX Department of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE.

The simplest diagnostic test is one where the results of an investigation, such as an x ray examination or biopsy, are used to classify patients into two groups according to the presence or absence of a symptom or sign. For example, the table shows the relation between the results of a test, a liver scan, and the correct diagnosis based on either necropsy, biopsy, or surgical inspection.1 How good is the liver scan at diagnosis of abnormal pathology?


Relation between results of liver scan and correct diagnosis1
-----------------------------------------------------------
                                Pathology
              ---------------------------------------------
                Abnormal         Normal
Liver scan        (+)             (-)          Total
-----------------------------------------------------------
Abnormal(+)       231              32            263
Normal(-)          27              54             81
-----------------------------------------------------------
Total             258              86            344

One approach is to calculate the proportions of patients with normal and abnormal liver scans who are correctly "diagnosed" by the scan. The terms positive and negative are used to refer to the presence or absence of the condition of interest, here abnormal pathology. Thus there are 258 true positives and 86 true negatives. The proportions of these two groups that were correctly diagnosed by the scan were 231/258=0.90 and 54/86=0.63 respectively. These two proportions have confusingly similar names.

Sensitivity is the proportion of true positives that are correctly identified by the test.

Specificity is the proportion of true negatives that are correctly identified by the test.

We can thus say that, based on the sample studied, we would expect 90% of patients with abnormal pathology to have abnormal (positive) liver scans, while 63% of those with normal pathology would have normal (negative) liver scans.

The sensitivity and specificity are proportions, so confidence intervals can be calculated for them using standard methods for proportions.2

Sensitivity and specificity are one approach to quantifying the diagnostic ability of the test. In clinical practice, however, the test result is all that is known, so we want to know how good the test is at predicting abnormality. In other words, what proportion of patients with abnormal test results are truly abnormal? This question is addressed in a subsequent note.

  1. Drum DE, Christacapoulos JS. Hepatic scintigraphy in clinical decision making. J Nucl Med 1972;13:908-15. [Abstract/Free Full Text]
  2. Gardner MJ, Altman DG. Calculating confidence intervals for proportions and their differences. In: Gardner MJ, Altman DG, eds. Statistics with confidence. London: BMJ Publishing Group, 1989:28-33.

This article has been cited by other articles:

  • Kalita, J, Misra, U K, Das, M (2008). Neurophysiological criteria in the diagnosis of different clinical types of Guillain-Barre syndrome. J. Neurol. Neurosurg. Psychiatry 79: 289-293 [Abstract] [Full text]  
  • Lotsch, J., Reichmann, H., Hummel, T. (2008). Different Odor Tests Contribute Differently to the Evaluation of Olfactory Loss. Chem Senses 33: 17-21 [Abstract] [Full text]  
  • Sadatsafavi, M., Moayyeri, A., Bahrami, H., Soltani, A. (2007). The Value of Bayes Theorem in the Interpretation of Subjective Diagnostic Findings: What Can We Learn from Agreement Studies?. Med Decis Making 27: 735-743 [Abstract]  
  • Peterson, R. C., Wolffsohn, J. S. (2007). Sensitivity and reliability of objective image analysis compared to subjective grading of bulbar hyperaemia. Br. J. Ophthalmol. 91: 1464-1466 [Abstract] [Full text]  
  • Lex, C., Ferreira, F., Zacharasiewicz, A., Nicholson, A. G., Haslam, P. L., Wilson, N. M., Hansel, T. T., Payne, D. N. R., Bush, A. (2006). Airway Eosinophilia in Children with Severe Asthma: Predictive Values of Noninvasive Tests. Am. J. Respir. Crit. Care Med. 174: 1286-1291 [Abstract] [Full text]  
  • Cleary, P. A., Orchard, T. J., Genuth, S., Wong, N. D., Detrano, R., Backlund, J.-Y. C., Zinman, B., Jacobson, A., Sun, W., Lachin, J. M., Nathan, D. M., for the DCCT/EDIC Research Group, (2006). The Effect of Intensive Glycemic Treatment on Coronary Artery Calcification in Type 1 Diabetic Participants of the Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications (DCCT/EDIC) Study. Diabetes 55: 3556-3565 [Abstract] [Full text]  
  • Zhang, W, Doherty, M, Pascual, E, Bardin, T, Barskova, V, Conaghan, P, Gerster, J, Jacobs, J, Leeb, B, Liote, F, McCarthy, G, Netter, P, Nuki, G, Perez-Ruiz, F, Pignone, A, Pimentao, J, Punzi, L, Roddy, E, Uhlig, T, Zimmermann-Gorska, I (2006). EULAR evidence based recommendations for gout. Part I: Diagnosis. Report of a task force of the standing committee for international clinical studies including therapeutics (ESCISIT). Ann Rheum Dis 65: 1301-1311 [Abstract] [Full text]  
  • Bulgiba, A. M., Fisher, M. H. (2006). Using neural networks and just nine patient-reportable factors of screen for AMI.. Health Informatics Journal 12: 213-225 [Abstract]  
  • Moskowitz, C. S, Pepe, M. S (2006). Comparing the predictive values of diagnostic tests: sample size and analysis for paired study designs. Clin Trials 3: 272-279 [Abstract]  
  • R Richter, R., Reinking, M. F (2005). How does evidence on the diagnostic accuracy of the vertebral artery test influence teaching of the test in a professional physical therapist education program?. ptjournal 85: 589-599 [Full text]  
  • (2004). Which test for Helicobacter pylori in primary care?. DTB 42: 71-72 [Abstract] [Full text]  
  • Deeks, J. J, Altman, D. G (2004). Diagnostic tests 4: likelihood ratios. BMJ 329: 168-169 [Full text]  
  • Bunn, H.J., Woltmann, G., Grigg, J. (2002). Applicability of laser scanning cytometry to study paediatric alveolar macrophages. Eur Respir J 20: 1437-1443 [Abstract] [Full text]  
  • Equi, A C, Pike, S E, Davies, J, Bush, A (2001). Use of cough swabs in a cystic fibrosis clinic. Arch. Dis. Child. 85: 438-439 [Abstract] [Full text]  
  • Chinn, S. (2001). Statistics for the European Respiratory Journal. Eur Respir J 18: 393-401 [Abstract] [Full text]  
  • Hassey, A., Gerrett, D., Wilson, A. (2001). A survey of validity and utility of electronic patient records in a general practice. BMJ 322: 1401-1405 [Abstract] [Full text]  
  • Rushforth, H., Bliss, A., Burge, D., Glasper, E. A. (2000). A pilot randomised controlled trial of medical versus nurse clerking for minor surgery. Arch. Dis. Child. 83: 223-226 [Abstract] [Full text]  
  • Tsuang, D., Larson, E. B., Bowen, J., McCormick, W., Teri, L., Nochlin, D., Leverenz, J. B., Peskind, E. R., Lim, A., Raskind, M. A., Thompson, M. L., Mirra, S. S., Gearing, M., Schellenberg, G. D., Kukull, W. (1999). The Utility of Apolipoprotein E Genotyping in the Diagnosis of Alzheimer Disease in a Community-Based Case Series. Arch Neurol 56: 1489-1495 [Abstract] [Full text]  
  • McConville, J P, Craig, J J, Collinge, J., Rossor, M. N, Thomas, D., Frosh, A., Tolley, N., Otto, M., Zerr, I., Poser, S., Wiltfang, J., Schütz, E., Pfahlberg, A., Gefeller, O. (1998). Diagnosis of Creutzfeldt-Jakob disease by measurement of S100 protein in serum. BMJ 317: 472-472 [Full text]  
  • Maffulli, N. (1998). The Clinical Diagnosis of Subcutaneous Tear of the AchillesTendon: A Prospective Study in 174 Patients. Am J Sports Med 26: 266-270 [Abstract] [Full text]  
  • Harding, S P, Broadbent, D M, Neoh, C, White, M C, Vora, J (1995). Sensitivity and specificity of photography and direct ophthalmoscopy in screening for sight threatening eye disease: the Liverpool diabetic eye study. BMJ 311: 1131-1135 [Abstract] [Full text]  
  • Altman, D G, Bland, J M (1994). Statistics Notes: Diagnostic tests 3: receiver operating characteristic plots. BMJ 309: 188-188 [Full text]  

Online poll
Find out more

Rapid responses for this article

There are no rapid responses for this article.


Student BMJ

Risk of surgery for inflammatory bowel disease: record linkage studies

What can you learn from this BMJ paper? Read Leanne Tite's Paper+

www.student.bmj.com

Listen to the latest BMJ Interview