Statistics Notes: Diagnostic tests 1: sensitivity and specificityBMJ 1994; 308 doi: https://doi.org/10.1136/bmj.308.6943.1552 (Published 11 June 1994) Cite this as: BMJ 1994;308:1552
- D G Altman,
- J M Bland
- Medical Statistics Laboratory, Imperial Cancer Research Fund, London WC2A 3PX
- Department of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE.
The simplest diagnostic test is one where the results of an investigation, such as an x ray examination or biopsy, are used to classify patients into two groups according to the presence or absence of a symptom or sign. For example, the table shows the relation between the results of a test, a liver scan, and the correct diagnosis based on either necropsy, biopsy, or surgical inspection.1 How good is the liver scan at diagnosis of abnormal pathology?
One approach is to calculate the proportions of patients with normal and abnormal liver scans who are correctly “diagnosed” by the scan. The terms positive and negative are used to refer to the presence or absence of the condition of interest, here abnormal pathology. Thus there are 258 true positives and 86 true negatives. The proportions of these two groups that were correctly diagnosed by the scan were 231/258=0.90 and 54/86=0.63 respectively. These two proportions have confusingly similar names.
Sensitivity is the proportion of true positives that are correctly identified by the test.
Specificity is the proportion of true negatives that are correctly identified by the test.
We can thus say that, based on the sample studied, we would expect 90% of patients with abnormal pathology to have abnormal (positive) liver scans, while 63% of those with normal pathology would have normal (negative) liver scans.
The sensitivity and specificity are proportions, so confidence intervals can be calculated for them using standard methods for proportions.2
Sensitivity and specificity are one approach to quantifying the diagnostic ability of the test. In clinical practice, however, the test result is all that is known, so we want to know how good the test is at predicting abnormality. In other words, what proportion of patients with abnormal test results are truly abnormal? This question is addressed in a subsequent note.