This article has a correction
- Trisha Greenhalgh, senior lecturer (email@example.com)a
- a Unit for Evidence-Based Practice and Policy Department of Primary Care and Population Sciences University College London Medical School/Royal Free Hospital School of Medicine Whittington Hospital London N19 5NF
If you are new to the concept of validating diagnostic tests, the following example may help you. Ten men are awaiting trial for murder. Only three of them actually committed a murder; the seven others are innocent of any crime. A jury hears each case and finds six of the men guilty of murder. Two of the convicted are true murderers. Four men are wrongly imprisoned. One murderer walks free.
This information can be expressed in what is known as a two by two table (table 1). Note that the “truth” (whether or not the men really committed a murder) is expressed along the horizontal title row, whereas the jury's verdict (which may or may not reflect the truth) is expressed down the vertical row.
These figures, if they are typical, reflect several features of this particular jury:
the jury correctly identifies two in every three true murderers;
it correctly acquits three out of every seven innocent people;
if this jury has found a person guilty, there is still only a one in three chance that they are actually a murderer;
if this jury found a person innocent, he or she has a three in four chance of actually being innocent; and
in five cases out of every 10 the jury gets it right.
These five features constitute, respectively, the sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of this jury's performance. The rest of this article considers these five features applied to diagnostic (or screening) tests when compared with a “true” diagnosis or gold standard. A sixth feature—the likelihood ratio—is introduced at the end of the article.
Validating tests against a gold standard
Our window cleaner told me that he had been feeling thirsty recently and had …