Jump to: Page Content, Site Navigation, Site Search,
You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.
BMJ 2005;330:1080-1083 (7 May), doi:10.1136/bmj.330.7499.1080
Christopher J Gill, assistant professor1, Lora Sabin, assistant professor1, Christopher H Schmid, associate professor2
1 Center for International Health and Development, Department of International Health, Boston University School of Public Health, Boston, MA 02118, USA, 2 Biostatistics Research Center, Division of Clinical Care Research, Department of Medicine, Tufts UniversityNew England Medical Center, Boston, MA 02111, USA
Correspondence to: C J Gill cgill{at}bu.edu
Thought you didn't understand bayesian statistics? Read on and find out why doctors are expert in applying the theory, whether they realise it or not
In this case, the child's rapid respiratory rate and absence of fever generated a diagnosis of pneumonia with advice to immediately start antibiotics. Our presence was fortuitous. We were able to give the child antimalarial drugs and transport him to the nearest hospital, where blood smear examination confirmed that his blood was teeming with malaria parasites. How did clinical judgments prove superior to the algorithm, a diagnostic tool carefully developed over two decades of research? Was it just a lucky guess?
|
Bayesians interpret the test result not as a categorical probability of a false positive but as the degree to which a positive or negative result adjusts the probability of a given disease. In this way, the test acts as an opinion modifier, updating a prior probability of disease to generate a posterior probability. In a sense, the bayesian approach asks, "What is the probability that this patient has the disease, given this test result?" This question proves to be an accurate encapsulation of Bayes's theorem.1
The likelihood ratio summarises the operating characteristics of a diagnostic test as the ratio of patients with the disease to those without disease among those with either a positive or negative test result, and is derived directly from the test's sensitivity and specificity according to the following two formulas:
For a positive test result: likelihood ratio = sensitivity/(1 - specificity)
For a negative test result: likelihood ratio = (1 - sensitivity)/specificity
The following example shows how Bayes's theory of conditional probability is relevant to clinical decision making. The figure shows an electrocardiogram with an abnormal pattern of ST segment and T wave changes. Because the test provides an answer, this process must start with a question, such as, "Is this patient having a heart attack?" The bayesian approach does not yield a categorical yes or no answer but a conditional probability reflecting the context in which the test is applied. This context emerges from what is generally known about heart attacks and electrocardiograms and the characteristics of the patientfor example, "Who is this patient?" "Does this patient have symptoms?" and "What was this patient doing at the time the test was done?" To illustrate this, assume this electrocardiogram was obtained from either of the following two hypothetical patients:
|
Logically, our opinion of heart attack before seeing the electrocardiogram should have differed greatly between these two patients. Since patient 1 sounds like exactly the kind of person prone to heart attacks, we might estimate his pre-test odds to be high, perhaps 5:1 (prior probability = 83%). If we assume that this electrocardiogram has a 90% sensitivity and 90% specificity for heart attacks,3 the positive likelihood ratio would be 9 (0.9/(1 - 0.9)) and the negative likelihood ratio 0.11 ((1 - 0.9)/0.9). With this electrocardiogram patient 1's odds of heart attack increase ninefold from 5:1 to 45:1 (posterior probability = 98%). Note, our suspicion of heart attack was so high that even normal electrocardiographic appearances would be insufficient to erase all concern: multiplying the negative likelihood ratio (0.11) by the pre-test odds of 5:1 gives a posterior probability of 0.55:1 (38%).
By contrast, our suspicion of heart attack for patient 2 was very low based on her context, perhaps 1:1000 (prior probability = 0.1%). This electrocardiogram also increased patient 2's odds of heart attack ninefold to reach 9:1000 (posterior probability = 0.89%), leaving the diagnosis still very unlikely.
The electrocardiogram modified the prior odds by the same degree in both cases. This does not suggest that both patients would be equally likely to have this electrocardiographic resultin reality they would be unlikely to do so. The purpose of this example was to emphasise the conundrum that often arises in clinical medicine when faced with a truly unexpected test result. Diagnostic tests are mainly used in clinical medicine to answer the bayesian question, "What is the probability that the patient has the disease given an abnormal test?" not, "What is the probability of an abnormal result given that the patient has a disease?" Thus, the response to an unexpected result should be to carefully consider how it modifies the prior probability of that disease, not to second guess your original estimate of that probability.
To extend the previous example, a recent study found the sensitivity and specificity of a stress thallium scan for cardiac ischaemia were 83% and 94% respectively.4 These figures give a positive likelihood ratio of 13.8 and negative likelihood ratio of 0.18. Thus, if patient 2's post-test odds for myocardial infarction after electrocardiography are 9:1000 and she subsequently had a positive stress thallium test result, her new post-odds would increase to 124:1000 (11%). The odds are still against patient 2 having a heart attack, but given the serious implications of a heart attack, she may now merit further and more accurate invasive testing. Conversely, a negative stress thallium result would decrease her odds from 9:1000 to 1.6:1000 (0.15%), leaving little justification for pursuing the heart attack question further.
This example makes clear that the only truly useless test result is one with a likelihood ratio of 1.0 (sensitivity and specificity both 50%) since multiplying pre-test odds by 1.0 changes nothing. By contrast, a test with 70% sensitivity and 70% specificity is imprecise but not useless since its result still modifies the odds slightly (positive likelihood ratio = 0.7/(1 - 0.7) = 2.3). Arbitrarily, 2.0 and 0.5 have been suggested as the minimally useful values for positive and negative likelihood ratios.5
The ability to combine test results in series achieves greater importance once we accept that each question and physical examination during a clinical encounter constitutes a diagnostic test with an attached likelihood ratio. What mainly distinguishes these from formal diagnostic tests (scans, blood tests, biopsies, etc) is that we rarely know what the sensitivities, specificities, or likelihood ratios are for these tests. At best, clinicians carry a general impression about their usefulness and, if quantified, it would not be surprising if most proved to have a positive likelihood ratio of around 2.0 or a negative likelihood ratio of around 0.5that is, minimally useful.
It is straightforward to measure the operating characteristics of a question or examination, just as with any other diagnostic test. JAMA's rational clinical examination series has measured the accuracy of physical examination for diagnosing breast cancer,6 digital clubbing,7 abdominal aneurysms,8 streptococcal pharyngitis,9 and others. Not surprisingly, the accuracy of these tests often proved unimpressive. For example, one study determined the accuracy of eliciting "shifting dullness" for identifying ascites.10 This test operates under the assumption that gas filled intestines should float when surrounded by fluid. Accordingly, when percussing a patient's abdomen in the presence of ascites, areas of dullness and tympany should shift depending on whether the patient is lying supine or on his or her side. Using ultrasonography as the reference standard for ascites, the researchers found that shifting dullness had a sensitivity of 77% and specificity of 72%, leading to an uninspiring positive likelihood ratio of 2.75.
It would be tempting to dismiss shifting dullness as having little use, since a low likelihood ratio is virtually synonymous with a high false positive rate at a population level. However, this reasoning is flawed when applying the test to an individual patient. As with the electrocardiography example, bayesian reasoning demands that shifting dullness be interpreted in some prior context. Here, the context reflects the patient's presenting complaint "My belly has been swelling up lately," and the doctor's knowledge of things that cause bellies to swell. Considering the possibility of ascites, the doctor then refines this context further by conducting the following series of bedside diagnostic tests:
By now it should be obvious that the decision to test for shifting dullness last or to conclude the bedside evaluation at this stage was arbitrary, and the decision to test for shifting dullness at all emerged as a logical consequence of the doctor's cumulative degree of suspicion to that point. Note, in this hypothetical example, the physician violates the statistical requirement that the tests operate independently, since scleral icterus and jaundice are usually manifestations of the same thing (raised bilirubin concentration). However, this reflects the reality that there is some redundancy in our clinical evaluations.
Dismissing shifting dullness for its low likelihood ratio risks setting clinicians on a slippery slope towards clinical impotence. If we pursued this reasoning to its logical conclusion, many (perhaps most) other questions or examinations might also prove minimally useful. But this conclusion follows only by considering each test in isolation. Instead, suppose we applied the arbitrary minimally useful positive likelihood ratio of 2 to each of the above 16 tests. If all returned positive, the aggregate likelihood ratio could reach 65 356 (2 to the power 16). For comparison, a current generation rapid HIV antibody test carries a sensitivity and specificity of 99.6% and 99.4% respectively.11 This would be considered an excellent test, but it has a positive likelihood ratio of only 166 (0.996/(1 - 0.994)).
In reality, clinicians don't calculate a running tally of likelihood ratios as they evaluate patients. Rather, they interpret each positive result as "somewhat more suggestive" of the disease and each negative test as "somewhat less suggestive" and conceptualise the pretest and post-test odds in qualitative rather than quantitative terms. Nevertheless, somewhat suggestive to the power x may reach critical mass. This process was what allowed our diagnosis of malaria in the Ethiopian child to be far more than just a lucky guess.
This is not to say that clinical impressions are always better than formal diagnostic tests, particularly as clinical evaluations are rarely as definitive as in this purposefully contrived example. Moreover, single findings can occasionally be very powerful and definitive, just as the results of certain formal diagnostic tests (such as a spinal fluid Gram stain showing bacteria). Nevertheless, the history and physical examination are immensely powerful tools, potentially more powerful than many other formal tests in which clinicians place great faith.
Doctor: How long have you had a fever?
Patient: Three days.
Doctor thinks, "Sounds like an acute infection, probably just a cold."
Doctor: Where have you been recently?
Patient: Libreville, in Gabon.
Doctor thinks, "Well now, this might be a tropical infection, perhaps malaria, typhoid, tuberculosis, some kind of parasite...or possibly one of those esoteric viruses we learned about in medical school.
Doctor: What did you do there?
Patient: I was part of a compassionate relief team helping rural villagers, many of whom were dying with bleeding gums, high fever, cough, and skin rash.
Doctor thinks, "Hmm, esoteric virus quite plausible."
Doctor: Do you have these symptoms too?
Patient: Yes, my gums bleed when I brush, I have a painful skin rash, and I'm coughing blood (cough, cough)."
Doctor thinks, "Nasty esoteric virus very likely. Need to get this patient isolated and call Centers for Disease Control and Prevention and Department of Homeland Security. Have I just been exposed to Ebola virus?"
After this interview, the now masked and gowned doctor examines the patient and finds a raised temperature, haemorrhages on the patient's conjunctivae, soft palate and finger nail beds, faecal occult blood, a tender swollen liver, and mild jaundice. With each new finding, the probability of nasty esoteric virus increases further despite the fact that none of these tests is remotely specific for infection with haemorrhagic fever virus. However, their poor performance individually does not diminish their importance when combined in a logical sequence. Quite the opposite, since within the span of a few minutes, our doctor has correctly shifted the differential diagnosis from influenza, sinus infection, or possibly pneumonia, to Lassa fever, filo virus infection, or yellow fever without a single blood test, x ray examination, or biopsy and without having more than an educated guess about their associated likelihood ratios. Just as importantly, the doctor's qualitative impression of the odds of nasty esoteric viral infection evolved from "possible" to "very likely," which now dictates what formal diagnostic tests should logically follow to establish the specific diagnosis and how the patient should be managed initially.
|
Contributors and sources: CJG is a clinical infectious disease specialist who works with an applied research unit conducting clinical trials in developing nations. The impetus for this article emerged after years of clinical practice and after taking a course in applied bayesian statistics, taught by CHS. CJG conceived and primarily wrote the paper. LS helped in writing and rewriting the paper and contributed additional ideas to the manuscript. CHS was CJG's professor of bayesian statistics during his fellowship and helped ensure that the views and arguments presented were in agreement with bayesian theory. CJG is the guarantor.
Funding: CJG was supported by grant NIH/NIAID K23 AI62208 01.
Competing interests: None declared.
![]()
CiteULike
Complore
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
Read all Rapid Responses
UK medical students have published unreleased government plans to restrict failed asylum seekers' access to medical care