On the Evaluation of symptom checkers for self diagnosis and triage: audit study
I should begin by acknowledging the authors’ important contribution to elucidating the gap between what symptom checkers may hope to provide and the existing state of the art. Semigren et al adopt a pragmatic approach both by identifying which symptom checkers patients may reasonably find and assessing them in the most intuitive way imaginable: making them take the standardized patient tests we all take in medical school.
Using this approach, the authors mitigate many of the most vexing challenges for assessing “diagnostic accuracy.” The gold standard for diagnosis is frequently absent or is sometimes no better than “expert judgment.” So-called “overlap syndromes” may fit two or more diagnoses. There are many diagnoses of exclusion for which there is no test other than that every other possible diagnosis has been ruled out. Some symptoms like “back pain” remain without a satisfying diagnosis, simply getting a label “chronic back pain.” This list is not exhaustive. So, the cases have been designed with specific diagnoses in mind.
The meaning of “diagnostic accuracy,” however, is not straightforward. Commonly, as in this study, it is used to refer to a test of “sensitivity” or “how often does the correct diagnosis appear in this list.” This is half the picture, which is why any test undergoing clinical validation also is assessed for specificity. “Specificity” captures the concept “how often is the diagnosis that I see the true diagnosis.” These sound similar but are distinct statistical concepts. This is best illustrated with an example. If you have cancer, you really want a doctor to find it. You want good sensitivity. If you don’t have cancer, you really don’t want a doctor (mistakenly) telling you you have cancer (this happens!). You want good specificity. Unfortunately, improving specificity usually means reducing sensitivity and vice versa.
I hope that future work in this area looks directly at metrics of specificity as these are important for patients too. Poor specificity contributes to the “cyberchondria” the authors refer to, which is a clever portmanteau that just means people worry when they are sick and they often go to the Internet to worry. Further, while “diagnostic accuracy” in both its forms is no doubt a critical piece of the evaluation of symptom checkers, there is still much research to be done about how symptom checkers can better serve patients. Is making a sick patient click through 30 different questions to get a short description of a few diagnoses they may have really what they want when they are using a symptom checker or can we do better somehow?
The answer to this question depends in large part on what the goal of a symptom checker is. If the goal of a symptom checker is to give a diagnosis better than a doctor, it is doomed from the start. The vignettes in this study were realistic cases and well-constructed. Yet, many of them (8, by my count) could not be diagnosed even by a doctor from the symptoms alone and another set (7 additional cases) had diagnoses that were really an “idiopathic” symptom, or those without a known cause--“idiopathic” deriving from the Latin for when “doctors are idiots and it’s pathetic.” It is easier to make a diagnosis when you have the patient in front of you and additional tests at your disposal and it is much easier when at the end of an evaluation you can say, “honestly, I don’t know what is going on yet, but I’m here with you and we’re going to figure it out together. Here’s what I suggest…”
Yet, I am emboldened by the growing popularity of symptom checkers built not as a replacement for doctors or Google or phone triage, but as a set of tools built for patients with the goal of patient empowerment at the fore. We need studies like this to take a magnifying glass and a clinical eye to the technology of symptom checkers while maintaining the humility to give patients the final word on what matters. The first generation of symptom checkers have arrived at a remarkable time in medicine’s history. This is an opportunity, perhaps the opportunity, to focus on what patients want from the very beginning.
Competing interests: Co-founder of Symcat