Evaluation of symptom checkers for self diagnosis and triage: audit studyBMJ 2015; 351 doi: https://doi.org/10.1136/bmj.h3480 (Published 08 July 2015) Cite this as: BMJ 2015;351:h3480
All rapid responses
Response to: Semigran et al, 2015. Evaluation of symptom checkers for self diagnosis and triage: audit study
The study by Semigran et al1 used 45 standardised patient vignettes to test the accuracy of 23 online symptom checkers in diagnosis and assessment of triage urgency. They found that the correct diagnosis was found first in a third of cases and the correct triage level in less than 60% of cases on average, with accuracy increasing with the level of urgency. Online resources are undoubtedly a growing part of the normal ‘menu’ of health services for patients and governments and healthcare providers are developing online resources such as eHealth, telemedicine and electronic health records 23. Further research is needed to ensure that online symptom checkers form a beneficial, and not detrimental, addition to health and health services. The results of this study are compelling but are subject to a number of limitations.
The use of clinical vignettes to test performance is a common method used to assess doctors. For the purposes of the symptom checkers the vignettes were reduced to a list of clinical symptoms. Vignettes were chosen to cover three levels of urgency but beyond this they were simplistic with no comorbidities. These two points may reduce their application to the general population as patients may have complicated health and are unlikely to use clinical language. We could not discern whether statistical tests were used to determine the number of vignettes to include in the study design. The paper describes the process of vignette selection but further detail on the quality assessment – e.g. by what standards and by whom – would be welcome. It would also have been useful to see an example vignette, both pre- and post- ‘coding’ into clinical symptoms.
The methodology would have been greatly improved by the inclusion of an appropriate control group to compare practical effectiveness. A correct first-diagnosis rate of 34% might appear poor, however a study of dermatology referrals by Basarad, Munn and Jones (1996) reported that GPs only had a 47% accuracy in first diagnosis4. The authors also briefly mention the use of generic search engines, but a more detailed analysis would be welcome, as they may be used as a first alternative to online symptom checkers.
We identified several possible future roles and developments for online symptom checkers. First, they might operate as a traceable piece of evidence to support health care treatment. They could provide a print out of the detail entered, which could be useful for the doctor as an aide memoir. Second, they could signpost other parts of the healthcare system (pharmacists for example) which could support reduced burden on primary and secondary care services. Third, we concur with the authors’ suggestion to introduce local epidemiological data. This could, for example, allow a more sophisticated assessment of symptoms – such as increasing suspicion of certain diseases or pathogens if there is a local increase or outbreak, for example. Furthermore, there may be benefits in analysing symptom checker usage to monitor or model disease epidemiology proactively, in ways similar to Google Flu Trends.
Online symptom checkers are clearly here to stay and an increasingly tech-savvy population means that usage is likely to grow. From a UK perspective, we would strongly argue for a national body to take a role in quality checking available tools, applying a quality mark to highlight trusted tools. The NHS symptom checker would be the preferred tool, due to assurances about the underpinning evidence, and there may also be a role in promoting the tool alongside other actions such as search engine optimisation or even sponsorship of links to increase competitive visibility in search engine queries.
1 Semigran, H; Linder, A; Gidengil, C; Mehrota, A. (2015). Evaluation of symptom checkers for self diagnosis and triage: audit study. BMJ 351:h3480.
2 Hofstede, J; de Bie, J; van Wijngaarden, B; Hejmans, M. (2014). Knowledge, use and attitude toward eHealth among patients with chronic lung disease. International Journal of Medical Informatics. 83(12):967-74.
3 Andrews, L; Gajanyake, R; Sahama, T, 2014. The Australian general public’s perceptions of having a personally controlled electronic health record (PCEHR). International Journal of Medical Informatics 83(12): 889-900.
4 Basarad, T; Munn, SE; Jones, RR. (1996). Diagnostic accuracy and appropriateness of general practitioner referrals to a dermatology out-patient clinic. British Journal of Dermatology. 135(1): 70-3.
Competing interests: No competing interests
I should begin by acknowledging the authors’ important contribution to elucidating the gap between what symptom checkers may hope to provide and the existing state of the art. Semigren et al adopt a pragmatic approach both by identifying which symptom checkers patients may reasonably find and assessing them in the most intuitive way imaginable: making them take the standardized patient tests we all take in medical school.
Using this approach, the authors mitigate many of the most vexing challenges for assessing “diagnostic accuracy.” The gold standard for diagnosis is frequently absent or is sometimes no better than “expert judgment.” So-called “overlap syndromes” may fit two or more diagnoses. There are many diagnoses of exclusion for which there is no test other than that every other possible diagnosis has been ruled out. Some symptoms like “back pain” remain without a satisfying diagnosis, simply getting a label “chronic back pain.” This list is not exhaustive. So, the cases have been designed with specific diagnoses in mind.
The meaning of “diagnostic accuracy,” however, is not straightforward. Commonly, as in this study, it is used to refer to a test of “sensitivity” or “how often does the correct diagnosis appear in this list.” This is half the picture, which is why any test undergoing clinical validation also is assessed for specificity. “Specificity” captures the concept “how often is the diagnosis that I see the true diagnosis.” These sound similar but are distinct statistical concepts. This is best illustrated with an example. If you have cancer, you really want a doctor to find it. You want good sensitivity. If you don’t have cancer, you really don’t want a doctor (mistakenly) telling you you have cancer (this happens!). You want good specificity. Unfortunately, improving specificity usually means reducing sensitivity and vice versa.
I hope that future work in this area looks directly at metrics of specificity as these are important for patients too. Poor specificity contributes to the “cyberchondria” the authors refer to, which is a clever portmanteau that just means people worry when they are sick and they often go to the Internet to worry. Further, while “diagnostic accuracy” in both its forms is no doubt a critical piece of the evaluation of symptom checkers, there is still much research to be done about how symptom checkers can better serve patients. Is making a sick patient click through 30 different questions to get a short description of a few diagnoses they may have really what they want when they are using a symptom checker or can we do better somehow?
The answer to this question depends in large part on what the goal of a symptom checker is. If the goal of a symptom checker is to give a diagnosis better than a doctor, it is doomed from the start. The vignettes in this study were realistic cases and well-constructed. Yet, many of them (8, by my count) could not be diagnosed even by a doctor from the symptoms alone and another set (7 additional cases) had diagnoses that were really an “idiopathic” symptom, or those without a known cause--“idiopathic” deriving from the Latin for when “doctors are idiots and it’s pathetic.” It is easier to make a diagnosis when you have the patient in front of you and additional tests at your disposal and it is much easier when at the end of an evaluation you can say, “honestly, I don’t know what is going on yet, but I’m here with you and we’re going to figure it out together. Here’s what I suggest…”
Yet, I am emboldened by the growing popularity of symptom checkers built not as a replacement for doctors or Google or phone triage, but as a set of tools built for patients with the goal of patient empowerment at the fore. We need studies like this to take a magnifying glass and a clinical eye to the technology of symptom checkers while maintaining the humility to give patients the final word on what matters. The first generation of symptom checkers have arrived at a remarkable time in medicine’s history. This is an opportunity, perhaps the opportunity, to focus on what patients want from the very beginning.
Competing interests: Co-founder of Symcat
First, I commend Professor Mehrotra and his associates for completing such an arduous task of evaluating and comparing multiple symptom checkers to assess their accuracy and publishing their results.
These results are very useful in helping us assess where our deficiencies are as a whole in the symptom checker space and will assist us in improving our accuracy and recommendations. Whether we like it or not, users/patients will continue to use symptom checkers due to their convenience and little to no cost. It is our responsibility as healthcare professionals to provide accurate and useful information to users/patients to better educate themselves in order to make appropriate decisions in regards to their health.
I am very encouraged by our start-up DocResponse's ability to get the diagnosis correct at the number 1 spot in 50% of the cases, thereby providing the most accurate solution on the market. I do believe healthcare will continue to go down this path and symptom checkers will become more accurate in the future. We are on the cusp of revolutionizing healthcare.
Competing interests: Founder and CEO DocResponse
Many people, including medical doctors and other professionals not used to do research, make a mistake of considering the Internet a source of “pseudo-authentic” information. The reality is that now thanks to the Google service whole medical libraries are accessable via the Internet, meaning one doesn’t have to physically go to medical libraries and instead can access the vast majority of medical papers, published in medical journals, free of charge in the comfort of one's home. In other words, it is not some mythical ‘internet information’ but truly authentic information.
How such information is used is a totally separate issue. One possibility is to find and print out relevant medical papers on the Internet and then go and see a medical practitioner with them. If not anything else, it would be an interesting exercise.
Competing interests: No competing interests
I applaud the design of this study and its conclusions, which homed in on what's useful and avoided various conclusions that would have missed the mark.
(I am a member of the BMJ's patient advisory panel, and serve as volunteer co-chair of the Society for Participatory Medicine.)
I blogged about this study today on our Society's site: "A Turing test for diagnosis: BMJ evaluates online symptom checkers; good Globe article" http://e-patients.net/archives/2015/07/a-turing-test-for-diagnosis-bmj-e...
I'm interested in the response from Mr Maude of Isabel (whose good results I respect), in saying "These tools are designed primarily to help the patient become better informed and be able to ask their doctor the right questions. They are not intended to encourage the patient to diagnose themselves and avoid a discussion with a clinician. This is about the patient and doctor working as partners to get to the right diagnosis and receive appropriate care and treatment as soon as possible." The Isabel site hints at that too: "The Isabel Symptom Checker puts the world’s medical knowledge at your fingertips and enables you to make sense of your symptoms. It will change the way you speak to your doctor forever." It may be useful to follow up with a description of the intention of each symptom checker tested.
Competing interests: No competing interests
Anyone is entitled to seek advice about medical symptoms from anyone. Often a neighbor, the butcher, the hairdresser all have input, but the seeker weighs their advice appropriately. The problem is the pseudo-authenticity of the internet and people place much more trust in its content than is deserved.
The other problem is perspective. The internet gives none. The symptom "can't breathe" can connote heart failure, asthma or a stuffy nose. How to place these in the proper context often requires a health care professional.
Dr. Google is nowhere to be found when responsibility needs to be assigned.
Competing interests: No competing interests
It would be useful to know the rate of accuracy of physicians on the cases that were used in this study, rather than rely on published statistics for physicians being accurate 85-90% of the time.
In general, for studies of decision support tools, it is important to have controls that allow comparison to the best available alternative. In a study that we carried out of the efficacy of the diagnostic decision support tool SimulConsult (restricted to medical professionals, and thus excluded from this study) we compared how physicians did without the tool and with the tool (Segal MM et al. (2014) J Child Neurol. 29:487-92; http://www.ncbi.nlm.nih.gov/pubmed/23576414).
The best model for diagnostic decision support is likely to be one of enhancing the efficacy of medical professionals, not replacing them.
Competing interests: Dr. Segal is the founder of SimulConsult
The authors are to be congratulated for carrying out the first large study of symptom checkers designed for use by patients.
However, this study, like most discussions on this topic is framed in the wrong way. These tools are designed primarily to help the patient become better informed and be able to ask their doctor the right questions. They are not intended to encourage the patient to diagnose themselves and avoid a discussion with a clinician. This is about the patient and doctor working as partners to get to the right diagnosis and receive appropriate care and treatment as soon as possible.
The study also has some serious limitations.
The test cases were very medical and very different from the way real patients enter their symptoms. Many, for example, included negatives which patients would very rarely enter. From our experience at Isabel, over 90% of the cases entered by patients include 4-6 symptoms expressed in normal everyday language.
The study did not compare the ease and speed of use. Many symptom checkers require the patient to go through over 20 screens answering questions before they are finally shown the results.
The study also did not look at knowledge provided to users when they were presented with the results to help the patients learn more. This is an important aspect as many of the diagnoses will not mean much to most patients and they need to be shown trustworthy reference resources to encourage them to understand their symptoms better.
Competing interests: Founder of Isabel