Intended for healthcare professionals

Analysis

New diagnostic tests: more harm than good

BMJ 2017; 358 doi: https://doi.org/10.1136/bmj.j3314 (Published 18 July 2017) Cite this as: BMJ 2017;358:j3314
  1. Bjørn Hofmann, professor1,
  2. H. Gilbert Welch, professor2
  1. 1Department of Health Sciences in Gjøvik, Norwegian University of Science and Technology, and Centre for Medical Ethics at the University of Oslo, PO Box 1130, Blindern, N-0318 Oslo, Norway
  2. 2Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
  1. Correspondence to B Hofmann b.m.hofmann{at}medisin.uio.no

Although new diagnostics may advance the time of diagnoses in selected patients, they will increase the frequency of false alarms, overdiagnosis, and overtreatment in others. Bjorn Hofmann and H. Gilbert Welch explain how to minimise harm

Key messages

  • Innovative technologies and ample venture capital are combining to produce new disease biomarkers and mobile monitoring devices

  • These new diagnostics are technologically advanced but do not automatically provide improvements in clinical care and population health

  • They have the potential to help some but also to increase the frequency of false alarms, overdiagnosis, and overtreatment in others

  • Excessive testing and false alarms may increase healthcare workload and shift clinicians’ focus towards the healthy

  • Misleading feedback at both the population and individual levels tends to favour further market growth

  • Clinicians must provide a strong counterbalance: educating patients, respecting baseline risk, thinking downstream, and expecting misleading feedback

Advances in technology and availability of ample venture capital are combining to produce a growing array of new medical diagnostics. New biomarkers are being identified to predict or detect a wide range of diseases, and new devices are being developed continuously to monitor biological parameters—often connecting with mobile devices to provide user friendly updates of health status (m-health). One vision is that these new diagnostics will transform medicine from treating disease to promoting health, from being reactive to being proactive, and from being general to being personal.1 Another vision is less sanguine—that new diagnostics will warrant a warning of their downsides.

Efforts to detect disease early can always be accompanied by unintended harms. These include false alarms and indeterminate findings that can worry patients, drive more testing, increase clinical workload, and distract clinicians from more important work. Overdiagnosis can lead to unnecessary treatments. Promotional campaigns will necessarily need to get people concerned about disease and indicate that the path to health is through testing—reinforcing health anxiety in some and distracting many from more important health behaviours. These “advancements” in diagnostics have real financial costs, some of which may directly fall on the patient.

In this article, we consider how clinicians could handle emerging diagnostics. We begin by investigating four case studies from the business database Factiva, based on their novelty, diversity, substantial investor interest, and broad clinical appeal (directed towards common diseases) (table 1; see supplementary methods). We go on to explore the market conditions, both the investment climate and the misleading feedback favouring further market growth. We conclude with specific actions for clinicians to minimise harm.

Table 1

Short description, proposed benefits, potential harms, and estimated costs for the four case studies

View this table:

Immunosignature for cancer and infections

What is it?

Immunosignature, although not currently approved by the US Food and Drug Administration (FDA), is an emerging technology to predict impending disease by analysing how an individual’s antibodies bind to proprietary arrays of random peptides (numbering from the tens of thousands to several million). The technology has been investigated in diabetes, Alzheimer’s disease, infectious diseases, and cancer. To generate immunosignatures, researchers examined the sera of patients with one of six cancers or one of six infectious diseases (10 patients with each disease).2 For each disease, researchers identified the top 50 most informative peptides—that is, those most able to discriminate between diseases.

What are the claims?

HealthTell, a company from Arizona State University that is developing immunosignature, says that immunosignature represents “a new concept in healthcare: continuous monitoring of healthy people to detect and treat disease early.” It says that, if widely used, immunosignature has the potential to reduce medical costs and “remodel our expectations of human health.”3

What are the potential benefits, and what is the evidence?

Immunosignature can potentially detect and predict a wide range of diseases and become a useful tool in clinical practice. Initial studies report high sensitivity for multiple diseases (95%).2 4 Although no high quality evidence of improved patient outcome is available, researchers report that, to date, “no diagnostic of which we are aware can simultaneously discriminate between six cancers and six infectious diseases using the same platform.”2 Reported sensitivities, however, might be overstated due to overfitting—that is, describing random error or noise instead of the underlying relationship. Although it might be possible to identify variables that discriminate among conditions in a selected dataset, those variables might be less predictive in the general population. Sensitivities might be overstated owing to spectrum bias, in which tests perform better at the extremes of the disease spectrum than in between, which is where they are typically used.5 Distinguishing overt cases of dengue fever from syphilis or myeloma from lung cancer is one thing; but distinguishing who will and will not develop disease among a group of people at similar risk is quite another.

What are the concerns about harms and costs?

Reported false positive rates (1-2%)2 4 might be understated, as they were obtained from healthy volunteers (university students in these studies) (supplementary table 1). Adding to the complexity, immunosignatures might vary greatly between healthy people and might fluctuate over time in individuals. Adequately detecting impending disease might come at the cost of multiple false alarms and overdiagnosis. The technology will also raise a number of clinical, ethical, and health policy questions. What should be done for someone with a concerning immunosignature who is otherwise well? What are the consequences for that person’s employment and insurance? What are the downstream costs? Knowledge about these pertinent questions is lacking.

Breath test for lung cancer

What is it?

Breath testing, also characterised as a “cancer sniffing sensor,” is a technology that measures volatile organic compounds.6 The most biologically plausible application for breath testing is screening for lung cancer. Efforts have included measuring the concentrations of between four and 33 distinct compounds, which are then combined in a risk score and dichotomised into normal and abnormal. Breath tests are not approved by the FDA.

What are the claims?

One developer, Owlstone Nanotech, claims that the device “could save 10 000 lives a year and save the NHS £245m”7

What are the potential benefits, and what is the evidence?

Breath testing could serve as an alternative to low dose computed tomography, providing simple and widespread screening without ionising radiation.8 It could also reduce screening costs, facilitate early detection of disease, and reduce mortality. As clearly expressed by Norman Edelman of the American Lung Association, “It’s really the future of medical testing in general. We are just scratching the surface on the utility of breath testing in medical diagnosis. . . . We could screen many, many more people for lung cancer.”9 Sensitivity is reported to be between 51% and 100%.10 This variation in reported sensitivity probably reflects the lack of standardisation in breath collection and risk score methodology, as well as irregular validation in independent population samples.10

What are the concerns about harms and costs?

Reported false positive rates vary widely, between 0% and 83%.10 Many people might get an incorrect and alarming notice about the risk of having a feared and deadly disease. False positive tests engender extensive follow-up testing, which comes with considerable biopsy related risks and overdiagnosis, not to mention extra costs. Were the breath test priced low enough, it could easily be marketed well beyond the current target population for lung cancer screening (to those outside of the 55 to 80 age range and, in particular, to non-smokers). If this occurs, the harms and costs would escalate, with little (or no) corresponding benefit.

Patch vital sign monitoring

What is it?

Several patches approved by the FDA or with CE marks are now commercially available for monitoring and transferring data wirelessly to other devices, including smartphones, tablets, and central monitors. Patches can continuously measure several parameters: electrocardiography, heart rate, respiratory rate, activity, and posture (using an accelerometer). Accompanying software summarises these data and provides customisable alarm thresholds and has functions to assess heart rate variability, activity, energy expenditure, and balance.11

What are the claims?

Zephyr Technology, a producer of Biopatch, says that “With Zephyr, you really can measure life . . . anywhere!”12

What are the potential benefits, and what is the evidence?

Patch systems increase the portability of existing monitoring systems and might provide more accurate diagnosis, leading to appropriate clinical management for a wide range of diseases.11 Potential benefits include the detection of falls, cardiac events, and developing bed sores. Reported electrocardiography parameters, heart rate, respiratory rate, and accelerometer measurements correlate closely with those obtained from conventional instrumentation. The performance data, however, frequently come from a small number of healthy men in controlled settings (exercise physiology laboratories) and probably overestimate the real world performance for those who are less healthy and in less ideal measurement environments.

What are the concerns about harms and costs?

System prices lie around $1500, and the cost of disposable patches varies. Little is known about performance in daily use or the accuracy and clinical importance of abnormalities detected by the systems’ functions and algorithms.11 Statistical noise and artefacts (for example, caused by inadequate sensor contact) might result in frequent false alarms and unnecessary worries, clinic visits, and costs. A randomised trial of real time monitoring of a single physiological variable—pulmonary impedance in patients with heart failure—found that monitoring was associated with three times as many clinic visits and significantly more admissions to hospital.13 Monitoring multiple variables might be expected to compound this problem, and widespread implementation would seem to increase false alarms further, resulting in more visits, more testing, more referral, and more fear.

Biomarkers for Alzheimer’s disease

What is it?

Multiple blood based biomarkers have been shown to differentiate Alzheimer’s disease from healthy controls14 and to predict onset and progression of Alzheimer’s disease many years before symptoms start.15

What are the claims?

One research group says, “This new blood test can accurately reflect development of Alzheimer’s disease up to 10 years prior to clinical onset.”16

What are the potential benefits, and what is the evidence?

Sensitivities over 95% for detection of Alzheimer’s disease have been reported,17 and over 90% for predicting Alzherimer’s disease over a time frame of 2-3 years.18 But published results have been hard to replicate.14 Variability in the biomarkers included in different assays and how biomarkers in the same person change over time are challenges yet to be accounted for.15 Many tests are developed and verified on the same population, thus lacking external validation. Test performance is frequently measured in two distinct populations: sensitivity among patients with overt Alzheimer’s disease and false positives among normal controls, which again introduces spectrum bias.5 Adding to the complexity, different assays are tested against different gold standards for what constitutes Alzheimer’s disease.19

What are the concerns about harms and costs?

Reported false positive rates are high (10-30%),5 implying that many people might be given a false diagnosis. Even those given the correct diagnosis or prediction will face the challenge of what to do with a positive result, as the disease is not currently actionable. Although early detection might help people plan and prepare, it can also result in emotional despair, stigma, and discrimination. With clear definitions lacking, clinicians might be tempted to use biomarkers as a quantitative and objective gold standard for the diagnosis of Alzheimer’s disease. This might not only increase the prevalence of Alzheimer’s disease, but also have serious implications for the rights of people to drive, make a will, and handle financial affairs. If biomarkers genuinely produce long lead times, such as 10 years before clinical onset, they will simultaneously create ample potential for overdiagnosis, as many will die from other diseases before they develop overt Alzheimer’s disease.

Market conditions

Investment climate

The producers of these four diagnostic tests are enthusiastic about the size of their potential markets. HealthTell sees a substantial market for immunosignature: “With over 170 million people in the US being affected by neurological, autoimmune, oncologic, metabolic, and infectious diseases, the importance of early detection and monitoring is paramount.”20 Investors apparently agree as the company has raised $40m to develop and commercialise immunosignature.21 There are multiple potential producers of breath tests, but Grand View Research expects the global breath test market to reach $11.3bn by 2024.22

Qualcomm Life also sees a big market for patch based monitoring systems: “There are 300 million people in Europe and North America, and 860 million worldwide, with at least one chronic disease. It is estimated that 25% of patients would immediately benefit from wireless home monitoring solutions.”23 And producers of biomarkers for Alzheimer’s disease envision sales to primary care and neurology practices “for individuals 65 years and older, representing a population of about 45 million people in the US, for a potential yearly market cap of roughly $3bn.”24

The enthusiasm goes well beyond these four diagnostic tests, and there are bullish expectations for the diagnostic industry in general (fig 1). Revenues for the global biomarker market are expected to more than double from 2012 to 2018 (from $22.4bn to $53.6bn) and are expected to reach $100bn by 2020.27 Global m-health market revenues are expected to increase more than 10-fold from 2012 to 2018, albeit from a lower baseline (from $1.5bn to $21.5bn). Clearly, some see a vast market for new diagnostic tests, as we are all potential customers for early diagnosis.

Figure1

Fig 1 Global mobile health and biomarker market revenues in 2012 and estimated revenues for 201825 26

Misleading feedback favouring market growth

Successful marketing of tests for early diagnosis can rapidly produce misleading positive feedback. Testing tends to promote the demand for more testing, regardless of the genuine utility of the test itself (fig 2).28 At the population level, testing tends to increase the apparent prevalence of disease and abnormalities, fostering more concern and more testing. At the same time, it tends to identify patients with milder disease and abnormalities. These patients invariably do better than those diagnosed as having the disease in the past, apparently reinforcing the value of testing.

Figure2

Fig 2 Misleading feedback favouring market growth both at the population and individual level

Feedback is equally positive at the individual level, regardless of the test result. Because most tests are negative, most people will have the positive experience of being reassured by testing. Those whose results are shown to be falsely positive by subsequent testing might experience a sense of relief.29 The finding of abnormalities with real consequences provides the strongest positive feedback, as these patients are presumed to have benefited from the test and any subsequent interventions. Ironically, those who have experienced the most substantial harm of testing, overdiagnosis, view themselves to be in the benefiting group and are enthusiastic about testing.

Apparently favourable feedback strengthens the enthusiasm of investors, patients, and health policy makers. But the feedback also reinforces the harms of testing: increasing health anxiety, false alarms, and overdiagnosis. The resulting increased workload distracts clinicians from more important work, and the focus on testing distracts patients from more important health behaviours. Misleading feedback needs a strong counterbalance.

Counterbalancing actions

In countries where healthcare is market driven, payers might want to encourage more prudent testing by having patients share the associated costs—known as having “skin in the game.” For healthcare systems using cost sharing strategies, we suggest that they bundle the cost of expected downstream testing into one price. If a $100 test, for example, leads to a $2000 test 10% of the time, then the bundled test price would be $300 (100+(2000/10)). Bundled pricing has the dual benefits of motivating careful consideration before testing and covering patients’ downstream costs.

In countries where healthcare is better regulated, diagnostic tests should be rigorously assessed before they are approved, and manufacturers should explicitly state how the new tests add clinical value. Approval of new diagnostics would ideally be contingent on randomised trials showing improvement of a patient centred outcome. Practically, given the large sample size and long follow-up required, such trials will rarely occur.

In which case new diagnostics should be rigorously assessed by researchers who represent the public’s interest, not that of industry. Researchers should try to answer three questions. Firstly, does the test reliably predict a health event that matters to patients? Secondly, can that risk be lowered by an effective action? Many tests will fail here,30 but for those that don’t, the final question is, what happens to those who do not benefit? Answering this question requires routine surveillance of excessive false positive rates and excessive diagnostic yields, which are a warning sign of overdiagnosis. See supplementary table S2 for stakeholder responses.

Ultimately, however, we think that clinicians will be the most important counterbalance to these favourable market conditions (box 1).

Box 1: Actions for clinicians to assure proper testing

  • Educate patients—Inform patients not only about the potential benefits but also of the dilemmas and harms that testing may entail. Prepare them for unexpected findings and inconclusive results

  • Respect baseline risk—Avoid testing people at low risk of the disease, particularly when false positives are common or require invasive follow-up testing and when the risk of overdiagnosis is high

  • Think downstream—Consider all downstream implications before testing, in particular whether the test is actionable and whether it leads to distress or stigma or has implications for patients’ insurance. Avoid unnecessarily increasing the healthcare workload

  • Expect misleading feedback—Expect incidence and prevalence to rise when trying to detect disease early or applying more sensitive tests. Expect outcomes to improve if you treat milder cases.

Shared decision making

Clinicians need to communicate both the potential benefits and harms so that patients can make informed decisions about tests.31 Self-testing of healthy people should be discouraged. When testing is warranted, clinicians should prepare patients for unexpected findings, such as a concerning immunosignature, and the possibility that ignoring them might be the best course of action.

Respect baseline risk

Diagnostics have been traditionally directed towards people with symptoms, which indicate elevated risk. In this case the harms of testing are typically small relative to the benefits. But testing and monitoring to people who have no symptoms has less potential for benefit with a similar potential for harm. New diagnostics should focus on those at the highest risk—for example, giving breath tests to heavy smokers—and avoid testing in those at low risk.

Think downstream

Before testing, consider the downstream implications. What will you do differently? If the answer is “nothing,” avoid testing. Consider not only whether a positive result is genuinely actionable30—such as the patch alarm or a positive biomarker for Alzheimer’s disease—but also whether the result may lead to stigma and distress, unnecessary subsequent testing, overdiagnosis, and overtreatment.

Expect misleading feedback

Prepare yourself, colleagues, and patients for apparently concerning reports after testing. Expect reports of rising disease prevalence after additional testing and recognise that epidemics might be deceptive. Be prepared for optimistic reports too. Expect outcomes for the typical patient to improve and provide the alternative explanation for powerful stories of patients who attribute their life to the test—they were overdiagnosed and needlessly treated.

Summary

Innovative technologies and ample venture capital are combining to produce new disease biomarkers and mobile monitoring devices. These new diagnostics represent tremendous technological advances, but do not automatically provide improvements in clinical care and population health. Diagnostic efforts can start a cascade of events that turn well people into ill patients. We must develop new diagnostic tests to tackle real health problems, not to generate them.

Footnotes

  • Contributors and sources: HGW has studied and reported extensively on overdiagnosis. BH has scrutinised the role of technology in healthcare. Both authors have contributed to the design of the study, data collection, data analysis, revision of the manuscript and both have approved the final manuscript. BH is the guarantor for this study.

  • Competing interests: We have read and understood BMJ policy on declaration of interests and declare the following: BH has received funding from the Commonwealth Fund through the Harkness Fellowship at the Dartmouth Institute of Health Policy and Clinical Practice for part of this work. The views presented here are those of the author and not necessarily those of the Commonwealth Fund, their directors, officers, or staff or of the Dartmouth Institute. Neither BH or HGW have any relationships with any companies that might have an interest in the submitted work in the previous three years; their spouses, partners, or children have any financial relationships that may be relevant to the submitted work; and BH and HGW have no non-financial interests that may be relevant to the submitted work.

  • Provenance and peer review: Not commissioned; externally peer reviewed.

References

View Abstract