Evaluation of diagnostic procedures
BMJ 2002; 324 doi: https://doi.org/10.1136/bmj.324.7335.477 (Published 23 February 2002) Cite this as: BMJ 2002;324:477
All rapid responses
Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
Editor–I applaud the series of papers on the evidence base of
clinical diagnosis that are being published in the BMJ. In the first paper
of the series,1 Dr Knottnerus and colleagues underline that the
"development of diagnostic techniques has greatly accelerated but the
methodology of diagnostic research lags far behind that for evaluating
treatments", implying that many diagnostic procedures are used despite
little evidence of effectiveness. The use of such procedures is prone to
be heavily affected by non-clinical factors, raising regulatory and
political questions. We have found that this is the case for diagnostic
imaging used in the diagnostic pathway of cognitive disturbances of the
elderly.
Present guidelines require that at least one structural imaging exam
(computed tomography – CT, or magnetic resonance – MR) be performed to a
patient with cognitive disturbances, but no indication is provided as to
which should be prescribed.2,3 A questionnaire was sent in March 2001 to
clinical directors of the 164 Alzheimer's centres (Unità di Valutazione
Alzheimer – UVA) in the regions of Piemonte, Lombardia, Emilia-Romagna,
Trentino, and Veneto in northern Italy, and 46 (28%) completed the
questionnaire (list available on www.centroAlzheimer.it, Gruppo Italiano
di Studio per l’Uso dell’Imaging Diagnostico nei Disturbi Cognitivi). A
question was asked about which factors directed the choice between CT or
MR. The results indicated that 32 to 53% of responders were influenced by
clinical factors (patient’s age, severity of cognitive impairment, and
clinical supicion of cerebrovascular disease), but a similar proportion of
responsders were influenced by organizational factors (availability of a
local scanner and waiting time).
Ideally, organizational factors should not affect physicians’ choice
of diagnostic procedures. The relevance of local scanner availability and
waiting time on the choice between CT and MR indicates a high level of
uncertainty on the part of the physician regarding the diagnostic value of
the exams. Since in Italy MR is about three times as expensive as CT,
these results imply that the excess expenditure might bring about
disproportionately little diagnostic information. I believe that these
findings can likely be extended beyond northern Italy to most other EU
countries.
High tech options for diagnosis are expanding at a faster pace than
diagnostic and implementation research can keep up with, and the gap
between the offer of technological procedures and the evidence of their
usefulness is likely to progressively widen in the future. This argues for
the urgent need of a health technology assessment agency to review and
give advice about the efficient use of present and future biomedical
technology.4
References
1. Knottnerus JA, van Weel C, Muris JW. Evaluation of diagnostic
procedures. BMJ 2002;324:477-80.
2. Knopman DS, DeKosky ST, Cummings JL, Chui H, Corey-Bloom J, Relkin N,
et al. Practice parameter: diagnosis of dementia (an evidence-based
review). Report of the Quality Standards Subcommittee of the American
Academy of Neurology. Neurology 2001;56:1143-53.
3. Bonavita V, Caltagirone C, Musicco M, Sorbi S. Linee guida sulla
diagnosi di demenza e di malattia di Alzheimer. Available from: URL:
http://www.neuro.it/lg.htm.
4. Banta HD. Health policy, health technology assessment, and screening in
Europe. Int J Technol Assess Health Care 2001;17:409-17.
Competing interests: No competing interests
Knotterus et al.’s definition of medical diagnosis agrees with the
quite broad definition that is generally given in the current biomedical
literature [1]. According to this definition, diagnosis means not only
detection or exclusion of disease (i.e. the classical definition of
medical diagnosis), but also evaluation of disease risk, prognostic
assessment, therapeutic monitoring, etc. This quite broad definition of
medical diagnosis is not universally acknowledged [2].
In the meantime, standards for reporting studies of diagnostic
accuracy have been defined, and guidelines should soon ensue (details of
the STARD project can be found on the CONSORT website: http://www.consort-
statement.org/). It can be foreseen that editors of leading medical
journals will soon ask authors of diagnostic studies to comply with the
STARD criteria, because some of them already ask authors of randomised
trials to comply with the CONSORT statement [3, 4]. Their argument for
doing so might of course be that this should eventually lead to better
quality primary diagnostic studies which would consequently allow higher
quality systematic reviews in the field of diagnosis.
One of the problem is that in fact, Knotterus et al., the STARD
group, as well as most “thinkers” in medical diagnosis, are obviously much
more interested in the afore-mentioned classical definition of medical
diagnosis than in its broader definition. Let us consider for example the
case of prognostic assessment that Knotterus et al. considers to be the
starting point of clinical follow-up and informing patients. This is only
but a small part of what prognostic assessment means. Above all,
independent prognostic co-variables constitute the basis of any staging or
stratification systems. In lung cancer for example, the most powerful
prognostic co-variables are disease extent and performance status, which
thus constitute the basis of lung cancer staging. Staging has major
implications for the patients, e.g. non small cell lung cancer patients
with advanced stage cannot be operated on and have a much shorter life
expectancy than those patients who can be operated on. In lung cancer, as
well as in many other diseases, independent prognostic factors must also
be identified before valid clinical trials can be designed, conducted and
interpreted [3]. In lung cancer patients for example, independent
prognostic co-variables cannot be identified without survival studies
associated with multivariate statistical analysis which take into account
disease extent and performance status. The notions of sensitivity,
specificity, negative or predictive values, or likelihood ratios are
fundamental in the afore-mentioned classical medical definition of
diagnosis. These notions are less so in the case of prognostic studies.
Cannot we consider therefore that the STARD criteria are less
appropriate for prognostic studies than they are for more classical
diagnostic studies? Shouldn’t editors of medical journals keep this in
mind if they are going to decide to ask authors of prognostic studies to
comply with the STARD criteria?
References:
[1] Knotterus et al. BMJ 2002;324:477-480.
[2] Bruns DE, Huth EJ, Magid E, Young DS. Toward a Checklist for
Reporting of Studies of Diagnostic Accuracy of Medical Tests. Clin Chem
2000; 46: 893-895 (and ensuing e-responses on: http://www.clinchem.org/)
[3] Moher D, Schulz KF, Altman DG; CONSORT GROUP (Consolidated
Standards of Reporting Trials). The CONSORT statement: revised
recommendations for improving the quality of reports of parallel-group
randomised trials. Ann Intern Med 2001 Apr 17;134(8):657-62. Details
published in: Ann Intern Med 2001; 134(8):663-94.
[4] Smith R. A plea to authors: ensure your studies comply with
guidelines. BMJ 2002;324:314 (and ensuing e-responses on:
http://www.bmj.com/)
Competing interests: No competing interests
There seems to be a curse on those who wish to expound and explain
the mysteries of the 2 x 2 table. Specifically, I have become sensitive to
the likelihood (always plus) of predicting errors by experts such as
Sackett, Wulff, and Greenhalgh: now we have an arithmetic error in
calculating prior probability (Figure: "200/800=0.2"-it should be 200/1000
= 0.2). This isn't really excusable in a didactic paper especially as it
has considerable merit otherwise. The example of X raying for fracture
might have been used to advantage to demonstrate the effect of selection
bias. Suppose 20 compound fractures had been included in the trial, but
excluded for analysis (X ray could not add anything.). Then the
sensitivity of the physical findings would fall from 0.95 to 0.89. Quite
what the average clinician would make of this is open to speculation, of
course. Pundits might make a start by making sure that their lessons are
correct and that confidence (or lack of) limits for the various parameters
are provided.
Test results are too often used to "add an air of verisimilitude to an
otherwise bald and unconvincing narrative"- especially if they come on
imposing notepaper, and are typewritten.
Competing interests: No competing interests
Lacking information about useless tests.
Only the odds ratio of a useless test is given. But there are some
other simple formulas to evaluate useless tests with sensitivity,
specificity, predictive values and likelihood ratios.
When the proportion diagnosed in the index group with the disease
(Sensitivity) equals the proportion diagnosed in the control group (1-
Specificity) the test is useless or Sensitivity+Specificity=1.
Similarly
when the proportion diseased with a positive test (PPV) equals the
proportion diseased with a negative test (1-NPV) the test is useless or
PPV+NPV=1.
Also the likelihood ratio LR(+)/LR(-) =OR= 1 is associated
with a useless test.
Competing interests: No competing interests