Original Article
Evaluation of Diagnostic Imaging Tests: Diagnostic Probability Estimation

https://doi.org/10.1016/S0895-4356(98)00127-9Get rights and content

Abstract

In the evaluation of a diagnostic imaging test for the diagnosis of a particular illness in a particular category of patients, the test should be construed as leading to a test result in the sense of a set of descriptive readings from the image(s), not interpretation of these; and in the evaluation of the test, therefore, the first challenge is the translation of each test result (set of readings) into the corresponding probability that the illness is present. This interpretive translation should not be subjective, nor should it be based on an objective algorithm founded on clinical judgments. Instead, a suitable diagnostic probability function (of the elements in the test result) should be derived empirically by logistic regression analysis of suitable data. We illustrate this alternative outlook by reanalysis of the data from the Prospective Investigation of Pulmonary Embolism Diagnosis.

Introduction

Any evaluation of a diagnostic test has to do with a particular generic context of its potential application: concern to learn about the presence of a particular illness in a particular domain of presentation for testing. Thus, for ventilation-perfusion (V-Q) scanning of the lungs, the evaluation might focus on the diagnosis of pulmonary embolism (PE) in the patient giving rise to a suspicion for this illness by a specified set of domain-defining criteria.

For whichever context, evaluation must focus on a particular conceptual variant of the test. Thus, as for V-Q scanning in this context, the concept of the test without further specifications is so vague that one does not know even the broadest nature of its results: is it images per se, descriptive readings or data based on these (such as number of mismatched defects), or interpretation of the images or data with respect to presence of the illness (such as “low probability” of PE)? In other words, without such specification, it is unclear, even, where the test ends and the interpretation of its result begins. The choice among these three conceptualizations of an imaging test is, in and of itself, already a major basis for divergent outlooks on the evaluation of imaging tests.

Another important basis for divergence of outlooks relates to the theoretical framework for diagnosis and, hence, for diagnostic research. It was the radiologist Lusted who, in collaboration with Ledley, introduced the Bayes’ theorem framework for this [1]. Yet, an alternative theoretical framework [2] deserves attention, one that in the context of diagnostic tests has particular merit with respect to imaging tests on the grounds that they produce descriptive readings or data on multiple aspects of the image(s).

In what follows, we outline very briefly the outlook that now prevails in the evaluation of diagnostic imaging tests, present critical questions about it, and then outline and justify the proposed alternative approach to setting diagnostic probabilities. We illustrate the prevailing outlook by the Prospective Investigation of Pulmonary Embolism Diagnosis (PIOPED) [3] and the alternative by reanalysis of the PIOPED data.

Section snippets

The prevailing outlook

The PIOPED was an eminent, multicenter study about the presence of PE in the domain of adults in whom symptoms suggestive of PE were present within the most recent 24 hours and prompted a request for radiologic assessment. The radiologic test at issue was V-Q scanning in conjunction with chest roentgenography 3, 4.

The definition of the V-Q test under evaluation involved three sequential elements:

  • 1.

    Production of the images (imaging proper)—when to produce them (recency of symptoms) and how

Critical questions

Taking some distance from this prevailing outlook and culture in the evaluation of diagnostic imaging tests, two important, interrelated questions arise. First, would it not be much more natural to take the development of categories of illness probability (“high probability,” etc.)—insofar as they are of interest at all—to be the first-order objective of the study rather than an a priori constraint for it? In other words, why define the readings-based categories of illness probability in

The alternative outlook: elements

The PIOPED “interpretation categories” were defined on the basis of the following input readings/data [3]:

  • Number of large segmental (i.e., 75% or more of a segment) perfusion defects that were mismatched (i.e., without corresponding ventilation or roentgenographic abnormalities or substantially larger than these)

  • Number of moderate segmental (i.e., 25%–75%) mismatched perfusion defects

  • Number (0, 1–3, 4+) of small segmental (i.e., 25% or less) mismatched perfusion defects with normal roentgenogram

The alternative outlook: extensions

In accordance with the spirit of the PIOPED, addressed earlier here was the situation in which the radiologist expresses diagnostic probability on the basis of the radiologic data alone. Yet, ultimately the diagnostic probability that guides the decision about intervention is based on added inputs from the patient’s history and physical examination as well as tests other than imaging. Some aspects of history are relevant to differential risks for the illness at issue and its

Discussion

Our orientational proposition is that diagnostic interpretation of the readings from a (set of) diagnostic image(s) should not be construed as part of the test itself. Instead, the test should be construed as ending with the readings (descriptive) constituting the test result.

Given this conceptualization of an imaging test in diagnosis, we strongly propose that a priori definition of a scale (unidimensional) a result interpretation should be replaced by logistic regression analysis of the data

Acknowledgements

We thank H. Dirk Sostman, M.D., for providing us with access to the Prospective Investigation of Pulmonary Imbolism Diagnosis database and for helpful discussions on the manuscript.

References (14)

There are more references available in the full text version of this article.

Cited by (0)

View full text