Evaluation of diagnostic tests when there is no gold standard. A review of methods

A W S Rutjes; J B Reitsma; A Coomarasamy; K S Khan; P M M Bossuyt

doi:10.3310/hta11500

Evaluation of diagnostic tests when there is no gold standard. A review of methods

Health Technol Assess. 2007 Dec;11(50):iii, ix-51. doi: 10.3310/hta11500.

Authors

A W S Rutjes¹, J B Reitsma, A Coomarasamy, K S Khan, P M M Bossuyt

Affiliation

¹ Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academical Medical Center, University of Amsterdam, The Netherlands.

PMID: 18021577
DOI: 10.3310/hta11500

Abstract

Objective: To generate a classification of methods to evaluate medical tests when there is no gold standard.

Methods: Multiple search strategies were employed to obtain an overview of the different methods described in the literature, including searches of electronic databases, contacting experts for papers in personal archives, exploring databases from previous methodological projects and cross-checking of reference lists of useful papers already identified.

Results: All methods available were classified into four main groups. The first method group, impute or adjust for missing data on reference standard, needs careful attention to the pattern and fraction of missing values. The second group, correct imperfect reference standard, can be useful if there is reliable information about the degree of imperfection of the reference standard and about the correlation of the errors between the index test and the reference standard. The third group of methods, construct reference standard, have in common that they combine multiple test results to construct a reference standard outcome including deterministic predefined rules, consensus procedures and statistical modelling (latent class analysis). In the final group, validate index test results, the diagnostic test accuracy paradigm is abandoned and research examines, using a number of different methods, whether the results of an index test are meaningful in practice, for example by relating index test results to relevant other clinical characteristics and future clinical events.

Conclusions: The majority of methods try to impute, adjust or construct a reference standard in an effort to obtain the familiar diagnostic accuracy statistics, such as sensitivity and specificity. In situations that deviate only marginally from the classical diagnostic accuracy paradigm, these are valuable methods. However, in situations where an acceptable reference standard does not exist, applying the concept of clinical test validation can provide a significant methodological advance. All methods summarised in this report need further development. Some methods, such as the construction of a reference standard using panel consensus methods and validation of tests outwith the accuracy paradigm, are particularly promising but are lacking in methodological research. These methods deserve particular attention in future research.

Publication types

Review

MeSH terms

Diagnostic Techniques and Procedures*
Humans
Process Assessment, Health Care*
Reference Standards
Sensitivity and Specificity*