BMJ 1997;315:540-543 (30 August)

Education and debate

How to read a paper: Papers that report diagnostic or screening tests

Trisha Greenhalgh, senior lecturer a

a Unit for Evidence-Based Practice and Policy Department of Primary Care and Population Sciences University College London Medical School/Royal Free Hospital School of Medicine Whittington Hospital London N19 5NF p.greenhalgh@ucl.ac.uk


right arrow   Ten men in the dock
up arrowTop
dotTen men in the...
down arrowValidating tests against a...
down arrowDoes the paper validate...
down arrowA note on likelihood...
down arrowReferences

If you are new to the concept of validating diagnostic tests, the following example may help you. Ten men are awaiting trial for murder. Only three of them actually committed a murder; the seven others are innocent of any crime. A jury hears each case and finds six of the men guilty of murder. Two of the convicted are true murderers. Four men are wrongly imprisoned. One murderer walks free.



View larger version (109K):
[in this window]
[in a new window]
 

PETER BROWN

This information can be expressed in what is known as a two by two table (table 1). Note that the "truth" (whether or not the men really committed a murder) is expressed along the horizontal title row, whereas the jury's verdict (which may or may not reflect the truth) is expressed down the vertical row.


 
View this table:
[in this window]
[in a new window]
 
Table 1 Two by two table showing outcome of trial for 10 men accused of murder

These figures, if they are typical, reflect several features of this particular jury:

  • the jury correctly identifies two in every three true murderers;

  • it correctly acquits three out of every seven innocent people;

  • if this jury has found a person guilty, there is still only a one in three chance that they are actually a murderer;

  • if this jury found a person innocent, he or she has a three in four chance of actually being innocent; and

  • in five cases out of every 10 the jury gets it right.

These five features constitute, respectively, the sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of this jury's performance. The rest of this article considers these five features applied to diagnostic (or screening) tests when compared with a "true" diagnosis or gold standard. A sixth feature—the likelihood ratio—is introduced at the end of the article.


right arrow   Validating tests against a gold standard
up arrowTop
up arrowTen men in the...
dotValidating tests against a...
down arrowDoes the paper validate...
down arrowA note on likelihood...
down arrowReferences

Our window cleaner told me that he had been feeling thirsty recently and had asked his general practitioner to be tested for diabetes, which runs in his family. The nurse in his surgery had asked him to produce a urine specimen and dipped a stick in it. The stick stayed green, which meant, apparently, that there was no sugar in his urine. This, the nurse had said, meant that he did not have diabetes.


Summary points

New tests should be validated by comparison against an established gold standard in an appropriate spectrum of subjects

Diagnostic tests are seldom 100% accurate (false positives and false negatives will occur)

A test is valid if it detects most people with the target disorder (high sensitivity) and excludes most people without the disorder (high specificity), and if a positive test usually indicates that the disorder is present (high positive predictive value)

The best measure of the usefulness of a test is probably the likelihood ratio—how much more likely a positive test is to be found in someone with, as opposed to without, the disorder

I had trouble explaining that the result did not necessarily mean this, any more than a guilty verdict necessarily makes someone a murderer. The definition of diabetes, according to the World Health Organisation, is a blood glucose level above 8 mmol/l in the fasting state, or above 11 mmol/l two hours after a 100 g oral glucose load, on one occasion if the patient has symptoms and on two occasions if he or she does not.1 These stringent criteria can be termed the gold standard for diagnosing diabetes (although purists have challenged this notion2 ).

The dipstick test, however, has some distinct practical advantages over the fullblown glucose tolerance test. To assess objectively just how useful the dipstick test for diabetes is, we would need to select a sample of people (say 100) and do two tests on each of them: the urine test (screening test) and a standard glucose tolerance test (gold standard). We could then see, for each person, whether the result of the screening test matched the gold standard (see table 2). Such an exercise is known as a validation study.


 
View this table:
[in this window]
[in a new window]
 
Table 2 Two by two table notation for expressing the results of validation study for diagnostic or screening test

The validity of urine testing for glucose in diagnosing diabetes has been looked at by Andersson and colleagues,3 whose data I have adapted for use (expressed as a proportion of 1000 subjects tested) in table 3.


 
View this table:
[in this window]
[in a new window]
 
Table 3 Two by two table showing results of validation study of urine glucose testing for diabetes against gold standard3

From the calculations of important features of the urine dipstick test for diabetes (box), you can see why I did not share the window cleaner's assurance that he did not have diabetes. A positive urine glucose test is only 22% sensitive, which means that the test misses nearly four fifths of people who have diabetes. In the presence of classical symptoms and a family history, the window cleaner's baseline chances (pretest likelihood) of having the condition are pretty high and is reduced to only about four fifths of this (the negative likelihood ratio, 0.78; see below) after a single negative urine test. This man clearly needs to undergo a more definitive test.


Features of diagnostic test that can be calculated by comparison with gold standard in validation study

Feature of the test Alternative name Question addressed Formula (see table 2)

Sensitivity True positive rate (positive in disease) How good is this test at picking up people who have the condition? a/ (a+c)
Specificity True negative rate (negative in health) How good is this test at correctly excluding people without the condition? d/ (b+d)
Positive predictive value Post-test probability of a positive test If a person tests positive, what is the probability that he or she has the condition? a/ (a+b)
Negative predictive value Post-test probability of a negative test If a person tests negative, what is the probability that he or she does not have the condition? d/ (c+d)
Accuracy What proportion of all tests have given the correct result? (true positives and true negatives as a proportion of all results) (a+d)/ (a+b+c+d)
Likelihood ratio of a positive test How much more likely is a positive test to be found in a person with the condition than in a person without it? sensitivity/ (l-specificity)
Likelihood ratio of a negative test How much more likely is a negative test to be found in a person without the condition than in a person with it (l-sensitivity)/specificity?


right arrow   Does the paper validate the test?
up arrowTop
up arrowTen men in the...
up arrowValidating tests against a...
dotDoes the paper validate...
down arrowA note on likelihood...
down arrowReferences

The 10 questions below can be asked about a paper that claims to validate a diagnostic or screening test. In preparing these tips, I have drawn on several sources.4 5 6 7 8

Question 1: Is this test potentially relevant to my practice?
Sackett and colleagues call this the utility of the test.6 Even if this test were 100% valid, accurate, and reliable, would it help me? Would it identify a treatable disorder? If so, would I use it in preference to the test I use now? Could I (or my patients or the taxpayer) afford it? Would my patients consent to it? Would it change the probabilities for competing diagnoses sufficiently for me to alter my treatment plan?

Question 2: Has the test been compared with a true gold standard?
You need to ask, firstly, whether the test has been compared with anything at all. Assuming that a "gold standard" test has been used, you should verify that it merits the description, perhaps by using the questions listed in question 1. For many conditions, there is no gold standard diagnostic test. Unsurprisingly, these tend to be the conditions for which new tests are most actively sought. Hence, the authors of such papers may need to develop and justify a combination of criteria against which the new test is to be assessed. One specific point to check is that the test being validated in the paper is not being used to define the gold standard.

Question 3: Did this validation study include an appropriate spectrum of subjects?
Although few investigators would be naive enough to select only, say, healthy male medical students for their validation study, only 27% of published studies explicitly define the spectrum of subjects tested in terms of age, sex, symptoms or disease severity, and specific eligibility criteria.7 Importantly, the test should be verified on a population which includes mild and severe disease, treated and untreated subjects, and those with different but commonly confused conditions.6


Calculating the important features of screening test

Feature Formula Data (see table 3) Value

Sensitivity a/ (a+c) 6/27 22.2%
Specificity d/ (b+d) 966/973 99.3%
Positive predictive value a/ (a+b) 6/13 46.2%
Negative predictive value d/ (c+d) 966/973 97.8%
Accuracy (a+d)/ (a+b+c+d) 972/1000 97.2%
Likelihood ratio:
 Positive test Sensitivity/ (l-specificity) 22.2/0.7 32
 Negative test (l-sensitivity)/specificity 77.8/99. 0.783

Although the sensitivity and specificity of a test are virtually constant whatever the prevalence of the condition, the positive and negative predictive values depend crucially on prevalence. This is why general practitioners are sceptical of the utility of tests developed exclusively in a secondary care population, and why a good diagnostic test is not necessarily a good screening test.

Question 4: Has workup bias been avoided?
This is easy to check. It simply means, "Did everyone who got the new diagnostic test also get the gold standard, and vice versa?" There is clearly a potential bias in studies where the gold standard test is performed only on people who have already tested positive for the test being validated.7

Question 5: Has expectation bias been avoided?
Expectation bias occurs when pathologists and others who interpret diagnostic specimens are subconsciously influenced by the knowledge of the particular features of the case—for example, the presence of chest pain when interpreting an electrocardiogram. In the context of validating diagnostic tests against a gold standard, all such assessments should be "blind."

Question 6: Was the test shown to be reproducible?
If the same observer performs the same test on two occasions on a subject whose characteristics have not changed, they will get different results in a proportion of cases. Similarly, it is important to confirm that reproducibility between different observers is at an acceptable level.9

Question 7: What are the features of the test as derived from this validation study?
All the above standards could have been met, but the test might still be worthless because the sensitivity, specificity, and other crucial features of the test are too low—that is, the test is not valid. What counts as acceptable depends on the condition being screened for. Few of us would quibble about a test for colour blindness that was 95% sensitive and 80% specific, but nobody ever died of colour blindness. The Guthrie heel-prick screening test for congenital hypothyroidism, performed on all babies in Britain soon after birth, is over 99% sensitive but has a positive predictive value of only 6% (it picks up almost all babies with the condition at the expense of a high false positive rate),10 and rightly so. It is more important to pick up every baby with this treatable condition who would otherwise develop severe mental handicap than to save hundreds the minor stress of a repeat blood test.

Question 8: Were confidence intervals given?
A confidence interval, which can be calculated for virtually every numerical aspect of a set of results, expresses the possible range of results within which the true value will probably lie. If the jury in the first example had found just one more murderer not guilty, the sensitivity of its verdict would have gone down from 67% to 33%, and the positive predictive value of the verdict from 33% to 20%. This enormous (and quite unacceptable) sensitivity to a single case decision is, of course, because we validated the jury's performance on only 10 cases. The larger the sample, the narrower the confidence interval, so it is particularly important to look for confidence intervals if the paper you are reading reports a study on a relatively small sample.11

Question 9: Has a sensible "normal range" been derived?
If the test gives non-dichotomous (continuous) results—that is, if it gives a numerical value rather than a yes/no result—someone will have to say what values count as abnormal. Defining relative and absolute danger zones for a continuous variable (such as blood pressure) is a complex science, which should take into account the actual likelihood of the adverse outcome which the proposed treatment aims to prevent. This process is made considerably more objective by the use of likelihood ratios (see below).

Question 10: Has this test been placed in the context of other potential tests in the diagnostic sequence?
In general, we treat high blood pressure simply on the basis of a series of resting blood pressure readings. Compare this with the sequence we use to diagnose coronary artery stenosis. Firstly, we select patients with a typical history of effort angina. Next, we usually do a resting electrocardiogram, an exercise electrocardiogram, and, in some cases, a radionuclide scan of the heart. Most patients come to a coronary angiogram only after they have produced an abnormal result on these preliminary tests.

If you sent 100 ordinary people for a coronary angiogram, the test might show very different positive and negative predictive values (and even different sensitivity and specificity) than it did in the ill population on which it was originally validated. This means that the various aspects of validity of the coronary angiogram as a diagnostic test are virtually meaningless unless these figures are expressed in terms of what they contribute to the overall diagnostic work up.


right arrow   A note on likelihood ratios
up arrowTop
up arrowTen men in the...
up arrowValidating tests against a...
up arrowDoes the paper validate...
dotA note on likelihood...
down arrowReferences

Question 9 above described the problem of defining a normal range for a continuous variable. In such circumstances, it can be preferable to express the test result not as "normal" or "abnormal" but in terms of the actual chances of a patient having the target disorder if the test result reaches a particular level. Take, for example, the use of the prostate specific antigen (PSA) test to screen for prostate cancer. Most men will have some detectable antigen in their blood (say, 0.5 ng/ml), and most of those with advanced prostate cancer will have high concentrations (above about 20 ng/ml). But a concentration of, say, 7.4 ng/ml may be found either in a perfectly normal man or in someone with early cancer. There simply is not a clean cutoff between normal and abnormal.12

We can, however, use the results of a validation study of this test against a gold standard for prostate cancer (say a biopsy of the prostate gland) to draw up a whole series of two by two tables. Each table would use a different definition of an abnormal test result to classify patients as "normal" or "abnormal." From these tables, we could generate different likelihood ratios associated with an antigen concentration above each different cutoff point. When faced with a test result in the "grey zone" we would at least be able to say, "This test has not proved that the patient has prostate cancer, but it has increased [or decreased] the odds of that diagnosis by a factor of x."

The likelihood ratio thus has enormous practical value, and it is becoming the preferred way of expressing and comparing the usefulness of different tests.6 For example, if a person enters my consulting room with no symptoms at all, I know that they have a 5% chance of having iron deficiency anaemia, since I know that one person in 20 in the population has this condition (in the language of diagnostic tests, the pretest probability of anaemia is 0.05).13



View larger version (19K):
[in this window]
[in a new window]
 
Fig 1 Use of likelihood ratios to calculate post-test probability of someone being a smoker6

Now, if I do a diagnostic test for anaemia, the serum ferritin concentration, the result will usually make the diagnosis of anaemia either more or less likely. A moderately reduced serum ferritin concentration (between 18 and 45 µg/l) has a likelihood ratio of 3, so the chances of a patient with this result having iron deficiency anaemia is 0.05x3—or 0.15 (15%). This value is known as the post-test probability of the serum ferritin test. The likelihood ratio of a very low serum ferritin concentration (below 18 µg/l) is 41, making the chances of iron deficiency anaemia in a patient with this result greater than unity. On the other hand, a very high concentration (above 100 µg/l; likelihood ratio 0.13) would reduce the chances of the patient being anaemic from 5% to less than 1%.13

Figure 1 shows a nomogram, adapted by Sackett and colleagues from an original paper by Fagan,14 for working out post-test probabilities when the pretest probability (prevalence) and likelihood ratio for the test are known. The lines A, B, and C, drawn from a pretest probability of 25% (the prevalence of smoking among British adults), are the trajectories through likelihood ratios of 15, 100, and 0.015, respectively—three different tests for detecting whether someone is a smoker.15 Actually, test C detects whether the person is a non-smoker, since a positive result in this test leads to a post-test probability of only 0.5%.


The articles in this series are excerpts from How to read a paper: the basics of evidence based medicine. The book includes chapters on searching the literature and implementing evidence based findings. It can be ordered from the BMJ Publishing Group: tel 0171 383 6185/6245; fax 0171 383 6662. Price £13.95 UK members, £14.95 non-members.


right arrow   Acknowledgements

Thanks to Dr Sarah Walters and Dr Jonathan Elford for advice, and in particular to Dr Walters for the jury example.


right arrow   References
up arrowTop
up arrowTen men in the...
up arrowValidating tests against a...
up arrowDoes the paper validate...
up arrowA note on likelihood...
dotReferences

  1. WHO Study Group. Diabetes mellitus. WHO Tech Report Ser 1985:No 727.
  2. McCance DR, Hanson RL, Charles M-A, Jacobsson LTH, Pettitt DJ, Bennett PH, et al. Comparison of tests for glycated haemoglobin and fasting and two-hour plasma glucose concentrations as diagnostic measures for diabetes. BMJ 1994;308:1323-8. [Abstract/Free Full Text]
  3. Andersson DKG, Lundblad E, Svardsudd K. A model for early diagnosis of type 2 diabetes mellitus in primary health care. Diabet Med 1993;10:167-73.
  4. Jaeschke R, Guyatt G, Sackett DL. Users' guides to the medical literature. III. How to use an article about a diagnostic test. A. Are the results of the study valid? JAMA 1994;271:389-91. [Medline]
  5. Jaeschke R, Guyatt G, Sackett DL. Users' guides to the medical literature. III. How to use an article about a diagnostic test. B. What were the results and will they help me in caring for my patients? JAMA 1994;271:703-7.
  6. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical epidemiology—a basic science for clinical medicine. London: Little, Brown, 1991:51-68.
  7. Read MC, Lachs MS, Feinstein AR. Use of methodological standards in diagnostic test research: getting better but still not good. JAMA 1995;274:645-51. [Abstract]
  8. Mant D. Testing a test: three critical steps. In: Jones R, Kinmonth A-L, eds. Critical reading for primary care. Oxford: Oxford University Press, 1995:183-90.
  9. Bush B, Shaw S, Cleary P, Delbanco TL, Aronson MD. Screening for alcohol abuse using the CAGE questionnaire. Am J Med 1987;82:231-6.
  10. Verkerk PH, Derksen-Lubsen G, Vulsma T, Loeber JG, de Vijlder JJ, Verbrugge HP. Evaluation of a decade of neonatal screening for congenital hypothyroidism in the Netherlands. Ned Tijdschr Geneesk 1993;137:2199-205.
  11. Gardner MJ, Altman DG, eds. Statistics with confidence: confidence intervals and statistical guidelines. London: BMJ Books, 1989.
  12. Catalona WJ, Hudson MA, Scardino PT, Richie JP, Ahmann FR, Flanigan RC, et al. Selection of optimal prostate specific antigen cutoffs for early diagnosis of prostate cancer: receiver operator characteristic curves. J Urol 1994;152:2037-42.
  13. Guyatt GH, Patterson C, Ali M, Singer J, Levine M, Turpie I, Meyer R. Diagnosis of iron deficiency anaemia in the elderly. Am J Med 1990;88:205-9.
  14. Fagan TJ. Nomogram for Bayes' theorem. N Engl J Med 1975;293:257-61. [Medline]
  15. How good is that test—using the result. Bandolier 1996;3:6-8.

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?

Related Article

Test accuracy is example of redundant information
A Ralph Henderson
BMJ 1998 316: 312. [Extract] [Full Text]

This article has been cited by other articles:

  • Selman, T. J., Mann, C. MD, Zamora, J. PhD, Appleyard, T.-L. MBBS, Khan, K. MSc (2008). Diagnostic accuracy of tests for lymph node status in primary cervical cancer: a systematic review and meta-analysis. CMAJ 178: 855-862 [Abstract] [Full text]  
  • Khaleeli, Z, Sastre-Garriga, J, Ciccarelli, O, Miller, D H, Thompson, A J (2007). Magnetisation transfer ratio in the normal appearing white matter predicts progression of disability over 1 year in early primary progressive multiple sclerosis. J. Neurol. Neurosurg. Psychiatry 78: 1076-1082 [Abstract] [Full text]  
  • Johnston, O., Fornai, G., Cabrini, S., Kendrick, T. (2007). Feasibility and acceptability of screening for eating disorders in primary care. Fam Pract 24: 511-517 [Abstract] [Full text]  
  • Thoirs, K., Williams, M. A., Phillips, M. (2007). Systematic Review of Sonographic Measurements of the Ulnar Nerve at the Elbow. Journal of Diagnostic Medical Sonography 23: 255-262 [Abstract]  
  • Brealey, S, Westwood, M (2007). Are you reading what we are reading? The effect of who interprets medical images on estimates of diagnostic test accuracy in systematic reviews. Br. J. Radiol. 80: 674-677 [Abstract] [Full text]  
  • Gagnon, C., Baillargeon, J.-P. (2007). Suitability of recommended limits for fasting glucose tests in women with polycystic ovary syndrome. CMAJ 176: 933-938 [Abstract] [Full text]  
  • Grijalva, C. G., Poehling, K. A., Edwards, K. M., Weinberg, G. A., Staat, M. A., Iwane, M. K., Schaffner, W., Griffin, M. R. (2007). Accuracy and Interpretation of Rapid Influenza Tests in Children. Pediatrics 119: e6-e11 [Abstract] [Full text]  
  • Ince, W. L., Jubb, A. M., Holden, S. N., Holmgren, E. B., Tobin, P., Sridhar, M., Hurwitz, H. I., Kabbinavar, F., Novotny, W. F., Hillan, K. J., Koeppen, H. (2005). Association of k-ras, b-raf, and p53 Status With the Treatment Effect of Bevacizumab. JNCI J Natl Cancer Inst 97: 981-989 [Abstract] [Full text]  
  • Nix, P., Lind, M., Greenman, J., Stafford, N., Cawkwell, L. (2004). Expression of Cox-2 protein in radioresistant laryngeal cancer. Ann Oncol 15: 797-801 [Abstract] [Full text]  
  • Chan, L. W., Moses, M. A., Goley, E., Sproull, M., Muanza, T., Coleman, C. N., Figg, W. D., Albert, P. S., Menard, C., Camphausen, K. (2004). Urinary VEGF and MMP Levels As Predictive Markers of 1-Year Progression-Free Survival in Cancer Patients Treated With Radiation Therapy: A Longitudinal Study of Protein Kinetics Throughout Tumor Progression and Therapy. JCO 22: 499-506 [Abstract] [Full text]  
  • Cayley, W. E. Jr., Asselbergs, F. W., Cohen Tervaert, J.-W., Tio, R. A., Brennan, M.-L., Penn, M. S., Hazen, S. L. (2004). Prognostic Value of Myeloperoxidase in Patients with Chest Pain. NEJM 350: 516-518 [Full text]  
  • Foy, R, Warner, P (2003). About time: diagnostic guidelines that help clinicians. Qual Saf Health Care 12: 205-209 [Abstract] [Full text]  
  • Chan, M. H.M., Chow, K. M., Chan, A. T.C., Leung, C. B., Chan, L. Y.S., Chow, K. C.K., Lam, C. W., Lo, Y.M. D. (2003). Quantitative Analysis of Pleural Fluid Cell-free DNA as a Tool for the Classification of Pleural Effusions. Clin. Chem. 49: 740-745 [Abstract] [Full text]  
  • Fox, N. J. (2003). Practice-based Evidence: Towards Collaborative and Transgressive Research. Sociology 37: 81-102 [Abstract]  
  • Clark, T. J., Voit, D., Gupta, J. K., Hyde, C., Song, F., Khan, K. S. (2002). Accuracy of Hysteroscopy in the Diagnosis of Endometrial Cancer and Hyperplasia: A Systematic Quantitative Review. JAMA 288: 1610-1621 [Abstract] [Full text]  
  • Fallis, D., Fricke, M. (2002). Indicators of Accuracy of Consumer Health Information on the Internet: A Study of Indicators Relating to Information for Managing Fever in Children in the Home. J. Am. Med. Inform. Assoc. 9: 73-79 [Abstract] [Full text]  
  • Lysakowski, C., Walder, B., Costanza, M. C., Tramer, M. R. (2001). Transcranial Doppler Versus Angiography in Patients With Vasospasm due to a Ruptured Cerebral Aneurysm: A Systematic Review. Stroke 32: 2292-2298 [Abstract] [Full text]  
  • Fritz, J. M, Wainner, R. S (2001). Examining Diagnostic Tests: An Evidence-Based Perspective. ptjournal 81: 1546-1564 [Abstract] [Full text]  
  • McQueen, M. J. (2001). Overview of Evidence-based Medicine: Challenges for Evidence-based Laboratory Medicine. Clin. Chem. 47: 1536-1546 [Abstract] [Full text]  
  • Holvoet, P., Mertens, A., Verhamme, P., Bogaerts, K., Beyens, G., Verhaeghe, R., Collen, D., Muls, E., Van de Werf, F. (2001). Circulating Oxidized LDL Is a Useful Marker for Identifying Patients With Coronary Artery Disease. Arterioscler. Thromb. Vasc. Bio. 21: 844-848 [Abstract] [Full text]  
  • Swingler, G. H. (2000). Radiologic Differentiation Between Bacterial and Viral Lower Respiratory Infection in Children: A Systematic Literature Review. CLIN PEDIATR 39: 627-633 [Abstract]  
  • Kennedy, C. R, HALL, D., DAVIS, A. (2000). Current topic: Neonatal screening for hearing impairment. Arch. Dis. Child. 83: 377-383 [Full text]  
  • Price, C. P. (2000). Evidence-based Laboratory Medicine: Supporting Decision-Making. Clin. Chem. 46: 1041-1050 [Abstract] [Full text]  
  • Santos-Gomes, G., Gomes-Pereira, S., Campino, L., Araújo, M. D. A., Abranches, P. (2000). Performance of Immunoblotting in Diagnosis of Visceral Leishmaniasis in Human Immunodeficiency Virus-Leishmania sp.-Coinfected Patients. J. Clin. Microbiol. 38: 175-178 [Abstract] [Full text]  
  • Morgan, J. F, Reid, F., Lacey, J H. (1999). The SCOFF questionnaire: assessment of a new screening tool for eating disorders. BMJ 319: 1467-1468 [Full text]  
  • Lijmer, J. G., Mol, B. W., Heisterkamp, S., Bonsel, G. J., Prins, M. H., van der Meulen, J. H. P., Bossuyt, P. M. M. (1999). Empirical Evidence of Design-Related Bias in Studies of Diagnostic Tests. JAMA 282: 1061-1066 [Abstract] [Full text]  
  • Berlin, J. A., Rennie, D. (1999). Measuring the Quality of Trials: The Quality of Quality Scales. JAMA 282: 1083-1085 [Full text]  
  • KÜNZLI, N., STUTZ, E. Z., PERRUCHOUD, A. P., BRÄNDLI, O., TSCHOPP, J.-M., BOLOGNINI, G., KARRER, W., SCHINDLER, C., ACKERMANN-LIEBRICH, U., LEUENBERGER, P. (1999). Peak Flow Variability in the SAPALDIA Study and Its Validity in Screening for Asthma-related Conditions. Am. J. Respir. Crit. Care Med. 160: 427-434 [Abstract] [Full text]  
  • Henderson, A R. (1998). Test accuracy is example of redundant information. BMJ 316: 312b-312 [Full text]  

Rapid Responses:

Read all Rapid Responses

Neonatal screnning in UK has high sensitivity
Michael Addison
bmj.com, 23 Jan 2001 [Full text]
Diagnostic Tests Revisited
Daan G Uitenbroek
bmj.com, 6 Feb 2002 [Full text]
Using the likelihood ratio in diagnostic tests
Alan C Gibbs
bmj.com, 27 May 2003 [Full text]



Student BMJ

Intimate examinations

Israeli students are refusing to perform intimate examinations on anaesthetised women without their informed consent.

www.student.bmj.com

Listen to the latest BMJ Interview