Analysis And Comment Medical education

Evidence based checklists for objective structured clinical examinations

BMJ 2006; 333 doi: (Published 07 September 2006) Cite this as: BMJ 2006;333:546
  1. Christopher Frank, clinical programme leader (frankc{at}
  1. 1 Providence Continuing Care Centre, 340 Union Street, PO Box 3600, Kingston, ON, Canada K7L 5A2
  • Accepted 17 July 2006

How doctors examine a patient is often influenced more by tradition than by evidence. But trainees should be assessed on what works and not personal preferences

The objective structured clinical examination has been used to evaluate the clinical skills of medical trainees for more than 25 years. Many examinations use checklists as the main indicator of performance, although some people advocate global ratings.13 The development of these checklists is challenging. Doctors devising checklists often disagree about what should be included and the weighting given to items. This is particularly true with checklists for physical examinations, where tradition, different training sites, and specialty backgrounds influence opinion.

Reasons for concern

The physical examination represents a link with the history of medicine, and many clinicians have strong opinions on the merits of specific clinical signs. Clinicians have begun to critically review many aspects of physical examination, but this critical approach does not seem to have been applied to the development of checklists for objective structured examinations. Gorter and colleagues reviewed the literature on developing these examinations and found that only 41% of the 29 papers described the process of checklist development. None of the papers reported that checklists were based on the available literature, and only three reported use of published evidence.4


I reviewed the checklists for physical examination stations from undergraduate and postgraduate examinations at a Canadian health sciences centre and from a multicentre examination to provide examples. I also reviewed the checklists in an examination preparation book.5 The items discussed below were chosen because there was controversy about the clinical value of a particular aspect of a physical examination or the possible weighting of marks.

I used JAMA's rational clinical examination series as a distillation of evidence for specific procedures.67 For each of the procedures potentially included in an examination checklist, I searched Medline for articles published after the JAMA articles. My approach was intended to simulate an evidence based approach that a committee could use and did not include steps used to answer clinically based questions, such as evaluating the effect after application of the evidence.

Respiratory examination for pneumonia

Students' skills in performing the respiratory examination are often tested by using scenarios in which patients have pneumonia or shortness of breath. The checklists commonly use items that reflect longstanding traditions for chest examination and include several controversial manoeuvres, most notably vocal and tactile fremitus and aegophony.

Studies have found the precision of respiratory examinations to be poor for various conditions, including community acquired pneumonia.810 Several manoeuvres that might be included on examination checklists had poor measures of precision: tactile fremitus (κ value of agreement = 0.01),8 bronchial breath sounds (κ = 0.19), and whispered pectoriloquy (κ = 0.11).8 Even the most reliable items (percussion dullness and wheezes) had κ values of 0.51 and 0.52 respectively, which means that clinicians agreed about half of the time beyond chance. These values are partly explained by the rarity of findings like aegophony and the fluctuating presence of clinical signs. For large scale examinations, differences of opinion between examiners about physical examination techniques can become an issue because of the need for multiple examiners.

Gennis and colleagues found that abnormal chest findings, including fremitus, decreased breath sounds, crackles, wheezes, aegophony, and percussion dullness, were present in fewer than half of patients with pneumonia on radiography.9 Studies have found that percussion dullness, aegophony, and altered fremitus had good specificity but poorer sensitivity (< 50%).910 Aegophony, fremitus, and whispered pectoriloquy were uncommon in patients with pneumonia. Individually, these findings had positive predictive values of 50-55% and negative predictive values of 62-63%.9 None of the studies differentiated between anterior and posterior auscultation, and the importance of including anterior examination as a separate checklist item is unclear.

Overall, it seems that auscultation of the chest, percussion for dullness, and vital signs are the most important aspect of the physical examination of the chest. The relevance of traditional techniques such as aegophony, fremitus, and examination for whispered pectoriloquy is less clear and they are likely to lead to poor inter-rater reliability. For this reason checklist weighting should emphasise the more basic aspects of the respiratory examination and should include temperature and vital signs, especially if the examination station is intended to evaluate assessment of pneumonia rather than a generic chest examination. Assigning separate marks for anterior and posterior examination may be much less important than for comparing sides or considering examination in the lateral position.

Embedded Image

Examination should be more science than art


Capillary refill for volume status or shock

The assessment of hypovolaemia can be difficult, especially with older patients. Examining the rate of capillary refill is often considered part of the assessment of volume status and shock, as well as peripheral arterial disease. Assessment of hypovolaemia in adult patients may be included in examination stations dealing with trauma or the management of shock.

The definition and range of normal values for capillary refill time is not commonly cited. The test involves compressing the distal phalanx of the middle finger (at heart level) for five seconds and then timing the return of normal colour to the finger. Measurements done by two examiners using stopwatches have been shown to be within 0.3 seconds of each other. The normal range depends on age, temperature of the room, and technique. At 21°C the upper limit of normal is 2 seconds for children and men and 3 seconds for women, although it can rise to 4 seconds in very elderly people.7 It is unclear if most clinical examinations provide these criteria for examiners. Low lighting has been found to make it harder to detect capillary refill. This may be relevant to examinations, where the physical environment may vary.

I found only one study that looked at the role of capillary refill in people with possible hypvolaemia (emergency departments and people who had just donated blood). The test did not have good sensitivity (6%) for identification of 450 ml blood loss (specificity of 93% and a likelihood ratio for a positive result of 1.0). The use of an arbitrary cut-off point (2 seconds) did not improve diagnostic performance.11 Concerns about the use of capillary refill led to its exclusion from the trauma scale and are echoed in the literature about its use in the assessment of peripheral arterial disease.

Strategies that may help

Assessment is linked closely to the content of teaching. Many undergraduate examinations are developed to assess student's progress in a clinical skills course. For this reason, shortcomings in the checklist may reflect a lack of evidence being used in the development of the clinical skills curriculum1213 as well as decreased attention to teaching clinical skills,813 especially at the bedside.14 The interest in studying and critically reviewing the validity and reliability of physical examination procedures is relatively recent. It is only in the past five years that I have heard trainees citing recommendations on physical examination from journal articles or seen citations referred to in examination answers.

Clinicians still seem resistant to dropping unreliable examination procedures from teaching or structured examinations, perhaps because of their medical education, clinical experience, or lack of awareness of existing literature. One solution might be to teach techniques such as whispering pectoriloquy or capillary refill as part of the history of medicine. However, they should not be included in objective structured clinical examinations.

Use of the available evidence is particularly important for checklists in national or multicentre examinations, when it is more difficult to ensure a match between teaching and examination content than it is for examinations in individual centres. Although trainees from different centres may have been taught a wide range of manoeuvres, it is reasonable to assume that those backed by evidence will be included in the core curriculum. Using evidence based checklists may also improve the quality of teaching of clinical skills teaching if universities attempt to “teach to the exam.”

Recognition of the potential pitfalls of development of checklists without formal processes is a first step to improving quality. The recommendations of Gorter and colleagues for developing checklists included the use of evidence based performance standards and use of a documented consensus approach to combine expert opinion with published evidence.4 By considering the evidence behind checklist items, people preparing examination checklists can feel more confident justifying their choices to trainees and examiners.

Help with assessing evidence

Reviewing evidence can be time consuming and challenging. As most practitioners become familiar with using the tools of evidence based medicine and with available resources, this task will presumably become easier. The rational clinical examination series is an excellent starting point for clinicians trying to improve their clinical skills as well as for reviewing individual examination manoeuvres. The Standards for Reporting Diagnostic Accuracy initiative (STARD) has published a checklist to assess the quality of new studies of tests of diagnostic accuracy.15 The Society of General Internal Medicine clinical examination research interest group16 and the American College of Physicians17 have compiled resources on the web that may help to optimise checklist items more quickly and more accurately than a review done by an individual. Evidence based textbooks on clinical skills have also been published.18

Evidence is also needed about other aspects of objective structured clinical examinations. History taking is a major focus of these exams, but there is little research on the merits of the questions used in taking a history of common medical problems. Studies of the effect of an evidence based approach on the success of these exams would be important in promoting the use of this approach for examinations and for the teaching of clinical skills.

Summary points

Objective structured examinations are an important tool for testing clinical performance

Students' performance is usually assessed by using a checklist

The contents of the checklist are often controversial, especially for physical examinations

Development of checklists should be based on available evidence of effectiveness


Contributors and sources: CF has been involved in undergraduate and postgraduate medical education in geriatrics for many years. He has been involved in development of the content of objective structured clinical examinations at a university and national level and has been an examiner on many occasions. This article arose at a committee meeting preparing a multicentre OSCE exam.


  • Competing interests None declared.


  1. 1.
  2. 3.
  3. 4.
  4. 5.
  5. 6.
  6. 7.
  7. 8.
  8. 9.
  9. 10.
  10. 11.
  11. 12.
  12. 13.
  13. 14.
  14. 15.
  15. 16.
  16. 17.
  17. 18.
View Abstract

Sign in

Log in through your institution

Free trial

Register for a free trial to to receive unlimited access to all content on for 14 days.
Sign up for a free trial