- Lucas M Bachmann, senior research fellow ()a,
- Esther Kolb, research fellowa,
- Michael T Koller, research fellowa,
- Johann Steurer, professora,
- Gerben ter Riet, clinical epidemiologistb
- a Horten Centre, Zurich University, Postfach Nord, CH-8091 Zurich, Switzerland
- b Academic Medical Center, Department of General Practice, Meibergdreef 15, 1105 AZ Amsterdam, Netherlands
- Correspondence to: L M Bachmann
- Accepted 2 December 2002
Objective: To summarise the evidence on accuracy of the Ottawa ankle rules, a decision aid for excluding fractures of the ankle and mid-foot.
Design: Systematic review.
Data sources: Electronic databases, reference lists of included studies, and experts.
Review methods: Data were extracted on the study population, the type of Ottawa ankle rules used, and methods. Sensitivities, but not specificities, were pooled using the bootstrap after inspection of the receiver operating characteristics plot. Negative likelihood ratios were pooled for several subgroups, correcting for four main methodological threats to validity.
Results: 32 studies met the inclusion criteria and 27 studies reporting on 15 581 patients were used for meta-analysis. The pooled negative likelihood ratios for the ankle and mid-foot were 0.08 (95% confidence interval 0.03 to 0.18) and 0.08 (0.03 to 0.20), respectively. The pooled negative likelihood ratio for both regions in children was 0.07 (0.03 to 0.18). Applying these ratios to a 15% prevalence of fracture gave a less than 1.4% probability of actual fracture in these subgroups.
Conclusion: Evidence supports the Ottawa ankle rules as an accurate instrument for excluding fractures of the ankle and mid-foot. The instrument has a sensitivity of almost 100% and a modest specificity, and its use should reduce the number of unnecessary radiographs by 30-40%.
What is already known on this topic
What is already known on this topic Although most patients with ankle sprains who present to emergency departments undergo radiography, less than 15% have a fracture
The Ottawa ankle rules is a clinical decision aid designed to avoid unnecessary radiography
What this paper adds
What this paper adds The Ottawa ankle rules is highly accurate at excluding ankle fractures after sprain injury
The number of acute ankle sprains managed by lay people at sporting activities is unknown; however, general practitioners frequently encounter such injuries.1 The management of ankle sprains is daily routine at emergency departments, and although most patients undergo radiography, fracture of the ankle or mid-foot occurs in less than 15%.2–6 This small yield triggered the development of the Ottawa ankle rules in 1992.7 This instrument consists of a questionnaire for assessment of the ankle and foot.8 The ankle assessment covers the ability to walk four steps (immediately after the injury or at the emergency department) and notes localised tenderness of the posterior edge or tip of either malleolus (four spots). The mid-foot assessment covers the ability to walk and notes localised tenderness of the navicular or the base of the fifth metatarsal (fig 1). The instrument is designed to rule out fractures of the malleolus and the mid-foot. It has been validated and modified in several clinical settings.
When almost every patient entering the emergency department with an ankle sprain undergoes radiography, even modest values for specificity may imply large reductions in the number of radiographs needed. The instrument is therefore calibrated towards high sensitivity, at the expense of specificity to some extent. We conducted a systematic review on the accuracy of the Ottawa ankle rules.
We focused on studies in which the Ottawa ankle rules was used to diagnose fractures of the ankle or mid-foot. We electronically searched databases, checked the reference lists of included studies, and contacted experts and authors in the specialty (see appendix on bmj.com).
We searched Medline and Premedline (Ovid version; 1990 to present), Embase (Datastar version; 1990-2002), CINAHL (Winspirs version; 1990-2002), and the Cochrane Library (2002, issue 2). We explored the Science Citation Index database (Web of Science by Institute for Scientific Information), entering reference 7 of this paper. The search had no language restrictions.
We selected studies in a two stage process. Firstly, all abstracts or titles found by the electronic searches were independently scrutinised by JS and LMB. If a paper's eligibility was disputed, the paper was obtained and scrutinised. Next, we obtained copies of eligible papers. We used a checklist to assess that criteria for inclusion had been met. Minimal requirements for inclusion were assessment of the Ottawa ankle rules and the possibility of constructing at least a 2×2 table specifying the false positive rate and the false negative rate. Disagreements on eligibility of studies were resolved by consensus.
Methodological quality and statistical analysis
EK and LMB independently assessed the methods of data collection, patient selection, blinding and prevention of verification bias, and description of the instrument and reference standard.9–14 Disagreements were resolved by consensus.
We calculated several pooled estimates of the negative likelihood ratio by successively increasing the number of methodological criteria required (table 1).
We calculated sensitivities, specificities, likelihood ratios, and their standard errors. Because the Ottawa ankle rules is calibrated towards high sensitivity, we were particularly interested in the pooled sensitivity (using the bootstrap) and in the pooled likelihood ratio of a negative result (using a random effects model)—that is, how many times more likely it is to find a negative result among people with a fracture (1−sensitivity) than among those without (specificity). To investigate sources of variation in the negative likelihood ratios, we looked at this variable in analyses stratified by variables related to clinical subgroups and study design. We calculated the Spearman rank correlation to assess variation in diagnostic threshold. We tested heterogeneity of sensitivities and specificities using χ2 tests, but the interpretation was hampered by small numbers of false negative results.15 After inspection of the receiver operating characteristics plot we decided to pool sensitivities, but not specificities (fig 2). We analysed the data with Stata 7.0.
We identified 1085 studies from the electronic search, and we obtained full papers for 116. The reference lists of these studies revealed 15 additional articles. Overall, we analysed 32 studies meeting our inclusion criteria. 7 16–46 Contact with the first authors of these studies yielded no additional data.
Overall, 32 studies investigated the accuracy of the Ottawa ankle rules: 16 assessed the ankle, 7 16 18 26 28 30 31 33 34 37 39–43 46 11 assessed the mid-foot, 7 16 18 28 30 33 40–43 46 and 10 investigated global accuracy, which included a combination of both assessments. 17 21–23 25 27 35 38 44 45
The Ottawa ankle rules was developed to assist decision making in adults, but six studies reported on the accuracy of the instrument in children. 19 20 24 29 32 36 Several studies selectively included patients admitted to the hospital within 48 hours of a sprain instead of within one week. 21 24 31 36
We excluded from the pooled estimates studies that collected data non-prospectively in addition to unknown blinding of the radiologist 17 37 and one abstract.40 If studies compared the performance of different specialties using the rules, we analysed only the data on doctors' judgments. 31 35 We also excluded from the pooled analysis data on modifications of the rules. 27 28 41 44
Overall, 27 studies were available for the pooled analysis: 12 on assessment of the ankle (13 2×2 tables), 7 16 18 26 30 31 33 34 39 42 43 46 eight on assessment of the mid-foot (nine 2×2 tables), 7 16 18 30 33 42 43 46 10 on assessment of both the ankle and the mid-foot (10 2×2 tables), 21–23 25 27 32 38 44 45 and six on assessment of the ankle or mid-foot in children (seven 2×2 tables). 19 20 24 29 32 36
Among these 27 studies describing 15 581 patients, 47 patients (0.3%) had a false negative result. Table 2 shows the studies' characteristics stratified by ankle, mid-foot, or combined assessment.
Sensitivity and specificity
Table 3 shows the pooled sensitivities and the distribution of specificities stratified by several characteristics. Sensitivities were consistently high but ranged from 99.6% (95% confidence interval 98.2% to 100.0%) in studies on application of the rules within 48 hours of injury to 96.4% (93.8% to 98.6%) in studies of combined assessment. The specificities ranged from 47.9% (interquartile range 42.3%-77.1%) in studies with a prevalence of fracture below the 25th centile of all studies to 26.3% (19.4%-34.3%) in studies of combined assessment.
Negative likelihood ratio
Table 4 shows pooled negative likelihood ratios for clinical subgroups and probabilities of fracture after a negative result, assuming a 15% prevalence of fracture. The post-test probability of fracture was lowest in those studies with prevalences below the 25th centile of all studies (0.7%, 0.35% to 1.90%) and highest in those studies with prevalences above the 75th centile of all studies (3.74%, 1.73% to 8.26%). As the pretest probability of fracture increases, the pooled negative likelihood ratio gets worse. In studies assessing the Ottawa ankle rules in children, the probability of fracture after a negative result was 1.22% (0.53% to 3.08%). A worse negative likelihood ratio was found in the studies that assessed both the ankle and the mid-foot.
Table 5 shows the likelihood ratios for three criteria believed to affect the accuracy of diagnosis. The features of ideal study design, such as consecutive entry and applying a radiography reference standard in all patients, were associated with slightly worse likelihood ratios.
Table 1 shows pooled negative likelihood ratios stratified for delay of patients being assessed (within or after 48 hours) and according to the quality items prospective data collection, enrolment of consecutive patients, blinding of assessor of radiographs, and definite diagnosis with radiography in all patients. Data on the use of the Ottawa ankle rules within 48 hours in adults are scarce. In children, the pooled negative likelihood ratio was 0.07, which seems low enough to be useful, although the evidence is sparse and the confidence interval correspondingly wide. The pooled likelihood ratios for assessment of the ankle and mid-foot are similar irrespective of methodological quality. Nevertheless, the estimates further towards the right side of the table are more likely to be valid.
We summarised the accuracy of the Ottawa ankle rules for excluding fractures of the ankle and mid-foot in patients presenting to emergency departments with an acute ankle sprain. Less than 2% of patients in most subgroups who were negative for fracture according to the Ottawa ankle rules actually had a fracture.
As the Ottawa ankle rules is an instrument that is calibrated towards high sensitivity, we were particularly interested in the pooled sensitivity and the pooled likelihood ratio of a negative result. Specificity, however, is an indicator of the number of unnecessary radiographs that may be avoided with this decision rule. The variability in the specificities, which ranged from 10% to 79%, is surprising. 35 42
We hypothesise that differences in clinical skills, interpretation of the test, and experience of staff performing the test influenced the accuracy of the Ottawa ankle rules. Only a few studies reported particulars of staff performing the test, stating, for instance, the number of years worked at a trauma emergency department. In addition, the expression of pain, which is crucial for the interpretation of the test, may have a cultural dimension. This could result in a higher false positive rate among patients with a relatively vivid expression of pain or a higher false negative rate among stoical individuals, unless the clinician shares the patient's cultural background. The subtlety of palpation technique might explain some of the large variation in false positive rates—the percentages of patients who apparently indicated pain (or were unable to walk four steps) but had no fracture.
The Ottawa ankle rules was developed to avoid unnecessary radiography. The economic aspect of the test may be more complex. An obvious requirement of saving costs by means of the test is its application in clinical practice. A study on techniques for dissemination investigated the impact on requests for radiography of the ankle and foot in clinical practice after use of the instrument.47 The study found that although clinicians widely recognised the test as a decision tool, its use and the change of clinical behaviour was limited. Clinicians aim to minimise the number of missed fractures and would therefore maximise sensitivity at all costs. Fear of a bad professional reputation or litigation might be an explanation. In contrast, a health insurer would be interested in the optimal balance between sensitivity and specificity of the instrument. Therefore, the practical question from the health authorities' point of view is, how should the instrument behave in order that clinicians will use it? Suppose, for example, that a sensitivity of 92% with a specificity of 85% maximised cost effectiveness. Suppose also that clinicians simply refuse to use the instrument at such a low sensitivity. In that case, it may be more useful to design the instrument such that, for example, 90% of clinicians will use it. To do this calculation would require knowing the distribution of the minimal sensitivities that the relevant clinicians are prepared to work with. Then the optimal cut-off point for sensitivity at which just enough clinicians would actually use it to make the test cost effective could be calculated.
Immediate access to radiography may further trigger requests for radiographs. So far the usefulness of the Ottawa ankle rules as a decision tool in primary care has not been assessed. Dissemination among general practitioners and people supervising sports activities may therefore be pertinent.
We thank Pius Estermann (information specialist, University Hospital Zurich) for doing the literature searches and Afina Glas and Patrick Bossuyt (department of clinical epidemiology and biostatistics, University of Amsterdam) for commenting on an earlier draft.
Contributors: LMB and JS initiated the project. LMB, MTK, EK, and JS screened and extracted the data. LMB and MTK cross checked the extracted data. LMB and GtR analysed the data. All authors participated in discussing the results and in writing the paper. LMB will act as guarantor for the paper.
Editorial by Heyworth
Competing interests None declared.
Examples of the search strategy and details of the included studies appear on bmj.com