Red flags to screen for malignancy and fracture in patients with low back pain: systematic reviewBMJ 2013; 347 doi: http://dx.doi.org/10.1136/bmj.f7095 (Published 11 December 2013) Cite this as: BMJ 2013;347:f7095
- Aron Downie, PhD student12,
- Christopher M Williams, honorary research fellow1,
- Nicholas Henschke, research fellow13,
- Mark J Hancock, senior lecturer4,
- Raymond W J G Ostelo, professor5,
- Henrica C W de Vet, professor of clinimetrics6,
- Petra Macaskill, professor of biostatistics7,
- Les Irwig, professor of epidemiology8,
- Maurits W van Tulder, professor9,
- Bart W Koes, professor10,
- Christopher G Maher, director1
- 1George Institute for Global Health, University of Sydney, Sydney, NSW, 2050, Australia
- 2Faculty of Science, Macquarie University, Sydney, Australia
- 3Institute of Public Health, University of Heidelberg, Germany
- 4Faculty of Human Sciences, Macquarie University, Sydney, Australia
- 5Department of Health Sciences, EMGO Institute for Health and Care Research, VU University, Amsterdam, Netherlands
- 6Department of Epidemiology and Biostatistics, EMGO Institute for Health and Care Research, VU University Medical Centre, Amsterdam
- 7Screening and Test Evaluation Program (STEP), School of Public Health, Sydney
- 8School of Public Health, University of Sydney, Sydney, Australia
- 9Department of Health Sciences, Faculty of Earth and Life Sciences, VU University, Amsterdam, Netherlands
- 10Department of General Practice, Erasmus Medical Centre, Rotterdam, Netherlands
- Correspondence to: A Downie, George Institute for Global Health, University of Sydney, PO Box M201, Camperdown, Sydney, NSW, 2050, Australia
- Accepted 18 November 2013
Objective To review the evidence on diagnostic accuracy of red flag signs and symptoms to screen for fracture or malignancy in patients presenting with low back pain to primary, secondary, or tertiary care.
Design Systematic review.
Data sources Medline, OldMedline, Embase, and CINAHL from earliest available up to 1 October 2013.
Inclusion criteria Primary diagnostic studies comparing red flags for fracture or malignancy to an acceptable reference standard, published in any language.
Review methods Assessment of study quality and extraction of data was conducted by three independent assessors. Diagnostic accuracy statistics and post-test probabilities were generated for each red flag.
Results We included 14 studies (eight from primary care, two from secondary care, four from tertiary care) evaluating 53 red flags; only five studies evaluated combinations of red flags. Pooling of data was not possible because of index test heterogeneity. Many red flags in current guidelines provide virtually no change in probability of fracture or malignancy or have untested diagnostic accuracy. The red flags with the highest post-test probability for detection of fracture were older age (9%, 95% confidence interval 3% to 25%), prolonged use of corticosteroid drugs (33%, 10% to 67%), severe trauma (11%, 8% to 16%), and presence of a contusion or abrasion (62%, 49% to 74%). Probability of spinal fracture was higher when multiple red flags were present (90%, 34% to 99%). The red flag with the highest post-test probability for detection of spinal malignancy was history of malignancy (33%, 22% to 46%).
Conclusions While several red flags are endorsed in guidelines to screen for fracture or malignancy, only a small subset of these have evidence that they are indeed informative. These findings suggest a need for revision of many current guidelines.
Low back pain is a major cause of disability,1 leading to considerable healthcare expenditure around the world, especially in high income countries.2 The difficulty in providing a definitive diagnosis for most presentations of back pain has given rise to the term “non-specific low back pain,” which is generally considered to be benign and can be managed in a primary care setting.3 Some patients, however, present with low back pain as the initial manifestation of a more serious pathology, such as malignancy, spinal fracture, infection, or cauda equina syndrome. Spinal fracture and malignancy are the most common serious pathologies affecting the spine. In patients with low back pain presenting to primary care, between 1% and 4% will have a spinal fracture4 and in less than 1% malignancy, whether primary tumour or metastasis, will be the underlying cause.5
Identification of serious pathologies, when they exist, is important in the clinical assessment and further assessment and specific treatment is usually required, particularly for malignancy.6 7 For instance, early detection of spinal malignancy could prevent further spread of metastatic disease.8 Identification of spinal fracture will prevent the prescription of treatment such as manual therapy, which is contraindicated,9 as well as progress the patient towards further testing and treatment of underlying disease (such as osteoporosis). Despite the potential consequences of a late or missed diagnosis of these serious pathologies, their low prevalence in primary care settings does not justify routine ancillary testing of patients presenting with low back pain. For this reason, accurate screening tools to aid clinical decisions about when to refer for further testing are paramount.
Most clinical practice guidelines for back pain recommend the use of red flags to help identify those patients with a higher likelihood of spinal fracture or malignancy who then become candidates for more extensive diagnostic investigations. There is confusion, however, as the guidelines have produced different lists of red flags to screen for spinal fracture and malignancy. Eight of the guidelines6 10 11 12 13 14 15 16 investigated by Koes and colleagues in their review of back pain guidelines,3 endorsed 26 red flags for fracture and 27 for malignancy. None of the eight guidelines endorsed the same set of red flags, for either condition, so it is unclear what clinicians should use in clinical care. Additionally, guidelines generally provide no information on diagnostic accuracy of the endorsed red flags, which limits their value in clinical decision making. Adding to the uncertainty, the same agency can provide inconsistent information on red flags. For example, the National Institute for Health and Care Excellence clinical guideline on the early management of persistent non-specific low back pain4 does not endorse red flags, whereas the group’s clinical knowledge summary for the management of low back pain does.17
To resolve the uncertainty around application of red flags in clinical practice, we conducted two Cochrane diagnostic test accuracy reviews assessing the accuracy of red flags to screen for the most common forms of serious pathology—spinal fracture and malignancy—in patients with low back pain.4 5 We have provided a distilled summary of both reviews to help guide clinical decision making.
For the Cochrane diagnostic test accuracy reviews, we searched Medline, OldMedline, Embase, and CINAHL (Ebsco) for eligible studies from the earliest record up to 7 March 2012. The search was updated to include studies up to 1 October 2013. The searches used combinations of terms related to the patient population, history taking, physical examination, and the target condition and was developed in collaboration with a medical information specialist. Forward and backward citation searches were completed.
Primary diagnostic studies were considered if they looked at the results of history taking or physical examination compared with those of an acceptable reference standard, to identify spinal fracture or spinal malignancy in adult patients presenting with low back pain. Examples of reference standards include diagnostic imaging to confirm the presence of spinal fracture, primary malignancy, or metastases in the spine. Long term follow-up (more than six months) of patients after the initial consultation was also considered an appropriate reference standard for both fracture and malignancy if suspected cases identified during the follow-up period were confirmed by medical review. Cohort and cross sectional studies of a consecutive series of patients that presented sufficient data to allow estimates of diagnostic accuracy (such as sensitivity and specificity) were considered for eligibility. We considered studies reported in abstracts or conference proceedings and full journal publications in all languages and excluded studies in which “global clinician judgment” was the only red flag investigated. Four review authors (CMW, NH, and AD for fracture; NH, RWJGO, and AD for malignancy) independently applied the selection criteria to retrieved citations. Final selection was based on a review of full publications. Disagreements were resolved by consensus or by consulting other review authors in cases of persisting disagreement.
Data extraction and quality assessment
Three review authors (CMW and NH for fracture; NH and RWJGO for malignancy) independently extracted data on selection of participants, the index test, reference test, flow of participants, timing of the study, and diagnostic accuracy. Review authors independently assessed the risk of bias of each study using the QUality Assessment of Diagnostic Accuracy Studies (QUADAS) checklist.18 We used the 11 item version of the QUADAS recommended by the Cochrane Diagnostic Test Accuracy Working Group.19 This checklist (see table A in appendix) was used to classify each item as “yes” (adequately dealt with); “no” (inadequately dealt with); or “unclear” (inadequate detail presented to allow a judgment to be made). Disagreements were resolved by consensus. Authors of the current review who were involved in the original study were not involved in data extraction or quality assessment of that study.
Data synthesis and analysis
We generated diagnostic 2×2 tables to characterise diagnostic accuracy. Because of the heterogeneity of tests, study settings, and methods we could not pool results. We used Review Manager 5.220 to calculate study specific estimates of sensitivity and specificity with 95% confidence intervals and MetaDisc 1.421 to calculate likelihood ratios with 95% confidence intervals. Post-test probability was determined with standard methods22 23 with the 95% confidence interval for post-test probability determined with point estimate of pre-test probability and the 95% confidence interval of the likelihood ratio.
We compared the diagnostic accuracy of red flags against the clinical practice guideline developed by Chou and colleagues6 24 for the American College of Physicians and American Pain Society. The guideline provides advice on diagnostic investigation based on various patterns of risk factors (red flags).
The electronic database search and citation tracking identified 13 669 unique titles for fracture and 3092 for malignancy (fig 1⇓). After screening of titles and abstracts, we retrieved full text copies of 68 articles relating to fracture and 70 full text articles relating to malignancy. Fourteen discrete studies were included in the reviews, of which eight related to fracture and nine related to malignancy (three studies dealt with both conditions).
For each included study, details on the design, setting, population, reference standard, index tests used, and definition of the target condition are provided in tables B and C in the appendix. Most studies (eight) were set in primary care,7 25 26 27 28 29 30 31 two in secondary care,32 33 and the four remaining in tertiary care (three in emergency departments34 35 36 and one in a spinal surgical unit37). Three studies investigated red flags for both fracture and malignancy.7 27 34 Five of the eight primary care studies were prospective in design, and three were retrospective chart reviews (two for fracture, one for malignancy). Standard radiographs were the most common reference standard for diagnosing fracture, and the most common reference standards used for malignancy were magnetic resonance imaging or long term clinical follow-up.
Appendix table D shows the individual results of quality assessment for included studies. The items that were not adequately dealt with were: an acceptable delay between index and reference tests, partial verification, differential verification, reference standard blinding, reporting uninterpretable results, and explaining withdrawals. Four criteria (acceptable delay between tests, differential verification, reference blinding, and explaining withdrawals) were often scored as “unclear,” indicating that studies provided insufficient information. The specific details and criteria used for imaging reference standards to diagnose either condition (such as views taken or reporting procedure), though present, were usually poorly described.
Prevalence varied among studies for both fracture (primary care: median 3.6%, interquartile range 1.8-4.3%; secondary/tertiary care: 6.5%, 2.9-9.1%), and malignancy (primary care: 0.2%, 0.1-0.7%; secondary care: one study 7%; tertiary care: two studies 1.5% and 5.9%). Point prevalence used to calculate post-test probability in this review was determined by extracting prevalence from a reduced set of methodologically robust studies for fracture7 33 35 and cancer,7 27 and by considering a value that could be readily applied in the clinical setting (fracture: 1% for primary care, 5% for secondary and tertiary care; malignancy: 0.5% for primary care, 1.5% for secondary and tertiary care).
The reviews identified 29 red flags to screen for fracture7 25 26 27 33 34 35 36 and 24 to screen for malignancy,7 27 28 29 30 31 32 34 37 with 13 (25%) evaluated in more than one study. Only six studies reported on the accuracy of combinations of red flags,7 25 33 35 36 37 and few studies provided precise definitions for each red flag. Figures 2-5⇓ ⇓ ⇓ ⇓ show the positive and negative likelihood ratios and associated post-test probabilities for each red flag. Figures 2 and 4 are confined to the red flags endorsed in the American College of Physicians clinical practice guideline with additional red flags identified in our review reported in figures 3 and 5. The full diagnostic accuracy data, both raw data and calculated statistics, are provided in appendix tables E-G.
Red flags for spinal fracture
Figure 2 ⇑presents the information on diagnostic accuracy data for red flags recommended in the American College of Physicians guideline. The first column contains the specific wording of the red flag from the guideline and the next column the specific wording of similar red flags evaluated in the diagnostic studies. For each evaluated red flag we have provided the positive and negative likelihood ratios, typical prevalence, and post-test probability if the red flag is present or absent. We found inconsistent descriptions between the red flags evaluated in primary studies and those presented in the guideline. For example, the American College of Physicians guideline red flag “older age” (men aged >65, women aged >75) is compared with the similar red flag (age >64, van den Bosch and colleagues25) that had a positive likelihood ratio of 2.5 (95% confidence interval 2.2. to 2.8) and negative likelihood ratio of 0.3 (0.2 to 0.5). As this study was conducted in primary care, the typical prevalence was set at 1% and is represented by the vertical line in the figure. The probability of fracture given the presence of the red flag is 2% (95% confidence interval 2% to 3%). A second study using the same red flag (age >64, Henschke and colleagues7) found the probability of fracture given presence of red flag to be 7% (4% to 13%).
Of the four red flags endorsed in the American College of Physicians guideline we could find data only on older age, trauma, and corticosteroid use and not on history of osteoporosis. In general, the presence of these red flags increased the likelihood of fracture by up to 15% (older age, trauma). The exception was the red flag “prolonged corticosteroid use,” which, when present, suggested a post-test probability of 33% (95% confidence interval 10% to 67%) when positive.7 By contrast, the American College of Physicians guideline uses the red flag “corticosteroid use,” as studied by Deyo and Diehl27, which, when present, results in a post-test probability of 4% (0% to 44%).27
Figure 3⇑ shows the red flags not endorsed in the American College of Physicians guideline. In general, when present, these produce trivial increases in probability of fracture; few produced a precise estimate beyond the pre-test prevalence. The exceptions are contusion or abrasion reported by Patrick and colleagues (62%, 95% confidence interval 49% to 74%, post-test probability),35 the combination of red flags (any four of leg or buttock pain, female, older age, BMI <23, gait abnormality, no regular exercise, sitting pain, osteoarthritis) reported by Roman and colleagues (34%, 24% to 45%, post-test probability),33 the combination of red flags (any three of female, age >70, severe trauma, prolonged use of corticosteroids) reported by Henschke and colleagues (90%, 34% to 99%, post-test probability),7 and trauma with neurological signs reported by Gibson and Zoltie (43%, 11% to 82%, post-test probability).36
Red flags for spinal malignancy
Figure 4⇑ shows that of the red flags classified as “major risk” in the American College of Physicians guideline, only data on the red flag “history of cancer” are available. When present, “history of cancer” suggests a post-test probability of 7% (95% confidence interval 3% to 16%) in primary care,31 and 33% (22% to 46%) in the emergency setting.34 Red flags classified as “minor risk” in the American College of Physicians guideline (older age, unexplained weight loss, and failure to improve after one month) have post-test probability point estimates below 3%.27 29 31 32 37 Figure 5⇑ shows red flags not endorsed in the American College of Physicians guideline; all seem uninformative.30 31 37 The exception is neurological symptoms, for which there are contradictory data.28 31
The informativeness of the large number of red flags endorsed in guidelines for the management of non-specific low back pain in primary care3 varies substantially and many have poor or untested diagnostic accuracy. Of the red flags for fracture, older age, prolonged steroid use, severe trauma, and contusion or abrasion increased the probability of fracture to between 10% and 33%, while the presence of multiple red flags increased the probability of fracture to between 42% and 90%. Of the red flags for malignancy, “history of cancer” increased the probability of malignancy to between 7% and 33% while older age, unexplained weight loss, and failure to improve after one month have post-test probabilities below 3%.
Our results support the approach taken in the American College of Physicians guideline, which provides a more focused list of red flags than other guidelines, and emphasises consideration of the low probability of disease (given the specific red flags present) when making decisions about the need for, and timing of, further diagnostic investigation. While older age, steroid use, and severe trauma are endorsed by the guideline to screen for fracture, the combined red flags prolonged steroid use and contusion or abrasion are absent. Our results suggest consideration of their inclusion when the guideline is revised. Of red flags endorsed by the guideline to screen for malignancy, a history of malignancy was the only red flag that was found to increase the chance of malignancy to greater than 7%.7 31 Our data show that all red flags endorsed in the guideline as “minor risk” for cancer had post-test probabilities below 3%, supporting the guideline’s distinction between major risk and minor risk red flags for cancer.
Many guidelines contain no information on diagnostic accuracy for individual red flags.12 13 14 16 38 For example, the European guideline for the management of chronic non-specific low back pain12 endorses 10 red flags for conditions including fracture and malignancy: patient aged <20 or >55, non-mechanical pain, thoracic pain, history of cancer, steroid use, structural changes, general unwellness, loss of weight, and diffuse neurological deficit. Of these, we found no evidence for the red flags age <20 or general unwellness. The red flags age >55, thoracic pain, non-mechanical pain (with movement), structural change (scoliosis, kyphosis), and loss of weight were all uninformative, with contradictory data on the red flag “diffuse neurological deficit.” Only history of cancer and steroid use (prolonged) were found to be informative.
In addition to endorsing red flags with low or no diagnostic accuracy, we found guidelines that recommended immediate referral to imaging if any red flag was present.13 14 39 For example the West Australian Diagnostic Imaging Pathways Guideline39 advocates a larger list of red flags and directs clinicians to image patients when even a single red flag is present. If this advice was followed it would lead to substantial and arguably unwarranted referrals for imaging as some of the endorsed red flags are common (such as age <20 or >55)9 or uninformative (such as thoracic pain, which has a positive likelihood ratio of about 1 in screening for malignancy).31
Strengths and weaknesses of the review
We combined two previous reviews4 5 that followed published pre-specified protocols40 41 and adopted the methods endorsed by the Cochrane Collaboration to search for, appraise, and summarise the evidence (http://srdta.cochrane.org/handbook-dta-reviews). By illustrating the pre- and post-test probabilities in figures we provide a simple method to describe the utility of red flags to inform clinical decisions about their use in practice. Also, the ability to view recommendations of guidelines in conjunction with our diagnostic accuracy data provides insight into the lack of standard for what constitutes a red flag.
A challenge in applying the results of diagnostic research in practice is that some key statistics might be misunderstood by clinicians.42 43 Provision of a list of red flags with sensitivity and specificity values might therefore not be optimal. We have graphically portrayed the post-test probability and 95% confidence intervals for investigated red flags. The figures enable clinicians to easily interpret the informativeness of red flags to screen for spinal fracture and malignancy.44 A limitation of this approach is that prevalence of fracture and malignancy varied considerably between studies (fracture from 0.7% to 11.0%; malignancy from 0% to 7.0%) and depended on study methods and setting. Clinical and artefactual variability could also have contributed to the differences in prevalence.45 Because of study heterogeneity and variability in quality, prevalence per care setting was determined by using only the most methodologically robust studies. Therefore values for prevalence and post-test probability in our review might not generalise to every setting. We have included sensitivities, specificities, and likelihood ratios in the appendix for all red flags to enable calculation of post-test probability for different prevalence rates. There is still work to be done in determining the best approach to bringing this information to the point of care.
Possible explanations and implications for clinicians and policymakers
Unnecessary imaging is a concern in many settings, with adverse consequences for the patient and society.6 7 46 47 48 Our review shows that the advice, provided in some guidelines, to refer all patients with a single positive red flag for imaging is unwise. Given that 80% of patients in primary care might have at least one positive red flag,7 such a course would mean that most patients with low back pain would receive diagnostic imaging. In contrast, our review is consistent with the approach in the guidelines of the American College of Physicians guideline and others7 27 46 to first of all consider the probability of serious disease (given the specific red flags present) when making decisions about the need for and timing of imaging. Our results suggest that many guidelines will need revision so that they provide advice for practice that is more firmly grounded in the relevant diagnostic research.
Unanswered questions and future research
Our review has highlighted the need for more high quality diagnostic research on the topic. Much of the existing research has evaluated single clinical features and our results suggest it might be more useful to evaluate combinations of clinical features, as is done with decision rules such as the Ottawa ankle rule49 and Canadian C-Spine Rule.50 51 While some decision rules are available,7 33 all are at the derivation stage and await external validation. There is a need for systematic reviews of red flags to screen for other conditions such as infection and ankylosing spondylitis. Finally, consideration of the optimal way to provide this information so that it is available to the clinician at the point of care, in a readily understood format, is an important topic for future research.
What is already known on this topic
Most clinical practice guidelines provide “red flags” to suggest a need to screen for spinal fracture or malignancy in patients presenting with low back pain but do not agree on which ones to use
The total number of red flags endorsed in clinical guidelines is large
What this study adds
We identified the evidence for diagnostic accuracy of red flags for spinal fractures and spinal malignancies
Older age, prolonged corticosteroid use, severe trauma, and presence of a contusion or abrasion increased the likelihood of spinal fracture; likelihood was higher with multiple red flags
Only a history of malignancy increased the likelihood of spinal malignancy
Cite this as: BMJ 2013;347:f7095
Contributors: Conception and design: AD, CGM, MJH, CMW, NH. Analysis and interpretation of the data: AD, CGM, MJH, CMW, NH, RWO, HCWdV, PM, LI. Drafting of the article: AD, CGM, MJH, CMW, NH. All authors critically revised the article for important intellectual content and approved the final article. Statistical expertise: RWO, HCWdV, PM, LI, CGM. Administrative, technical, or logistic support: CGM, MJH. Collection and assembly of data: AD, CGM, MJH, CMW, NH. CGM is guarantor.
Funding: This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: Not required.
Declaration of transparency: The lead author (Aron Downie) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.
Data sharing: No additional available.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/.