Systematic review of randomised controlled trials of interventions for painful shoulder: selection criteria, outcome assessment, and efficacyBMJ 1998; 316 doi: http://dx.doi.org/10.1136/bmj.316.7128.354 (Published 31 January 1998) Cite this as: BMJ 1998;316:354
- Sally Green, PhD scholara,
- Rachelle Buchbinder, senior lecturera (, )
- Richard Glazier, assistant professorb,
- Andrew Forbes, senior lecturera
- a Department of Epidemiology and Preventive Medicine, Monash University, Melbourne 3186, Australia
- b Family and Community Medicine, Preventive Medicine and Biostatistics, University of Toronto, Ontario, Canada
- Correspondence to: Dr Buchbinder
- Accepted 23 September 1997
Objective: To review the efficacy of common interventions for shoulder pain.
Design: All randomised controlled trials of non-steroidal anti-inflammatory drugs, intra-articular and subacromial glucocorticosteroid injection, oral glucocorticosteroid treatment, physiotherapy, manipulation under anaesthesia, hydrodilatation, and surgery for shoulder pain that were identified by computerised and hand searches of the literature and had a blinded assessment of outcome were included.
Main outcome measures: Methodological quality (score out of 40), selection criteria, and outcome measures. Effect sizes were calculated and combined in a pooled analysis if study population, end point, and intervention were comparable.
Results: Thirty one trials met inclusion criteria. Mean methodological quality score was 16.8 (9.5-22). Selection criteria varied widely, even for the same diagnostic label. There was no uniformity in the outcome measures used, and their measurement properties were rarely reported. Effect sizes for individual trials were small (range −1.4 to 3.0). The results of only three studies investigating “rotator cuff tendinitis” could be pooled. The only positive finding was that subacromial steroid injection is better than placebo in improving the range of abduction (weighted difference between means 35° (95% confidence interval 14 to 55)).
Conclusions: There is little evidence to support or refute the efficacy of common interventions for shoulder pain. As well as the need for further well designed clinical trials, more research is needed to establish a uniform method of defining shoulder disorders and developing outcome measures which are valid, reliable, and responsive in affected people.
This systematic review found little evidence to support the use of any of the common interventions in managing shoulder pain
There is currently no uniformity in the way shoulder disorders are labelled or defined
Measurement of outcome varies widely between clinical trials and, in general, the reliability, validity, and responsiveness of these outcome measures are not established
Further clinical trials are needed to determine the optimal treatment strategies for shoulder pain
There are many accepted standard forms of conservative treatment for shoulder disorders, including non-steroidal anti-inflammatory drugs, corticosteroid injections, and physiotherapy, yet evidence of their efficacy is not well established. Furthermore, the interpretation of results of studies is often hampered by the fact that these disorders are labelled and defined in diverse and often conflicting ways. To determine the efficacy of common interventions for shoulder pain, we performed a systematic review of randomised controlled trials investigating these treatments. To determine whether the results of the different studies could be compared or pooled, or both, we also undertook a methodological review of the selection criteria and outcome assessment used in these studies.
Identification and selection of studies
One of us (SG) searched computerised bibliographic databases (MEDLINE, EMBASE, CINAHL) without language restrictions from 1966 to September 1995 using the Cochrane Collaboration search strategy, which aims to identify all randomised controlled trials.1 This was combined with the medical subject heading “shoulder” (exploded) and other keywords pertaining to shoulder disorders or their treatment. SG also hand searched relevant conference proceedings and reviewed textbooks and reference lists of all retrieved articles.
To determine whether a study should be included, two of us (RB and RG) reviewed the retyped methods sections of all identified trials independently according to predetermined criteria (that the trial be randomised, that the outcome assessment be blinded, and that the intervention was one of those under review). Randomised controlled trials which investigated common interventions for shoulder pain in adults (age greater than or equal to 18 years) were included provided that there was a blinded assessment of outcome. For the purposes of this review, interventions were broadly categorised as non-steroidal anti-inflammatory drugs, intra-articular and subacromial glucocorticosteroid injection, oral glucocorticosteroid treatment, physiotherapy, manipulation under anaesthesia, hydrodilatation (shoulder distension), and surgery. All studies which primarily concerned pain arising from the shoulder were included irrespective of diagnostic label. Studies that included various rheumatological disorders were considered if the results on shoulder pain were presented separately or if 90% or more of patients in the study had shoulder pain.
Assessment of validity
To determine the methodological quality, RB and RG independently reviewed the retyped methods and results sections of the included studies using predetermined criteria.2 An overall score for methodological quality (out of a possible 40) was calculated to assess the overall standard of each trial, but these scores were not used to weight the pooled analysis.
Assessment of selection criteria and outcome
RB and SG reviewed the selection criteria and outcome measures. They determined whether distinct diagnoses were specified and whether definitions of these diagnoses were recorded. Inclusion and exclusion criteria were also examined. For outcome measures they recorded the types of measures used, the method of measurement, and whether the clinimetric properties had been considered. (Clinimetric or measurement properties refer to the reliability, validity, and responsiveness to change of an outcome measure, clinimetrics being the measurement of clinical phenomena.) The timing of outcome assessment was also noted.
Analysis of data
To assess efficacy, raw data (means and standard deviations) were extracted for reported outcomes when data were available in the published reports and entered into the Cochrane Collaboration's Review Manager software program.1 For studies not reporting the required data, further details were requested of first authors, but no additional information was obtained. Range of motion scores were entered as degrees of movement, and all pain and overall effect scores were transformed to 100 point scales.
Effect sizes: individual trials
To determine the treatment effect for each end point assessed we calculated the effect size as the difference between the mean of the treatment group at follow up and the mean of the control group at follow up, divided by the pooled standard deviation:
Effect size=(xt-xc)/PSD where xt=treatment group mean, xc=control group mean, and PSD = pooled standard deviation.
In this equation, effect size is expressed as a function of standard deviation. For example, an effect size of 0.43 reflects a difference between means of 0.43 of one standard deviation.
In the five studies in which standard deviation was not reported, effect size was calculated by conversion of the reported P value to a t statistic with appropriate degrees of freedom and use of the formula
where nt=number of subjects in treatment group, nc=number of subjects in control group, and t=observed t statistic.
In the three studies which reported P values only as less than a value—for example, P<0.05, P<0.01—a conservatively small effect size was calculated as though the P value equalled the quoted value—for example, P< 0.05 was taken as P=0.05. When only the change in scores from baseline were reported (three studies), the within group standard deviation after intervention was computed assuming a (likely) conservative correlation of 0.3 between measurements before and after intervention. One study provided only the range (maximum and minimum) of scores for each group, from which the standard deviation was estimated as one quarter of the range.
After reviewing the outcomes assessed in the individual trials, we examined three end points which together provided the most detailed data: pain (100 point scale), restriction of abduction (degrees), and overall effect as reported by the patient (100 point scale).
Only trials investigating the same intervention in similar populations and using the same outcome measure at the same or similar times of follow up were considered for pooling. Pooling was carried out using the Revman 3.0 software program1 to calculate weighted difference between means, which is the difference between treatment and comparison group means at follow up, and (in contrast to effect size) is measured in the dimension of the outcome variable being combined. A random effects model was used for pooling to allow for heterogeneity between studies.3 Estimates and in particular confidence intervals from this model reduce to those from a fixed effects model when study results are homogeneous.
Identification and selection of studies
Thirty one out of 58 identified studies met our selection criteria.4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 A list of excluded studies is available from us. Reasons for exclusion were lack of randomisation (23 studies), lack of blinding (12 studies), not the study population of interest (5 studies), and not the intervention of interest (6 studies). Despite blinding, 8 of the 58 trials were identified by RB or RG, or both.4 8 14 18 20 21 25 30
The included studies assessed non-steroidal anti-inflammatory drugs (17 trials),4 5 6 7 8 10 11 12 13 16 17 18 19 22 26 29 33 intra-articular or subacromial glucocorticosteroid injection (10 trials),10 11 13 14 18 23 24 25 31 32 physiotherapy (5 trials),9 15 18 20 21 oral glucocorticosteroid treatment (2 trials),31 34 hydrodilatation (2 trials),24 32 and manipulation under anaesthesia (1 trial).28 There were no randomised controlled trials of surgical interventions.
Twenty four trials specified distinct diagnoses to characterise their study population. These included periarthritis (7 trials),4 5 16 27 29 30 31 adhesive capsulitis (6 trials),14 18 24 25 26 33 frozen shoulder (3 trials),28 29 31 rotator cuff tendinitis (7 trials),8 10 11 17 18 24 30 supraspinatus tendonitis (3 trials),29 30 32 bursitis (2 trials),8 14 infraspinatus tendonitis (1 trial),14 subscapularis tendonitis (1 trial),14 acromioclavicular joint sprain (1 trial),14 and rotator cuff lesion (1 trial).19 Definitions of distinct diagnoses were specified in 16 of the 24 trials. On the basis of review of the diagnostic labels or definitions of the study populations, or both, most trials could be broadly categorised as studying adhesive capsulitis (including periarthritis and frozen shoulder) (23 trials) or rotator cuff tendinitis (including supraspinatus, infraspinatus, and subscapularis tendonitis) (12 trials), or both. Six trials gave no selection criteria or population definition.6 11 21 28 30 35
The box shows the selection criteria used to define the shoulder disorders studied. In general, adhesive capsulitis was defined as the presence of pain with restriction of active and passive glenohumeral joint movements, and rotator cuff tendinitis was defined by the presence of painful arc and pain with resisted movements or a normal passive range of motion. However, no standard definitions were used, and conflicting criteria often defined the same condition in different trials. Exclusion criteria were specified in 28 studies, although these also varied widely between studies.
Selection criteria used in shoulder pain trials
Rotar cuff disease
No definition given (2 trials)
Shoulder pain and painful arc (1 trial)
At least two of painful abduction, painful arc, and tenderness of supraspinatus insertion (1 trial)
Painful arc between 40° and 120° abduction (1 trial)
Pain on resisted abduction, tenderness over supraspinatus tendon, and normal passive motion (1 trial)
Pain with resistance in abduction and external and internal rotation (1 trial)
Pain with resisted movements of the shoulder and loss of passive abduction (1 trial)
Pain on resisted abduction with or without resisted external rotation (1 trial)
Pain exacerbated by resisted movement, passive range > active range, normal passive range (1 trial)
Full range of passive motion and pain on resisted abduction (1 trial)
No definition given (8 trials)
External rotation <30° and abduction < 90° (1 trial)
Total passive movement <50% of normal with pain worse at night (1 trial)
Generalised limitation of glenohumeral motion with pain at rest or on movement (5 trials)
Pain, active and passive motion limited to >20°, pain on resisted abduction or rotation, and impaired glenohumeral joint (1 trial)
Loss of passive motion of glenohumeral joint (1 trial)
Pain, restriction of movement, loss of full function, and pain at night with inability to lie on affected side (1 trial)
Pain at night, inability to lie on affected side, restriction of active and passive motion, restriction of external rotation (2 trials)
Restriction of abduction and external rotation (1 trial)
Appreciable restriction of both active and passive motion (1 trial)
Abduction and flexion <90° and external rotation <20° (1 trial)
Abduction and flexion <70% and external rotation <20% (1 trial)
Assessment of outcome
Table 1 summarises the outcomes assessed. Pain and range of motion were recorded in most trials (29 and 27 respectively). Only four studies included an adequate description of how range of motion was assessed (measurement tool and definition of end of range).16 21 22 35 Function was assessed in eight studies and was measured either by visual analogue scale14 15 16 17 18 19 26 or return to work.20 No study included a disability index. Although two studies cited a reliability study for their method of range of motion assessment,21 35 no other reference was made to the reliability, validity, or responsiveness of the outcome measures used. Timing of assessment for the primary efficacy analysis varied between 1 and 24 weeks.
Validity of individual trials
Table 2 shows the overall scores for the quality of methods and for each category for each trial. The mean quality score of all trials combined was 16.8 out of a possible 40 points (42%). No trial scored greater than 22 out of a possible score of 40 (range 9.5-22.0).
Table 2 shows effect sizes according to intervention. In general, the effect sizes were small, suggesting a lack of clear benefit for any of the treatments investigated.
Most results of individual trials could not be pooled because of a lack of similarity of study population, outcome measures, timing of outcome assessment, or insufficient reported data. Pooling of results was performed for two studies of non-steroidal anti-inflammatory drugs and placebo in rotator cuff tendinitis (table 3).10 11 The weighted difference between means suggested that non-steroidal anti-inflammatory drugs may be superior to placebo in improving the degree of restriction of abduction (26 degrees (95% confidence interval −9 to 61)). The weighted difference between means for pain score was 3 (−19 to 25). Data from the same two trials were pooled to determine the efficacy of subacromial steroid injection and placebo in rotator cuff tendinitis (table 3).10 11 The weighted difference between means suggested injection was superior to placebo in improving range of abduction (35 (14 to 55)).
Pooling of results was also performed for two studies which compared non-steroidal anti-inflammatory drugs and steroid injection with non-steroidal anti-inflammatory drugs alone in rotator cuff tendinitis (table 3).10 18 The weighted difference between means showed no added benefit of injection with respect to both restriction of abduction (4 (−14 to 22)) and pain (−2 (−11 to 7)).
This review has confirmed the lack of uniformity in the way shoulder disorders are labelled and defined. It has also highlighted the wide variation in assessment of outcome in clinical trials investigating the efficacy of interventions for painful shoulder. These factors limit the degree to which the results of different trials can be compared and pooled. In addition, the heterogeneity of the interventions studied, the timing of outcome assessment, the overall poor methodological quality, inadequate reporting of results, and small sample sizes preclude the drawing of firm conclusions about the efficacy of any of the interventions studied.
On the basis of our review, the only conclusions that may be drawn about efficacy are that non-steroidal anti-inflammatory drugs and subacromial glucocorticosteroid injection may be superior to placebo in improving range of abduction in rotator cuff tendinitis and that the addition of corticosteroid injection to non-steroidal anti-inflammatory drugs does not seem to confer further benefit. These pooled results should be interpreted cautiously. In addition to the above concerns, statistical manipulation of inadequately presented results was necessary to perform these analyses. No conclusions can be drawn about the efficacy of the interventions studied for adhesive capsulitis.
Previous separate reviews of non-steroidal anti-inflammatory drugs, steroid injections, and physiotherapy have been performed by investigators in the Netherlands.35 36 37 Although our study confirms their conclusions on the poor overall methodological quality of reviewed trials, our study differs in several important respects.
Firstly, we tried to differentiate studies on the basis of the nature of the populations being studied, recognising that the benefits of treatment may vary for different underlying causes of shoulder pain.
Secondly, we calculated effect sizes for the same reported outcome measures in different trials. This enables a direct comparison between studies using the same outcome measurement, although it is important to note that if one effect size is larger than another it may be because in the different studies the numerator (the treatment effect) is larger, the denominator (the variability between subjects in each group) is smaller, or there is some combination of the two. In the previous reviews the overall efficacy of interventions was described by calculation of success rates for each intervention group. These were determined by dividing the number of documented successes (defined as recovery or substantial improvement from baseline, according to the patient) at the end of the intervention period by the number allocated to the intervention by randomisation. The exact definition of success therefore differed between papers and is, in essence, subjective.36
Finally, we performed a detailed assessment of the outcome measures used in previous trials. While most studies incorporated a measure of pain and range of motion, the accompanying disability, which may be of utmost importance to the patient and therefore a more appropriate end point, was neglected in most studies. Furthermore, the reliability and responsiveness of the outcome measures used in most trials has not been established. To improve our ability to interpret and compare the results of different studies, further work is needed to address these issues.
Another recent systematic review of corticosteroid injection for rotator cuff tendinitis concluded that steroid injection is effective in treating it.38 However, this conclusion cannot be verified by our results. The other review included non-randomised studies, the results of primary studies were reported only as significant or not significant, and no attempt was made to measure effect sizes or pool results.
Systematic reviews of randomised controlled trials serve to identify parts of clinical practice where further research is required. We could not draw firm conclusions about the efficacy of any of the common interventions currently being used to treat painful shoulders from this review. Therefore further clinical trials are needed to justify or censure current treatment strategies. These trials should carefully consider their study population and measures of outcome. Adoption of a uniform method of labelling and defining shoulder disorders and incorporation of a standard set of outcome measures may greatly enhance these efforts. This review will be updated through the Musculoskeletal Review Group of the Cochrane Collaboration as further trials are performed.
We thank the Australasian Cochrane Centre and the Musculoskeletal Cochrane Review Group for their methodological support.
Conflict of interest: None.
Contributors: RB initiated and lodged the protocol for the project with the Cochrane Collaboration. SG coordinated the project under RB's supervision. SG searched the literature, retrieved data, and performed blinding. RB and RG performed the methodological review and determined inclusion. SG entered and synthesised the data and analysed and interpreted the results. AF, RG, and RB also analysed and interpreted the results. All authors wrote the paper. RB is the guarantor for the study.