- Margaret P Staples, biostatistician1,
- David F Kallmes, professor2,
- Bryan A Comstock, operations director3,
- Jeffrey G Jarvik, professor of radiology and neurological surgery and director4,
- Richard H Osborne, professor of public health and director5,
- Patrick J Heagerty, professor6,
- Rachelle Buchbinder, director and professor1
- 1Department of Clinical Epidemiology, Cabrini Hospital, and Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Victoria, Australia
- 2Department of Radiology, Mayo Clinic College of Medicine, MN, USA
- 3Center for Biomedical Statistics, Department of Radiology, University of Washington, USA
- 4Comparative Effectiveness, Cost and Outcomes Research Center, University of Washington, USA
- 5Public Health Innovation, School of Health and Social Development, Deakin University, Victoria, Australia
- 6Department of Biostatistics, University of Washington, USA
- Correspondence to: R Buchbinder, Suite 41, Cabrini Medical Centre, 183 Wattletree Road, Malvern Victoria 3144, Australia
- Accepted 10 May 2011
Objective To determine whether vertebroplasty is more effective than placebo for patients with pain of recent onset (≤6 weeks) or severe pain (score ≥8 on 0-10 numerical rating scale).
Design Meta-analysis of combined individual patient level data.
Setting Two multicentred randomised controlled trials of vertebroplasty; one based in Australia, the other in the United States.
Participants 209 participants (Australian trial n=78, US trial n=131) with at least one radiographically confirmed vertebral compression fracture. 57 (27%) participants had pain of recent onset (vertebroplasty n=25, placebo n=32) and 99 (47%) had severe pain at baseline (vertebroplasty n=50, placebo n=49).
Intervention Percutaneous vertebroplasty versus a placebo procedure.
Main outcome measure Scores for pain (0-10 scale) and function (modified, 23 item Roland-Morris disability questionnaire) at one month.
Results For participants with pain of recent onset, between group differences in mean change scores at one month for pain and disability were 0.1 (95% confidence interval −1.4 to 1.6) and 0.2 (−3.0 to 3.4), respectively. For participants with severe pain at baseline, between group differences for pain and disability scores at one month were 0.3 (−0.8 to 1.5) and 1.4 (−1.2 to 3.9), respectively. At one month those in the vertebroplasty group were more likely to be using opioids.
Conclusions Individual patient data meta-analysis from two blinded trials of vertebroplasty, powered for subgroup analyses, failed to show an advantage of vertebroplasty over placebo for participants with recent onset fracture or severe pain. These results do not support the hypothesis that selected subgroups would benefit from vertebroplasty.
Two recent randomised placebo controlled trials of vertebroplasty for osteoporotic vertebral compression fractures, the only published randomised trials with a placebo control, blinded treatment allocation, and blinded outcome assessment, failed to confirm the efficacy of vertebroplasty.1 2 These results have generated considerable controversy.3 4 5 6 7 8 9 10 11 12 13 14 15 Despite the negative overall findings of the two placebo controlled trials, it has been suggested that there may be subgroups of patients who would benefit from vertebroplasty, and the suitability of the procedure for some of the included trial participants has been questioned. Some commentators have maintained that vertebroplasty should only be offered to patients with acute vertebral fractures of recent onset (<6 weeks),6 11 whereas others have claimed it is more effective for those with severe pain.6 11 16
Since publication of these trials, the results from the vertebroplasty versus conservative treatment in acute osteoporotic vertebral compression fractures (Vertos II) trial, a large open label randomised trial, have also been reported.17 The participants’ mean duration of pain was slightly less than six weeks and all reported baseline scores for pain of 5 or more on a 0 to 10 visual analogue scale. Compared with usual care, vertebroplasty resulted in an additional benefit of 2.6 units on the pain scale at one month, a result that was clinically and statistically significant. However, these results may be explained by the fact that neither the participants nor the outcome assessors or investigators were blinded to treatment allocation. Empirical evidence from meta-epidemiological studies has shown that lack of blinding results in an average 25% over-estimate of relative treatment benefit.18 19
Acutely painful osteoporotic vertebral fractures generally heal quickly, as was well illustrated in the Vertos II trial in which more than 50% of those who initially qualified for the study were deemed ineligible owing to spontaneous resolution of pain.17 This implies that most people would be unlikely to benefit from early invasive intervention, and it would need to be postulated that a subgroup with more persistent symptoms would derive benefit.
In contrast with what has been claimed both placebo controlled trials included participants with recent onset fractures (<6 weeks)—one third in the Australian study and about 20% in the US trial. Although neither trial found evidence that duration of symptoms was a treatment effect modifier, individually they lacked sufficient power to draw definitive conclusions from subgroup analyses. They also lacked sufficient power to investigate smaller benefits in pain or function than those originally hypothesised. By combining the data from both trials, the larger overall sample size provides an opportunity to undertake select subgroup analyses and to investigate smaller benefits in pain or function.
We combined the individual patient level data from both placebo controlled blinded trials to determine whether vertebroplasty is more effective than placebo for patients with fracture pain of recent onset (≤6 weeks) compared with pain of longer duration, and whether vertebroplasty is more effective for patients with severe pain (≥8 on a 0 to 10 numerical rating scale). Using the combined data we also investigated whether smaller benefits in pain or function were detectable.
In a post specified analysis, the US trial found a trend towards a higher rate of participants with a clinically meaningful improvement in pain at one month in the vertebroplasty group, although the treatment groups did not differ for physical disability related to back pain.2 With the combined trials we had sufficient power to test this trend. Using the whole dataset we also carried out responders’ analyses based on a 3 unit improvement in pain scores, a 3 unit improvement in score on the modified, 23 item Roland-Morris disability questionnaire, and 30% improvement in each of the pain and disability outcomes.
The methods and results for both studies are presented in detail elsewhere.1 2 20 21 Up to the one month outcome both trials were done according to similar but not identical protocols for entry criteria, randomisation, and how the treatments in the treatment and control arms were administered.
The Australian study was a multicentre, randomised controlled trial in which participants with one or two painful osteoporotic vertebral fractures of less than 12 months’ duration and unhealed, as confirmed by magnetic resonance imaging, were randomly assigned to vertebroplasty or to a sham procedure.1 20 Participants were stratified according to treatment centre, sex, and duration of symptoms (<6 weeks or ≥6 weeks) and outcomes were assessed at one week and one, three, and six months. The US trial was also a multicentre, randomised controlled trial in which participants with one to three painful osteoporotic vertebral compression fractures of less than 12 months’ duration and unhealed were randomly assigned to vertebroplasty or to a sham procedure.2 21 Participants were stratified according to treatment centre, and outcomes were assessed at three days, two weeks, and one and three months. Both trials blinded participants, investigators, and outcome assessors. The Australian trial was a parallel group design, whereas in the US trial participants were allowed to cross over to the other study group after one month or later if pain relief was not adequate.
The sham procedures in each trial attempted to simulate vertebroplasty as much as possible. In both, the skin, subcutaneous tissues, and periosteum of the pedicles were infiltrated with local anaesthetic. The Australian trial inserted a needle to rest on the lamina of the vertebral body, and after removal of the central sharp stylet the vertebral body was gently tapped with a blunt stylet. The US trial did not insert a needle but used verbal and physical clues such as pressure on the patient’s back to simulate the procedure. Both trials prepared the medical cement (polymethylmethacrylate) so that its smell permeated the room.
The primary outcome for the Australian trial was pain, measured on an 11 point numerical rating scale,1 20 and for the US trial was disability, measured by the modified, 23 item Roland-Morris disability questionnaire.2 21 Pain outcomes were assessed slightly differently in each trial. Participants in the Australian trial were asked to rate their average pain over the previous week, whereas participants in the US trial were asked to rate their average pain over the previous 24 hours. Based on empirical data which show that asking about the previous week or previous 24 hours yields similar results,22 we considered the measures of pain to be comparable for the purpose of this meta-analysis.
Disability was assessed in both trials using the modified Roland-Morris disability questionnaire, and generic health status was assessed using the EQ-5D. These two measures were included in the Australian trial several months after enrolment began and therefore were not available for all participants at the earlier time points. Although the Roland-Morris disability questionnaire was originally developed for use in patients with non-specific low back pain,23 the modified version has been shown to be valid and responsive for studying vertebroplasty for painful osteoporotic vertebral fractures.24
Outcome measures and time points
Common time points for data collection in both trials were baseline, one month, and three months. Crossover was allowed in the US trial after one month and, as 35 participants crossed over after this time, for the current analyses we have included follow-up to one month only. For both trials we obtained individual patient data for outcomes up to one month. Baseline data included study centre, race, sex, age in years, pain score measured on a 0 to 10 numerical rating scale, duration of pain in weeks, scores on the modified Roland-Morris disability questionnaire, and EQ-5D scores based on the US value sets.
Outcomes between baseline and one month were not measured at identical times in each study. In the Australian trial, assessments of pain, disability, and generic health status (EQ-5D) were carried out at one week, whereas in the US trial pain and disability were assessed at one day, three days, and two weeks and no EQ-5D assessments were made between baseline and one month.
We compared all outcome measurements between groups at baseline and one month. For the intermediate time points at three days, one week, and two weeks, we carried out three separate comparisons: the outcomes at one week for the Australian trial with the outcomes at three days for the US trial, the outcomes at one week for the Australian trial with a score computed as the mean of the scores at three days and two weeks for the US trial, and the outcomes at one week for the Australian trial with the outcomes at two weeks for the US trial. The results were comparable for each of the intermediate time point analyses and we therefore only present the third intermediate comparison (see web extra table 1 for the first and second intermediate comparisons).
For the primary analysis, a priori subgroups were participants with pain of recent onset (≤6 weeks or >6 weeks) and participants with severe pain scores at baseline (≥8 or <8). For the responders’ analyses we used the combined dataset to consider the proportion showing at least a 3 unit improvement in pain score from baseline, the proportion showing at least a 3 unit improvement in disability score from baseline (modified Roland-Morris disability questionnaire), the proportion showing a 30% improvement in pain score, and the proportion showing a 30% improvement in disability score. To assess whether these results could be subject to confounding we also compared the groups for self reported opioid use at one month.
Sample size calculations
The original sample size calculations for the Australian trial indicated that 24 participants would be needed in each treatment group to show a 2.5 unit advantage in pain scores, assuming a standard deviation of 3.0, a significance level of 5%, and 80% power. The observed standard deviation for pain was about 2.0 rather than 3.0 as assumed for study planning and therefore the power calculations are conservative. From the combined trials 25 participants in the control group and 32 in the vertebroplasty group had recent onset pain and 50 in the control group and 49 in the vertebroplasty group had severe pain at baseline. Thus the combined data provided greater than 80% power to assess whether vertebroplasty had a 2.5 unit advantage over control for patients with acute fractures or severe pain. The combined data would also have 53% power to detect a 3 unit difference (assuming a standard deviation of 5.5) in disability scores for the subgroup with recent onset pain and 77% power for the subgroup with severe pain at baseline.
The accepted minimum clinically important difference for pain is 1.5 units on an 11 point scale.25 26 With 209 participants we had over 94% power to detect a 1.5 unit advantage for vertebroplasty over control, assuming a standard deviation of 3.0 for each group. For the modified Roland-Morris disability questionnaire the accepted minimum clinically important difference is 2 or 3 points on a 0 to 23 scale,24 and with 209 participants we had 88% power to show a 3 point advantage for the vertebroplasty group, assuming a standard deviation of 6.7.
To assess baseline differences between the Australian and US trial populations we used χ2 tests for categorical variables and t tests for continuous variables.
We calculated the mean change scores for each group for the main outcomes and a priori subgroups at each time point. Analysis of covariance was used to generate estimates of treatment effect on the outcome measures of pain using a numerical rating scale, the modified Roland-Morris disability questionnaire, and EQ-5D. As study centre was nested within trial, we adjusted for study centre as a fixed effect to account for differences between both study centre and trial. The fixed main effects were intervention group, the relevant primary outcome, subgroup variables, and interactions as appropriate. Age and sex were not included as evidence is lacking that these influence outcome.
We compared the proportions of participants in each group who showed at least a 3 unit improvement in pain or disability scores or a 30% improvement in pain or disability scores from baseline using relative risks estimated from a generalised linear binomial model with a log link and robust error variance to account for the clustering of trial site.27 A relative risk greater than 1 indicates that a higher proportion of participants in the vertebroplasty group achieved at least a 3.0 unit or 30% improvement in pain or disability scores. We used logistic regression with adjustment for baseline opioid use and study centre to compare self reported opioid use at one month for each treatment group; a relative risk greater than 1 indicates that a higher proportion of participants in the vertebroplasty group were using opioids one month after the procedure.
Data on 209 participants (Australian trial n=78, US trial n=131) were available for analysis. Table 1⇓ shows the characteristics of the participants in each trial. The average duration of pain was shorter for participants in the Australian trial, with a higher proportion reporting pain of recent onset (P<0.001). Self reported use of opioids at baseline was also higher for participants in the Australian trial (P=0.001). All other measures were comparable between trials. The baseline characteristics of combined participants by treatment group (table 1) and of the subgroups by treatment group (table 2⇓) were also similar.
Table 3⇓ shows the mean change scores from baseline and adjusted between group differences in pain and disability scores at two weeks/one week and one month and the EQ-5D at one month. Both groups did not differ significantly for pain, disability, or health status at any time point. The adjusted between group differences for pain and disability scores were all below the minimum clinically important difference for each respective measure and below the more stringent 1.5 units for pain and 2 points for disability.
Outcomes did not differ between the groups for participants with pain duration recent onset or longer (>6 weeks) or for participants with severe pain at baseline pain or mild to moderate pain (score <8) at either the two weeks/one week or one month time points (table 3 and figure⇓). The adjusted mean between group differences were all below the minimum clinically important differences for pain and disability scores.
The vertebroplasty and placebo groups did not differ in the proportions showing a 3 unit improvement in pain score or a 3 point improvement in disability score at either the two weeks/one week or one month time points (table 4⇓). At one month only was the trend towards a higher proportion of the vertebroplasty group achieving at least 30% improvement in pain scores (relative risk 1.32, 95% confidence interval 0.98 to 1.76, P=0.07). At neither time point did the groups differ in the proportions achieving at least 30% improvement in disability scores (table 4).
In view of the observed trend, the analyses were repeated based on a priori subgroups. At the two weeks/one week time point, patients with severe pain at baseline were more likely to show at least a 3 unit improvement or at least a 30% improvement in pain but the proportion with a clinically meaningful improvement did not differ between the groups (see web extra tables 2 and 3). The subgroups with recent onset of pain or pain of longer duration did not differ (see web extra tables 4 and 5).
At baseline, 68% of participants in each treatment group were using opioids for pain. At one month the vertebroplasty group was more likely to report opioid use (64% vertebroplasty group v 46% placebo group; P=0.018). After adjusting for baseline opioid use, patients randomised to vertebroplasty were 25% more likely to be taking opioids at one month than patients randomised to placebo (relative risk 1.25, 1.14 to 1.36, P<0.001; table 5⇓).
In an individual patient meta-analysis from two randomised placebo controlled and blinded trials of vertebroplasty, combined data compared with the data from the individual trials did not differ significantly despite the larger sample size and increased power. In particular, no smaller benefits in pain or function were detected with vertebroplasty. Subgroup analyses also failed to show an advantage of vertebroplasty for participants with pain of recent onset (≤6 weeks) or severe pain at baseline (score ≥8). At one month the proportion of participants who had achieved improvements of at least 3 units in pain or disability (modified Roland-Morris disability questionnaire) scores did not differ significantly between the groups. The trend was towards a higher proportion in the vertebroplasty group achieving at least a 30% improvement in pain scores at one month, but this group was more likely to be using opioids at one month; this may have influenced reported pain severity in favour of vertebroplasty and also suggests that pain may have been less well controlled in the vertebroplasty group.
By combining data from the Australian1and US2 trials, which were carried out using similar protocols and outcome measures, we were able to perform subgroup analyses to assess whether vertebroplasty is indicated for patients with severe or acute pain only. This analysis provides no evidence to support the use of the procedure for these selected patient groups. Individual patient meta-analysis is the ideal method for exploring variability in effectiveness,28 yet it is rarely done.29 Challenges include acquiring and harmonising the data.30 When each group became aware of the other’s trial, efforts were made to harmonise outcome assessment in case further research questions required a combined analysis. This paper shows the benefits of this approach.
Commentaries comparing blinded and unblinded trials of vertebroplasty have focused on patient selection.3 6 10 11 12 16 The commentators suggested that the “negative” outcomes reported in the blinded trials resulted from enrolment of patients who would not generally be expected to benefit from spinal augmentation based on duration of pain, the use of magnetic resonance imaging, or other factors. The current individual patient meta-analysis, based on the only two double blind randomised placebo controlled trials published to date, refutes these arguments on duration and severity of pain by providing improved power compared with the individual trials.
Logically it is unlikely that factors predicting a more favourable outcome from vertebroplasty will be identified, as the net overall effect of vertebroplasty in both placebo controlled trials was close to zero.9 Furthermore, the only way that vertebroplasty could have a large benefit for a proportion of patients would be if the condition of a substantial proportion was made worse—a scenario that is not reflected in the available data.
In view of the recent publication of the open label Vertos II trial,17 we extended our analysis to include the specific enrolment criteria, pain of recent onset (≤6 weeks) and pain score of 5 or more, from that trial, and again were unable to show a treatment benefit for vertebroplasty (data not shown). This adds further weight to the likelihood that lack of blinding and an inadequate control group accounted for the treatment benefit observed in Vertos II.
This meta-analysis was based on the evidence from the two most rigorous trials applied to vertebroplasty to date. Although both trials infiltrated local anaesthetic under the skin, subcutaneous tissues, and periosteum in the control group, which some have argued may be an active control, a recent open cohort study using a local anaesthesia regimen identical to this failed to show any benefit of local anaesthesia into the periosteum.31 This suggests that the role of local anaesthesia in the observed improvement in the control groups of both trials was likely to be insignificant.
This meta-analysis adds to the available literature that the benefits of vertebroplasty, even in selected subgroups, are overstated. Future trials should be informed by these results and if any doubts remain about the value of vertebroplasty for specific subgroups of patients, these should be examined in randomised blinded placebo controlled trials.
What is already known on this topic
Two double blind randomised placebo controlled trials failed to confirm the efficacy of vertebroplasty for osteoporotic vertebral compression fractures
Despite these findings, it has been suggested that there may be subgroups of patients who would benefit from vertebroplasty
Some commentators maintain the procedure should only be offered to patients with fractures of recent onset (<6 weeks), whereas others have claimed it is effective for those with severe pain
What this study adds
Individual patient data meta-analysis from two randomised trials of vertebroplasty failed to show an advantage of vertebroplasty over placebo for participants with recent onset fracture or severe pain
These results do not support the hypothesis that selected subgroups would benefit from vertebroplasty
Cite this as: BMJ 2011;343:d3952
Contributors: RB and DFK planned the work. MPS carried out the analysis and MPS and RB drafted the manuscript. All authors contributed to the design of the analysis, interpretation of findings, and writing of the final manuscript. RB is the guarantor.
Funding: RHO is supported in part by an Australian National Health and Medical Research Council (NHMRC) population health career development award and RB is supported in part by an NHMRC practitioner fellowship.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: RHO is supported in part by an Australian National Health and Medical Research Council (NHMRC) population health career development award and RB is supported in part by an NHMRC practitioner fellowship; DFK has received research support from Stryker and ArthroCare and is a consultant for CareFusion, JGJ has received an honorarium for lecturing at a course sponsored by Synthes in 2010, is on the GE Healthcare comparative effectiveness advisory board, is a consultant to HealthHelp, and is cofounder and patent holder of PhysioSonics; no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: Not required.
Data sharing: No additional data available.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.