Reverse total shoulder replacement versus anatomical total shoulder replacement for osteoarthritis: population based cohort study using data from the National Joint Registry and Hospital Episode Statistics for England

Abstract Objectives To answer a national research priority by comparing the risk-benefit and costs associated with reverse total shoulder replacement (RTSR) and anatomical total shoulder replacement (TSR) in patients having elective primary shoulder replacement for osteoarthritis. Design Population based cohort study using data from the National Joint Registry and Hospital Episode Statistics for England. Setting Public hospitals and publicly funded procedures at private hospitals in England, 2012-20. Participants Adults aged 60 years or older who underwent RTSR or TSR for osteoarthritis with intact rotator cuff tendons. Patients were identified from the National Joint Registry and linked to NHS Hospital Episode Statistics and civil registration mortality data. Propensity score matching and inverse probability of treatment weighting were used to balance the study groups. Main outcome measures The main outcome measure was revision surgery. Secondary outcome measures included serious adverse events within 90 days, reoperations within 12 months, prolonged hospital stay (more than three nights), change in Oxford Shoulder Score (preoperative to six month postoperative), and lifetime costs to the healthcare service. Results The propensity score matched population comprised 7124 RTSR or TSR procedures (126 were revised), and the inverse probability of treatment weighted population comprised 12 968 procedures (294 were revised) with a maximum follow-up of 8.75 years. RTSR had a reduced hazard ratio of revision in the first three years (hazard ratio local minimum 0.33, 95% confidence interval 0.18 to 0.59) with no clinically important difference in revision-free restricted mean survival time, and a reduced relative risk of reoperations at 12 months (odds ratio 0.45, 95% confidence interval 0.25 to 0.83) with an absolute risk difference of −0.51% (95% confidence interval −0.89 to −0.13). Serious adverse events and prolonged hospital stay risks, change in Oxford Shoulder Score, and modelled mean lifetime costs were similar. Outcomes remained consistent after weighting. Conclusions This study’s findings provide reassurance that RTSR is an acceptable alternative to TSR for patients aged 60 years or older with osteoarthritis and intact rotator cuff tendons. Despite a significant difference in the risk profiles of revision surgery over time, no statistically significant and clinically important differences between RTSR and TSR were found in terms of long term revision surgery, serious adverse events, reoperations, prolonged hospital stay, or lifetime healthcare costs.


Negative control outcomes
Treatment effects for sensitivity analyses

Parametric model parameters
Parametric model fit was assessed using the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and visual inspection, and the Weibull distribution fit best in all but one case where the log normal distribution was marginally better (TSR, revision, matched -Table S6).Given established methodological recommendation to use the same parametric distribution for different treatment arms unless there is strong evidence to suggest an alternative is more plausible, we used the Weibull distribution for both mortality and revision models for matched and weighted cohorts to inform transition probabilities for the cost analysis 1 .
Flexible parametric survival models (FPSM) were used to model revision for the clinical effectiveness section of this study, and offered comparable model fit to the other parametric models used below.
Given the necessity to extrapolate survival probabilities past the 8.75 years of follow-up, the more parsimonious parametric models were preferred over FPSM for the base case analysis.However, a sensitivity analysis was run using separate FPSM models for revision, yielding consistent results (Figure S5).Hazards between TSR and RTSR groups were proportional (see log-log plot), so a single model was used with treatment group added as a covariate.Hazards between TSR and RTSR groups were not proportional, so separate models were used for each treatment group.Hazards between TSR and RTSR groups were proportional (see log-log plot), so a single model was used with treatment group added as a covariate.

Hospital costs
Primary costs (sum of index primary procedure + SAE within 90 days + reoperations within 12 months).
Revision costs (sum of revision procedure + SAE within 90 days).
Figure S1: Relative and absolute risk for negative control outcomes

Figure S10 :
Figure S10: Effect of discount rateWhile the National Institute for Health and Care Excellence (NICE) recommends a discount rate of 3.5% used for the base case analysis, the below graph (representing the matched population) represents the effect of varying the discount rate, showing the results are robust to discount rates from 0 to 10% 2 .

Figure S11 :
Figure S11: Oxford Shoulder Score change histograms Distributions of change in Oxford Shoulder Score [(6-month postoperative) -(preoperative)] for subset of non-missing change scores within base case matched and weighted (IPTW) populations.

Table S2 :
ICD-10 codes to identify serious adverse events

Table S4 :
Covariate balance for sensitivity analyses

Table S5 :
Model fit statistics for different distributions of mortality (matched)

Table S6 :
Model fit statistics for different distributions of revision (TSR) (matched)

Table S7 :
Model fit statistics for different distributions of revision (RTSR) (matched)

Table S8 :
Model fit statistics for different distributions of mortality (IPTW)

Table S13 :
Oxford Shoulder Score change distributions

Table S14 :
Covariate balance for Oxford Shoulder Score responders vs non-respondersThe below table shows the covariate balance in the matched and weighted cohorts, between patients who had a non-missing (responders) and missing (non-responders) Oxford Shoulder Score.The majority of covariates were well balanced with ASMD less than 10%, but there were a couple of categories of certain variables that had a slightly higher ASMD suggesting some minor imbalance.