- Art Sedrakyan, associate professor13,
- Sharon-Lise T Normand, professor2,
- Stefan Dabic, fellow, FDA3,
- Samantha Jacobs, fellow, FDA3,
- Stephen Graves, professor4,
- Danica Marinac-Dabic, director of epidemiology, FDA3
- 1Weill Cornell Medical College, New York, NY, USA
- 2Harvard Medical School and Harvard School of Public Health, Boston, MA, USA
- 3Office of Surveillance and Biometrics, Center for Devices and Radiological Health, FDA, Silver Spring, MD, USA
- 4University of Melbourne and the Australian Orthopaedic Association’s National Joint Replacement Registry in Adelaide, Australia
- Correspondence to: A Sedrakyan, Weill Cornell Medical College, 402 East 67th Street, Suite 223, New York, NY 10065, USA
- Accepted 14 November 2011
Objective To determine comparative safety and effectiveness of combinations of bearing surfaces of hip implants.
Design Systematic review of clinical trials, observational studies, and registries.
Data sources Medline, Embase, Cochrane Controlled Trials Register, reference lists of articles, annual reports of major registries, summaries of safety and effectiveness for pre-market application and mandated post-market studies at the United States Food and Drug Administration.
Study selection Criteria for inclusion were comparative studies in adults reporting information for various combinations of bearings (such as metal on metal and ceramic on ceramic). Data search, abstraction, and analyses were independently performed and confirmed by at least two authors. Qualitative data syntheses were performed.
Results There were 3139 patients and 3404 hips enrolled in 18 comparative studies and over 830 000 operations in national registries. The mean age range in the trials was 42-71, and 26-88% were women. Disease specific functional outcomes and general quality of life scores were no different or they favoured patients receiving metal on polyethylene rather than metal on metal in the trials. While one clinical study reported fewer dislocations associated with metal on metal implants, in the three largest national registries there was evidence of higher rates of implant revision associated with metal on metal implants compared with metal on polyethylene. One trial reported fewer revisions with ceramic on ceramic compared with metal on polyethylene implants, but data from national registries did not support this finding.
Conclusions There is limited evidence regarding comparative effectiveness of various hip implant bearings. Results do not indicate any advantage for metal on metal or ceramic on ceramic implants compared with traditional metal on polyethylene or ceramic on polyethylene bearings.
Every year over 700 000 joint replacements are performed in the Unites States alone,1 of which over 270 000 are hip replacements. The annual volume of hip joint replacement is projected to double over the next decade. Moreover, these surgeries are expected to become more expensive with total costs tripling in just five years.2 While joint replacement is a successful operation and deals with a great public health burden, substantial numbers of patients who receive hip implants require revision surgery within 10 years to replace the implant because of infection, dislocation, wear, instability, loosening, or other mechanical failures.3 4 5 6 The bearing/articulating surface is designed to endure the contact stress and is one of the key design factors to reduce the complications and the chance of revision. Hip implants with metal femoral heads with polyethylene cups as articulating surfaces are associated with low rates of revision.7 National registries throughout the world continue reporting low risk of revision with this traditional bearing.8 9 10 On the other hand, rapid growth of technology introduced several alternative bearings to the market that aimed to further reduce implant wear and subsequently the time to revision surgery. These alternatives include metal on metal and ceramic on ceramic bearings.11
Metal on metal bearings are particularly attractive to surgeons as they allow use of larger femoral heads (>32 mm v <32 mm) and supposedly reduce the risk of dislocation and improve the functional outcomes in younger patients.12 13 They were quickly adopted by surgeons and often used even in older patients. In one study, one out of three older patients undergoing hip surgery received metal on metal hip implants.2 Recently, however, the United Kingdom regulatory agency (Medicines and Healthcare Products Regulatory Agency) alerted the public about severe cases of metallosis (accumulation of metal ions in the tissues) related to the release of metal ions from the implants,14 and the British Orthopaedic Association developed specific recommendations.11 Furthermore, in August 2010 Johnson and Johnson recalled over 93 000 metal on metal implants called “ASR.”15 16 The recall received widespread coverage in the New York Times,17 and the British Medical Journal published papers related to device regulation, clinician involvement, and the need to develop evidence.16 18
In the US, the Food and Drug Administration has been closely monitoring reports of failure of hip implants related to various bearing surfaces and, in November 2009, initiated a comprehensive evaluation and synthesis of evidence of reported outcomes for approved implants. We systematically reviewed the evidence to determine the short and long term outcomes reported by patients undergoing hip replacement and the rates of revision after use of implants with various bearings.
Identification of studies
We worked with the Food and Drug Administration to identify the summaries of safety and effectiveness for all pre-market application trials and relevant Food and Drug Administration mandated post-market studies reporting comparative information for hip implants. Summaries of safety and effectiveness of all ceramic on ceramic hip implants are publicly available as these were the only pre-market application hip replacement devices. We then identified all relevant publications related to pre-market application trials in Medline, Embase, and the Cochrane Controlled Trials Register to learn about long term results (any follow-up) reported. Additionally, we performed a comprehensive search of all other publications in Medline, Embase, and the Cochrane Controlled Trials Register from January 1995 to June 2011. We used all MeSH terms corresponding to hip bearing surface as well as any text words that were applicable to locate the studies. The systematic search strategy is outlined in appendix 1 on bmj.com. We also searched the reference lists of trials and reviews for additional studies. Finally, we reviewed the online annual reports of all registries (within and outside the US) that report information from their registry (see appendix 2 on bmj.com for details).
We limited our study to papers in English. Our inclusion criteria were clinical studies, only adults enrolled, reporting any one of the clinical outcomes of interest (any functional outcomes or revisions, or both), and conventional hip replacement.
Abstraction of data
The data were independently abstracted and checked multiple times by two abstractors (SD, SJ). The senior author (AS) resolved any discrepancies. Both SD and SJ were trained by using a large learning sample at the stage of abstract screening and manuscript review. One of the senior investigators (AS) reviewed a 20% random sample of excluded studies and all included abstracts for quality control. The process led to 100% agreement among the three authors.
We abstracted information on study design, quality, number of hips (patients), date of procedure, age, sex, number of surgeons and centres, diagnosis (osteoarthritis or avascular necrosis), mean follow-up, percentage follow-up, and information on the manufacturer. The main outcome measures included any functional outcome and occurrence of revision.
To evaluate quality of included studies we used four important criteria that lead to conclusions of overall perception of the risk of bias. As masking/blinding of surgeons in these trials is not applicable, we considered “randomised” study description; description of a correct randomisation procedure; masking of patients and outcome assessors; intention to treat analysis; and allocation concealment (methods such as sealed envelope, central telephone) including the time of the announcement of the allocation (in the operating room v before). We used the STROBE criteria to assess the quality of the observational studies.19 Quality was classified as high (low risk of bias), intermediate (moderate risk of bias), and low (high risk of bias). No formal tool or score was developed. The high quality classification was based on adequate description of randomisation or comparator group, units of randomisation (patient v hip), masking (participants, outcome assessors), and allocation concealment. If studies did not report any of the criteria and described potentially problematic designs (such as questionable exclusion of patients, not describing the group balance regarding prognostic factors), we categorised these studies as low quality. Other classifications (moderate, moderate to high, moderate to low) were based on overall clinical and critical judgments of proximity to high or low quality. Thus, the decision to classify studies into the three categories was based on overall perception of risk of bias after evaluation of the quality criteria, descriptions related to design, and presentations of data.
While publication bias is applicable to any topic area, we believe that comparative surgical trials in orthopaedics are relatively rare, and papers for such comparisons are likely to be submitted and accepted by journals regardless of the findings. Another possible publication bias might be related to selective reporting of outcomes. Many studies did not report all outcomes of interest to us (particularly revision surgery).
Qualitative and quantitative analyses
We evaluated clinical heterogeneity (diversity) among the trials by abstracting data on included populations and evaluating definitions of outcomes (often not reported). Functional outcomes included disease specific Harris hip score and general quality of life measures (SF-12). We collected data on number of events (categorical data), scores (continuous data), relative risks (when reported) and their 95% confidence intervals, and P values. We report evidence for alternative bearings (such as metal on metal and ceramic on ceramic) compared with traditional bearings (such as ceramic on polyethylene and metal on polyethylene) bearings. We also report any comparative evidence between two traditional bearings (such as ceramic on polyethylene versus metal on polyethylene) and between alternative bearings (ceramic on ceramic versus metal on metal). Formal meta-analysis was considered in one instance of relatively complete reporting of functional outcome. The results from each study are expressed as a mean difference with 95% confidence intervals and combined with a random effect method. Statistical heterogeneity was evaluated with I2 estimates. Rev Man 5.1 software was used for statistical analyses.
Studies and populations
We identified and reviewed 3254 abstracts (see appendix 3 on bmj.com). We identified 18 randomised trials and comparative observational studies.20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 One study (Capello and colleagues) included both randomised and non-randomised arms and is represented twice in the tables. 30 31 32 37 There were 3139 patients and 3404 hips enrolled in 18 comparative studies (table 1)⇓. The mean age ranged from 42 to 71, and only five out of 18 comparative studies reported standard deviation. The proportion of women was 26-88%; four studies did not report information on sex.28 34 36 39 None of the studies reported age specific or sex specific analyses, and none reported race or patients’ comorbidity, such as diabetes and obesity. Only 13 reported information on diagnosis, and osteoarthritis was the most prevalent diagnosis in 12 studies (range 41-100%). One study predominantly enrolled patients with avascular necrosis,35 which was the second most common diagnosis in the trials. Reported outcomes included functional outcomes such as disease specific Harris hip score, general quality of life measure (such as SF-12), and revision surgery. Follow-up ranged from three months to 8.1 years (table 1)⇓.
Four studies were classified as moderate to high quality, five studies as moderate quality, and six studies as low quality. Subgroup analyses based on quality were not informative because of homogenous results and relatively few studies contributing to each comparison. We report the quality information to illustrate the overall lack of appreciation of design and analysis reporting in the comparative studies of bearing surface.
Patients’ outcomes: Harris hip score
The Harris hip score (with higher scores indicating better outcomes) was the most commonly used functional outcome measurement and was reported by 16 out of 18 studies (table 2⇓). Only 10 studies, however, reported both baseline and follow-up measurements; six reported only postoperative scores.
Metal on metal v traditional bearing surfaces—All studies had follow-up measurements, but one study did not report preoperative scores (table 2).⇑ Four studies compared metal on metal bearing with metal on polyethylene at two years. The reporting was relatively complete in these studies, and the scores were combined. The combined estimate favoured metal on polyethylene bearing. There was a significant 2.4 points higher score (95% confidence interval 4.5 to 0.3) associated with metal on polyethylene compared with metal on metal bearing (fig 1⇓). Three studies reported Harris hip scores beyond two years: two studies compared metal on metal with metal on polyethylene and another with ceramic on polyethylene (fig 2⇓). The individual and combined scores were similar in all three studies for all bearings.
Ceramic on ceramic v traditional bearing surfaces—Five studies compared ceramic on ceramic bearings with ceramic on polyethylene bearing. Four studies had both baseline and follow-up measurements, and one study had measurements only after surgery. Two studies compared ceramic on ceramic with metal on polyethylene bearing and had measurements only after surgery. One of the studies was part of the pre-market application process and had a randomised and non-randomised arm (and hence is represented twice in table 2).30 31 32 37 The Harris hip scores were similar at baseline and follow-up in all groups that compared ceramic on ceramic with other bearing surfaces.
Ceramic on polyethylene v metal on polyethylene—Two studies compared ceramic on polyethylene with metal on polyethylene, and one of the studies had only follow-up measurement. There were no differences in Harris hip scores.
Metal on metal v ceramic on ceramic—One study compared both alternative bearing surfaces and did not find any difference in scores at follow-up.24 The study included only 28 hips (patients) and was likely to be underpowered to determine any differences between the groups.
Patients’ outcomes: other functional scores
Five studies compared metal on metal with other bearing surfaces (table 3⇓). Four of these compared metal on metal with metal on polyethylene and one study compared metal on metal with ceramic on ceramic. One study reported substantially better physical functioning after surgery associated with metal on polyethylene compared with metal on metal.28
One study compared ceramic on ceramic with ceramic on polyethylene, and there was no difference between the groups.
Patients’ outcomes: revision occurrence after hip replacement
Evidence from comparative studies
Ten comparative evaluations reported information on revisions, and only seven reported data on dislocations. No study specifically reported aseptic loosening.
Metal on metal v traditional bearing surfaces—Two studies compared metal on metal with metal on polyethylene bearing and found no difference in occurrence of revisions. Two other studies compared metal on metal with ceramic on polyethylene bearing, and one of these studies33 reported substantially higher occurrence of dislocation in the ceramic on polyethylene group. Occurrence of dislocation was not different in the other three studies.
Ceramic on ceramic v traditional bearing surfaces—Seven comparative studies reported this information. Two studies compared ceramic on ceramic with metal on polyethylene bearing. As described before, one of the studies (Capello and colleagues) included both randomised and non-randomised arms (represented twice in table 4⇓).30 31 32 37 This study reported substantially lower occurrence of revision in the ceramic on ceramic arms compared with metal on polyethylene arm. Five studies compared ceramic on ceramic with ceramic on polyethylene and found no qualitative or quantitative differences between the groups in terms of revision occurrence.
Ceramic on polyethylene v metal on polyethylene—One comparative study compared ceramic on polyethylene with metal on polyethylene and found no difference in revision occurrence.
Metal on metal v ceramic on ceramic—There were no comparative studies discussing occurrence of revision.
Emerging evidence on revision from national registries
We reviewed the annual reports of 29 national and regional registries. Five national registries8 9 10 41 42 including over 830 000 surgeries reported information on bearing surface, and one US study including over 57 000 surgeries reported data from the Medicare database.2 Some of the large national registries, such as the Swedish registry,43 reported information on every implant used in the country but did not stratify by bearing surface. National registries provided important quantitative data, but they varied in their methods of risk adjustment. While we considered national registry reports as comparative studies, we did not evaluate quality because of lack of information on methods.
Metal on metal was associated with higher occurrence of revision compared with metal on polyethylene in the adjusted analyses of three national registries: Australian, New Zealand, and England and Wales National Registries (including over 720 000 patients).8 9 10 Three other smaller registries (including the Center for Medicare and Medicaid Services (CMS) database) did not report such results (fig 3⇓).
Ceramic on ceramic was associated with a higher occurrence of revision than metal on polyethylene in the adjusted analyses of the New Zealand National Registry.9 Five registries (including the CMS database) did not report any difference.
Ceramic on polyethylene was associated with a higher occurrence of revision than metal on polyethylene in the New Zealand National Registry but a lower occurrence in the England and Wales National Registry. Three other national registries did not report any difference (ceramic on polyethylene implant category was not available in the CMS database).
Functional outcomes, traditionally thought of as primary effectiveness outcomes, were no different among patients receiving hip replacements with various bearings. We conducted this systematic evidence review as part of the new project initiated by the Food and Drug Administration to create a framework for post-market evaluation of orthopaedic implants. This systematic evaluation helps to deal with evidence gaps in comparisons of hip bearing surfaces.
We found some evidence for lower Harris hip scores (by 2.4 points) at two year follow-up for metal on metal bearing compared with metal on polyethylene. There was also evidence from one trial that patients receiving metal on metal implants have a lower quality of life (functional component) than patients receiving metal on polyethylene bearing.28 Though these differences are small and might not be clinically relevant, they are still valuable findings to be considered for future hypothesis generation.
Evidence on implant revision did not favour metal on metal implants. While one study reported fewer dislocations associated with metal on metal implants, three other trials did not support this finding, and the three largest national registries reported substantially higher occurrence of revision associated with metal on metal implants compared with metal on polyethylene bearing (fig 2)⇑.
One medium size comparative clinical trial reported a substantially lower rate of revision associated with ceramic on ceramic implants compared with metal on polyethylene implants.30 This conflicts with results reported in national registries (fig 2⇑). Five national data sources/registries did not report any difference between the ceramic on ceramic and traditional bearings, and a large registry report from New Zealand favoured ceramic on polyethylene (traditional bearing) compared with ceramic on ceramic surfaces.
Comparison with other studies
To our knowledge this is the first systematic appraisal of assessment of clinical outcomes after hip replacement with various bearing surfaces. We did not summarise the evidence related to metal sensitivity or toxicity, but our findings related to clinical outcomes of metal on metal implants are augmented by reports that show severe metallosis associated with such implants.14 Concerns regarding the effects of metal ions have been noted after a landmark paper by MacDonald et al36 that noted a 5.3-fold increase in erythrocyte cobalt, a 35-fold increase in urinary cobalt, and a 17-fold increase in urinary chromium concentrations in patients who received metal on metal hip replacements compared with those who received metal on polyethylene.
Strengths and limitations of study
We found 18 comparative studies of bearing surface and six national data source/registry reports and comprehensively summarised the strengths and limitations of current data. The overall quality of reporting in the comparative studies was less than adequate in the trials of bearing surface. Many investigators did not report their power to detect minimally important differences among groups and often omitted standard deviations and range of scores observed. In addition, few studies were adequately powered or reported subgroup analyses by age, sex, underlying aetiology, or femoral head size. Classification of polyethylene inserts used in hip replacement also needs harmonisation; the various terms used to define cross linked polyethylene hinder subgroup analyses. Our review was limited to English language publications, and there is chance of publication bias. There was substantial heterogeneity in registry evidence. Formal data pooling from registries will require harmonisation of data definitions and analytical methods.
Changing technology, the need to have large numbers of patients, long term follow-up to establish the safety, and short and long term outcomes reported by patients to establish the effectiveness all lead to a lack of strong evidence in orthopaedics. For example, patients who receive hip implants often require revision surgery within 10 years, when the implants fail because of infection, dislocation, wear, instability, loosening, or other mechanical failures. In addition, clinical trials assessing bearing surfaces are conducted in relatively unique environments defined by skilled surgeons operating at high volume centres and often concentrating on clinical and radiological indirect measures of device safety and effectiveness. In general this is a reflection of the current state of research in implantable devices, where comparative trials are rare and often not applicable.44
Registries are likely to fill the evidence gap in the immediate future. Large registries or networks of registries capturing various orthopaedic devices are particularly important for evaluation of comparative outcomes and active surveillance. Only large longitudinal multinational registries can provide denominator data for adverse events related to specific implants and allow proper conduct of comparative safety and effectiveness studies, particularly for rare end points. Current registry annual reports that include comparative evaluations, however, are not harmonised in terms of methods and reporting. Accordingly, the findings are tentative. To deal with this limitation the US Food and Drug Administration Center for Devices and Radiologic Health started an important initiative called the International Consortium of Orthopaedic Registries, which aims to build the foundations for a worldwide research consortium of orthopaedic registries. The consortium represents more than 15 countries that have existing registries with a mission to improve the safety and effectiveness of orthopaedic devices and procedures through collaboration. Currently, these international registries included more than 3 500 000 orthopaedic surgeries capturing all implantable devices on the market.
There is limited evidence regarding comparative effectiveness of various hip implant bearings, and the results do not indicate any advantage for metal on metal or ceramic on ceramic implants compared with traditional bearings. A large and high quality randomised controlled trial of bearing surfaces in total hip replacement needs to be conducted before any claims of benefit are made. Until then national registries provide important real world data that are critical for the safety and future comparative safety and effectiveness evaluation.
What is already known on this topic
There have been severe cases of accumulation of metal ions in tissues of patients with metal on metal hip implants
Metal on metal and ceramic on ceramic hip implants might not be associated with any advantage compared with traditional bearings such as metal on polyethylene or ceramic on polyethylene
What this study adds
Disease specific functional outcomes such as Harris hip score and general quality of life measures were no different between patients with metal on metal or ceramic on ceramic hip implants compared with traditional hip implants
There is some evidence of higher rates of revision surgery associated with metal on metal implants compared with metal on polyethylene implants
While an investigational device exemption trial reported fewer revisions associated with ceramic on ceramic implants, the emerging evidence from national registries does not support these results
Cite this as: BMJ 2011;343:d7434
Contributors: AR and DM-D were responsible for study design and concept. AR, DM-D, SD, and SJ acquired the data, which were analysed by AR, SG, S-LTN, SD, and SJ. AR, SG, DM-D, and S-LTN interpreted the results. AR drafted the manuscript and is guarantor. AR and S-LTN provided statistical expertise. AR and DM-D provided administrative, technical, or material support. All authors critically revised the manuscript and revision for important intellectual content.
Funding: The study is funded by the US Food and Drug Administration Center for Devices and Radiological Health. The authors accept full responsibility for the conduct of the study, had access to the data, and controlled the decision to publish. The opinions expressed in this article are those of the authors and not necessarily those of the Food and Drug Administration.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: Not required.
Data sharing: No additional data available.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.