Applicability and generalisability of published results of randomised controlled trials and non-randomised studies evaluating four orthopaedic procedures: methodological systematic reviewBMJ 2009; 339 doi: https://doi.org/10.1136/bmj.b4538 (Published 17 November 2009) Cite this as: BMJ 2009;339:b4538
- Leslie Pibouleau, research project manager in health technology assessment12,
- Isabelle Boutron, assistant professor of epidemiology13,
- Barnaby C Reeves, professorial research fellow in health services research4,
- Rémy Nizard, professor of orthopaedic surgery5,
- Philippe Ravaud, professor of epidemiology1
- 1INSERM U738, Paris, France; Assistance Publique des Hôpitaux de Paris, Hôpital Hôtel Dieu, Centre d’Epidémiologie Clinique, Paris, France; Université Paris Descartes, Faculté de Médecine, Paris, France
- 2Department of Assessment of Medical Devices, Haute Autorité de Santé (French National Authority for Health), Saint Denis, France
- 3Centre for Statistics in Medicine, University of Oxford, Wolfson College Annexe, Oxford OX2 6UD
- 4Clinical Trials and Evaluation Unit, University of Bristol, Bristol Royal Infirmary, Bristol
- 5Assistance Publique des Hôpitaux de Paris, Hôpital Lariboisière, Service de Chirurgie Orthopédique et Traumatologique, Paris, France; Université Paris Diderot, Faculté de Médecine, Paris, France
- Correspondence to: I Boutron
- Accepted 8 September 2009
Objective To compare the reporting of essential applicability data from randomised controlled trials and non-randomised studies evaluating four new orthopaedic surgical procedures.
Data sources Medline and the Cochrane central register of controlled trials.
Study selection All articles of comparative studies assessing total hip or knee arthroplasty carried out by a minimally invasive approach or computer assisted navigation system.
Data extraction Items judged to be essential for interpreting the applicability of findings about such procedures were identified by a survey of a sample of orthopaedic surgeons (77 of 512 completed the survey). Reports were evaluated for data describing these “essential” items and the number of centres and surgeons involved in the trials. When data on the number of centres and surgeons were not reported, the corresponding author of the selected trials was contacted.
Results 84 articles were identified (38 randomised controlled trials, 46 non-randomised studies). The median percentage (interquartile range) of essential items reported for non-randomised studies compared with randomised controlled trials was 38% (25-63%) versus 44% (38-45%) for items about patients, 71% (43-86%) versus 71% (57-86%) for items considered essential for all interventions, and 38% (25-50%) versus 50% (25-50%) for items about the context of care. More than 80% of both study types were single centre studies, with one or two participating surgeons.
Conclusion The reporting of data related to the applicability of results was poor in published articles of both non-randomised studies and randomised controlled trials and did not differ by study design. The applicability of results from the trials and studies was similar in terms of number of centres and surgeons involved and the reproducibility of the intervention.
Randomised controlled trials provide the most reliable evidence for quantifying treatment effect sizes.1 2 In the specialty of surgery, however, results of such trials are often criticised for being poorly applicable. The results of non-randomised studies are believed to have better applicability.3 4 5 6 7 8
Applicability (also called external validity or generalisability)9 concerns a multidimensional concept depending on the extent to which participants, the context of care, and the interventions (and comparators) evaluated in studies are representative of, or can be reproduced in, usual care. The applicability of a trial’s results could be limited if patients represent only a small proportion of those being treated in normal practice.10 The participation of centres with different resources and surgeons with different skills may mean that treatment effects observed in research may not be applicable, or at worst are irrelevant, to non-research settings.11 12 13 14 15 Surgical procedures are complex interventions that can be difficult to describe, standardise, and reproduce consistently in clinical practice.16
Appraising the applicability of the results of a study is intertwined with the quality of reporting—that is, the extent to which an article provides information about the patients, the intervention, and the context of care (centres and surgeons’ expertise). Articles often omit important details. Poor reporting of applicability data by researchers may be a barrier to applying research findings in clinical practice.
We tested empirically the hypothesis that non-randomised studies yield results that are more applicable than those of randomised controlled trials. For this purpose we identified items considered by surgeons to be essential for appraising applicability in research articles, compared the reporting of these data in published articles of randomised controlled trials and non-randomised studies, and compared the context of care (number of centres and surgeons involved) in published reports of randomised controlled trials and non-randomised studies.
We focused on minimally invasive and computer assisted navigation techniques for total hip arthroplasty and total knee arthroplasty. These surgical procedures were chosen because they have been developed recently, are complex, and their success depends on patient selection, surgeon experience, and volume of procedures undertaken by a centre.12
We identified and selected eligible published articles of randomised controlled trials and non-randomised studies that assessed four surgical procedures: minimally invasive and computer assisted navigation techniques for total hip arthroplasty and total knee arthroplasty. Next, we surveyed orthopaedic surgeons to identify items considered essential in assessing the applicability of evidence for these procedures to clinical practice. We extracted data on the reporting of essential applicability items using standardised methods and compared the quality of reporting for non-randomised studies and randomised controlled trials. Finally, we extracted data on the context of care (number of centres and surgeons involved) and compared the applicability of the context of care for non-randomised studies and randomised controlled trials.
Search for and selection of eligible studies
We searched for all English language articles of trials that evaluated minimally invasive or computer assisted total hip arthroplasty or total knee arthroplasty in Medline and the Cochrane central register of controlled trials (see web extra appendix 1 for details of the search strategy). One author (LP) screened the titles and abstracts of retrieved citations to select the relevant articles. The a priori inclusion criteria were all randomised and non-randomised studies that compared total hip arthroplasty or total knee arthroplasty done by a minimally invasive approach or a computer assisted navigation system with one or more conventional procedures. We also included trials that evaluated minimally invasive procedures involving computer assisted navigation techniques.
A priori exclusion criteria were uncontrolled studies, non-therapeutic studies (in vitro, biomechanical, and epidemiological studies), pathophysiological studies, letters, ancillary studies such as a subgroup analysis, studies that compared two minimally invasive procedures or two computer assisted navigation procedures, cost effectiveness evaluations, and systematic reviews or meta-analyses. We also excluded studies that assessed the organisation of the healthcare system or interventions provided to care providers. When more than one article was retrieved for the same study, we considered only the earliest publication as eligible.
We used a standardised form to extract data for each eligible study (see web extra appendix 2): year of publication, type of surgical procedure (total hip arthroplasty or total knee arthroplasty, minimally invasive or navigation procedure), study design (randomised controlled trial, non-randomised historically controlled study, case-control study, or other non-randomised comparative study), sample size, whether a statistician or methodologist was included among the authors, the risk of bias, and items essential to interpreting the applicability of the findings.
Identification of items essential for interpreting applicability
To identify items relevant to applicability, we carried out a literature search, relying especially on criteria proposed by the CONSORT statement and its extension for non-pharmacological treatments17 18 and by Rothwell et al.8 Selected items (see web extra appendix 3) were classified into three main domains: the description of the patients, the description of the experimental intervention (for practical reasons we did not focus on the description of the comparator), and the context of care (centres and care providers).
In a second step we invited by email experts to participate in a web based survey: all corresponding authors of published articles of studies (with no restriction on the design) that assessed knee arthroplasty or hip arthroplasty identified by an electronic search strategy (see web extra appendix 4) and all members of the French Hip and Knee Society (SFHG, created in 1997 and consisting of 100 orthopaedic surgeons specialising in hip and knee surgery). For each item, surgeons had to indicate whether they agreed, on a scale of 1 (totally disagree) to 9 (totally agree), that the item should be reported in a published article of a trial. Surgeons could also indicate any other items that were not listed but were deemed important. The criterion used to classify an item as being “essential” for adequate appraisal of the applicability of the published results of a study was a score of 7 or more by 50% or more of respondents.
We did not invite other important stakeholders such as patients or policymakers because surgeons are usually the first line in appraising to whom and in which context trial results should be applied.
Extracting data on essential applicability items
One author (LP) appraised the reporting of essential applicability items using a standardised data extraction form (see web extra appendix 2). The author also assessed whether applicability was considered in the discussion section. A random sample of 15% of the selected articles was reviewed independently by another author (IB) for quality assurance (see web extra appendix 5 for the proportion of agreement between the two reviewers). Items with a low level of agreement were discussed and, if necessary, all selected articles were reappraised after discussion.
We calculated the proportion of essential items reported for three components of applicability: description of the patients, description of the experimental intervention, and context of care.
Context of care
As well as evaluating the reporting of applicability data, we aimed to appraise the actual applicability of the results of the selected trials. Because appraising the applicability of published results of a study is difficult, we focused on only some components related to the context of care—the number of centres and number of surgeons involved in the randomised controlled trials and non-randomised studies, assuming that studies with a low number of participating centres and surgeons had low applicability. When the number of centres and participating surgeons was not reported in selected articles, we contacted the corresponding author by email for this information. When authors did not respond we assumed that the number of centres corresponded to the number of orthopaedic centres reported in the affiliations of the article, and the number of surgeons was treated as missing.
Categorical variables are described with frequencies and percentages, and quantitative variables with medians (interquartile ranges).
To compare the reporting of applicability of the results of the two study types, we calculated the percentage of applicability items reported, from 0 (no item reported) to 100 (all items reported), for each trial for the three domains of patients, experimental intervention, and context of care. We compared the percentage of applicability items reported for randomised controlled trials and non-randomised studies by a non-paired Wilcoxon test. The level of significance was set at P<0.05.
Applicability assessments are described with frequencies and percentages. All data analyses were done using the R 2.8.0 software package (R Foundation for Statistical Computing, Vienna, Austria).
The search strategy generated 207 articles: 84 were eligible and appraised (fig 1⇓). Thirty eight studies were randomised controlled trials and 46 were non-randomised studies. Thirty four studies assessed total hip arthroplasty and 50 total knee arthroplasty. The experimental procedure was a minimally invasive one in 32 studies, a computer assisted navigation technique in 42, and a computer assisted navigation technique associated with a minimally invasive procedure in 10.
Characteristics of selected studies
Table 1⇓ details the general characteristics of the selected articles. Articles were published between 2001 and 2008, with the highest number of publications in 2005 and 2006.
The median (interquartile range) number of patients for non-randomised studies was 92 (60-131) and for randomised controlled trials was 90 (60-120). Thirty (65%) non-randomised studies were controlled cohort studies, 14 (30%) historically controlled studies, and 2 (4%) case-control studies. Eleven (37%) controlled cohort studies were clearly reported as being prospective. The comparator was systematically another surgical procedure.
A primary outcome was clearly reported for 39% of non-randomised studies (n=18) and 61% of randomised controlled trials (n=23) and, when reported, was radiographic in 93% of reports (n=37). The duration of follow-up, when reported, was no longer than one year in 84% (56/67) of the articles. Adverse events were reported for 70% of non-randomised studies (n=32) and 61% of randomised controlled trials (n=23). A definition of severe adverse events was given in the reports of only three non-randomised studies (7%) and two randomised controlled trials (5%).
Survey of surgeons
Of the 512 experts contacted by email, 87 completed the web based survey. Respondents who were not orthopaedic surgeons (n=10) were excluded (see web extra appendix 6 for the flow of experts and web extra appendix 7 for a description of these participants). The results of the survey are summarised in web extra appendix 8. Eight items were classified as essential for patient characteristics and four for context of care (centres and surgeons). These items did not differ according to the procedure evaluated. Essential items describing the intervention varied by procedure: seven generic items were selected for all of the interventions (minimally invasive and computer navigated total hip arthroplasty and total knee arthroplasty) and nine items were selected specifically for minimally invasive procedures and seven for navigated procedures.
Reporting of essential applicability items
The median proportion (interquartile range) of essential items for non-randomised studies and for randomised controlled trials for the description of participants was 38% (25-63%) and 44% (38-45%; P=0.60), for the description of the experimental intervention was 71% (43-86%) and 71% (57-86%; P=0.68), for the generic items was 50% (33-75%), and for specific items was 67% (49-75%; P=0.27).
The median proportion (interquartile range) of essential items describing the context of care for non-randomised studies and for randomised controlled trials was 38% (25-50%) and 50% (25-50%; P=0.17). The number of centres reported for non-randomised studies was 35% (n=16) and for randomised controlled trials was 50% (n=19). Details such as volume of care in the centre were reported for only two non-randomised studies (4%). Details on surgeons’ expertise were reported for 59% of non-randomised studies (n=16) and 50% of randomised controlled trials (n=19). When reported, these details described years of practice for only one non-randomised comparative study (6%) and one randomised controlled trial (5%) and the number of experimental interventions carried out before the start of the study for 50% of non-randomised studies (n=8) and 55% of randomised controlled trials (n=11). For 38% of non-randomised studies (n=6) and 30% of randomised controlled trials (n=6) the surgeons were reported as “experts,” without any further detail.
Finally, issues with applicability were discussed in the discussion section of 22% of the articles of non-randomised studies (n=10) and 21% of those of randomised controlled trials (n=8).
Context of care
The context of care was evaluated by comparing the number of surgeons and centres involved in the randomised controlled trials and non-randomised studies. After we contacted the corresponding authors, the number of participating surgeons was known for 81% of the studies (n=68). Data on the number of centres were available for 58 studies (69%). For the remaining 26 studies, the number of orthopaedic centres reported in the affiliations was considered.
Figure 3⇓ describes the reported and actual number of participating centres and surgeons in the trials. The actual number of centres did not differ according to study design because most trials were carried out in only one centre: 82% of non-randomised studies (n=37) and 87% of randomised controlled trials (n=33). The actual number of participating surgeons was comparable between the two study types. One or two surgeons participated in 80% of the non-randomised studies (n=37) and in 82% of the randomised controlled trials (n=31).
Our appraisal of 84 articles of non-randomised studies (n=46) and randomised controlled trials (n=38) that assessed four orthopaedic interventions (total hip arthroplasty or total knee arthroplasty carried out by a minimally invasive approach or computer assisted navigation system) does not support the hypothesis that, in general, results of non-randomised studies have better applicability than those of randomised controlled trials. The reporting of items judged “essential” for determining applicability did not differ between the two study designs. Important components of the intervention itself, such as protocols for preoperative care or management of pain, were rarely described. The proxy used to evaluate the applicability related to the context of care—the number of surgeons and centres—was similar between the trial types as well. Other factors potentially affecting actual applicability, such as the relevance of a radiographic primary outcome and duration of follow-up of less than one year, also did not differ by study design and limited the applicability of the results of the selected studies. Our results suggest that some reports of both non-randomised studies and randomised controlled trials may be of uncertain value to surgeons, researchers, systematic reviewers, and decision makers.
These results inevitably prompt the question of why. Controlled studies in other healthcare specialties vary on a spectrum of “pragmatism” or “efficacy/effectiveness,” addressing research questions focused on clinical or policy decisions or mechanisms of action.19 Are our findings evidence of general disinterest among surgeons about pragmatic questions or were the interventions reviewed here too new? Some examples of nationally representative studies on surgical outcomes, such as those involving national arthroplasty registers, may provide useful data, but such studies make up a tiny fraction of the surgical literature. In fact, no published articles evaluating the selected procedures used data from a national register or similar database with wide coverage. Furthermore, studies carried out in other specialties highlight the challenges of interpreting the findings of non-randomised studies involving a nationally representative sample.20 21
How could we improve the situation? Applicability must be considered as it is usually done for internal validity at different steps of the trial: in the protocol, when deciding the eligibility criteria for the centres, surgeons, and patients but also when reporting the trial results by following the CONSORT statement,22 particularly the extension for non-pharmacological treatments.17 To tackle the question of the impact of the surgical learning curve, for instance, one author recommended that “surgical trials should report explicitly and informatively on the prior expertise of the participating surgeons.”23
Our results on applicability reporting are consistent with those for other trials, highlighting that authors pay insufficient attention to applicability in their published articles of randomised controlled trials.8 24 However, to our knowledge this is the first study to compare the reporting of applicability data in reports of randomised controlled trials and non-randomised studies. Furthermore, we took into account that applicability criteria vary depending on the procedure evaluated. In our study, orthopaedic surgeons contributed to the selection of relevant applicability items for each of the four interventions.
Limitations of the study
The study has several limitations. Firstly, we focused on studies assessing the specific procedures of total hip arthroplasty and total knee arthroplasty, and these results should be confirmed in other surgical areas. However, we chose recent interventions that are increasingly being used in clinical practice. This choice also allowed for a detailed and precise assessment of applicability. Secondly, we focused on the reporting of essential applicability information for randomised controlled trials and non-randomised studies and evaluated the actual applicability of the results of the studies mainly from data related to the context of care. We were unable to compare the representativeness of patients in reports of both study designs because essential information was often missing. Thirdly, for practical reasons we evaluated the context of care by focusing only on the centres and surgeons. Finally, we assumed that involvement of more centres and surgeons implies better applicability of results, but this assumption is not true when all participating centres and surgeons have high expertise. However, our results highlighted that most trials involved only one centre and one or two surgeons, and the applicability of results from such trials is probably debatable.
In conclusion, the study highlights the poor reporting of data related to the applicability and generalisability of results in published articles of both non-randomised studies and randomised controlled trials. Furthermore, the appraisal of the applicability of results from the two trial types did not differ in terms of number and expertise of centres and surgeons involved and the reproducibility of the intervention. From these articles we were unable to conclude whether the patients who participated were representative. The results of this study need confirmation in other disciplines.
What is already known on this topic
In the specialty of surgery, results from randomised controlled trials are criticised for having poor applicability to clinical practice
This argument is often used to justify the use of observational studies rather than randomised controlled trials
What this study adds
Our results do not support the hypothesis that results from non-randomised studies of surgery have better applicability than those from randomised controlled trials
The reporting of applicability data was poor with both designs
Both study types were mainly single centre studies, with one or two participating surgeons
Cite this as: BMJ 2009;339:b4538
We thank the experts who completed the online survey: J N Argenson, J N Auyeung, T Baad-Hansen, D L Back, A R Barrett, M Bercovy, D Biau, R Bourne, P Boyer, K J Bozic, I J Brenkel, J L Briard, J Bruns, M Buttaro, P Calas, P Cartier, I B De Groot, F H De Man, C Delaunay, L Descamps, R Eisele, R H Emerson Jr, S A Ender, J A Epinette, M C Forster, F Frihagen, R Gandhi, L E Gayet, F Genet, A Gonzalez Della Valle, W L Griffin, R A Hall, M Hamadouche, D Hannouche, D Hernandez-Vaquero, B E Heyworth, C Hulet, C A Jacobs, P K Jaiswal, J Y Jenny, T Judet, R L Kane, V Karatosun, L Kerboull, Y S Kim, S Kohler, P Kort, J M Laffosse, G Lecerf, E A Lingard, S J MacDonald, O M Mahoney, A Martin, D Matlock, T Matsumoto, C W McBryde, H Mizu-Uchi, F D Naal, J M Naylor, R G Nelissen, M A Newman, R Nizard, V Oztuna, R Padua, J Parvizi, M K Petersen, A Phadnis, R Philippot, F Picard, P Piriou, R W Poolman, S Procyk, T A Radcliff, O Robertsson, A R Rochwerger, A Roth, O Sadr Azodi, D Saragaglia, A P Schulz, R J Sierra, J P Simon, M Soubeyrand, L M Specht, M Stevens, F Thorey, I Van den Akker-Scheek, C Vielpeau, R Wagenmakers, P Weinrauch, H Wu, P J Yates, and F Zadegan; Laura Heraty (BioMedEditing East York, ON Canada) who edited the manuscript for submission; and Samira Laribi who designed the website for the survey.
Contributors: LP, IB, BR, and PR conceived and designed the study. LP and IB acquired the data. All authors analysed and interpreted the data. LP drafted the manuscript. IB, BR, RN, and PR critically revised the manuscript for important intellectual content. PR provided administrative, technical, and material support. All authors saw and approved the final manuscript. LP, IB, and PR are guarantors, had full access to the data in the study, and take responsibility for the integrity of the data and the accuracy of the data analysis.
Funding: IB is supported by a grant from the Societé Francaise de Rhumatologie and the Lavoisier Program (Ministère des Affaires étrangères et européennes). The Funders were not involved in the conduct of the study or preparation of the manuscript.
Competing interests: None declared.
Ethical approval: Not required.
Data sharing: The technical appendix, statistical code, and dataset are available from the corresponding author.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.