Intended for healthcare professionals

Education And Debate

Randomised trials in surgery: problems and possible solutions

BMJ 2002; 324 doi: (Published 15 June 2002) Cite this as: BMJ 2002;324:1448
  1. Peter McCulloch (petermcculloch{at}, senior lecturer in surgerya,
  2. Irving Taylor, professor of surgeryb,
  3. Mitsuru Sasako, professor of surgeryc,
  4. Bryony Lovett, lecturer in surgeryd,
  5. Damian Griffin, clinical readere
  1. a Academic Unit of Surgery, University of Liverpool, Clinical Sciences Centre, University Hospital Aintree, Liverpool L9 7AL
  2. b Department of Surgery, Royal Free and University College Medical School, Charles Bell House, London W1W 7EJ
  3. c Gastric Surgery Division, National Cancer Centre Hospital, Tsukiji, 5-1-1 Chuo-Ku, Tokyo, Japan
  4. d Basildon Hospital, Nethermayne, Basildon SS16 5NL
  5. e Nuffield Department of Orthopaedic Surgery, Orthopaedic Centre, Oxford OX3 7LD
  1. Correspondence to: P McCulloch, Academic Unit of Surgery, University of Liverpool, Clinical Sciences Centre, University Hospital Aintree, Long Lane, Liverpool L9 7AL

    The quality and quantity of randomised trials of surgical techniques is acknowledged to be limited. According to Peter McCulloch and colleagues, however, some aspects of surgery present special difficulties for randomised trials. In this article they analyse what these difficulties are and propose some solutions for improving the standards of clinical research in surgery

    The improvement in the quality of clinical research in the past decade is to be welcomed, but it carries its own dangers. Some have extrapolated the advantages of the randomised controlled trial (RCT) into the dogma that it is the only valid method for comparing treatments,1 ignoring the difficulties that have hampered the use of RCTs in some disciplines. The RCT has theoretical advantages over other study designs, but experimental studies comparing treatment effect estimates in randomised and non-randomised studies have not consistently confirmed this, 2 3 w1-w3 and the superiority of RCTs should not therefore be accepted as axiomatic.

    Small, poorly conducted RCTs are more likely to result when RCTs are difficult to conduct, and these may then be misleading because their design affords them unwarranted credibility. Surgery seems to be such an area. Until recently, most studies of operations were retrospective case series, with RCTs accounting for less than 10% of the total.w4-w6 RCTs declined from 14% of research articles in the British Journal of Surgery in 1985 to 5% in 1992. 4 5 Treatments in general surgery are half as likely to be based on RCT evidence as treatments in internal medicine. 6 7 Methodological quality was poor in 56% of RCTs comparing cancer surgery techniques.8 Only 58% of these studies described satisfactory randomisation, and few significant outcome differences were found, probably because of type II statistical errors.

    Why is surgery so deficient? Some of the obstacles militate against all scientific studies, but in view of previous specific criticism,w7 we focus on randomised trials and try to evaluate the problems and suggest potential solutions.

    Summary points

    Research in surgery is disadvantaged by the limited quality and quantity of randomised trials of surgical techniques

    Some aspects of surgery present special difficulties for randomised trials

    The existence and nature of these difficulties needs to be recognised, with strategies developed to overcome them

    A proposed strategy involves the integration of modified randomised trials with prospective audit and quality control studies

    Obstacles to randomised trials in surgery

    Historical, structural, and cultural


    History did not favour the validation of surgery by RCTs. After the invention of anaesthesia and antiseptic techniques, surgical treatments were rapidly developed for many previously untreatable conditions. Many current operations were therefore introduced well before randomised trials became established in medicine—unlike most modern drugs. Once a treatment is accepted as standard, testing it against placebo becomes difficult. Rarely, treatment benefits are so obvious that a trial would clearly be unethical,9 but often lack of equipoise (see below) simply prevents studies. This problem applies equally to old drugs—for example, digoxin—which are also difficult to study in RCTs using placebo. For fields such as cardiac surgery, transplantation, orthopaedics, and neurosurgery, however, which have developed rapidly since 1950, surgeons cannot fall back on history to explain the lack of rigour in surgical research.

    Commercial competition and personal prestige

    Doctors can be tempted to ignore evidence that threatens their personal interests. Objectivity about procedures central to a surgeon's reputation is difficult, and RCTs may seem threatening. Private sector competition may affect surgeons particularly strongly, and it arguably influenced the introduction of laparoscopic cholecystectomy. A consensus conference in 199410 quoted many reports of increased bile duct injuries and only two RCTs. 11 12 The benefits that these showed were not overwhelming against this evidence of possible harm, but further RCTs were declared infeasible because the technique was already so widespread. Surgeons' eagerness to learn the operation seemed related more to commercial concerns than to concern for patients.

    Surgeons' equipoise

    Other doctors regard surgeons as making up in self confidence for what they lack in patience, a stereotype containing a kernel of truth. Career surgeons are selected for traits that include comfort with making important clinical decisions quickly with incomplete information. This quality, required for decisive action during operations, may make it difficult for them to be consciously uncertain which of two treatments is better. This state of equipoise, however, is a prerequisite for performing RCTs.

    Box 1: Problems of performing randomised trials in surgery

    • Structural, cultural, and psychological resistance exists to the use of randomisation

    • The inherent variability of surgery requires precise definition of interventions and close monitoring of quality

    • Surgical learning curves cause difficulty in timing and performing randomised trials of new techniques

    • Comparisons of surgical and non-surgical treatments with greatly different risks causes difficulties with patients' equipoise

    • Rare conditions and urgent and life threatening situations cause difficulties with recruitment, consent, and randomisation

    Lack of funding, infrastructure, and experience of data collection

    These are real and major problems for surgical trials.w8 The difficulty is partly self inflicted as funding bodies are influenced by the poor quality of much previous surgical research.w9

    Lack of education in clinical epidemiology

    Subjectively, surgeons' knowledge of clinical epidemiology remains poor despite relevant publications in surgical journalsw10-w17: we have no objective evidence that they receive less specific education than other doctors.13 w15 Surgeons recruit patients for cancer chemotherapy trials14w18 but less readily for trials of surgical technique. Whether lack of education can explain this is unclear.

    Rare conditions and life threatening and urgent situations

    Emergency surgery often occurs outside normal working hours and involves urgent lifesaving treatment, making consent and randomisation difficult. Uncommon conditions are difficult to investigate when accrual of patients takes over two years.13

    Special technical problems

    The learning curve

    Some authors suggest that RCTs of new operations should begin with the first patient.15 w19 Operations, however, are complex procedures, and quality in performance requires frequent repetition over time. Learning curves of similar lengths are reported for disparate operations. 16 17 w20 During the learning curve, errors and adverse outcomes are more likely. Randomising between a familiar and an unfamiliar operation therefore introduces bias against the latter, as observed for gastrectomy.18 This problem for surgical RCTs has few parallels in drug trials.


    Variations on an operation are common and may influence success rates. When comparing operations, clear definitions are therefore needed of the limits on acceptable technical variation. A standard description may be necessary, proscribing all modifications. If definitions are not precise, the treatments delivered may overlap, whereas in drug trials, treatments are usually simple to define exactly.

    Quality control monitoring

    The technical quality of operations undoubtedly affects outcome. Poor quality surgery represents failure to deliver the intended treatment, causing a difference between efficacy and effectiveness. Trials then measure deliverability, not efficacy.w21 Quality control failures may narrow important differences in the surgery received—for example, for gastric cancer 19 20 —and may influence outcomes.w22 w23 Defining and enforcing minimum quality standards may be difficult for surgical trials.

    Development versus research

    RCTs consume substantial resources and are therefore not justified for some questions about small modifications to treatments. Surgical technique typically progresses via such modifications, which individually are unlikely to produce detectable benefits, but which collectively may do so. During the historical progression through hand washing via the use of antiseptics to the aseptic surgical environment, the change in morbidity from surgical infection was huge, but the increment with each step was small enough to allow persistent scepticism.21 Small randomised trials of components of this progression showed no benefit.22w24 If a positive RCT were required before adopting each small improvement, most would be rejected, and progress would be slowed. RCTs are appropriate where a clear, clinically important choice exists between contrasting alternatives. For smaller changes, an industrial paradigm may be needed.

    Patients' equipoise

    Three types of RCT are commonly described as “surgical.” Type 1 trials—standard RCTs comparing medical treatments in surgical patients—account for 75% of “surgical trials.”23 Type 2 trials—comparing surgical techniques—pose the problems described above. Type 3 trials—comparing surgical and non-surgical treatments—pose particular difficulties with the equipoise of patientsw25: patients often reject RCTs because they do not wish their treatment to be decided by chance.w26 Type 3 trials increase this discomfort because the adverse effects of the options often differ enormously and the surgical option is irreversible. Eighty two per cent of problems preventing type 3 trials are related to patients' equipoise.13 Examples of choices include aspirin versus carotid endarterectomy to prevent embolic stroke24 and goserelin versus castration for prostate cancer.25w27 Such trials may recruit slowly, or select an unusual subgroup of patients, making them impractical or their results difficult to generalise.w28


    Blinding is particularly difficult in surgical trials, although creative solutions—such as the use of standardised wound dressings—can succeed.w29 Only a third of surgical trials examined by Solomon et al had adequate blinding of patients and/or surgeons.23

    Proposed solutions

    History—A comprehensive review of the evidence base is needed to indicate areas warranting new trials of old techniques.

    Commercial competition and prestige may be less obstructive in a framework of comprehensive continuous performance evaluation (see below).

    Surgeons' equipoise, if confirmed, may need to be accommodated by including parallel, non-randomised, preference arms alongside RCTs.

    Lack of funding, infrastructure, and experience of data collection require a change to a culture of cooperation rather than competition. This would facilitate the creation of large groups to perform specific trials, thereby attracting funding and developing the infrastructure. This change would require support from bodies responsible for funding clinical research.

    Lack of education in clinical epidemiology needs to be investigated and if necessary corrected through the bodies responsible for postgraduate surgical education and training.

    Rare conditions and life threatening and urgent situations will always be challenging areas for RCTs, but have been successfully studied in other disciplines.26 w30 Paediatric oncologists have illustrated the enormous value of cooperation through their success in trials on childhood leukaemia.27w31

    The learning curve needs to be recognised and evaluated using appropriate statistical techniques.28 Trial methodology will need modification—for example, to show completion of the curve before beginning randomisation,w32 as in two recent trials. 29 30 In theory, patients could also be randomised not to operations but to surgeons, who would perform their operation of preference, although this option remains untested in practice.

    Definition of intervention and quality control monitoring— Precisely defined photographic or video evidence and/or pathological specimens could document the nature and quality of the treatment delivered, as in a recent trial of total mesorectal excision in rectal cancer.31 Norms for pre-trial success rates and complications could provide a basis for defining acceptable quality, making reliable surgical audit data essential for participation in RCTs.

    Development v research— Surgeons should adopt industrial quality assessment techniques to evaluate changes in technique where RCTs are inappropriate.32 The Japanese term “kaizen” defines an evaluative system akin to the classical audit loop.w33 Sequential approaches such as CUSUM33 and the “control curve”32 are also applicable to surgical innovation.

    Patients' equipoise in type 3 trials may be helped by decision analysis techniquesw34 and carefully designed composite end pointsw35 to reflect the contrasting possible outcomes of trial arms.

    Blinding will always be difficult for surgical treatments,34 but blinded observers should be used routinely for evaluating outcomes.w36

    Proposed framework for clinical research in surgery

    This analysis of the problems shows why current practices are not working. We need a framework that reflects the difficulties of evaluation in surgery.

    Embedded Image


    Audit data collection

    The baseline for the scientific study of surgery is routine collection of comprehensive data about practice and outcomes. The culture and organisation necessary for this should permit easy participation in trials, whereas where these are absent, trialists have to develop the trial infrastructure and run it simultaneously. Surgeons need the resources to record a meaningful audit dataset, entailing considerable investment in data acquisition and management resources.

    Continuous performance evaluation

    Systems for continuous quality control, using instruments such as CUSUM, CRAM or VLAD plots 33 35 36 or control curves32 should be used for the analysis of technical innovations. Indications of outcome changes from this surveillance should lead to an audit or kaizen assessment, using decision analysis techniques to determine whether an RCT is warranted.w37 Where it is not, continuing prospective data collection and regular re-evaluation using bayesian analysisw38 provide the best available data on outcome changes and allow reconsideration of the need for an RCT.

    Conduct of RCTs

    When RCTs are necessary, they should routinely be preceded by preliminary phase 2S (phase 2 surgical) studies. These would develop satisfactory definition criteria for the procedure, test measures of surgical quality, define suitable end points, estimate the required sample size, and analyse the learning curve of participants. Such studies would reduce the problems of timing surgical RCTs, and randomisation could be introduced early using “tracker” designs if desired.w39 During randomised data entry, continuous quality control should be linked to preplanned interim analyses by the trial review committee and appropriate stopping rules. Objective validation of quality should evaluate images, pathological specimens, and outcome data against criteria drawn up in the phase 2S study. Parallel preference arms may be used to improve overall power and evaluate generalisability. For type 3 trials, end point design and decision analysis tools to help patients understand their choices may be important.

    Other sources of evidence

    Historically, the surgical literature is poor in RCTs. Meta-analysis of non-randomised evidence should therefore be used wherever appropriate. Where RCTs are difficult for sound reasons, prospective non-randomised designs that minimise known biases should be considered sympathetically by journals and funding bodies.


    The substantial obstacles to RCTs of surgical techniques should be recognised. Alternative methods of studying operations should be based on comprehensive prospective audit data. Where RCTs are appropriate they require attention to the issues of the learning curve, intervention definition, and quality control; a preliminary non-randomised phase is also recommended.

    Box 2: Suggestions for progress in surgical research

    • Detailed prospective “audit” data collection is essential for surgical research

    • Continuous quality control techniques should be used to help determine whether randomised trials are appropriate

    • Larger randomised trials are needed, requiring better cooperation

    • Learning curves and variations in technique and in quality of surgery must be measured and controlled

    • Trials should incorporate a non-randomised initial phase to permit these evaluations, determine suitable end points, and allow sample size calculations

    • The need for study types other than randomised trials should be recognised


    This work was partly inspired by interactions with members of the Cochrane Non-randomised Studies Methodology Group and by the activities of its surgical subgroup. We thank Laurent Audige and Barney Reeves in particular for their helpful criticisms. The final article is the responsibility of the authors and not of the surgical subgroup.


    • Funding None.

    • Competing interests PMcC and DG are members of the Cochrane Non-randomised Studies Methodology Group and its surgical subgroup. PMcC is a member of the Centre for Evidence Based Medicine and is paid to facilitate at its Oxford teaching courses once a year.

    • Embedded ImageReferences cited in the text with the prefix “w” are available on


    1. 1.
    2. 2.
    3. 3.
    4. 4.
    5. 5.
    6. 6.
    7. 7.
    8. 8.
    9. 9.
    10. 10.
    11. 11.
    12. 12.
    13. 13.
    14. 14.
    15. 15.
    16. 16.
    17. 17.
    18. 18.
    19. 19.
    20. 20.
    21. 21.
    22. 22.
    23. 23.
    24. 24.
    25. 25.
    26. 26.
    27. 27.
    28. 28.
    29. 29.
    30. 30.
    31. 31.
    32. 32.
    33. 33.
    34. 34.
    35. 35.
    36. 36.
    View Abstract