IDEAL framework for surgical innovation 2: observational studies in the exploration and assessment stages

BMJ 2013; 346 doi: (Published 18 June 2013)
Cite this as: BMJ 2013;346:f3011
  1. Patrick L Ergina, assistant professor of surgery12,
  2. Jeffrey S Barkun, professor of surgery3,
  3. Peter McCulloch, clinical reader in surgery4,
  4. Jonathan A Cook, methodologist5,
  5. Douglas G Altman, director6
  6. On behalf of the IDEAL group
  1. 1Cardiothoracic Surgery Division, McGill University Health Centre, Royal Victoria Hospital, Montreal, Quebec, Canada H3A 1A1
  2. 2Oxford International Programme in Evidence-Based Health Care, University of Oxford, Oxford, UK
  3. 3Department of Surgery, McGill University, Montreal, Canada
  4. 4Nuffield Department of Surgical Science, University of Oxford, UK
  5. 5Health Services Research Unit, University of Aberdeen, Aberdeen, UK
  6. 6Centre for Statistics in Medicine, University of Oxford, UK
  1. Correspondence to: P L Ergina patrick.ergina{at}
  • Accepted 15 March 2013

The IDEAL framework describes the stages of evaluation for surgical innovations. This paper considers the role of observational studies in the exploration and assessment stages. At the exploration stage, the surgical intervention is usually more widely used, and observational studies should collect prospective data from multiple surgeons, deal with factors such as case mix and learning, and prepare for a definitive evaluation at the next stage of assessment. Although a randomised controlled trial is preferable, a high quality observational study would be acceptable if a randomised trial is not feasible or, on rare occasions, deemed unnecessary.


The evaluation of new innovations, from idea developed to accepted practice, has been less orderly in surgery and other interventional therapies than in clinical pharmacology. The IDEAL framework for surgical innovations and recommendations has been designed to describe the stages of evaluation for these interventional therapies (idea, development, exploration, assessment, and long term study), and to highlight the study designs and reporting standards that are likely to prove most useful at each stage.1 2 The first two IDEAL stages are covered in the first paper in this series.3 This second article focuses on the IDEAL recommendations for the use of observational studies in the exploration and assessment stages, and discusses the options for observational study designs and reporting protocols (box 1), using examples of surgical innovations. The final paper in the series covers the undertaking of a definitive randomised controlled trial, mainly in the assessment stage, as well as the long term stage.4

Box 1: Recommendations for observational studies at stages 2b (exploration) and 3 (assessment)

  • Observational studies should generally be prospective and have a protocol

  • A range of outcomes should be collected using standardised definitions

  • Observational studies that are uncontrolled (for example, those based on registry and routine data collection) should be diagnosis based rather than procedure based whenever possible

  • Important patient risk factors and variations in the interventions should be explored

  • Studies should record and report surgeon experience (including any specific training received). Where possible, the effect of skill differences and learning should be assessed using appropriate data analysis

  • Prospective, collaborative observational studies should be designed with a definite evaluation in mind (preferably a randomised controlled trial)

  • Definitive observational studies should use a quasi-experimental study design; protocol driven controlled studies with standardised eligibility and prospective data collection

  • Possible designs include non-randomised controlled trials and interrupted time series

  • Key patient and centre characteristics likely to confound analysis should be considered before conducting study and collecting appropriate data, which would facilitate assessment and adjustment of the case mix, and help matching to control for potential confounding

Reaching the exploration stage (IDEAL stage 2b)

By the exploration stage, the innovation is usually already practiced by many surgeons on an increasing number of less carefully selected patients. Promising evidence of safety and beneficial short term outcomes—without unacceptable complications—will have been generated, or further development would have been halted. Under the IDEAL framework, use of retrospective studies should be limited to hypothesis generation in the earliest stages. Typically, the early evaluation in the development stage (2a) will use small observational studies without contemporaneous comparison groups in highly selected cohorts of patients. The exploration stage (2b) offers the opportunity to obtain higher quality evidence in a more representative patient population and to deal with factors that could hinder the conduct of a proper methodological evaluation. Although developmental refinement of the intervention will probably not cease completely by this point, its adoption by multiple surgeons across different sites will increase variation in the patient case mix, driven by surgeons’ practices and centre infrastructure and policies. One focus of studies in this stage should be to capture variation in practice. In addition, careful tabulation of patient characteristics could suggest potential covariates and confounders influencing outcomes.

Nature and challenges of the exploration stage: preparing for a definitive evaluation

In 1987, Martin Buxton observed that “It’s always too early (to do a randomised trial) until, unfortunately, it’s suddenly too late.”5 In observing past innovations, the exploration stage is often the “tipping point” of a surgical innovation (for example, laparoscopic procedures)—as described by Everett Rogers, where adopters’ characteristics act as drivers or barriers (figs 1 and 2).6 Factors such as whether the technique is too complex or too onerous to learn, and the strength of physician or patient preferences might critically affect its adoption.7 This point could also be described as a time of “clinical equipoise,” because further exponential adoption of this innovation by “early majority” and “late majority” adopters is consistent with a conviction of likely efficacy (for example, trends in diffusion of laparoscopic surgery8). It is at this stage when changes in regulatory structure might have the most profound effects in promoting randomised controlled trials in surgery (for example, approval requirements from the US Food and Drug Administration for drug trial phases for proof of safety and efficacy).

Fig 1 Theoretical adoption curve showing the different stages (according to adopter type) in the diffusion of a surgical innovation6

Fig 2 Example of surgical innovation: laparoscopic procedure adoption.8 Reproduced from reference 8 with permission. Data are percentage of operations carried out using a laparoscopic approach in 1989-2003, from the Nationwide Inpatient Sample, a nationally representative annual sample of hospital admissions in the United States

Several factors are needed to facilitate a definitive evaluation (preferably a randomised controlled trial). These include gathering practical information and fully evaluating the effect of the innovation (benefits and harms) that earlier evaluations would be ill equipped to represent. In the meantime, the new intervention still needs appropriate evaluation, and the highest possible methodological quality of evidence from observational studies should be sought at this stage. Prospective (and possibly controlled) observational studies are the most likely design at stage 2b—their value can be maximised, based on four recommendations.

Firstly, observational studies should collect data for consecutive patients from multiple surgeons (and preferably multiple centres) undertaking the new intervention.9 Ideally, these studies would also be based on disease or diagnosis rather than solely on a new procedure, which would include patients irrespective of subsequent treatment. Such a prospective design is a substantial advance on the usual single surgeon (or single centre) retrospective case series of selected patients undergoing a novel intervention, which have predominated in the surgical literature. There is evidence that retrospective designs can be more susceptible to bias than prospective designs when comparing randomised studies with non-randomised (including both prospective and retrospective) studies.10

A well conducted, large prospective observational study can form the basis for identifying important patient characteristics (the case mix), technical intervention variables (including potential co-interventions), and clinical outcomes of interest. A recent example of this type of collaboration is the International Registry of Acute Aortic Dissection, which uses this design for evidence to guide surgical, endovascular, and medical practice in acute aortic dissection (box 2).11 Data collection sponsored by professional organisations or the government can also help the conduct of later comparative observational studies (for example, the American College of Surgeons’ national programme for surgical quality improvement).12

Box 2: Example of observational study at exploration stage (2b)

International Registry of Acute Aortic Dissection study13
Clinical background at time of conduct
  • Aortic dissection is defined as a tear in the aorta

  • Acute aortic dissection (within 14 days of onset) needs urgent treatment because it is associated with increased mortality and morbidity

  • There are two types of aortic dissection (A and B), according to location

  • The effect of developments in surgical and medical management is uncertain

  • Observational study with registry data collection

  • Eligibility was based on diagnosis—all patients with an acute aortic dissection in 12 large referral centres (six countries)

  • Study included 464 patients between 1 January 1996 and 31 December 1998

  • Data were collected at presentation and from routine hospital records until discharge

  • Physical findings at presentation were diverse, classic findings were often absent

  • For patients with a type A dissection, medical management was associated with a hospital mortality of 58%, compared with 26% mortality for surgical management

  • For patients with a type B dissection, medical management was associated with a hospital mortality of 11%, compared with 31% for surgical management

Secondly, studies at this stage should collect data for a range of outcomes using standardised definitions as well as key patient characteristics. Not only benefits but also harms should be assessed. Surgical research has focused considerably on the risks of short term harm (surgical complications), although with varying extensiveness and clarity. Standardised frameworks should be used—for example, the Dindo-Clavien system14 for postoperative complications.

Thirdly, surgical skill differences and associated learning curves can affect outcomes,15 and an evaluation of surgical variation and learning should be incorporated into study designs at this stage whenever possible.16 We recommend identifying relevant variables that can measure the effect of skill and learning (for example, surgeon or centre “volume,” operating times, quality measures, and appropriate outcomes), and analysing the data sequentially to assess learning, where possible.17 In a sequential statistical analysis of cases (using a cumulative sum control chart) early in the use of robotic beating heart surgery, researchers detected several complications needing further investigation.18

Finally, studies should be conducted not necessarily to be definitive, but rather to prepare for a definitive evaluation study (preferably a randomised controlled trial). We suggest that professional or government bodies promote collaborative multicentre observational studies to evaluate important new interventions in their specialty, and incorporate their work as a strong foundation towards a definitive randomised controlled trial, as a secondary aim. Collected information can inform the timing of a trial (or another type of high quality, prospective study) with respect to equipoise, the key research question, and the appropriate study population. In addition, standardisation of the intervention, quality assurance techniques, and appropriate validated and measurable outcome measures can be assessed. Several successful examples of this approach to consensus development of a trial have been published.19 In some circumstances, a feasibility or pilot trial could be a natural intermediate step between a prospective observational study and a definitive randomised controlled trial,20 which can identify specific enablers and barriers.

Nature and challenges of assessment (IDEAL stage 3)

Use of observational studies as a definitive evaluation in lieu of a randomised controlled trial

Assessment is the stage in the IDEAL framework that requires a definitive evaluation, preferably a randomised controlled trial. On rare occasions, a randomised comparison might be considered unnecessary, owing to the magnitude of evidence from early evaluations (for example, the parachute scenario21). However, the risks of error due to bias are easily underestimated; therefore, as the magnitude of the treatment effect becomes smaller, one should be cautious about relying on such evidence. Criteria based on the signal to noise ratio suggesting that at least a 5-fold to 10-fold improvement in improvement or cure is needed for a randomised controlled trial to be considered unnecessary, have been proposed.22 Few new interventions achieve such striking results, and most will need a randomised controlled trial to give confidence of their efficacy. More likely reasons for not using a randomised trial are that it is considered impractical; this can be due to anticipated recruitment difficulties, the low likelihood of a timely completion (for example, key technology becoming outdated by the end of the trial); or the study will be prohibitively expensive. In this scenario, careful consideration of how to obtain observational data of the greatest value and quality is particularly important.

Any observational study conducted as an alternative to a high quality, randomised controlled trial should have as many positive design features of such a trial as possible.23 The study should have a prospective design with a detailed research protocol (ideally published at the outset) that clearly describes and defines a standardised intervention, the eligibility criteria and characteristics for patients being treated with the novel intervention, and the incorporation of quality control measures regarding delivery of the intervention. A unique circumstance when an observational study might be needed is if there is no viable alternative therapeutic option (for example, organ transplantation of the heart24 or liver25 for severe advanced stage disease). Many examples of prospective, uncontrolled observational studies have successfully provided evidence to guide practice in surgery.26

We consider in turn two quasi-experimental designs: non-randomised controlled trials and interrupted time series. These designs are methodologically stronger options than uncontrolled prospective observational studies,27 and could fulfil the role of a definitive evaluation when a randomised controlled trial is infeasible (box 3 shows an example).

Box 3: Example of observational study at assessment stage (3)

Minimally invasive, open radical prostatectomy with and without robotic assistance28
Clinical background at time of conduct
  • Open retropubic radical prostatectomy (RRP) is commonly used to treat prostate cancer

  • Use of minimally invasive radical prostatectomy (MIRP) with or without robotic assistance had been proposed as an alternative, and its use is increasing

  • Non-randomised controlled trial nested within data collection from a population based registry

  • Men diagnosed with prostate cancer as their first and only cancer were eligible

  • Men who underwent MIRP between 2002 and 2005 (n=1938) were compared with men who underwent RRP (n=6899) using a propensity score adjusted statistical analysis

  • Registry data were linked with US Medicare administrative data

  • Compared with RRP, MIRP resulted in a shorter length of stay, fewer strictures, and fewer 30 days respiratory and miscellaneous surgical complications—but a higher occurrence of incontinence, erectile dysfunction, and 30 day genitourinary complications

  • Use of additional postoperative cancer treatments was similar for both approaches

Non-randomised controlled trials

The preferred observational design is a non-randomised controlled trial; a study in which a cohort of patients undergoing a novel surgical intervention is compared with a concurrent control group undergoing standard treatment (standard surgical, medical, or no treatment). The study should incorporate the positive design features associated with a randomised controlled trial (for example, a prospective design and standardised data collection), with the exception of randomisation and blinding. Such studies provided the first convincing prospective evidence for benefits in coronary artery bypass surgery29 and laparoscopic cholecystectomy.30

In a randomised controlled trial, random allocation will probably achieve balance for known and unknown risk factors and minimise bias. Selection bias in a non-randomised controlled trial can be addressed by controlling for known risk factors (case mix) in the analysis. Relevant risk factors, how they should be documented, and potential for bias should be considered before starting data collection. Patient characteristics at study entry should be thorough documented. Treatment group assignment can, however, have a different risk pattern at baseline, and this can lead to groups being less comparable after statistical adjustment (for example, regression) owing to the “constant risk fallacy” where the assumption of constant risk across different organisations (for example, hospitals) may be inappropriate.31

Nevertheless, adjustment or matching for known prognostic factors should generally be done where possible (for example, using propensity scoring and corresponding analysis, which is an increasingly popular approach32), while recognising the limitations of such analyses. The estimated treatment effects can be assumed to be unbiased only if matching stratified analyses or regression techniques are sufficient enough to fully deal with risk imbalance—that is, when treatment allocation is ignorable in terms of baseline risk.33 A cautionary example of the importance of risk adjustment is the Veterans Affairs National Surgical Quality Improvement programme’s study of long term outcomes after bariatric surgery. The survival advantage observed in the unmatched cohort disappeared when researchers used propensity scoring to analyse a matched cohort.34 In some instances, results from randomised and observational studies have corresponded with respect to the magnitude of the effect size, although in general, observational studies have a greater risk of bias.35 36

Interrupted time series

The interrupted time series is an alternative quasi-experimental design for an observational study that could potentially be used at the assessment stage.37 The design uses a temporal rather than concurrent control group. A key outcome of interest (such as anastomotic leakage, graft failure, or death) is measured sequentially during a time period before the new intervention is introduced (that is, the interruption) and measured again during the same period afterwards.

Interrupted time series may be particularly suited to evaluating interventions that can be implemented at a centre with a long history of treating a particular disease (such as congenital heart disease). Although the design has been used to evaluate the effect of new interventions, it has not typically been used for evaluating clinical intervention efficacy. This design can be more susceptible to bias than non-randomised controlled trials, if not enough patient data are available to investigate and control risk factors. The design is particularly useful to assess secular trends in clinical care—that is, changes with time that could affect outcomes for all patients. The design should, whenever possible, be strengthened by adding a control group (that is, a parallel time series from a group where there the new intervention is not used).

An interrupted time series has been used to track the effect of new surgical interventions (such as laparoscopic cholecystectomy on rates of bile duct injury38), evaluate quality of care (for example, in relation to rates of cardiac surgery mortality39), and estimate associated healthcare costs.40 Surgical studies are often complicated by the nature of complex interventions and potential co-intervention effects (for example, medical and anaesthesia treatment of surgical patients), and an interrupted time series could isolate these effects by tracking the onset of factors (that is, interruptions) other than the surgical intervention itself.


After the refinement and definition of the innovation in small studies with short endpoints for preliminary investigations at IDEAL development stage, the evaluation of a new surgical intervention enters the exploration stage. At this stage, researchers should obtain the highest possible quality of evidence from prospective observational studies and prepare for a definitive evaluation (preferably with a randomised trial design). Key factors for the evaluation to address include defining patient prognostic variables, characterising and standardising the surgical intervention, assessing learning, and identifying appropriate outcomes. Studies should use clear standardised definitions of key concepts, and be designed to promote a definitive evaluation at the assessment stage. Observational studies at the exploration stage should be based on a disease or indication, rather than just on the new procedure or technology of interest.

A randomised controlled trial is the preferred study design for definitive evidence and should be used wherever possible. But a high quality observational study may be acceptable if a trial is not feasible or, on rare occasions, deemed unnecessary. Observational studies should be carefully designed and conducted to maximally reduce the risk of bias. In such cases, quasi-experimental study designs should be considered (in particular non-randomised controlled trials or controlled interrupted series).

Summary points

  • Observational studies at IDEAL exploration stage (2b) should collect data for consecutive patients from multiple surgeons, including key case mix characteristics that are likely to influence outcome. Where appropriate, adjustment or matching should be used to control for potential confounding in the statistical analysis

  • Studies at the exploration stage should investigate the effect of technical parameters as well as skill and learning, assess the full range of outcomes, and prepare for a definitive (preferably randomised) evaluation at the assessment stage (3)

  • A non-randomised controlled trial or interrupted time series could fulfil the role of a definitive evaluation at the assessment stage if a randomised controlled trial is not feasible, or on rare occasions, considered unnecessarily


Cite this as: BMJ 2013;346:f3011


  • Research Methods & Reporting, doi:10.1136/bmj.f3012
  • Research Methods & Reporting, doi:10.1136/bmj.f2820
  • The Health Services Research Unit is core funded by the Chief Scientist Office of the Scottish Government Health and Social Care Directorates. Views expressed are those of the authors and do not necessarily reflect the view of the Chief Scientist Office or the funders.

  • Contributors: JAC and PM formulated the IDEAL series to which this paper belongs. PE and JB wrote the first draft of this paper, and JC, DA, and PM all commented on the draft. All authors approved the final version, and PE is guarantor. The papers were informed by the IDEAL workshop in December 2010.

  • IDEAL workshop participants (December 2010): Doug Altman, Jeff Aronson, David Beard, Jane Blazeby, Bruce Campbell, Andrew Carr, Tammy Clifford, Jonathan Cook, Pierre Dagenais, Philipp Dahm, Peter Davidson, Hugh Davies, Markus Diener, Jonothan Earnshaw, Patrick Ergina, Shamiram Feinglass, Trish Groves, Sion Glyn-Jones, Muir Gray, Alison Halliday, Judith Hargreaves, Carl Heneghan, Jo Carol Hiatt, Sean Kehoe, Nicola Lennard, Georgios Lyratzopoulos, Guy Maddern, Danica Marinac-Dabic, Peter McCulloch, Jon Nicholl, Markus Ott, Art Sedrakyan, Dan Schaber, Frank Schuller, Bill Summerskill.

  • Funding: The IDEAL group meeting in December 2010 was funded by the National Institute for Health Research’s Health Technology Assessment programme, Johnson & Johnson, Medtronic and Zimmer (all unrestricted grants). JAC holds a Medical Research Council Methodology Fellowship (G1002292).

  • Competing interests: All authors have completed the ICMJE uniform disclosure form at and declare: PM received financial support from the National Institute for Health Research’s Health Technology Assessment programme, Johnson & Johnson, Medtronic, and Zimmer for the IDEAL collaboration and for a workshop; no other financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

  • Provenance and peer review: Not commissioned; externally peer reviewed.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: