Intended for healthcare professionals


New approaches to evaluating complex health and care systems

BMJ 2016; 352 doi: (Published 01 February 2016) Cite this as: BMJ 2016;352:i154
  1. Tara Lamont, scientific adviser1,
  2. Nicholas Barber, director of research2,
  3. John de Pury, assistant director of policy3,
  4. Naomi Fulop, professor of healthcare organisation and management4,
  5. Stephanie Garfield-Birkbeck, assistant director5,
  6. Richard Lilford, chair in public health6,
  7. Liz Mear, chief executive7,
  8. Rosalind Raine, professor of healthcare evaluation4,
  9. Ray Fitzpatrick, professor of public health and primary care8
  1. 1National Institute for Health Research Health Services and Delivery Research Programme, University of Southampton, SO16 7NS, UK
  2. 2The Health Foundation, London, UK
  3. 3Universities UK, London, UK
  4. 4University College London, UK
  5. 5University of Southampton, UK
  6. 6University of Warwick, Coventry, UK
  7. 7North West Coast Academic Health Science Network, Warrington, UK
  8. 8University of Oxford, UK
  1. Correspondence to: T Lamont t.lamont{at}
  • Accepted 11 December 2015

Tara Lamont and colleagues discuss how researchers can help service leaders to evaluate rapidly changing models of care, with a range of approaches depending on needs and resources

The NHS has many examples of effective service changes that took too long to implement, from structured patient education in diabetes1 to enhanced recovery programmes in surgery.2 Other initiatives have seemed promising but didn’t deliver—or made things worse. For example, telephone triage and some types of case management increase demand for services rather than divert pressure from urgent care.3 Without the right evaluation, it is difficult to know which innovations are worth adopting. The scale of opportunity and real costs of implementing untested innovations and ignoring lessons learnt elsewhere are substantial.

In 2015 a large international summit was held in London, convened by the National Institute for Health Research, the Health Foundation, the Medical Research Council (MRC), Universities UK, and AcademyHealth, which led to an authoritative overview of the array of methods available to evaluate healthcare services.4 Here we summarise a parallel discussion that took place between research funders, practitioners, and leaders to identify the institutional barriers to healthcare evaluation and potential solutions. We argue for closer partnership between service leaders and researchers, based on a shared culture of basic principles and awareness of a range of options for evaluation.

Time to evaluate

At a time of straitened resources we cannot afford to make poor choices. As Twain said, “Supposing is good, but finding out is better.” This is the right time for researchers to get more engaged in supporting service change. In 2014 the NHS Five Year Forward View set out clearly the case for major system innovations and new ways of working.5 It suggests that future gains will come as much from changes in process and service delivery as from technological fixes. We need to better understand the ways in which care can be organised to improve quality and reduce costs.

Continuum of evaluation

Research and service innovation have not always been aligned, often seeming like two different cultures, with researchers focused on rigour and reliability and service leaders needing immediate, clear answers. But these polarised positions are not helpful, and the debate is tired. Investigators increasingly understand that research evidence is only one factor in decision making. Managers have a real appetite for evidence to underpin service change, with high demand for recent university workshops on evaluation for healthcare staff.

The MRC published a framework for complex evaluations in 2000,6 updated in 2008,7 which provided welcome recognition of the need for multiple methods and variants on experimental design. Further useful MRC guidance has been published on process evaluation.8 But the guidance is often focused on rigorous assessment of single services—it is more difficult to apply to complex, emerging services spanning organisational boundaries. A continuum of evaluation activity exists, depending on resources, need and purpose (fig 1).


Fig 1 Continuum of evaluation activity, from local to national effort (developed from an evaluation spectrum used by North Thames CLAHRC)9

There are five essential questions for evaluations at any point on the spectrum:

  • Why—Clarify aims and establish what we already know from evidence

  • Who—Identify and engage stakeholders and likely users of research at outset

  • How—Think about study design, using an appropriate mix of methods, and adjust for bias where possible (or at least acknowledge)

  • What—Consider what to measure (activity, costs, outcomes) and combine data from different sources

  • When— Pay attention to timing of results to maximise impact

What are the aims of the intervention and who are the main stakeholders?

Complex changes, such as reconfiguring clinical services, involve many people with different goals that are not always fully articulated. Early dialogue between service and research is crucial. The Nuffield Trust has commended the approach taken to evaluate pioneer accountable care organisations in the United States, where objectives were coproduced by the centre and participating sites.10 A model of change was created to specify how the interventions could plausibly produce the desired outcomes and how to assess their effects.

Evaluating service innovations is politically charged. Researchers need to be sensitive—and robust—in managing relationships with service leaders who may be invested in particular outcomes. A review for the European Commission noted the weak evidence base on the economic effects of integrated care, counter to policy assumptions of cost containment as well as quality improvement.11 The authors also questioned “whether integrated care is an intervention that, by implication, ought to be cost effective and support financial sustainability or whether it is a complex strategy to innovate and implement longlasting change . . . at multiple levels.”

At the same time, policy initiatives can provide natural experiments12—for example, comparisons were made between the different approaches by the four UK countries to implementing patient choice.13

What approaches can we use?

A range of new approaches and study designs take account of the complexities of changing services and systems (box).4 Most evaluations benefit from mixed methods.14

Examples of approaches and methods used in evaluating complex services4

  • Mixed methods research—Bringing together quantitative and qualitative research and integrating findings14

  • Quasi-experimental design—Controlled studies, such as interrupted time series, without randomisation of intervention and control7

  • Realist evaluation and programme theory—Examining relations between context, mechanism, and outcome to find out what works, for whom, in what context15 16

  • Complex adaptive theory—Understanding the behaviour of diverse, interconnected agents and processes from a system-wide perspective17

  • Stepped wedge trials—Interventions are rolled out sequentially and randomly to patient cohort18

  • Embedded or pragmatic trials—To test effectiveness of interventions in real world clinical practice using broad criteria and flexible approach19

  • Process evaluation—To understand how an intervention is enacted (often alongside outcome evaluation) with focus on implementation, mechanism of effect, and context8

  • Difference in difference analysis—Statistical method for comparing intervention group with reference population20

  • Propensity score matching—Non-randomised method of matching treatment group with control, according to distribution of observed variables19 21

  • Natural experiment—Observational studies to assess impact by comparing countries, regions, or organisations where different policies have been enacted12

  • Normalisation process theory—Sociological framework to study how and why some activities or interventions, but not others, become embedded (normalised) in routine practice22

In clinical trials, more attention is now given to the heterogeneity of treatment effects and the need to evaluate interventions that may be neither stable nor fixed (such as non-manualised psychotherapies). Service innovations are even more complex, and this complexity needs to be embraced, not eliminated.17 The simple question “does it work?” may not always be enough—we need to link data on outcome and costs with qualitative methods23 to tackle the questions of “how” and “why.”23 24 Realist evaluation,15 16 which uses programme theory to identify likely causes and mechanisms of change, has been used in health services research—for example, to evaluate interventions to manage referrals.25 This shifts the focus from what works to which preconditions make certain outcomes more likely, for which people, and in which context. A simple binary of success or failure is not always helpful, especially if it precludes learning from multiple sources. An evaluation of virtual wards, for example, showed a limited effect on reducing emergency admissions but highlighted the importance of dedicated ward clerks and organising schemes around groups of practices.26 Research can help show which elements of context are most important for wider implementation.27

Observational studies can be used to compare settings and models. For example, the landmark birthplace in England study of 60 000 births provided strong evidence on the relative quality, safety, and costs of birth in different settings.28 This study informed changes in national guidelines on intrapartum care.29

But observational studies may not always provide the right evidence to support decisions. Recent evaluations have used pragmatic and naturalistic designs. Stepped wedge designs, which have built in contemporaneous controls, can be a powerful way of evaluating policy and practice as it is introduced.18 Each practice can be assessed against itself before and after change and against peer practices. For example, they have been used to study predictive risk tools to reduce avoidable admission,30 targeted case finding for cardiovascular disease prevention,31 and GP led medication review of older people ( Further afield, this method has been used to assess the effect of tuberculosis screening in Brazil and of school breakfast programmes in New Zealand.32 Other forms of embedded and pragmatic trial design are now being used to assess pressing problems in the NHS, such as inequalities in access to cancer screening.19 33

What will we measure?

Linking particular changes to particular outcomes can be challenging, given many influences inside and outside organisations. And complex policy dependent interventions are likely to have many and diffuse effects.34 For example, the effects of complex safety interventions can be charted by a range of measures, from complication rates to complaints and process measures.35 Managers will look at the qualitative and quantitative data to check that they point in the same direction.

Data availability and quality presents a challenge and an opportunity for researchers. Some studies have made imaginative use of a range of routine data, from hospital episode statistics to prescribing information.36 This could be exploited further, with greater use of clinical audit data and routine costing and financial information. Data linkage provides a powerful way to assess system-wide changes using multiple data sources, such as in a recent study that combined data from general practice, hospital, and cancer registries.36 Evaluations can use national reference data and difference in difference analysis20 or propensity score matching21 to compare interventions at sites with similar cohorts.37 38 Local services can use routine data to check findings and to test emerging assumptions. New types of analysis are starting to become available, from text mining to use of big data, which may provide opportunities for future evaluations.

When should we evaluate?

One of the key challenges to effective evaluation is timing. To paraphrase Martin Buxton, it’s always too early to evaluate a new technology until suddenly it’s too late.39 Normal timelines for research proposals, including governance steps, are sometimes far longer than service planning cycles. But service changes do not always happen at the pace managers would like, and some have long lead-in times. Whatever the timelines, early engagement with service leaders is essential to capture baseline data, to start clarifying aims, and to agree best approaches within time and resource constraints.

Another challenge is that planned service changes can change over the lifetime of a project. This requires an imaginative approach—as seen in the alongside evaluation of a large scale service transformation (modernising stroke, kidney, and sexual health services in London). This involved repeated iterations of testing and refining theory against emerging findings.40 The five year evaluation of a region-wide pay for performance scheme similarly had to adapt as the intervention changed mid-scheme with introduction of new commissioning targets.41

These contemporaneous evaluations require different rules of engagement between the service and researchers, challenging traditional assumptions about objectivity and independence. Critical distance is important, but good evaluation teams will work closely with study sites, sharing findings to test the validity of emerging data. These relationships need to be constantly negotiated and carefully managed by senior, experienced field staff. Emergent literature on new forms of collaborative and participatory research highlights these challenges.42 The new partnerships of healthcare organisations and universities in England embody different kinds of collaboration and co-production.43 Features include matched funding to service innovations and their evaluation and joint working between research and service staff to formulate research problems and to implement solutions. Mechanisms also exist for spreading innovation through active networks of service, research, and industry partner organisations.44

Evaluations may be formative, using findings to optimise implementation, or summative, producing evidence of ultimate impact. Many studies combine both, but careful thought is needed to protect the integrity of summative evaluations.45 Sometimes timing is all—studies can maximise impact by timely release of findings without compromising scientific standards. For example, the evaluation of the reconfiguration of acute stroke services published important interim findings that influenced decisions for more radical centralisation of services in Manchester.46

These new ways of working pose a challenge for research funding bodies. Commitment to open and fair commissioning with expert review takes time. This can seem out of kilter with service needs and pace of change. In response, many funding bodies are experimenting with new ways to streamline processes for funding and publishing research, including decision gates for larger projects to take account of accumulating data and changing context without undue delay.47

More could be done to maximise the impact of evaluative research, at local and national levels. We know that managers place greater emphasis on personal experience and learning from other sites than more formal sources of evidence.48 The growing science of evidence use and implementation underlines some key points for research design, including making connections to local context, belief, and values. New theoretical frameworks, including normalisation process theory, help examine why some changes are more readily adopted than others,22 such as in a recent study of secondary fracture prevention services.49

We have always known the importance of opinion leaders in sharing learning.50 This is now enhanced by social media. These new platforms can reach wider audiences, adding context and commentary to findings in a way that can engage leaders, managers, and frontline staff in understanding evaluative research.


Evaluation is becoming democratised. Service leaders and managers are keen to assess the effects of changes and to learn from others. Any organisation can carry out a simple online survey of patient satisfaction (and this can be done well or badly), but more leaders now recognise that this will not tell you enough about the impact and sustainability of a complex service development. Large scale changes, which could have lessons for others at a national level, need independently funded controlled research. We have described some of the powerful new methods for doing this. But sometimes local audits and simple measurement are good enough. We have identified some key principles for good evaluation, which can be applied at local and national level depending on need. Researchers can help by working with service leaders to articulate the goals and describe the components of planned change; to synthesise helpful evidence on related interventions; to identify key stakeholders, appropriate methods, and outcome measures; to test early findings with target audiences; and to consider the best ways to share results. Whatever the resources and timescale, careful thought at the start of a project will pay dividends.

Key messages

  • We need to move beyond the unhelpful notion of service and research being two separate cultures

  • A spectrum of study designs and methods are now available to tackle challenges in evaluating complex and emergent services

  • Researchers can help service leaders to clarify goals, gather relevant evidence, and identify proportionate approaches for evaluating planned changes


Cite this as: BMJ 2016;352:i154


  • Contributors and sources: This paper is the result of a roundtable discussion between the named authors in May 2015 funded by the National Institute for Health Research, Medical Research Council, Health Foundation, and Universities UK.  The views expressed in this paper are not necessarily those of these organisations. TL wrote the first draft. All commented on subsequent drafts and all read and agreed the final version. Further helpful comments were received from Peter Brocklehurst, University College London. TL is guarantor.

  • Competing interests: We have read and understood BMJ policy on competing interests and declare: NF is current recipient of research grant from NIHR; TL, SGB, RR, RL, and LM have NIHR funded posts and positions.

  • Provenance and peer review: Not commissioned; externally peer reviewed.


View Abstract