Intended for healthcare professionals

Education And Debate Methods in health service research

Evaluation of health interventions at area and organisation level

BMJ 1999; 319 doi: (Published 07 August 1999) Cite this as: BMJ 1999;319:376
  1. Obioha C Ukoumunne, research associatea,
  2. Martin C Gulliford, senior lecturer (martin.gulliford{at},
  3. Susan Chinn, readera,
  4. Jonathan A C Sterne, senior lecturera,
  5. Peter G J Burney, professora,
  6. Allan Donner, chairmanb
  1. a Department of Public Health Sciences, Guy's, King's, and St Thomas's School of Medicine, King's College, London SE1 3QD
  2. b Department of Epidemiology and Biostatistics, University of Western Ontario, London, Ontario, Canada N6A 5C1.
  1. Correspondence to: MC Gulliford

    This is the second of four articles

    Healthcare interventions are often implemented at the level of the organisation or geographical area rather than at the level of the individual patient or healthy subject. For example, screening programmes are delivered to residents of a particular area; health promotion interventions might be delivered to towns or schools; general practitioners deliver services to general practice populations; hospital specialists deliver health care to clinic populations. Interventions at area or organisation level are delivered to clusters of individuals.

    The evaluation of interventions based in an area or organisation may require the allocation of clusters of individuals to different intervention groups (see box 1).1 2 Cluster based evaluations present special problems both in design and analysis.3 Often only a small number of organisational units of large size are available for study, and the investigator needs to consider the most effective way of designing a study with this constraint. Outcomes may be evaluated either at cluster level or at individual level (table).4 Often cluster level interventions are aimed at modifying the outcomes of the individuals within clusters, and it will then be important to recognise that outcomes for individuals within the same organisation may tend to be more similar than for individuals in different organisational clusters (see box 2). This dependence between individuals in the same cluster has important implications for the design and analysis of organisation based studies.2 This paper addresses these issues.

    Summary points

    Health interventions are often implemented at the levels of health service organisational unit or of geographical or administrative area

    The unit of intervention is then a cluster of individual patients or healthy subject

    Evaluation of cluster level interventions may be difficult because only a few units of large size may be available for study, evaluation may be at either individual or cluster level, and individuals' responses may be correlated within clusters

    At the design stage, it is important to randomise clusters whenever possble, adapt sample size calculations to allow for clustering of responses, and choose between cohort and repeated cross sectional designs

    Methods chosen for analysis of individual data should take into account the correlation of individual responses within clusters

    Nature of the evidence

    We retrieved relevant literature using computer searches of the Medline, BIDS (Bath Information and Data Services), and ERIC (Education Resources Information Centre) databases and hand searches of relevant journals The papers retrieved included theoretical statistical studies and studies that applied these methods. Much of the relevant work has been done on community intervention studies in coronary heart disease prevention. We retrieved the content of the papers, made qualitative judgments about the validity of different approaches, and synthesised the best evidence into methodological recommendations.


    We identified 10 key considerations for evaluating organisation level interventions.

    1. Recognise the cluster as the unit of intervention or allocation

      Healthcare evaluations often fail to recognise, or use correctly, the different levels of intervention which may be used for allocation and analysis.5 Failure to distinguish individual level from cluster level intervention or analysis can result in studies that are inappropriately designed or give incorrect results.3

    2. Justify the use of the cluster as the unit of intervention or allocation

      For a fixed number of participants, studies in which clusters are randomised to groups are not as powerful as traditional clinical trials in which individual patients are randomised.2 The decision to allocate at organisation level should therefore be justified on theoretical, practical, or economic grounds (box 1)

      Comparison of levels of intervention and levels of evaluation (adapted fromMcKinlay4)

      View this table:

      Box 1: Reasons for carrying out evaluations at cluster level

      • Public health and healthcare programmes are generally implemented at organisation rather than individual level, so cluster level studies are more appropriate for assessing the effectiveness of such programmes

      • It may not be appropriate, or possible in practice, to randomise individuals to intervention groups since all individuals within a general practice or clinic may be treated in the same way

      • “Contamination” may sometimes be minimised through allocation of appropriate organisational clusters to intervention and control groups. For example, individuals in an intervention group might communicate a health promotion message to control individuals in the same cluster. This might be minimised by randomising whole towns to different interventions

      • Studies in which entire clusters are allocated to groups may sometimes be more cost effective than individual level allocation, if locating and randomising individuals is relatively costly

    3. Include enough clusters

      Evaluation of an intervention that is implemented in a single cluster will not usually give generalisable results. For example, a study evaluating a new way of organising care at one diabetic clinic would be an audit study which may not be generalisable. It would be better to compare control and intervention clinics, but studies with only one clinic per group would be of little value, since the effect of intervention is completely confounded with other differences between the two clinics Studies with only a few (fewer than four) clusters per group should generally be avoided as the sample size will be too small to allow a valid statistical analysis with appreciable chance of detecting an intervention effect. Studies with as few as six clusters per group have been used to show effects from cluster based interventions,6 but larger numbers of clusters will often be needed, particularly when relevant intervention effects are small.

    4. Randomise clusters wherever possible

      Random allocation has not been used as often as it should in the evaluation of interventions at the level of area or organisation. Randomisation should be used to avoid bias in the estimate of intervention effect as a result of confounding with known or unknown factors. Sometimes the investigator will not be able to control the assignment of clusters—for instance, when evaluating an existing service,7 but because of the risk of bias, randomised designs should always be preferred. If randomisation is not feasible, then the chosen study design should allow for potential sources of bias.8 Non-randomised studies should include intervention and control groups with observations made before and after the intervention. If only a single group can be studied, observations should be made on several occasions both before and after the intervention.8

    5. Allow for clustering when estimating the required sample size

      When observations made at the individual level are used to evaluate interventions at the cluster level, standard formulas for sample size will not be appropriate for obtaining the total number of participants required. This is because they assume that the responses of individuals within clusters are independent (box 2).2 911 Standard sample size formulas underestimate the number of participants required because they allow for variation within clusters but not between clusters.

      Box 2: Three reasons for correlation of individual responses within area or organisational clusters

      • Healthy subjects or patients may have chosen the social unit to which they belong. For example, individuals may select their general practitioners on the basis of characteristics such as age, sex, or ethnic group. Individuals who choose the same social or organisational unit might be expected to have something in common

      • Cluster level attributes may have a common influence over all individuals in that cluster, thus making them more similar. For example, outcomes of surgery may vary systematically between surgeons, so that outcomes for patients treated by one surgeon tend to be more similar to each other than to those of another surgeon

      • Individuals may interact within the cluster, leading to similarities between individuals for some health related outcomes This might occur, for example, when individuals within a community respond to health promotion messages communicated through news media

      To allow for the correlation between subjects, the required standard sample size derived from formulas for individually randomised trials should be multiplied by a quantity known as the design effect or variance inflation factor.2 9 This will give a cluster level evaluation with the same power to detect a given intervention effect as a study with individual allocation The design effect is estimated as


      where Deff is the design effect, n0 is the average number of individuals per cluster and ρ is the intraclass correlation coefficient for the outcome of interest.

      The intraclass correlation coefficient is the proportion of the total variation in the outcome that is between clusters; this measures the degree of similarity or correlation between subjects within the same cluster. The larger the intraclass correlation coefficient—that is, the more the tendency for subjects within a cluster to be similar—the greater the size of the design effect and the larger the additional number of subjects required in an organisation based evaluation, compared with an individual based evaluation.

      Sample size calculations require the intraclass correlation coefficient to be known or estimated before the study is carried out.12 If the intraclass coefficient is not available, plausible values must be estimated. A range of components of variance and intraclass correlations is reported elsewhere.13 14

      The number of clusters required for a study can be estimated by dividing the total number of individuals required by the average cluster size When sampling of individuals within clusters is feasible, the power of the study may be increased either by increasing the number of individuals within clusters or by increasing the number of clusters Increasing the number of clusters will usually enhance the generalisability of the study and will give greater flexibility at the time of analysis,15 but the relative cost of increasing the number of clusters in the study, rather than the number of individuals within clusters, will also be an important consideration.

    6. Consider the use of matching or stratification of clusters where appropriate

      Stratification entails assigning clusters to strata classified according to cluster level prognostic factors. Equal numbers of clusters are then allocated to each intervention group from within each stratum. Some stratification or matching will often be necessary in area based or organisation based evaluations because simple randomisation will not usually give balanced intervention groups when a small number of clusters is randomised. However, stratification is useful only when the stratifying factor is fairly strongly related to the outcome.

      The simplest form of stratified design is the matched pairs design, in which each stratum contains just two clusters. We advise caution in the use of the matched pairs design for two reasons. Firstly, the range of analytical methods appropriate for the matched design is more limited than for studies which use unrestricted allocation or stratified designs in which several clusters are randomised to each intervention group within strata.16 Secondly, when the number of clusters is less than about 20, a matched analysis may have less statistical power than an unmatched analysis.17 If matching is thought to be essential at the design stage, an unmatched cluster level analysis is worth considering.18 Stratified designs in which there are four or more clusters per stratum do not suffer from the limitations of the paired design.

    7. Consider different approaches to repeated assessments in prospective evaluations

      Two basic sampling designs may be used for follow up: the cohort design, in which the same subjects from the study clusters are used at each measurement occasion, and the repeated cross sectional design, in which a fresh sample of subjects is drawn from the clusters at each measurement occasion.19 20 The cohort design is more appropriate when the focus of the study is on the effect of the programme at the level of the individual subject. The repeated cross sectional design, on the other hand, is more appropriate when the focus of interest is a cluster level index of health such as disease prevalence. The cohort design is potentially more powerful than the repeated cross sectional design because repeated observations on the same individuals tend to be correlated over time and may be used to reduce the variation of the estimated intervention effect. However, the repeated cross sectional design is more likely to give results that are representative of the clusters at the later measurement occasions, particularly for studies with long follow up.

    8. Allow for clustering at the time of analysis

      Standard statistical methods are not appropriate for the analysis of individual level data from organisation based evaluations because they assume that the responses of different subjects are independent.2 Standard methods may underestimate the standard error of the intervention effect, resulting in confidence intervals that are too narrow and P values that are too small.

      Outcomes can be compared between intervention groups at the level of the cluster, applying standard statistical methods to the cluster means or proportions, or at the level of the individual, using formulas that have been adjusted to allow for the similarity between individuals.2

      Individual level analyses allow for the similarity between individuals within the same cluster, by incorporating the design effect into conventional standard error formulas that are used for hypothesis testing and estimating confidence intervals.2 21 For adjusted individual level analyses the intraclass correlation coefficient can be estimated from the study data in order to calculate the design effect. About 20-25 clusters are required to estimate the intraclass correlation coefficient with a reasonable level of precision and a cluster level analysis is to be preferred when there are fewer clusters than this.

    9. Allow for confounding at both individual and cluster levels

      When confounding variables need to be controlled for at individual level or the cluster level, regression methods for clustered data should be used. The method of generalised estimating of equations treats the dependence between individual observations as a nuisance factor and provides estimates that are corrected for clustering. Random effects models (multilevel models) explicitly model the association between subjects in the same cluster. These methods may be used to estimate intervention effects, controlling for both individual level and cluster level characteristics.22 23 Regression methods for clustered data require a fairly large number of clusters but may be used with clusters that vary in size.

    10. Include estimates of intracluster correlation and components of variance in published reports

      To aid the planning of future studies, researchers should publish estimates of the intracluster correlation for key outcomes of interest, for different types of subjects, and for different levels of geographical and organisational clustering.1214


    Investigators will need to consider the circumstances of their own evaluation and use discretion in applying these guidelines to specific circumstances. Conducting cluster based evaluations may present unusual difficulties. The issue of informed consent needs careful consideration.24 Interventions and data management within clusters should be standardised, and the delivery of the intervention should usually be monitored through the collection of both qualitative and quantitative information, which may help to interpret the outcome of the study.


    This article is adapted from Health Services Research Methods: A Guide to Best Practice, edited by Nick Black, John Brazier, Ray Fitzpatrick, and Barnaby Reeves, published by BMJ Books.

    We thank Kate Hann for commenting on the manuscript.


    • Edited by Nick Black

    • Funding This work was supported by a contract from the NHS R&D Health Technology Assessment Programme. The views expressed do not necessarily reflect those of the NHS Executive.

    • Competing interests None declared.


    1. 1.
    2. 2.
    3. 3.
    4. 4.
    5. 5.
    6. 6.
    7. 7.
    8. 8.
    9. 9.
    10. 10.
    11. 11.
    12. 12.
    13. 13.
    14. 14.
    15. 15.
    16. 16.
    17. 17.
    18. 18.
    19. 19.
    20. 20.
    21. 21.
    22. 22.
    23. 23.
    24. 24.
    View Abstract