Creative use of existing clinical and health outcomes data to assess NHS performance in England: Part 2—more challenging aspects of monitoringBMJ 2005; 330 doi: https://doi.org/10.1136/bmj.330.7506.1486 (Published 23 June 2005) Cite this as: BMJ 2005;330:1486
- Azim Lakhani, director ()1,
- James Coles, director2,
- Daniel Eayres, senior analyst1,
- Craig Spence, analyst3,
- Colin Sanderson, reader in health services research4
- 1National Centre for Health Outcomes Development, London School of Hygiene and Tropical Medicine, London WC1E 6AZ
- 2CASPE Research, London W1G 0AN
- 3Northgate Information Solutions, Hemel Hempstead HP2 7HU
- 4Department of Public Health and Policy, London School of Hygiene and Tropical Medicine
- Correspondence to: A Lakhani
- Accepted 14 March 2005
In the second of their two articles about using existing routine data to assess performance in the NHS, the authors make practical suggestions about using data for mental health care, potentially avoidable deaths, and forecasting coronary heart disease outcomes, and raise issues about assumptions and technical aspects for discussion
There have been recent calls for better data on NHS outputs and outcomes in England.1–4 However, this will require new data collection that could take several years. In the meantime, creative and informed use of existing data, with clear admission of the known shortcomings, may give some indication of how outcomes are changing.5 The main challenges are to measure health validly and to judge how much any improvements are due to NHS interventions.
In this, the second of our two articles, we explore some of the technical issues involved and make practical illustrative suggestions about how best to use existing data regarding mental health care, potentially avoidable deaths, and forecasting coronary heart disease outcomes.6 Many other quality indicators could be produced along similar lines.7
Suggestions for indicators illustrating a range of methodological issues
Patterns of mental health care
Monitoring the quality of mental health care is problematic. Case fatality rates are low, so indicators based on numbers of deaths are inadequate. Much of mental health care occurs outside hospital without direct data on activity or outcomes. Also, there are few explicit standards. The challenge is to find a way of using data from hospitals to make some inferences about both hospital and community care and to identify aspects of care that are materially below optimum.
The national service framework for mental health highlights the preference for community over hospital care.8 With this policy being expected to produce better outcomes, assessing the way the service for mental health is delivered may be used as a proxy for quality of care. Mentally ill patients vary in numbers of readmissions to hospital and cumulative lengths of stay, and these may reflect variations in the quality and availability of care and support in the community. Too many readmissions and long cumulative lengths of stay may reflect inadequate community care, but too few readmissions and short cumulative lengths of stay may reflect inadequate provision of necessary hospital care.
A combination of the total number of admissions and the total time spent in hospital by patients during a year could be used to assess this. There may well be a trade-off between time spent in hospital and frequency of admission, and it is therefore important to consider the balance between the two variables as well as the individual measures. Studies of observed variation between populations could be used to derive target ranges for acceptable patterns of care.
To test this approach, we followed individual patients aged 17-64 admitted to hospital in April of each year in mental health specialties throughout the financial year, using continuous inpatient spells and the linkage methods described in our first article.6 We calculated the total length of inpatient stay per person during the year and the total number of admissions per person during the year. Readmissions could be to any NHS hospital in England and for any condition, such as injury, not just mental illness. We attributed the values to the primary care organisation that covered the patient's place of residence at the time of first admission.
We found substantial variation between primary care organisations for each of the two variables. To give each component equal weight, we transformed them into z scores (by measuring distance from their reference points and dividing by the standard deviations of the individual distributions). Conventionally, the mean of a distribution is the reference point for z scores. However, the mean is not necessarily a suitable target for a performance score because it is partly a reflection of “poor performance” at the tail end of the distribution.9 We used modified z scores in which the reference points were 1.65 admissions per person and 45.0 days total stay in hospital during the year. We chose these points as a realistic joint target because they had been achieved by a strategic health authority in 2000-1 (the year selected as standard). This “best achieved combination” of a low rate of admissions per person and a low total stay per person, giving each the same level of importance, equates to the lowest composite z score at the level of strategic health authority (see Year 3 in table 1), although at the level of primary care organisation, some trusts achieved even better z scores. We used these reference points to calculate modified z scores across all five years.
The z scores measure how far each primary care organisation is from the defined optimum for total admissions and for total stay. We then produced a composite score for each organisation by adding its two z scores, equating to a summary of that organisation's mix of experience on two fronts. A high composite score would be considered undesirable as it would indicate more hospitalisation than expected. Weighting the z scores before adding them would be a refinement if there was particular concern about the relative importance of the two variables.
Figure 1 shows modified z score isolines at primary care organisation level. Two organisations with a different mix of values for admissions and length of stay may have the same composite z score, reflecting the same performance but different trade-offs. Improvement over time should lead to lower overall composite z scores and a reduction in variation. Table 1 shows non-significant worsening trends in composite z scores at the England level and variation by strategic health authority for each of five financial years. Caution is needed in interpreting these trends because of the substantial recent organisational change in mental health services and disruptions to data collection. However, our analyses show the feasibility of examining patterns of care, rather than focusing on a single variable. The indicator could be refined further by standardising for diagnostic mix, subject to development and testing.
A possible cause of variation between primary care organisations might be differences in the prevalence of illness and in the level of support provided in the community, both by professional bodies and informal networks such as families. A full assessment of the effect of such factors is beyond the scope of this article, but we examined whether variation might be reduced when looking at similar geographical areas. For example, the populations of big cities can be more transient, have a greater incidence of some mental illnesses, and have fewer informal networks. The Office for National Statistics has used cluster analyses to create an area classification for grouping primary care organisations that are most similar in terms of 42 demographic, socioeconomic, housing, and other Census 2001 variables.10 Figure 2 shows the composite modified z score for each primary care organisation grouped within its area group. Although it shows some differences between groups (a slowly increasing z score across the chart), there is much greater residual variation within each group, suggesting that influences other than demography and socioeconomic conditions are largely responsible for the variation, such as service availability and clinical practice.
A measure of wider population mortality attributable to health care
Attempts to assess the contribution of health services to the entire population (not just those using health services) have relied on population based indicators of potentially avoidable mortality. Causes of death are included if there is evidence that they are amenable to healthcare interventions and—given timely, appropriate, and high quality care—death rates should be low among the age groups specified.11 Healthcare intervention includes preventing disease onset as well as treating existing disease.
Two such indicators based on potentially avoidable mortality are published annually for the NHS in the Compendium of Clinical and Health Indicators.7 Nolte and McKee reviewed the use of this concept and proposed an updated list of conditions and age bands for international comparisons, based on more recent evidence of amenability to healthcare interventions.12 We have used their list but have added asthma at ages 0-44 years, which they excluded because of lack of comparability in international studies.
In England 138 346 such deaths occurred in people aged less than 75 during 2001 and 2002, of which 48% were from ischaemic heart disease, 16% from cerebrovascular disease, 9% from colorectal cancer, 9% from female breast cancer, and 6% from pneumonia.
Table 1 shows the trend and geographical variation in deaths that were amenable to healthcare interventions and those that were not in people aged less than 75 years during 1998-2002. Mortality from amenable causes (including ischaemic heart disease) fell from 164 to 132 per 100 000—an average annual improvement of 5.7%. Mortality from ischaemic heart disease improved by an average of 6.5% a year, and mortality from other amenable causes improved by 5.0% a year. This compares with an annual improvement of only 1.0% for mortality from causes not considered amenable.
Figure 3 shows the trends for the 10 year period 1993-2002. There is a discontinuity in the trends as a result of a change in 2001 from ICD-9 to ICD-10 for coding cause of death. The difference between amenable and non-amenable causes in their improvement in mortality suggests improvements in the effectiveness of health care. The main concern about this approach is that trends and geographical variation may partly be due to factors other than the quality of health care, in particular improving socioeconomic conditions.
Forecasting future outcomes attributable to current investments
The observation that today's survival and death rates are at least partly a reflection of the quality of earlier health care applies particularly to primary and secondary prevention of conditions such as heart disease, stroke, diabetes, some cancers, and diseases of childhood. The converse of this is that many of the benefits to health from improved care today will not be seen for many years. One of the implications of this is that a comprehensive assessment of the quality of a healthcare system should include formal forecasts of the longer term effects of recent changes in provision and activity.
Several mathematical models have been and are being developed for doing this. For example, long term relative survival can be predicted for patients with recently diagnosed cancer.13 Another example is a microsimulation model that provides estimates of the annual benefits and costs over the middle and longer term (up to 20 years) of different patterns of healthcare provision and use for coronary heart disease.14 In terms of primary prevention, the model allows exploration of the population effect of improvements in the control of blood pressure and cholesterol and of changes in rates of cigarette smoking. In terms of treatment, it can explore the effects of changing ambulance response times, thrombolysis, and revascularisation rates. The model can produce, for example, estimates of the likely impact of meeting national service framework activity targets for coronary heart disease. Tables 2 and 3 show some illustrative examples of this, extracted from a report on this developmental work to the Department of Health,14 to demonstrate how simulation could be used to inform policy, subject to various assumptions and constraints.14 Currently, the model is being extended to other clinical conditions such as stroke and diabetes. Such models could be used to show the likely impact of new investments in prevention, incremental shifts from treatment to prevention, or alternative mixes of interventions.
Models of this kind are inevitably very demanding of data and assumptions, and there may be a trade-off between rigour and transparency. Their requirements include estimates of baseline levels of risk factors, disease prevalence, and healthcare use; estimates of trends over the forecasting period in exogenous factors (those not determined within the model); and, for cost effectiveness analyses, estimates of how treatment costs vary as levels of activity change. As well as modelling relationships between risk factors and outcomes, they have to be able to deal with combinations of changes in risk factors (such as reducing blood pressure and cholesterol concentrations) and interactions between risk factors (such as the effect of stopping smoking on blood pressure). Management of heart disease is one of the best researched aspects of health care, and, as well as the scientific literature, this model is based on new analyses of data from the health survey for England, the Framingham cohort study, and the British heart survey.14 However, different studies define variables in different ways, and substantial gaps in the literature remain, such as the effects of stopping treatment. Also, such models need maintenance, with new research findings needing to be incorporated regularly.
For discussion and debate
Our two articles are confined to health outcome measures and their proxies. There are, however, many other types of outputs that could be included in assessments of productivity. The following issues and assumptions require further discussion.
Selection of indicators and targets
Attribution of changes in health status to healthcare activity would normally require experimentation such as randomised controlled trials. Since this is not feasible as part of routine delivery of health services, judgment must be used, based on three criteria:
Research evidence or consensus (expressed in policies) suggest that health services (including public health, health partnerships, health advocacy) can have a significant influence on the outcome being measured
Variation between organisations in current performance suggests scope for improvement, with the best showing what is realistically achievable given optimum circumstances9
Variation between organisations in changes in performance over time suggests scope for improvement, with the greatest improvement showing what is realistically achievable given optimum circumstances.
The first criterion is essential for selection of indicators. The other two may not be, as the services may already be performing at an optimal level. Even if all three criteria are met, the outcomes may still reflect interventions not attributable to health services.
Aspects such as quality of life may be of more concern to patients than clinical measures, and therefore more appropriate as measures of outcome, albeit with greater problems of attribution. Absence of routine data on health related quality of life is a serious gap in our knowledge.
For annual cross sectional monitoring, it is important to select indicators that reflect short term impact unless long term impact is clear or can be forecast. Indicators such as incidence of stroke and deaths may reflect the cumulative effect of several natural events and interventions or resource use in the past. Some of these, such as prevalence of obesity and high blood pressure, may also act as proxies for future adverse outcomes and may therefore have a dual role in annual cross sectional monitoring.
Where there is clear evidence of the relation between intervention and health, such evidence may be used to create explicit standards for performance audit, and measures of the level of intervention may be used as proxies for future outputs or health outcomes.
Table 4 shows that the measure of what is achievable depends on the level at which analysis is undertaken. In large populations (regions, strategic health authorities) indicators are likely to be stable but may mask the true “optimum.” Smaller populations (primary care organisations, local authorities) will have fewer events and more inherent variability, and extreme values may reflect atypical circumstances. The reference point that is ultimately selected will be a matter of judgment, as not all events being measured (admissions, readmissions, deaths, etc) are likely to be avoidable. What we are seeking, in the absence of an evidence based standard, is an aspirational target based on reality, towards which we would expect the NHS to be moving. In some cases, this could be informed by what is shown to be achievable in international studies, as has been suggested for cancer survival.
Ideally, numerators and denominators should match—for example, case fatality rates for stroke should be based on all deaths among all patients with stroke, including those not admitted to hospital and who may have either mild disease with lower case fatality or severe disease and death before admission. This is not always possible, and the limitations of what is feasible must be acknowledged. Some indicators measure what happens to known patients, with a risk that those needing care but not receiving it, possibly with poorer outcomes, are excluded.
Any measure of geographical variation or time trends needs to ensure comparability of numerator and denominator data. This may require adjustment of indicators for differences in age, sex, case mix (mix or severity of conditions), etc. A major constraint with existing routinely collected national data is the lack of grouping systems for case mix that are based on prognosis. Grouping systems, such as healthcare resource groups, were designed to create subgroups for comparison that are homogeneous with respect to resource use but not necessarily outcomes. Standardisation also raises questions about what adjustment is legitimate. People in deprived populations might have relatively poor outcomes because of relatively intractable health problems or because of substandard care, or both. Standardisation is undiscriminating and would “protect” the providers against both kinds of effects. Likewise, where there is sex variation, there is a choice between using sex standardised person rates or sex specific rates.
When standardising rates for age (and other variables) the choice of method (direct or indirect) and of the standard population used may affect the results, particularly when comparing sub-national rates. We tested this for hospital case fatality, calculating trends at the England level using both direct and indirect methods and using various years as the standard, and found little difference (table 2, part 16). However, this should be monitored in any new approach to measuring performance. For the correct analysis of trends, data for all years should be adjusted with the same standard and time period.
The stability of the indicator needs to be taken into account. For example, data on strategic health authorities are less prone to yearly fluctuations in rankings than data on primary care organisations because of their larger populations.
Interpretation of data
Variation in data quality (in levels of missing records and missing or invalid codes) could influence trends, particularly if there were biases in such records compared with the rest. In the extra technical material on bmj.com, table 1.1 shows that the levels of incompleteness for indicators based on hospital episode statistics are too small to affect England indicator values and do not vary much between years. Within each year, however, completeness varies by strategic health authority, requiring caution in interpreting comparative strategic health authority data. The accuracy of seemingly valid diagnostic codes has been a source of concern.15 There are now local routine audits of the quality of clinical coding (personal communication, NHS Information Authority) but no national reporting system, which remains a serious gap.
National aggregate values may mask variation in component parts that could be important for productivity assessment. Table 2.2 in the extra technical material on bmj.com shows that there are age and sex specific variations in hospital case fatality and varying time trends. For example, there is convergence between sexes in the 0-5 year old age group over the five years but persistent sex differences in the 60-64 year age group. There are falling trends in deaths in the 75-79 year group but not in the 45-49 year group.
The potential for competition for resources between types of care and conditions needs to be acknowledged at national and local level, because the “best” achieved in one locality for one indicator may have been at the expense of poorer performance in other aspects of health care, reflecting local priorities.
Most service based indicators are incomplete, as data from the independent healthcare sector are missing.
Geographical monitoring is useful, as data can then be interpreted in the context of the strategic roles of strategic health authorities, the commissioning roles of primary care organisations, and local demographic and socioeconomic conditions. Local conditions may explain (although should not justify) poorer outcomes if there are known effective interventions. We found variations (some statistically significant) both between and within the area groups created by the Office for National Statistics for grouping healthcare organisations that are most similar in terms of a range of demographic and socioeconomic conditions. Significant variation within these groups probably reflects influences other than demography and socioeconomic conditions, such as quality of health care.
More rigorous analysis of existing routine clinical data would allow assessment of NHS performance across a wide range of services
Examples of such performance indicators include mental health care, potentially avoidable deaths, and forecasting coronary heart disease outcomes
Various assumptions and technical issues need discussion and debate—that is, the selection of indicators and targets, methods, interpretation of data, and application in productivity measurement
Application in productivity assessment
Practical ways need to be found to incorporate multiple indicators in productivity assessment: they may overlap or interact; some may be more important or relevant than others and may need to be weighted; some may reflect mismatching performance for a given time (see the above discussion on stroke). Techniques for dealing with these issues, such as weighted scores, profiles, etc, are beyond the scope of our two articles but need consideration.
High levels of activity in treatment, rehabilitation, and long term care may show desirable high productivity and improvement, but would be considered an undesirable or negative output of preventive activity for a preventable condition such as stroke.
A cross sectional approach does not take account of sequentially linked events over time, such as patients with myocardial infarction having further infarcts in due course.
Reality is even more complex than the approach taken here, and this should be acknowledged explicitly in any output to avoid sweeping simplistic generalisations during interpretation.
We have shown the feasibility of a variety of ways of measuring health related outputs and outcomes. Data from initiatives such as the new mental health minimum dataset and the new general practice contract should lead to better measurement. Any assessment of productivity requires careful matching of outcomes to the inputs used to achieve them, and this brings in a separate set of issues and assumptions that are beyond the scope of our articles.
Extra technical details of the methods described appear on bmj.com
AL's contributions to the study were made within his role at the Oxford branch of the National Centre for Health Outcomes Development, based at Oxford University, Headington, Oxford.
Contributors AL conceived of the study, drafted the article, and produced the hospital episode statistics based indicators. JC helped draft the article and produced the mental health indicators. DE helped draft the article and produced the population mortality indicators. CSp analysed hospital episode statistics data. CSa helped draft the article and produced the forecastingmodels. Lee Mellers helped analyse the hospital episode statistics data. Bernard Rachet contributed information on the forecasting models (cancer survival). David Rudrum provided editorial support. AL is guarantor for the study.
Competing interests All authors are involved in the work of the National Centre for Health Outcomes Development, either directly or via subcontracts. The centre is funded by the Department of Health and commissioned by it and the Healthcare Commission to develop and produce clinical and health indicators for them and the NHS. The views expressed here are those of the authors and not necessarily of the commissioners.
Ethical approval Not needed.