Intended for healthcare professionals

Education And Debate

Comparing healthcare outcomes

BMJ 1994; 308 doi: (Published 04 June 1994) Cite this as: BMJ 1994;308:1493
  1. C Orchard
  1. National Casemix Office, Information Management Group of the NHS Executive, Winchester, Hampshire SO23 9JA.

    Governments are increasingly concerned to compare the quality and effectiveness of healthcare interventions but find this a complex matter. Crude hospital statistics can be dangerously misleading and need adjusting for case mix, but identifying and weighting the patient characteristics which affect prognosis are problematical for conceptual, methodological, and practical reasons. These include the inherently uncertain nature of prognosis itself and the practical difficulties of collecting and quantifying data on the outcomes of interest for specific healthcare interventions and known risk factors such as severity.

    Need for outcome measures

    The combined pressures of “medical inflation,”1 fiscal constraints, and a shift in attitudes towards publicly funded services during the 1980s have made the search for measures of quality, efficiency, and effectiveness in health care a government priority in industrialised countries. Providers of health services are increasingly required to account for the resources they use. In order to do so they must devise ways of measuring the outcomes of their activities and the extent to which these meet specified objectives, be they medical, social, or financial.

    The complexity of what hospitals do makes such measurement difficult even in the narrow specialty of acute care, not least because the resources used, the choice of treatments, and the observed results (Donabedian's structure, process, and outcome2) are so dependent on the sorts and conditions of patients admitted - the hospital “case mix.” Attempts to measure this to date have focused largely on resource consumption, by collating diagnosis and procedure codes in order to form groups which are homogeneous in costs (proxied by length of hospital stay). However, if comparisons of performance between hospitals are to be meaningful, diagnosis related groups or their English equivalents, healthcare resource groups, will have to be supplemented by measures of input and output. In other words, data about patients will need to be made manageable by being grouped in terms of needs for treatment and the prognoses arising from it.

    Casemix developers in the United States and elsewhere have found that data about patients (as opposed to treatments) are difficult to organise in this way, but one approach is to work from the assumption that patients' needs for health care are best defined in terms of the expected outcomes of specified treatment options. Developing casemix groups on these lines will then entail addressing the conceptual and practical problems surrounding (a) the nature of prognosis, (b) the sources of data about patients, (c) the statistical analysis of those data, (d) the identification and weighting of patient risk factors, and (e) the definition and measurement of severity.


    The nature of prognosis itself is a major difficulty - firstly, because differential rates of illness and recovery are substantially associated with non-medical factors such as socioeconomic class, diet, smoking, family support, genetic predispositions, and physiological reserve. In other words, there is a need to distinguish between health outcomes and healthcare outcomes.3 Even for factors whose effects on general health outcomes are established little is known about their relative weight and the reasons they affect some people more than others. Secondly, prognosis for specific conditions, taking account of medical interventions, is unlikely ever to be other than uncertain and inherently probabilistic4 even if accurate data on all known risk factors could be collected. Finally, estimating these probabilities is complicated by the nature of the outcomes (see box).

    Assessing medical outcomes

    • Outcomes are multidimensional

    • Most outcomes are qualitative

    • Assessment of outcomes will be affected by timing

    • Subgroups of diseases may have differing outcomes

    • Outcomes may not be attributable to specific treatments

    For example, prognosis is multidimensional. At least five aspects have been identified of even a restricted definition of “outcomes” which focuses on the negative results for patients - namely, death, disease, disability, discomfort, and dissatisfaction.5 Many outcomes are qualitative, raising the question of how and by whom they are to be quantified. Their assessment will be affected by the time period over which they are made. For instance, over what period of weeks or years should outcomes for hip replacements be evaluated? Outcomes are likely to vary across subgroups of a disease and may or may not be attributable to specific treatments.4

    Even if consensus can be reached about the anticipated results that particular interventions are intended to achieve variations between observed and expected outcomes are difficult to assess. Quality of care is commonly assessed on the basis of negative outcomes, but as death is unlikely in most cases, other, more ambiguous outcomes will have to be measured. Quantifying such qualitative concerns as pain or discomfort, loss of mobility, or other kinds of disablement is fraught with difficulty, and the methods by which this is done need to be rigorous - not least in making explicit the assumptions and values on which they are based.

    Sources of data

    More difficulties surround the sources from which patient related data are obtained. The only unambiguous method of equalising risk among groups of patients having different treatments is through randomised controlled trials.1 The collection of data from these under the auspices of the Department of Health's research and development strategy6 and their publication as meta-analyses are most welcome. However, extensive use of randomised controlled trials is prohibited by practical difficulties (for instance, achieving true randomisation and the cost in time and money), doubts about the extent to which generalised conclusions about “typical” patients can be applied to particular groups (such as the very old or very young), and ethical problems surrounding informed consent and the withholding of beneficial or application of potentially harmful treatments.1 Moreover, randomised controlled trials are likely to be better tests of effectiveness (whether the chosen treatment “works”) than of quality (whether the treatment is being delivered properly).

    In the United States the need for valid outcome measures is regarded as particularly urgent for economic reasons. The Office of Health Economics (personal communication) estimates that by 1992 healthcare expenditure accounted for over 14% of gross domestic product (more than twice the percentage of the United Kingdom), 40% of this being reimbursed by the government. Because of the limitations of randomised controlled trial data, resort has been made to routine observational data from hospital discharge summaries and health insurance claims for comparing performance between providers. These are adjusted for risk under the Health Care Financing Administration's prospective payment system based on diagnosis related groups (in some states refined by severity scoring, as diagnosis related groups explain resource use rather than prognosis).

    Sources of data on patient outcomes

    • Results of randomised controlled trials (“gold standard”)

    • Meta-analysis (combined results of randomised controlled trials)

    • Routine observational data (surveys, hospital discharge summaries, etc)

    • Cross design synthesis (combined randomised controlled trial results and observational data)

    However, as doubts are widespread about the accuracy of such data and the bias and confounding inherent in their use, the United States General Accounting Office has advocated an alternative source - namely, the so called cross design synthesis, which combines the results of randomised controlled trials with the contents of large databases. The intention is to compensate for the weaknesses of both sources of data by combining their strengths.7 Unfortunately, it is doubtful whether the problems associated with using observational data from routine sources can be overcome so easily.3,8

    More accurate assessments of prognosis and the risk factors concerned will call for additional data to be collected, either from medical records or from patient surveys. With this in mind many patient surveys have been designed, both generic (such as those using the SF 36 questionnaire and the Nottingham health profile) and disease specific (such as the quality of life index for cancer patients or the Stanford health assessment questionnaire for those suffering from arthritis), though more research is needed to validate them.9 Meanwhile, in the United Kingdom better access to clinical data is being brought nearer by the clinical terms project,10 Read codes, and other information management improvements taking shape under the information management and technology strategy of the NHS.11

    Statistical problems

    Nevertheless, major practical difficulties need to be overcome before analyses based on observational data - whether or not combined with randomised controlled trials - could be regarded as credible. The first and most obvious problem, as noted above, is the accuracy of the data themselves. The coding, recording, and measurement of routine patient data in hospitals may be adequate for internal management but not for outcomes evaluation. Secondly, allowance must be made for random variations, which may account for much of the difference between providers' results. When numbers of cases are large enough it is possible to discover whether or not these differences are statistically significant, but when numbers are very small it is difficult to be sure that any variations are not entirely due to chance.12

    Less tractably, observational data are inevitably prone to bias and confounding. Biases arise because in normal medical practice patients are not selected for treatment randomly (as they would be in randomised controlled trials) but on the basis of a range of factors to which different clinicians attach different weights. Decisions about referrals and treatments are made on the basis of individual prognostic assessments, which are likely to be influenced by non-medical factors as well as by estimates of disease severity, comorbidity, and the anticipated effectiveness and acceptability of the intervention.

    Confounding - that is, the presence of additional factors which render apparent associations spurious - is particularly difficult to eliminate, as several recent studies illustrate. For example, controvery surrounded the respective outcomes for transurethral resection of the prostate compared with open surgery for prostatectomy, 13,14 and for angioplasty compared with coronary artery bypass grafting for patients with severe angina.15,16 In both cases it was argued that what appeared to be better outcomes for transurethral resection of the prostate and coronary artery bypass grafting had been confounded by patient factors - namely, age and frailty in one case and severity in the other. At the same time researchers need to beware of cancelling out the effects of one confounding variable by overadjusting for another. Data on respiratory illness might be adjusted for cigarette smoking, for instance, and mask the effects of socioeconomic class.

    Risk adjustment

    Notwithstanding these difficulties, the need for information about outcomes makes the trend towards using risk adjusted observational data for this purpose unstoppable. The research implications may be considerable, as the efforts devoted to researching the aetiology of diseases and the relative effectiveness of different treatments have not been matched by similarly extensive research into patient characteristics associated with good or bad outcomes.

    To date, the growing body of published evidence on outcomes mainly reflects debates about conceptual and methodological matters. Most of the papers examining prognostic factors associated with specific diseases concern acute manifestations or surgical interventions. Comparatively few address the problem of chronic or intractable conditions such as AIDS, arthritis, or diabetes - perhaps understandably, as research into outcomes for chronic conditions like rheumatoid arthritis, for example, is bedevilled both by the problem of agreeing on and measuring the outcomes of interest17 and by the unpredictable nature of the disease itself, whose clinical course is unpredictable and improvements in which may occur regardless of therapy.18

    For acute conditions there is evidence that the association between patient characteristics and results of treatments vary, not only between conditions but between outcomes. In the United States a study of Medicare patients undergoing elective cholecystectomy or transurethral resection of the prostate found that “the factors associated with death were not the same factors associated with adverse occurrence [complications] or failure to rescue [death after an adverse occurrence].”19

    A similar conclusion about differential effects of patient risk factors was reached in another American study of five acute conditions - stroke, lung cancer, pneumonia, acute myocardial infarction, and congestive heart failure. It was found that blood urea nitrogen value, consciousness level, and systolic blood pressure were important key clinical findings for all five conditions, and arterial pH and respiratory rate were important for four of them. However about a fifth of the key clinical findings were condition specific. Researchers conclude that generic factors may be helpful in predicting hospital mortality but that they are not the sole predictors.12,20

    Published work suggests that the following factors are predictive for many acute conditions: age of the patient, severity of the illness at presentation, comorbidity, physiological test results, and previous history or frailty of the patient, or both. Their effects are differential and may be difficult to quantify - even age, as there are suggestions that biological rather than chronological age is important.21 Severity (of secondary as well as primary diagnoses) is often significant, but its definition and measurement are problematical, as discussed below. Like age, comorbidity may or may not be relevant to the condition or treatment being assessed; moreover the recording of secondary diagnoses is notoriously difficult to standardise. Physiological tests will vary according to the condition (for example, antigens or nuclear shape of cancer cells for neoplasms, weight loss before surgery) and may need to be very detailed and specific. Finally, recording of previous history, like comorbidity, is difficult to standardise.

    Common patient risk factors in acute illnesses

    • Age

    • Severity

    • Comorbidity

    • Physiological findings

    • Previous history or frailty


    It seems incontrovertible that allowance must be made for severity when assessing healthcare outcomes, but doing this is far from straightforward. Firstly, clinicians' subjective opinions of severity are not only important determinants of the type of care received but may also be prognostic factors in their own right because of their influence on interactions with patients. Secondly, failure to respond to treatment may be an indication of severity - or an indication that the chosen treatment is inappropriate. Thus for outcome measurements severity needs to be assessed solely in terms of patients - that is, excluding aspects of the structure or process of health care.3

    Several severity scoring systems have been developed, the best known from the United States, but each uses a different definition of what has been described as an “important but nebulous concept.”22 These range from “the treatment difficulty presented to physicians due to the extent and interactions of a patient's diseases”23 to the risk of imminent death in an intensive care unit24 or the risk of imminent organ failure.25 Some systems correlate increasing case complexity with intensity and costs or, using cancer staging as a model, rate the relevant disease rather than the patient.26

    Some of these severity measures are computerised and use retrospective data from hospital discharge summaries. This has obvious drawbacks if the intention is to assess the quality of care, not least because it becomes difficult to separate iatrogenic events from those occurring because of the disease process. The prospective and exogenous (that is, patient rather than treatment focused) systems entail collecting additional data from medical records. This will mean extra expense and compound the difficulties of achieving quality and consistency of data within and between hospitals.


    The problems discussed above are a formidable list. Nevertheless, in the present philosophical and economic setting they are likely to be viewed as challenges rather than insurmountable barriers. Clearly casemix comparisons based on similar needs for care and expected prognoses are essential if purchasers of health care are to assess the appropriateness of existing referral and treatment patterns and make decisions based on quality and equity as well as cost. However, the benefits could be much wider. Doctors and patients would surely gain from having manageable data about the outcomes associated with particular prognostic factors, so that discussions about treatment options could take place on the basis of shared knowledge about probable risks.

    Realising these benefits will entail balancing the ease of collection and doubtful validity of generalised routine data against the better prognostic estimations and greater costs of disease specific physiological data.27 Until more is known about costs, benefits, and valid methods it will be advisable to proceed with caution, building up information about prognostic groupings condition by condition and using empirical data from medical audit systems and outcomes research to validate them.

    While this is taking place the conclusion reached in the United States, notwithstanding the sums already spent on outcomes research there, is that for the time being risk adjusted outcomes data are best used for quality management by hospitals themselves, flagging up areas for internal investigation. Publication of crude mortality figures inadequately adjusted for risk would be counter productive if it resulted here, as it did in the United States, in the most egregious rates being found at a hospice for terminally ill patients somewhere in the West.28

    Summary points

    • Summary points

    • Hospital activities are complex and difficult to measure and their outcomes data need adjusting for case mix

    • Existing casemix systems focus on treatments, but outcomes data willneed to focus on patients

    • Prognosis itself is inherently uncertain and substantially affected by non- medical factors

    • Medical prognoses are multidimensional, qualitative, time specific,and disease specific and may be difficult to attribute to specific treatments

    • Using observational data for outcomes measurement means allowing for inaccuracy, bias, confounding, and chance

    • Identifying and weighting patient characteristics associated with poor outcomes entail rigorous research

    • Prognostic factors, including severity, are difficult to quantify

    • The benefits of casemix adjusted outcomes data should outweigh the problems and costs in producing them


    I thank Mr Trevor Sheldon, of the Centre for Health Economics, University of York, and Drs Phil Anthony, Siva Prakash, and Hugh Sanderson, of the National Casemix Office, for helpful comments.