Intended for healthcare professionals

Chapter 3. Comparing disease rates

“Is this disease increasing in incidence? Does it occur with undue frequency in my local community? Does its incidence correlate with some suspected cause? Has the outcome changed since control measures were instituted?” To answer such questions means setting two sets of rates side by side and making some sense of the comparison. This chapter examines some of the problems that may arise.

Terminology and classifications of disease

Diagnostic labels and groupings are many and various, and in continual flux: in the interests of communication some standardisation is necessary, even though no single system can meet all requirements.
The ICD system
The International Classification of Diseases, Injuries, and Causes of Death, published by the World Health Organization, assigns a three character alphanumeric code to every major condition. Often a fourth character is added for more exact specification: for example, ICD C92 is myeloid leukaemia”, which may additionally be specified as C92.0 (“acute”) or C92.1 (“chronic”). Broader groupings are readily formed – for example, ICD C81-C96 consists of all malignant neoplasms of lymphatic and haematopoietic tissue. This system is used for coding death certificates. It determines the presentation of results in the registrar general’s reports and in the diagnostic registers of most hospitals.
The system has to be revised periodically to keep pace with medical usage. The ninth revision came into general use in 1979, and has now been superseded by the 10th revision for many applications. When the classification alters from one revision to the next, disease rates may not be directly comparable before and after the change. For example, the eighth revision included separate categories for gastric ulcer and for peptic ulcer of unspecified sites, whereas in the seventh revision this distinction was not made. In this situation some aggregation of categories is needed before valid comparisons can be made.

Measures of association

Several measures are commonly used to summarise comparisons of disease rates between populations, each with its special applications. The definitions given here assume that rates in an “exposed” population are being compared with those in “unexposed” people. The exposure might be to “risk factors” suspected of causing the disease (for example, being bottle fed or owning a cat) or of protecting against it (for example, immunisation). Parallel definitions can be used to compare disease rates between people with different levels of exposure to a risk factor (for example, people with high or low aluminium concentrations in their drinking water).
Attributable risk is the disease rate in exposed persons minus that in unexposed persons. It is the measure of association that is most relevant when making decisions for individuals. For example, in deciding whether or not to indulge in a dangerous sport such as rock climbing, it is the attributable risk of injury which must be weighed against the pleasures of participation.
Relative risk is the ratio of the disease rate in exposed persons to that in people who are unexposed. It is related to attributable risk by the formula: Attributable risk= rate of disease in unexposed persons x ( relative risk- 1)
Relative risk is less relevant to making decisions in risk management than is attributable risk. For example, given a choice between a doubling in their risk of death from bronchial carcinoma and a doubling in their risk of death from oral cancer, most informed people would opt for the latter. The relative risk is the same (two), but the corresponding attributable risk is lower because oral cancer is a rarer disease.
Nevertheless, relative risk is the measure of association most often used by epidemiologists. One reason for this is that it can be estimated by a wider range of study designs. In particular, relative risk can be estimated from case-control studies (see Chapter 8) whereas attributable risk cannot. Another reason is the empirical observation that where two risk factors for a disease act in concert, their relative risks often come close to multiplying. Table 3.1 shows risks of lung cancer in smokers and non-smokers according to whether or not they had worked with asbestos. Risk in smokers was about 10-fold more than in non-smokers, irrespective of exposure to asbestos. Attributable risk does not show this convenient invariance as often as relative risk.
Table 3.1 Relative risks of lung cancer according to smoking habits and exposure to asbestos
Exposure to asbestosCigarette smoking
Closely related to relative risk is the odds ratio, defined as the odds of disease in exposed persons divided by the odds of disease in unexposed persons. People who bet on horses will be aware that a rate or chance of one in 100 corresponds to odds of 99 to one against; and in general a rate of one in x is equivalent to odds of one to x – 1. In most circumstances, the odds ratio is a close approximation to relative risk.
Population attributable risk = attributable risk x prevalence of exposure to risk factor in population Population attributable risk measures the potential impact of control measures in a population, and is relevant to decisions in public health. Attributable proportion is the proportion of disease that would be eliminated in a population if its disease rate were reduced to that of unexposed persons. It is used to compare the potential impact of different public health strategies.


In an ideal laboratory experiment the investigator alters only one variable at a time, so that any effect he observes can only be due to that variable. Most epidemiological studies are observational, not experimental, and compare people who differ in all kinds of ways, known and unknown. If such differences determine risk of disease independently of the exposure under investigation, they are said to confoundits association with the disease.
For example, several studies have indicated high rates of lung cancer in cooks. Though this could be a consequence of their work (perhaps caused by carcinogens in fumes from frying), it may be simply because professional cooks smoke more than the average. In other words, smoking might confound the association with cooking.
Confounding determines the extent to which observed associations are causal. It may give rise to spurious associations when in fact there is no causal relation, or at the other extreme, it may obscure the effects of a true cause.
Two common confounding factors are age and sex. Crude mortality from all causes in males over a five year period was higher in Bournemouth than in Southampton. However, this difference disappeared when death rates were compared for specific age groups (Table 3.2). It occurred not because Bournemouth is a less healthy place than Southampton but because, being a town to which people retire, it has a more elderly population.
Table 3.2 Deaths in males in Bournemouth and Southampton during a five year period
Age group (years)BournemouthSouthampton
No of deathsPopulationAnnual death rate per 100 000No of deathsPopulationAnnual death rate per 100 000
< 1116919252422318972351
All ages56486667416945922995471190
The above example shows the dangers of drawing aetiological conclusions from comparisons of crude rates. The problem can be overcome by comparing age and sex specific rates as in Table 3.2, but the presentation of such data is rather cumbersome, and it is often helpful to derive a single statistic that summarises the comparison while allowing for differences in the age and sex structure of the populations under study. Standardisedor adjusted ratesprovide for this need. Two techniques are available: Direct standardisation
Direct standardisation entails comparison of weighted averages of age and sex specific disease rates, the weights being equal to the proportion’ of people in each age and sex group in a convenient reference population. Table 3.3shows the method of calculation, based on mortality from coronary heart disease in men in the USA aged 35-64 during 1968. Table 3.4 gives standardised rates for men and women in the ensuing years, calculated in the same way, and shows a remarkable fall.
Table 3.3 Example of direct standardisation, based on mortality from coronary heart disease (CHD) in men in the USA aged 35 – 64, 1968
Age (years)CHD deaths/100 000 (1)% of reference population in age group (2)(1)    X (2)
35 – 449334.43 199.2
45 – 5435536012 780.0
55 – 6496129.528 349.5
Total100443 28.7 /100=443
Table 3.4 Coronary heart disease in American men and women aged 35-64: changes in age standardised mortality (deaths/100 000/year) during 1968-1974
The direct method is for large studies, and in most surveys the indirect method yields more stable risk estimates. Suppose that a general practitioner wants to test his impression of a local excess of chronic bronchitis. Using a standard questionnaire, he examines a sample of middle aged men from his list, and finds that 45 have persistent cough and phlegm. Is this excessive? The calculation is shown in.
Table 3.5 Example of indirect standardisation
Age (years) No in study (1)Symptom prevalence in preference group (2)Expected cases = (1) x (2)
35 441508%12
45 541009%9
55 649010%9
First the numbers of subjects in each age class are listed (column 1). The doctor must then choose a suitable reference population in which the class specific rates are known (column 2). (In mortality studies this would usually be the nation or some subset of it, such as a particular region or social class; in multicentre studies it could be the pooled data from all centres.) Cross multiplying columns 1 and 2 for each class gives the expected number of cases in a group of that age and size, based on the reference population’s rates. Summation over all classes yields the total expected frequency, given the size and age structure of that particular study sample. Where 30 cases were expected he has observed 45, giving an age adjusted relative risk or standardised prevalence ratio of 45/30 = 150%. (Conventionally, standardised ratios are often expressed as percentages).
A comparable statistic, the standardised mortality ratio (SMR) is widely used by the registrar general in summarising time trends and regional and occupational differences. Thus in 1981 the standardised mortality ratio for death by suicide in male doctors was 172%, indicating a large excess relative to the general population at the time. To analyse time trends, as with the cost of living index, an arbitrary base year is taken.
Other methods of adjusting for confounders
The techniques of standardisation are usually used to adjust for age and sex, although they can be applied to control for other confounders. Other methods, which are used more generally to adjust for confounding, include mathematical modelling techniques such as logistic regression. These assume that a person’s risk of disease is a specified mathematical function of his exposure to different risk factors and confounders. For example, it might be assumed that his odds of developing lung cancer are a product of a constant and three parameters – one determined by his age, one by whether he smokes, and the third by whether he has worked with asbestos. A computer program is then used to calculate the values of the parameters that best fit the observed data. These parameters estimate the odds ratios for each risk factor – age, smoking, and exposure to asbestos, and are mutually adjusted. Such modelling techniques are powerful and readily available to users of personal computers. They should be used with caution, however, as the mathematical assumptions in the model may not always reflect the realities of biology.