Chapter 2. Quantifying disease in populations

More chapters in Epidemiology for the uninitiated

What is a case?

Measuring disease frequency in populations requires the stipulation of diagnostic criteria. In clinical practice the definition of "a case" generally assumes that, for any disease, people are divided into two discrete classes - the affected and the non-affected. This assumption works well enough in the hospital ward, and at one time it was also thought to be appropriate for populations. Cholera, for instance, was identified only by an attack of profuse watery diarrhoea, which was often fatal; but we now know that infection may be subclinical or cause only mild diarrhoea. Similarly for non-infectious diseases today we recognise the diagnostic importance of premalignant dysplasias, in situ carcinoma, mild hypertension, and presymptomatic airways obstruction. Increasingly it appears that disease in populations exists as a continuum of severity rather than as an all or none phenomenon. The rare exceptions are mainly genetic disorders with high penetrance, like achondroplasia; for most acquired diseases the real question in population studies is not "Has the person got it?" but "How much of it has he or she got?"

One approach, therefore, is to use measures that take into account the quantitative nature of disease. For example, the distribution of blood pressures in a population can be summarised by its mean and standard deviation. For practical reasons, however, it is often helpful to dichotomise the diagnostic continuum into "cases" and "non-cases". In defining the cut off point for such a division, four options may be considered:

Statistical - "Normal" may be defined as being within two standard deviations of the age specific mean, as in conventional laboratory practice. This is acceptable as a simple guide to the limits of what is common, but it must not be given any other importance because it fixes the frequency of "abnormal" values of every variable at around 5% in every population. More importantly, what is usual is not necessarily good.

Clinical - Clinical importance may be defined by the level of a variable above which symptoms and complications become more frequent. Thus, in a study of hip osteoarthritis cases were defined as subjects with a joint space of less than 2 mm on xray, as this level of narrowing was associated with a clear increase in symptoms.

Prognostic- Some clinical findings such as high systolic blood pressure or poor glucose tolerance may be symptomless and yet carry an adverse prognosis. Sometimes, as with glucose tolerance, there is a threshold value below which level and prognosis are unrelated. "Prognosticate abnormal" is then definable by this level.

Operational- For some disorders, none of the above approaches is satisfactory. In men of 50, a systolic pressure of 150 mm Hg is common (that is, "statistically normal"), and it is clinically normal in the sense of being without symptoms. It carries an adverse prognosis, with a risk of fatal heart attack about twice that of a low blood pressure, but there is no threshold below which differences in blood pressure have no influence on risk. Nevertheless, practical people require a case definition, even if somewhat arbitrary, as a basis for decisions. An operational definition might be based on a threshold for treatment. This will take into account symptoms and prognosis but will not be determined consistently by either. A person may be symptom free yet benefit from treatment or alternatively may have an increased risk that cannot be remedied.

Each of these four approaches to case definition is suitable for a different purpose, so an investigator may need to define the purposes before cases can be defined.

Whatever approach is adopted, the case definition should as far as possible be precise and unambiguous. A standard textbook of cardiology proposes these electrocardiographic criteria for left bundle branch block: "The duration of QRS commonly measures 0.12 to 0.16 seconds... V5 or V6 exhibits a large widened R wave..." (our italics). As a basis for epidemiological comparisons this is potentially disastrous, because each investigator could interpret the italicised words differently. By contrast, the epidemiological "Minnesota Code" defines it like this: "QRS duration  0.l2 seconds in any one or more limb leads and R peak duration  0.06 seconds in any one or more of leads, I, II, aVL, V5, or V6; each criterion to be met in a majority of technically adequate beats." If different studies are to be compared, case definitions must be rigorously standardised and free from ambiguity. Conventional clinical descriptions do not meet this requirement.

It is also essential to define and standardise the methods of measuring the chosen criteria. An important feature in diagnosing rheumatoid arthritis, for example, is early morning stiffness of the fingers; but two interviewers may emerge with different prevalence estimates if one takes an ordinary clinical history whereas the other uses a standard questionnaire. Cases in a survey are defined not by theoretical criteria, but in terms of response to specific investigative techniques. These, too, need to be defined, standardised, and reported adequately. As a result, epidemiological case definitions are narrower and more rigid than clinical ones. This loss of flexibility has to be accepted as the price of standardisation. 

Measures of disease frequency
For epidemiological purposes the occurrence of cases of disease must be related to the "population at risk" giving rise to the cases. Several measures of disease frequency are in common use.


The incidence of a disease is the rate at which new cases occur in a population during a specified period. For example, the incidence of thyrotoxicosis during 1982 was 10/100 000/year in Barrow-in-Furness compared with 49/100 000/year in Chester.

When the population at risk is roughly constant, incidence is measured as:

Number of new cases: Population at risk×time during which cases were ascertained

Sometimes measurement of incidence is complicated by changes in the population at risk during the period when cases are ascertained, for example, through births, deaths, or migrations. This difficulty is overcome by relating the numbers of new cases to the person years at risk, calculated by adding together the periods during which each individual member of the population is at risk during the measurement period.

It should be noted that once a person is classified as a case, he or she is no longer liable to become a new case, and therefore should not contribute further person years at risk. Sometimes the same pathological event happens more than once to the same individual. In the course of a study, a patient may have several episodes of myocardial infarction. In these circumstances the definition of incidence is usually restricted to the first event, although sometimes (for example in the study of infectious diseases) it is more appropriate to count all episodes. When ambiguity is possible reports should state whether incidence refers only to first diagnosis or to all episodes, as this may influence interpretation. For example, gonorrhoea notification rates in England and Wales increased dramatically during the 1960s, but no one knows to what extent this was due to more people getting infected or to the same people getting infected more often. 


The prevalence of a disease is the proportion of a population that are cases at a point in time. The prevalence of persistent wheeze in a large sample of British primary school children surveyed during 1986 was approximately 3 per cent, the symptom being defined by response to a standard questionnaire completed by the children's parents. Prevalence is an appropriate measure only in such relatively stable conditions, and it is unsuitable for acute disorders.

Even in a chronic disease, the manifestations are often intermittent. In consequence, a "point" prevalence, based on a single examination, at one point in time, tends to underestimate the condition's total frequency. If repeated or continuous assessments of the same individuals are possible, a better measure is the period prevalence defined as the proportion of a population that are cases at any time within a stated period. Thus, the 12 month period prevalence of low back pain in a sample of British women aged 30-39 was found to be 33.6%. 


Mortality is the incidence of death from a disease

Interrelation of incidence, prevalence, and mortality
Each new (incident) case enters a prevalence pool and remains there until either recovery or death: {recovery
 - prevalence             {  death If recovery and death rates are low, then chronicity is high and even a low incidence will produce a high prevalence:

Prevalence = incidence x average duration

In studies of aetiology, incidence is the most appropriate measure of disease frequency. Mortality is a satisfactory proxy for incidence if survival is not related to the risk factors under investigation. However, patterns of mortality can be misleading if survival is variable. A recent decline in mortality from testicular cancer is attributable to improved cure rates from better treatment, and does not reflect a fall in incidence.

Prevalence is often used as an alternative to incidence in the study of rarer chronic diseases such as multiple sclerosis, where it would be difficult to accumulate large numbers of incident cases. Again, however, care is needed in interpretation. Differences in prevalence between different parts of the world may result from differences in survival and recovery as well as in incidence.

Crude and specific rates

A crude incidence, prevalence, or mortality (death rate) is one that relates to results for a population taken as a whole, without subdivision or refinement. The crude mortality from lung cancer in men in England and Wales during 1985-89 was 1034/million/year compared with 575/million/year during 1950-54. However, this bald fact masks a more complex pattern of trends in which mortality from lung cancer was declining in younger men while going up in the elderly.

mortality from lung cancer in men in England and Wales, 1950-89, by five year age group

Mortality from lung cancer in men in England and Wales, 1950-89, by five year age groups

It is often helpful to break down results for the whole population to give rates specific for age and sex, but it is frustrating if results are given for 35-44 years in one report, 30-49 in another, and 31 to 40 in another. When feasible, decade classes should be 5-14, 15-24, and so on, and quinquennia should be 5-9, 10-14, and so on. Overlapping classes (5-10, 10-15) should be avoided.

Extensions and alternatives to incidence and prevalence

The terms incidence and prevalence have been defined in relation to the onset and presence of disease, but they can be extended to encompass other events and states. Thus, one can measure the incidence of redundancy in an employed population (the rate at which people are made redundant over time) or the prevalence of smoking in it (the proportion of the population who currently smoke).

Some health outcomes do not lend themselves to description by an incidence or prevalence, because of difficulties in defining the population at risk. For these outcomes, special rates are defined with a quasi population at risk as denominator.

Some special rates
Birth rate:  Number of live births Mid-year population   Fertility rate: Number of live births Number of women aged 15-44 years  
Infant mortality rate: Number of infant (< 1 year) deaths Number of live births  
Stillbirth rate: Number of intrauterine deaths after 28 weeks Total births 

Perinatal mortality rate:  Number of stillbirths + deaths in 1st week of life Total births 

NB These rates are usually related to one year

Sometimes the population at risk can be satisfactorily defined, but it cannot be enumerated. For example, a cancer registry might collect information about the occupations of registered cancer cases, but not have data on the number of people in each occupation within its catchment area. Thus, the incidence of different cancers by occupation could not be calculated. An alternative in these circumstances would be to derive the proportion of different types of cancer in each occupational group. However, care is needed in the interpretation of proportions. A high proportion of prostatic cancers in farmers could reflect a high incidence of the disease, but it could also occur if farmers had an unusually low incidence of other types of cancer. Incidence and prevalence are preferable to proportions if they can be adequately measured.