Using data from the 1991 censusBMJ 1995; 310 doi: https://doi.org/10.1136/bmj.310.6993.1511 (Published 10 June 1995) Cite this as: BMJ 1995;310:1511
- F Azeem Majeed, lecturer in public health medicinea,
- Derek G Cook, senior lecturer in epidemiologya,
- Jan Poloniecki, lecturer in medical statisticsa,
- David Martin, lecturerb
- a Department of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE
- b Department of Geography, University of Southampton, Southampton SO17 1BJ
- Correspondence to: Dr Majeed.
- Accepted 13 April 1995
Box 3—Effect of data modification on small area statistics
SAS table 12—Long term illness in households: residents in households with limiting long term illness. Data taken from an enumeration district in Wandsworth Health Authority
DATA SUPPRESSION FOR AREAS WITH SMALL POPULATIONS
Small area statistics are released only for enumeration districts with 50 or more residents and 16 or more households. For enumeration districts below either of these thresholds, the only data that are available are the total number of people present, the total number of residents, and the total number of resident households. All other small area statistics for these enumeration districts are combined with those for a neighbouring enumeration district, provided that the combined total number of people and households is greater than the minimum thresholds. Local base statistics are released only for those wards with 1000 or more residents and 320 or more households. As with enumeration districts, data from wards which fall below either threshold are combined with data from a neighbouring ward. Enumeration districts or electoral wards that contain data from neighbouring areas are categorised as importing zones.
SPECIAL ENUMERATION DISTRICTS
Enumeration districts that were expected to contain 100 or more people living in one or more communal establishments (defined as an establishment in which some kind of communal catering is provided, such as nursing homes, hospitals, and prisons) on the night of the census were defined as special enumeration districts. The residents of such establishments often have different social and demographic characteristics from people living in the surrounding area. Therefore, to prevent the residents of these establishments distorting the small area statistics for the enumeration district in which they are located, these areas were treated as special enumeration districts. The total population and total number of households for special enumeration districts are published, and if the enumeration district has 50 or more residents and 16 or more households, the other small area statistics are also published. If the number of residents or households falls below either of these thresholds, then these other small area statistics for the enumeration district are suppressed. The small area statistics for special enumeration districts are not combined with those for a neighbouring enumeration district but are included in electoral ward totals.
SHIPPING ENUMERATION DISTRICTS
Every electoral ward contains one shipping enumeration district (a separate category of special enumeration district). Apart from houseboats on inland waterways (which are classified as households), vessels are treated in a similar way to communal establishments, with data on people enumerated while living on vessels published in a separate shipping enumeration district. For most areas, there are no residents of shipping enumeration districts, and no data will be present for shipping enumeration districts.
Because of data modification and data suppression, statistics derived by aggregating data for enumeration districts will differ from counts obtained directly from the small area statistics or local base statistics for the larger area. For example, if enumeration district counts of people with chronic illness are summed to calculate the total number of people with chronic illness living in a health authority, this total will differ from the total obtained directly from the local base statistics or small area statistics for the health authority. The difference arises because many people with limiting long term illness may live in special enumeration districts for which census data are suppressed, and also because of OPCS's policy of adding +1, -1, or 0 counts in enumeration district tables. To minimise the effect of this suppression and modification of data, census data from tables for the largest geographical area of interest should be used. For example, if census data for a health authority are required, then these data should be obtained from a table of census data for the health authority and not by aggregating data for enumeration districts or electoral wards within the health authority.
There was a greater problem with underenumeration in the 1991 census than in previous censuses: no census data were obtained for 2.2% of the population.9 Although this is a relatively small percentage of the population, underenumeration was not random and was highest in inner city areas and among men aged 20-29 years; about 9% of men in this age group nationally, and nearly 20% in inner London, were not enumerated.10 Underenumeration will therefore lead to errors in census data, especially for inner London, and to a lesser extent for other inner city areas.11
The census is carried out every 10 years and because of inward and outward migration, 1991 census data for small areas will become progressively more inaccurate with time. OPCS does attempt to estimate the effects of internal migration within England and Wales by using data from family health services authority age-sex registers, and of international migration by using the international passenger survey. However, these estimates are unreliable at small area level.12
Using postcodes to assign census data to individuals
One of the most common ways in which census data are used is to estimate the social and ethnic characteristics of people on the basis of their postcode. This is usually done by assigning a postcode to an enumeration district and using the census data for this enumeration district to characterise people living at the postcode. This process requires a method to assign postcodes to enumeration districts. After the 1981 census, the most common method was to use the grid references of postcodes and enumeration districts. However, Gatrell13 and Reading and Openshaw14 suggested that this method is often inaccurate, with about 40% of people being assigned to an incorrect enumeration district. This applied in England and Wales; in Scotland, postcodes are mapped directly to enumeration districts, with no boundary problems.15
To attempt to overcome this problem and improve the geographical referencing of data from the 1991 census, the postcode of each household and community establishment was collected on the census form and put on the computer record. This allowed OPCS to produce a new look up table (the postcode-enumeration district directory) by directly linking the postcode on the census form to the enumeration district in which that postcode fell.16 17 However, some postcodes lie in more than one enumeration district, which can make it difficult to assign a postcode to an enumeration district. To help overcome this problem, the look up table contains some additional information: the “pseudo-enumeration” district of the postcode (the enumeration district in which most people at that postcode live); and the number of households in each postcode-enumeration district intersection (known as part postcode units). The table also contains the Ordnance Survey grid reference of the postcode, but this is only given to a precision of 100 metres. However, the mapping of postcodes should improve in the future as a result of the Ordnance Survey's Address-Point initiative, which aims to allocate a grid reference to every address in England and Wales to an accuracy of one metre.18 The main limitation of the look up table is that it contains only the postcodes that existed at the time of the 1991 census and will become progressively more out of date with time, as new postcodes are created (for example, because of the construction of new residential estates).
Census data have an important role in helping to plan health services and in monitoring how health services are being used, and they are freely available to both NHS employees and employees of academic institutions
Census data for small areas contain inaccuracies, but the effect of these errors is substantially reduced by either aggregating data or by using data for larger areas
Data from the 1991 census were used to produce a new postcode to enumeration district look up table, which overcomes many of the problems encountered with older methods of assigning postcodes to enumeration districts
ASSIGNING ENUMERATION DISTRICTS TO INDIVIDUALS
Assigning an enumeration district to an individual living at a postcode which relates to only one enumeration district is straightforward. However, if the postcode relates to more than one enumeration district then assigning an enumeration district can be done in two ways: by using the number of households in a part postcode unit to determine the probability of individuals living in each part postcode unit (this is the preferred option); or by assigning all individuals living at a postcode to its pseudo-enumeration district (box 4).
Box 4—Obtaining weighted average of census data for individuals by using their postcode
The 1991 census for England and Wales provides a substantial amount of data on demography, ethnicity, housing tenure, employment status, and other social factors for geographical areas ranging in size from enumeration districts upwards. Many in the health service and in the academic community are making use of the data in the 1991 census. However, users of census data need to be aware of the problems and limitations of these data, which include the format of the data, data modification and suppression, sampling error, and underenumeration. An important innovation of the 1991 census was that the census form included a question on the postcode of respondents; this allowed the Office of Population Censuses and Surveys to produce a postcode-enumeration district look up table which overcomes many of the problems previously encountered in trying to assign postcodes to enumeration districts. The new look up table also includes the grid reference of postcodes, and this will improve the geographical referencing of census data.
The 1991 census provides the most detailed data on the demography of England and Wales. It also provides a substantial amount of data on deprivation, housing, and employment and was the first census to contain questions on ethnicity and chronic illness. Many in the health service will make use of the information obtained from the census; purchasers will use the data to help them improve their purchasing plans and their planning of health services, providers to help them tailor the services they offer to their catchment population, and health services researchers to compare measures of health need and the use of health services with demographic, social, and ethnic factors.1 2 In this paper we discuss the data available from the census and describe the main limitations of census data (see box 1 for a list of definitions). More detailed guidance is available in publications from the Office of Population Censuses and Surveys (OPCS).3 4
Box 1—Definitions of census terms
Enumeration district—The smallest geographical area for which census data are published
Local base statistics (LBS)—Tables of census data published for electoral wards and all larger areas
Small area statistics (SAS)—Tables of census data published for enumeration districts and all larger areas
Data modification—The random addition of +1, -1 or 0 to cells in SAS tables, and the random addition of +2, +1, 0, -1 or -2 to cells in LBS tables
Data suppression—The suppression of data in SAS and LBS tables for areas with small populations
Importing zones—Enumeration districts or electoral wards that contain data from neighbouring areas with small populations for which data have been suppressed
Special enumeration districts—Enumeration districts which were expected to contain 100 or more people living in communal establishments (such as hospitals or prisons) at the time of the 1991 census
Shipping enumeration districts—A special category of enumeration district which contains data for people living on ships at the time of the 1991 census
Part-postcode unit—The intersection of a postcode and an enumeration district
Pseudo-enumeration district—The enumeration district in which most people at a postcode live
Data available from the census
The census is carried out every 10 years by OPCS, and aims to collect data on the entire population. (However, there was a problem with underenumeration in the 1991 census; this is discussed later in this article.) The 1991 census contained about 20 questions including questions on housing, race, limiting long term illness, car ownership, and occupation. The questions were collected for all 113196 enumeration districts (the basic geographical building blocks of census data and the smallest areas for which census data are made available) in England and Wales. The data in these questions were then analysed by OPCS to produce a number of predefined tables, available for different geographical areas including enumeration districts, electoral wards, and health and local authorities. Users of census data are confined to using the information in these tables unless they are prepared to commission OPCS to produce additional tables. Census data have been purchased by health authorities and are available to NHS employees. Census data are also freely accessible to employees of academic institutions.
The census data published in the available tables are in two formats: local base statistics and small area statistics.5 The local base statistics contain 99 tables and are published for geographical areas down to electoral ward level. The small area statistics are a cut down version of the local base statistics and are published for all geographical areas down to enumeration district level. Because the small area statistics are available for enumeration districts, there are fewer tables in them, and the tables that are present usually contain fewer categories (for example, broader age bands) than the equivalent tables in the local base statistics; this is done to maintain confidentiality and to ensure that individuals cannot be identified. However, the small area statistics are the only source of census data for enumeration districts and the data in them have to be used for all analyses that require census data for small areas. Most of the commonly used tables are present in both the local base statistics and small area statistics, and a restricted range of tables in the small area statistics is not usually a problem. Some of the more commonly used tables from the small area statistics are listed in box 2.
Box 2—Examples of data available from commonly used small area statistics tables
Tables derived from 100% of census records
Age and marital status of residents (table 2)
People living in communal establishments (table 3)
Ethnic group (by age) of residents (table 6)
Economic position of residents (table 8)
Residents (by age) in households with limiting long term illness (table 12)
Household composition and housing (table 42)
Household space type: tenure and amenities (table 58)
Tables derived from 10% sample of census records
Socioeconomic group of households and families (table 86)
Social class of household (table 90)
Age breakdown of population
Number of people living in nursing and residential homes
Ethnic breakdown of population
Number of people with chronic illness
Number of overcrowded households
Number of households without a car
Number of households that are not owner occupied
Number of people living in households where head of household is unskilled
Number of people living in households where the head of household is in social class V
Both the local base statistics and small area statistics are available on disks for IBM personal computers, and the data in them can be accessed through several computer programs. The most commonly used program, SASPAC,6 can extract census data from local base statistics and small area statistics tables and save the data in a variety of file formats (such as dBase IV or Lotus 1-2-3) to facilitate transfer of the data to other computer programs. People who are inexperienced in working with census data often find census tables confusing and have difficulty in extracting data from them. Advice on what tables to use and how to extract data from them (such as unemployment rates) is published by OPCS in Key statistics. Definitions and cell numbers.7
Data available for a 10% sample of records
Most tables of census data are derived from an analysis of all census records. However, some of the responses to the questions in the census were only coded for 10% of the population and the corresponding census tables (some of which are shown in box 2) are derived from an analysis of this 10% sample of census records. These tables are based on questions that are classified as “hard to code,” such as occupation. Because they are based on a 10% sample of records, the data in these tables are estimates and will be subject to sampling error.
Table I illustrates the effect of sampling error on the estimate of the percentage of people in different geographical areas living in a household where the head of household is in socioeconomic group 11 (unskilled manual), using data from the Wandsworth Health Authority. As the table shows, for small areas such as enumeration districts the 95% confidence intervals can be very wide for percentages derived from tables based on the 10% sample of census records. For larger areas such as electoral wards and health authorities, the 95% confidence intervals are much narrower.
Problems with using small area census data
To maintain confidentiality and prevent inadvertent disclosure of census data about individuals, OPCS adds +1, -1, or 0 randomly to the numbers contained in each cell of the small area statistics tables for enumeration districts (except for cells that contain true zeros).8 For the local base statistics tables for wards, this procedure is carried out twice, so that each number in each cell in these tables is altered by adding a number between -2 and +2.8 Data in tables based on the 10% sample of records are not modified in this way as the sampling of only one in 10 records is thought to be sufficient to maintain confidentiality.
These modifications introduce errors into the row and column totals in census tables and have three main effects. Firstly, the same information may be contained in more than one census table and because the process of modifying data is carried out separately for each table, counts of variables that are contained in more than one table may differ from each other. For example, the number of people from ethnic minorities may not be the same in different tables. This will lead to differences in rates calculated for ethnic minorities (and for other groups of the population such as people living in non-owner occupied households) depending on which table the estimate of the ethnic minority population is taken from. However, these differences are likely to be significant only when areas with small populations are being investigated.
Secondly, counts derived by aggregating data for small areas to larger geographical areas will differ from the counts in the actual tables for these larger areas. For example, it is possible to obtain the number of people living in households without a car in a health authority by three methods, all of which are likely to give slightly different results: by summing data for enumeration districts within the health authority, summing data for electoral wards within the health authority, and obtaining the number of people living in households without a car directly from a table of census data for the health authority (the preferred option).
Thirdly, the modified counts in tables will differ from the “true” counts for an area, but it is possible to calculate confidence intervals for the true count—these depend on the number of modified cells in the table (table II). Data modification can introduce errors into percentages calculated for enumeration districts with small populations; census data for small areas should therefore be interpreted with caution. If, however, data for enumeration districts are aggregated, the effect of data modification on percentages is substantially reduced, and much more reliable estimates obtained. Box 3 illustrates the effect of data modification on one of the small area statistics tables (table 12—long term illness in households: residents in households with limiting long term illness) for an enumeration district in Wandsworth Health Authority.
Although using the number of households in a part postcode unit as the weighting factor is the preferred method of assigning social and ethnic characteristics to individuals on the basis of their postcode, it still has some limitations because the number of households in a part postcode unit is only an approximate measure of the number of people living in that unit. This is because the number of people per household is not constant within or between enumeration districts. Moreover, in some enumeration districts, some people live in communal establishments such as nursing or residential homes; the number of households in a part postcode unit is increased by between one to four to take into account these individuals. Another limitation of the look up table is that it gives the probability that an individual falls into a certain category (for example, lives in a household without a car). Whether the individual has been correctly assigned to a category depends on how representative the individual is of the enumeration district in which he or she lives. However, when estimated data for individuals are aggregated to larger populations, the effect of this limitation on the estimated values is reduced.19
The 1991 census offers valuable data to academics and to people working in the health service. However, users of census data should be aware of its limitations, chiefly underenumeration, data modification, and data suppression, the effects of which are reduced substantially when census data are aggregated. The new postcode-enumeration district directly produced by OPCS overcomes many of the problems previously encountered in trying to assign postcodes to enumeration districts and will improve the geographical referencing of census data.
We thank Ivor Evans and Andrew Frost of Wandsworth Health Authority's information department for their help.