Intended for healthcare professionals

Education And Debate

Ethnicity as a variable in epidemiological research

BMJ 1994; 309 doi: (Published 30 July 1994) Cite this as: BMJ 1994;309:327
  1. P A Senior,
  2. R Bhopal
  1. Department of Epidemiology and Public Health, The Medical School, Newcastle upon Tyne NE2 4HH
  1. Correspondence to: Dr Senior c/o Professor Bhopal.
  • Accepted 20 June 1994

Ethnicity is used increasingly as a key variable to describe health data, and ethnic monitoring in the NHS will further stimulate this trend. We identify four fundamental problems with ethnicity in this type of research: the difficulties of measurement, the heterogeneity of the populations being studied, lack of clarity about the research purpose of the research, and ethnocentricity affecting the interpretation and use of data. Ethnicity needs to be used carefully to be a useful tool for health research. We make nine recommendations for future practice, one of which is that ethnicity and race should be recognised and treated as distinct concepts.

Epidemiology is the study of the distribution and determinants of disease. The main method of study, particularly for investigating the causes of disease, is to compare populations with different risks of disease. Ethnicity is a variable that is used increasingly to define populations for epidemiological studies. Differences by ethnicity in both the characteristics of populations and their experience of disease have been easy to describe, and the literature on ethnicity and health is large and growing.1 We consider here the nature of ethnicity, the attributes of sound epidemiological variables, the measurement and value of ethnicity as an epidemiological variable, and how ethnicity might best be used in future research. By reviewing critically ethnicity as a variable in epidemiology we hope to facilitate better research. This review is relevant to ethnic monitoring in the NHS.

What is ethnicity?

Ethnicity is derived from a Greek word meaning a people or tribe. The concept of ethnicity is neither simple nor precise,*RF 2-6* but it implies one or more of the following: shared origins or social background; shared culture and traditions that are distinctive, maintained between generations, and lead to a sense of identity and group; and a common language or religious tradition.*RF 3-6* Ethnic boundaries are imprecise and fluid.3,7 The cultural “melting pot” changes and blurs ethnic distinctions, but ethnic groups may remain distinct while becoming different from the original migrant group.3

Ethnicity should not be confused with nationality or with migrant status.2,3 For example, immigrants from the Indian subcontinent to the United Kingdom may be British nationals but be members of a particular ethnic group such as Sikh Punjabis. Their children born in the United Kingdom are members of their parents' ethnic group but may perceive themselves part of a larger ethnic group such as Indian, Asian, or black. They may also perceive themselves to have an additional ethnic identity relating to the host community (such as British, Scottish, or Irish). Ethnicity, particularly self defined ethnicity, depends on the context in which the definition is made.

Ethnicity and race

Ethnicity should be differentiated from race, which in the biological sciences means one of the divisions of humankind as differentiated by physical characteristics. The concept of race was first applied to humans in the eighteenth century as an arbitrary classification to aid understanding of evolution and examination of variation. The aim then was to extend to humans a taxonomic classification below the level of species. Cooper, among others, has questioned the application of the concept of race in epidemiology and the scientific validity of its presuppositions.8 While race has importance as a social and political phenomenon and may have a practical value in medicine (for example, in evaluating the probability of a diagnosis of sickle cell disease or glucose-6-phosphate dehydrogenase deficiency in a patient with African origins9), its biological significance has been deeply undermined since the second world war.

Describing human variation by racial group ought to help clarify the genetic and environmental basis of disease. Geographical variation in gene frequency is great, and however races are defined large numbers of populations straddle boundaries. No race possesses a discrete package of genetic characteristics.8 Genetic diseases are not confined to specific racial groups, although the risk varies by origin. Furthermore, there is more genetic variation within than between races, and the genes responsible for morphological features such as skin colour (which are the basis of racial groupings) are few, atypical, and not associated with genes responsible for disease.9 The conclusion that race is more useful for social rather than biological explanations of variations in the prevalence of disease is now widely agreed.

While race leaves numerous unclassified groups, such as Kalahari bushmen or pygmies,8 ethnicity creates a separate category for each group. Ethnicity is a socially constructed phenomenon. Since ethnic boundaries are imprecise and fluid,*RF 3-7* their definition must be made explicit before research can be done.3 Definitions will, of necessity, vary according to the requirements of the research. The terms of ethnicity and race are often used interchangeably,1 with the inference that variations in the prevalence of disease between ethnic or racial groups is due, at least in part, to genetic differences. Ethnicity is also used as a euphemism for race,1 as this term has been partly discredited by association with racism. It is important, however, that researchers are clear about the differences between the two terms as they are currently used.

What characterises a sound epidemiological variable?

Standard epidemiological texts outline the nature of epidemiological variables.10 These are measures that aid the analysis of patterns of disease within and between populations (for example, sex, age, occupation, and social class). Most variables used in epidemiology indicate underlying phenomena of interest that cannot be measured easily if at all. For example, social (occupational) class is a proxy indicator of various factors such as income, education, and styles of consumption.11 Even simple variables may reflect complex differences - for example, sex may act as a proxy for genetic, hormonal, psychological, or social factors in different studies. These variables may be used to define the distribution of disease and to plan for the provision of services or to generate hypotheses about the causes of disease. Such observations do not in themselves explain disease processes, and any hypotheses generated need to be tested.

We therefore propose that the attributes of a sound epidemiological variable are as follows.

  • It should be measurable accurately

  • It should differentiate populations in some underlying characteristic relevant to health (such as income, childhood circumstances, hormonal status, genetic inheritance, and lifestyle)

  • Observed differences in patterns of disease should generate testable aetiological hypotheses or be applicable to the planning and delivery of health care.

Is ethnicity a sound epidemiological variable?

We perceive four reasons why ethnicity has not always been a valuable and sound variable: errors of measurement, heterogeneity, ambiguity about the purpose of ethnicity and health research, and ethnocentricity.

Problem of measuring ethnicity

As ethnicity is not easily measured, several methods are in use. Skin colour, which is genetically determined, is clearly based on race, and observes have classified subjects' ethnicity by means of skin colour.12,13 This method is subjective, imprecise, and unreliable. For example, an observer could not accurately distinguish by observation alone between Muslim and Hindu Punjabis, who are in several important respects culturally distinct. Given an opportunity to define their own ethnicity in health studies, they would probably not place themselves in the same ethnic group. They are, however, likely to be in the same racial group.

Country of birth, as coded on birth and death certificates, has commonly been used as an index of ethnicity.*RF 14-18* It is objective but crude. For example, India is culturally diverse with innumerable distinct ethnic groups, a complex caste system, at least eight major religions, and 15 official languages.19 Yet Indians are grouped as one by this method, a classification comparable to European. Immigrants' children are not identified by this method, which therefore becomes more inaccurate with time.2

The St James survey in Trinidad used concordance of grandparents' national origin to ascribe ethnicity.20,21 While accurately reflecting individual origins, it is rigid, ignores current lifestyle or self perception, and yielded a large heterogeneous “mixed” group (people with fewer than three grandparents from the same country).

Hazuda et al identified Mexican Americans of Spanish origin by means of a complex algorithm that incorporated father's surname, mother's maiden name, place of birth, self assessed ethnic identity, and stated ethnicity of grandparents.22 The method was valid in reflecting both common origin and current identification but required a lot of data.

Names analysis has been used to identify people with origins in the Indian subcontinent in several studies.*RF 23-26* South Asian names are distinctive and often indicate religion,27 and endogamy is the norm.28 This simple method has been confirmed to be sensitive and specific27,29 and permits both prospective sampling and retrospective studies. However, south Asian Christians share names with white populations,27 and exogamy does occur.28 If exogamy increases, as anticipated,28 the method's discriminatory ability may decline, and the method's validity may not extend to other ethnic groups.27

Health has emphasised the emerging view that ethnicity is fundamentally a matter of self perception.30 Voluntary self classified ethnicity is acceptable31 (and this principle guided the classification used in the 1991 census) and has been advocated by the Council for Racial Equality. The concept will guide the introduction of ethnic monitoring in the NHS.31 However, self assessed ethnicity is changeable over short periods and is not subject to the control of the investigator, characteristics that are counter to the principles of scientific measurement.

Problem of heterogeneous populations

The populations identified by current methods of measuring ethnicity are often too diverse to provide useful information. For example, the term Asian is too broad and masks important variations by country of origin, religion, language, diet, and other factors relevant to health and disease.7,19,32 The same criticism applies to categories such as Indian, Chinese, Pakistani, and Afro-Carribean. A study of diet as a risk factor for coronary artery disease would give only limited insight if it compared risks of Indians and non-Indians since Indian diets are extremely diverse.33 By contrast, the findings of a study of first generation Punjabi Muslims could probably be generalised to other such populations in the United Kingdom and, though more limited in scope, would arguably be more valuable. Even within one Hindu community of Indian origin in Dar es Salaam substantial variations were found in lifestyle and socioeconomic characteristics and in risk factors for and prevalence of disease.34 In addition, there may be important variations by social class8,19 and differences between generations2,35 in ethnic groups. Unless research techniques can cope with the extreme heterogeneity of ethnic populations misleading conclusions may be drawn. Studies of broad, heterogeneous groups such as Afro-Carribeans and Asians may have value as exploratory or pilot studies - a first step to deeper understanding.

Problem of testing aetiological hypotheses and research for planning of health services

Studies emphasising ethnic differences have drawn much attention to the potential for aetiological inquiry but have rarely led to studies that have tested hypotheses or extended our knowledge of the causes of disease; this is partly because of the problems outlined above and partly for other reasons.36 Differences in patterns of disease do not always yield testable hypotheses, but even testable hypotheses have often remained untested or have been examined only superficially. For example, environmental differences such as diet were postulated as the reason for the higher perinatal mortality observed among Indians compared with non- Indians,37 but to our knowledge this hypothesis has not yet been refuted or confirmed. A similar criticism could be made about the observations of high rates of liver cancer and oropharyngeal cancer and low rates of colorectal cancer in Asians17,18,26 and many differences in hospital morbidity.23 There are numerous published reports of ethnic differences in the pattern of diseases that have been little studied beyond the initial observation of difference.*RF 16-19,23,24,26* An important exception to this general shortcoming is the hypothesis of a causative role for insulin resistance in the pathogenesis of coronary heart disease.38,39 Asians around the world are at high risk of coronary heart disease and of diabetes40 and have been noted to be predisposed to abdominal obesity, which is associated with insulin resistance.39 After several descriptive studies of general risk factors25 and those associated with insulin resistance,38,39 randomised trials of exercise and dietary interventions are under way and the results are imminent.41 The protracted research effort required to study this hypothesis shows why so few of the ethnic variations in the pattern of diseases that are now known are likely to be pursued to a conclusion.

Aetiological research requires detailed information and focuses on relative risks, whereas research for the planning of health services requires a broad view and simple data and focuses on absolute risks.36 Research data cannot easily be collected and presented to achieve simultaneously the needs of aetiological inquiry and the planning of health care. Most researchers of ethnicity and health have emphasised the potential to understand aetiology rather than to develop health policies and services.*RF 16-19,23,24,26* Aetiological research emphasises relative excess in different populations, describing patterns of disease with relative risks and odds ratios. Simple counts of cases, rankings of disease frequency, and disease rates are not central to the aetiological approach. For example, a paper describing morbidity among Asians in hospital does not give either number of cases or rates but focuses solely on odds ratios.23

Few published studies have focused on the provision of services (such as investigations of the acceptability and accessibility of health services to ethnic minority groups42 or the use of the numbers and rates of disease to assess the need for resources to tackle the health problems identified43). None the less, ethnic group can be a key variable for implementing health policy - for example, the need to identify infants of Indian subcontinent origin for the purpose of BCG immunisation. One possible reason for the lack of emphasis on the value of ethnicity in planning health services is ethnocentricity, which we discuss further below.

Problem of ethnocentricity

Ethnocentricity is the inherent tendency to view one's own culture as the standard against which others are judged.11 This has implications for all aspects of research on ethnicity and health. It will impinge on the design, aims, and methods of studies and the presentation and interpretation of results, making “value free” observation impossible.11 The impact of researchers' values on the presentation and interpretation of results is shown by the table, which contains data originally presented by Marmot et al.17 The authors ranked diseases in male immigrants from the Indian subcontinent by means of standardised mortality ratios relative to the male population of England and Wales (see left half of table). However, a very different impression of the importance of various diseases emerges when the diseases are ranked according to number of deaths. The primary perspective of Marmot et al was that of the performance of ethnic minorities compared with that of the general population, which is the standard approach in ethnicity and health research. The emphasis of this and most other reports has been on diseases that were more prevalent among ethnic minorities.17,18,23,24,26 Less emphasis has been given to diseases that were more prevalent in the general population in relation to the minority population and still less to conditions that were not different. In contrast, Bhopal considered the main problems of ethnic groups from the Indian subcontinent without concern for excesses or deficits but to find the common health problems confronting these groups.43,45 Both perspectives are based on valid research values but they lead to different emphases and interpretations of the same data.

Ranking of mortality in male immigrants from Indian subcontinent by standardised mortality ratios and numbers of deaths. (Total number of deaths=4352)*

View this table:

Ethnocentricity may influence the development and interpretation of hypotheses on the causes of variations in the prevalence of diseases. Several recent reviews have concluded that past research has too readily implied that difference in risk of disease or in uptake of services were due to cultural or genetic factors without accounting for confounding factors such as poor socioeconomic or educational status.*RF 1,36,46-48* Cooper argued that racial differences in almost all diseases have been attributed to genetic factors at some time and that the substitution of an environmental explanation for a genetic one is usual in the progress of understanding (for example, kuru).8 Alper and Natowicz considered the allure of genetic explanations and the conscious and subconscious mechanisms involved49: genetic and cultural explanations seem to attract others. while environmental and social ones attract others. The explanation may lie in the values of the researcher.

Other variables used in epidemiology may be subject to some of the same problems. However, we propose that ethnicity is unusual because it suffers from the problem of measurement error, together with heterogeneity of the measured populations, and the additional complexity of cross cultural research. While social class produces heterogeneous groups there are agreed, explicit criteria by which it is measured.

Improving the value of ethnicity as epidemiological variable

There are benefits in describing a population's health in terms of ethnicity. Consideration of ethnicity can help in planning health services and sometimes leads to new insights into the causes of disease. In addition, ethnicity is useful to doctors in the differential diagnosis of disease.

To improve the value of ethnicity as an epidemiological variable we recommend the following.

  • Ethnicity should be perceived as different from race and should not be used as a synonym for the latter biologically discredited term

  • Ethnicity's complex and fluid nature should be more widely appreciated

  • The limitations of all current methods of classifying ethnic groups should be recognised, and all reports should state explicity how such classifications were made (definitions of ethnicity may need to be devised to suit the needs of a particular research project)

  • Investigators should recognise the potential influence of their personal values, including ethnocentricity, on scientific research and policy making

  • Socioeconomic differences should be considered at the same time, and with at least equal weight, as cultural or genetic factors when explaining differences in health between ethnic groups

  • Research on methods for ethnic classification should be given higher priority

  • Ethnicity's fluid and dynamic nature means that results of research may rapidly become out of date - results should not be generalised across time, generations, or populations with different histories of migration, except with great caution

  • Results from studies of ethnicity and health should be analysed and applied to the planning of health services

  • Observations of variations in the prevalence of a disease should be followed by detailed examination of the relative importance of environmental, lifestyle, cultural, and genetic influences.

We thank Drs Bruce Charlton, Martin White, and Waqar Ahmed for their helpful comments and Mrs Lorna Hutchinson for help in preparation of the manuscript. We also thank the anonymous assessors for their helpful comments.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.
  49. 49.