Association between population mean and distribution of deviance in demographic surveys from 65 countries: cross sectional studyBMJ 2018; 362 doi: https://doi.org/10.1136/bmj.k3147 (Published 03 August 2018) Cite this as: BMJ 2018;362:k3147
- Fahad Razak, assistant professor1,
- SV Subramanian, professor2,
- Shohinee Sarma, internal medicine medical resident3,
- Ichiro Kawachi, John L Loeb and Frances Lehman Loeb professor of social epidemiology and chair2,
- Lisa Berkman, Thomas D Cabot professor of public policy, epidemiology, and global health and population4,
- George Davey Smith, professor of clinical epidemiology5,
- Daniel J Corsi, scientist6
- 1Division of General Internal Medicine, Li Ka Shing Knowledge Institute, St Michael’s Hospital, 209 Victoria Street, Toronto, ON M5B 1W8, Canada
- 2Department of Social and Behavioral Sciences, Harvard T H Chan School of Public Health, 677 Huntington Avenue, Kresge Building 7th Floor, 716, Boston, MA 02115-6096, USA
- 3McMaster University, Hamilton, ON, Canada
- 4Harvard Center for Population and Development Studies, 9 Bow Street, Cambridge, MA 02138, USA
- 5MRC Integrative Epidemiology Unit, University of Bristol, Bristol BS8 2BN, UK
- 6OMNI Research Group, Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, ON K1Y 4E9, Canada
- Correspondence to: F Razak
- Accepted 4 July 2018
Objectives To examine whether conditions related to scarcity at the left side of the distribution (anaemia, severe chronic energy deficiency, and underweight) are as strongly related to population means as conditions of excess at the right side of the distribution (overweight and obesity).
Design Observational study.
Setting 65 countries, with nationally representative cross sectional data from 1994 to 2014 obtained from the Demographic Health Surveys.
Participants Non-pregnant women aged 20-49. Sample of 65 countries and n=524 380 for analysis of BMI; sample of 44 countries and n=316 465 for analysis of haemoglobin.
Main outcome measures The association between mean and prevalence of each category. For BMI, prevalence of severe chronic energy deficiency (SCED, BMI <16.0), underweight (BMI <18.5), overweight (BMI >25) and obese (BMI >30.) were measured; for haemoglobin, prevalence of anaemia (haemoglobin <12.0 g/dL) and severe anaemia (haemoglobin <8.0 g/dL) were examined.
Results There was a strong association between mean BMI and prevalence of overweight (r2=0.98; r=0.99; β=8.3 (8.0 to 8.6)) and obesity (r2=0.93; r=0.97; β=4.2 (3.9 to 4.5)). For left sided conditions, a moderate to strong association was found between mean BMI and prevalence of underweight (r2=0.67; r=−0.82; β=−2.7 (−3.1 to −2.2)), and a weaker association for SCED (r2=0.38; r=−0.61; β=−0.32 (−0.43 to −0.22)). There was a moderate association between mean haemoglobin and prevalence of anaemia (r2=0.46; r=−0.68; β=−10.8 (−14.5 to −7.1)) and a weaker association with severe anaemia (r2=0.30; r=-0.55; β=−0.55 (−0.81 to −0.29)).
Conclusions The associations between population means and prevalence of conditions of scarcity such as low BMI and anaemia were substantially weaker than the associations of mean BMI with conditions of excesses such as overweight and obesity.
Nearly 30 years ago, Geoffrey Rose and Simon Day published a paper in The BMJ exploring the nature of “normality” and illness using data from the InterSalt.1 They concluded that “distributions of health related characteristics move up and down as a whole: the frequency of ‘cases’ can be understood only in the context of a population’s characteristics. The population thus carries a collective responsibility for its own health and wellbeing, including that of its deviants.” They used “deviant” in a statistical sense, as the tail of a continuous distribution, but also to highlight how medicine and society create categories of otherness or difference to shift responsibility and to reassure the “normal” masses. Their statement was based on the strong correlation between population means and prevalence of deviant values for blood pressure, salt intake, body mass index (BMI), and alcohol intake in InterSalt and provided empirical support for Rose’s more expansive population strategy for prevention (box 1; fig 1),2 greatly influencing subsequent public health paradigms.6 In subsequent years, supportive evidence for the relation between population mean and prevalence of deviance has been generated across a range of measures, such as psychiatric morbidity,3 dental caries,4 educational achievement,7 and problem gambling.8
Rose’s population strategy for prevention
Rose and Day found a very strong correlation between population means and prevalence of right sided deviance using data from 52 countries in the InterSalt study (fig 1).1 This finding spanned biological measures (r=0.94 for mean BMI to the prevalence of overweight, r=0.85 for mean systolic blood pressure to prevalence of hypertension), dietary measures (r=0.78 for mean sodium intake to prevalence of high sodium intake), and behavioural measures (r=0.97 for mean alcohol intake to prevalence of heavy drinking).1
Rose and Day wrote: “Heavy drinkers of alcohol are condemned, but moderation is beyond criticism. Obesity is bad, but average weight is socially acceptable (even in overweight populations). Football hooligans are deviant reprobates, but, in a market economy especially, less conspicuous aggression is usual and actually encouraged. In each case the population as a whole disowns the tail of its own distribution: hypertension, obesity, alcoholism, and other behavioural problems can then be considered in isolation.”
Rose later outlined the implications of the strong association between mean and deviant, arguing that the problems of the deviant minority are strongly related to the characteristics of the rest of the population.2 Using the examples of blood pressure and body weight, he said: “Clearly, given the average level of blood pressure in a particular population anywhere in the world, one can infer precisely the prevalence of hypertension. Similarly, the prevalence of obesity is a function of the population’s average weight.”
The term “function” in this description implies an almost causal relation between the mean and prevalence of deviance, and Rose felt that this was supported by the observation that the dispersion (or the distance between the tails) of risk factor distributions was preserved even as the centre moved. He said that “within one population the range of variation between individuals is closely regulated by the balance between diversifying and unifying forces, and as a result it is to be expected that changes in the central tendency (average) of a population will be accompanied by a general shift, with the dispersion being more stable.” He also argued that the strength of the correlation between mean and deviance was a statistical measure of the cohesive tendencies of a population, stating that “the more uniform or across-the-board the shift, the closer must be the correlation between population average and the prevalence of deviance.”
These findings supported Rose’s population strategy to prevention and his framework for understanding disease incidence and the ideal approach to prevention. Rose described his population strategy as “the attempt to control the determinants of incidence, to lower the mean level of risk factors, to shift the whole distribution of exposure in a favourable direction.” 34 This was in contrast to the conventional medical approach to disease prevention, the high risk strategy, where only individuals identified as having increased risk are treated.RETURN TO TEXT
To our knowledge, Rose and Day’s original hypothesis, and most subsequent work, has focused almost entirely on deviance that occurs on the right side or upper tail of distributions (for example hypertension, overweight, and so on), which are essentially conditions of excess. Examining the relation between population means and the lower tail of deviance is critical if Rose and Day’s postulate of “coherent” populations (box 1) truly applies to all members of that population, including those who may experience deprivation and scarcity. Evidence shows that inequality between individuals in measures such as income is growing globally,9 so consideration of left sided deviance is particularly important given persistent scarcity among marginalised groups.10 The prevalence of severe chronic energy deficiency (SCED, BMI <16.0), for example, has not declined in most countries, even though obesity rates have risen globally.5
In this paper, we recreate Rose and Day’s original analyses using data on global changes in BMI and haemoglobin concentrations across low and middle income countries. We examine both tails of deviance in the BMI distribution: obesity (BMI >30.0) and SCED.11 BMI is an ideal parameter for examining both tails of the distribution, as both extremes are important to public health,12 are associated with adverse health,13141516 and highlight issues of global health and economic inequality. For haemoglobin, we focus on the left side of the distribution, examining anaemia (capillary haemoglobin concentration <12.0 g/L) and severe anaemia (haemoglobin <8.0 g/L).17 Our hypothesis was that association of the mean with left sided deviance would be weaker than that with right sided deviance.
We analysed data from cross sectional surveys conducted as part of the Demographic Health Surveys (DHS) programme. These are nationally representative household sample surveys from more than 85 low and middle income countries with a focus on child and maternal health, fertility, and nutrition.18 The target sample is women of reproductive age (15-49 years) who are selected using multistage probabilistic sampling, where primary sampling units and households are drawn from geographical sampling frames that cover the entire territory of each country.19 The response rates in DHS are very high, in many cases exceeding 90% participation in sampled households. DHS are one of the best resources available for examining population health metrics, and all data contained in this analysis are publicly available.
DHS routinely include anthropometric assessments of height and weight of adults and children and are undertaken by dedicated and trained health investigators who accompany the interview teams. Adults are weighed wearing light fitting clothing and without shoes using digital Seca scales with a precision of 0.01 kg. Standing height is measured without shoes using Shorr stadiometers designed for use in survey settings and recorded to the nearest 0.1 cm. In many of the surveys, haemoglobin is measured using a finger prick blood test at the time of the survey. Samples are analysed immediately in the field by health investigators using a portable HemoCue analyzer (201+).
Study population and sample size
We used multiple cross sectional surveys from countries where at least one survey was completed and used the most recent survey available covering 65 countries from 1994 to 2014.5 Non-pregnant women aged 20-49 with complete data on height and weight were selected from surveys (n=524 380). Analyses on haemoglobin were conducted among 316 465 women in 44 countries. This study uses publicly available de-identified data and was considered exempt by the ethics board at the Harvard School of Public Health.
BMI was calculated by taking the weight in kilograms and dividing by the square of height in metres. We measured two outcomes at the “high” end (right hand side) of the distribution, overweight (BMI >25.0) and obesity (BMI >30.0), and two at the “low end” (left hand side), underweight (BMI <18.5) and SCED (BMI <16.0).
For haemoglobin, we focus on the left side of the distribution: anaemia (haemoglobin <12.0 g/dL) and severe anaemia (haemoglobin <8.0 g/dL).17 Unlike the relation between net overconsumption and high BMI, high haemoglobin (erythrocytosis or polycythaemia) is most often a secondary effect of underlying illness (for example, obstructive lung disease with impairment of oxygenation) and less frequently a pathological state related to a primary haematological malignancy or related to a range of rare genetic conditions.20 Adverse health consequence (such as thromboembolism) outside of the population with haematological malignancy has been poorly studied, but, importantly, empirical evidence indicates no harm.21 Therefore, we did not examine polycythaemia in this analysis, as it does not mirror the type of continuous distributions with deviant tails associated with risk that Rose and Day were focusing on (such as blood pressure, body weight, or alcohol intake).
All means and prevalence estimates were weighted using sampling weights provided by the DHS. The relation between mean and prevalence of high and low values by country were summarised using scatter plots, Pearson correlation coefficients, and linear regression.1 Analyses were performed using Stata statistical software (version 15.1). The strength of the association between mean and deviant was assessed through the r2 value from linear regression models.
Rose and Day tried to protect against what they described as the “autocorrelation” that would occur if distributional skew affected the mean.1 To account for this effect a sensitivity analysis was performed, where “correlations were also calculated between the prevalence of high values and the mean of the remainder of the population, excluding those high values.”1 This was not described further by Rose and Day, and to test whether this effect explained our results, we performed a sensitivity analysis that accounted for the effect of extremes at both the left and right tails on mean estimation. We excluded the right 0.5% and 2.5% tails of the country level BMI distribution before calculation of the mean when examining overweight and obesity. Similarly, we excluded the left 0.5% and 2.5% tails of the BMI or haemoglobin distributions when examining underweight, SCED or anaemia, respectively. The mean of this truncated distribution was then correlated against prevalence of deviance. In a second sensitivity analyses, we removed the deviant cases for each distribution (for example, obese individuals) and calculated a new mean (for example, mean BMI) of this truncated sample. We used this new mean to correlate against the prevalence of deviance.
Rose and Day noted that the association between mean and deviant may not be linear in some cases (such as for their analysis of sodium intake, fig 1) but did not perform any additional analysis around this. To allow for non-linear effects, we included a sensitivity analysis adding a quadratic term for the mean to the linear regression model.
No patients were involved in setting the research question or the outcome measures, nor were they involved in developing plans for design or implementation of the study. No patients were asked to advise on interpretation or writing up of results. There are no plans to disseminate the results of the research to study participants or the relevant patient community.
The dataset contained 65 countries and 524 380 participants. Participation rates among eligible women exceeded 90% in 61 countries.18 Country level data are provided in supplementary table 1.
Association of mean and deviance
For right sided deviance, there was a strong and nearly linear association of mean with overweight (r2=0.98) and with obesity (r2=0.94). For left sided deviance, a moderate association exists with underweight (r2=0.69) and a weaker association with SCED (r2=0.41) (fig 2, table 1). At higher mean BMIs, the prevalence of underweight and SCED approached zero. β coefficients show that with each unit increase in mean BMI, the increase in obesity prevalence is 4.2%, compared with a 0.32% decline in SCED—a 13-fold difference in magnitude. Overweight prevalence increased by 8.3% per unit change in mean BMI, the largest magnitude of change in any category.
We found a moderate association between mean haemoglobin and anaemia (r2=0.46) and a weaker relation with severe anaemia (r2=0.30) (fig 3).
Supplementary tables 2 and 3 report the correlation of mean BMI with each of the four categories of deviance through a series of sensitivity analyses. On the full sample, the correlation was moderate to strong for left sided entities, ranging in magnitude from 0.55 to 0.82; correlation for right sided entities was extremely strong and of a substantially higher magnitude, ranging from 0.97 to 0.99. After eliminating possible autocorrelation, the relative strength of correlations compared with the full sample was unchanged (that is, correlations were still weaker for left sided entities than for right sided entities), although correlations were weaker across all outcomes in the sensitivity analyses with cases removed. In regression models (table 1), adding a quadratic term resulted in an increase in r2 for SCED, underweight, and obesity; the stronger relation between mean and right sided deviance versus left sided deviance persisted.
This paper has two important findings. Firstly, the relation between mean BMI and right sided deviance (conditions of excess: overweight and obesity) across 65 low and middle income countries was extremely strong, similar to the results observed by Rose and Day for BMI and other biological and behavioural measures examined in the InterSalt study.1 Secondly, the relation between mean and left sided deviance (conditions of scarcity) was modest for underweight, and substantially weaker for the most extreme forms of undernutrition (SCED) and anaemia.
Comparison with other studies
A commentary by Khaw and Marmot listed several studies that showed a strong association between right sided deviance and population mean, for measures such as psychiatric morbidity,3 dental caries,4 educational achievement,7 and problem gambling.28 They provided a single example of left sided deviance correlated with the mean—“low achievement” on educational testing was correlated with the mean educational achievement scores.27 The construction of education scores used a complex statistical method: a matrix sampling design was used, in which very few questions were asked of individual students; imputation of all remaining questions used demographic data (described as “conditioning”); principal components analysis of these conditioned data was performed to estimate the latent quantity of “proficiency;” proficiency scores were generated; and linear transformation of these data (described as “concurrent calibration”) was used to link scores between cycles.722 Consequently, the final outcome of low achievement relies heavily on theoretical distributions and data smoothing rather than directly measured data at the individual level. This is in stark contrast to the direct measurement in individuals of haemoglobin, height, and weight that we examine here.
Since Khaw and Marmot’s commentary, authors citing Rose and Day’s work have continued to show the empirical association between mean and right sided deviance for outcomes such as gambling,23substance abuse,24 and BMI.25 Although some studies examined outcomes at the left side of the curve (such as whether more fractures occur with decreasing bone density),26 they did not statistically examine and interpret the strength of correlations between mean and deviants. The relation of mean to right sided deviance has also been used in statistical modelling to impute prevalence of hypertension based on mean blood pressure27 and to explore population strategies of prevention assuming the strong relation of mean to right sided deviance continues to hold, for example.2829 We are not aware of additional literature that has examined parameters of left sided deviance that have implications for human health and where correlation against mean population levels has been examined. Underweight and anaemia are core metrics in population health reporting, and examining how their prevalence changes is crucial for a comprehensive understanding of population health in low and middle income countries.51330 These findings build on evidence generated from observational data from various low, middle and high income counties, which show that as average BMI increases, the spread of the BMI distribution widens, with disproportionate gains at the right side of the distribution.3132333435 Similar analysis has not been conducted for haemoglobin, but anaemia is strongly related to poverty and low education,30 as has been shown for SCED.5
This paper has several limitations. Firstly, in contrast to haemoglobin, BMI is a measure where both tails of the distribution have clear negative consequences for health and are related to overconsumption versus underconsumption. Future research should examine other measures such as bone density that share this property.36 Secondly, high prevalence conditions (such as overweight) probably have a greater effect on the mean than low prevalence conditions (such SCED), and this may partially explain our findings. But even among conditions with similar prevalence, we found a weaker correlation between mean and the left sided tail (underweight and moderate anaemia) than the right tail (obesity). In addition, the confidence interval around correlations (supplementary tables 2 and 3) support the finding that the correlation of mean with left sided entities is weaker than that with right sided entities. Thirdly, DHS data only contain information on women, so trends for men may be different.
In the discussion section of their original paper, Rose and Day described the implications of their findings separately for research, prevention, and society and government, and we follow that framing here.
Rose and Day begin this section by stating: “It is now clear that the problem of the high-risk deviant minority can be understood only when considered in the context of the whole population. The prevalence of hypertension, and many other markers of deviance, is a secondary phenomenon whose underlying explanation must be sought among the population as a whole.” Global changes in BMI are an important counter example to this statement, where the magnitude of change in prevalence of SCED and underweight is substantially lower than the magnitude of increase in prevalence of overweight and obesity as mean BMI shifts. Rose noted that the distributional changes in BMI did not seem to fit the uniform population change hypothesis,2 but did not, to our knowledge, explore the effect of this on the relation of mean to deviant.37 The weaker relation between mean BMI and undernutrition is mirrored in the association of mean haemoglobin to anaemia, and especially severe anaemia. As has been argued previously, changes in the BMI distribution, with disproportionate increases in the right tail and marked increases in dispersion,33 question Rose’s focus on the mean for many measures3738 and a similar focus on the mean in public health reporting.39404142 For example, interventions that improved population health by reducing the prevalence of SCED and obesity, would reduce dispersion in the BMI distribution, though potentially would not change mean BMI.3337 Other health parameters where left sided deviance influences human health (such as folate deficiency, low height, or low bone mineral density) should be examined in future research to determine whether the findings for anaemia and underweight can be applied to other conditions of scarcity. Right sided conditions that are desirable, such as increased lung function or high IQ, could also be examined. Critically, research and reporting around population trends in health metrics must include measures of dispersion along with measures of centrality.
Previous literature has shown a widening of the BMI distribution; for example, an increase in standard deviation of 0.3 units per 1.0 unit increase in BMI across low and middle income countries.33 This translates to the 95th centile of the BMI distribution rising at 2.5 times the rate of the 5th centile and a 13-fold greater increase in obesity prevalence than decline in SCED in these settings.5 The imbalance in the evolution of the BMI distribution may be more consistent with a log normal distribution rather than a normal distribution38 or may be a distributional shape generated by a dynamic balance between drift towards a natural setpoint and diffusion from external effects (such as change in exercise or diet).43 Recent data from twin studies indicate a differential effect of how much the genetic component of variance explained these changes across various segments of the BMI distribution.44 Rose’s observation about the preserved distance around distributional tails assumed a symmetric bell shaped distribution for most measures,3738 and the weaker correlation of left sided deviance to mean may be driven by the progressive right dominant imbalance that has emerged as the BMI distribution has evolved.
In his broader body of work, Rose was remarkably forward thinking in advocating a population strategy for prevention.45 He criticised the traditional preventive strategy in public health for being focused only on the right sided tail and missing the importance of shifting average population levels. His theories galvanised the public health community, emphasising the need to think of health at the population level.6 In the nearly 30 years since his death, increasing evidence has shown that the distributions of some population metrics such as BMI are pulling apart,37 and underweight and anaemia in low and middle income countries are important examples of situations where focusing on the tails may be of critical importance to enhance equity and improve population health. The determinants of these conditions of scarcity are profoundly different than conditions of excess such as obesity and high cholesterol, and a single population strategy focused on the mean would not be plausible or effective.3745 Emerging evidence around the change in height distributions over the past half century indicates a near complete dissociation between mean changes and dispersion,46 providing another example of how mean changes may be an insufficient measure of population change. These are examples of human risk factor distributions at odds with the shared underlying cause and preserved distributional widths of risk factors that Rose and Day were examining. They are unlikely to have argued for a focus on the mean given these findings. The potential for “vulnerable populations” has been discussed previously in an exploration of how Rose’s population strategy may widen disparities,47 and this analysis provides empirical evidence for the phenomena of a left behind population in an observational setting.
Dispersion of the BMI distribution is also increasing in high income settings,3132 though the left side of the tail is unlikely to be related to the poverty driven chronic caloric deficiency that is a dominant factor in low income countries,5 and the right sided tail is associated with low socioeconomic status overall (though results are more complex within population subgroups).48 Theoretical approaches to prevention that could narrow the BMI distribution in such settings, through focus on increasing fruit and vegetable intake and increasing physical activity, have been proposed but require empirical testing.49
Whether the left tail of a distribution has adverse consequences for human health is critical when analysing distributional changes and strategies for prevention. For example, there is no convincing evidence that low concentrations of cholesterol are associated with harm,50 so focusing on mean population reductions in cholesterol may be important independent of changes in dispersion. For BMI, however, although the association between the mean and left sided deviance is weaker than for right sided deviance, a population approach focused on reduction of mean BMI would still be predicted to increase the prevalence of SCED and underweight (table 1). For some populations this could have severe consequences; in India, for example, where the prevalence of underweight is tenfold greater than the prevalence of obesity.5 Consequently, we propose that, for measures where increases in the mean are accompanied by rising dispersion, and where both tails of the distribution have consequences for health, the case for a single population intervention may be compromised.
For society and government
Rose and Day said: “It suits society to alienate its problem minorities and to regard them as independently responsible for their problems.” In an era of rising income inequality between individuals,9 this statement is undoubtedly true, as societies have shown a tolerance for rising dispersion. Using the example of changes in BMI in low and middle income countries, although 60% of these countries showed no decline in SCED prevalence in over a decade of follow-up,5 90% of these countries simultaneously experienced an increase in obesity rates.10 Rose and Day concluded their article by stating, “What is needed is an acceptance of the collective responsibility for the population’s health and social well-being.” Their argument was based on the strong relation of mean to deviant, but it remains as true today in light of our tolerance for rising dispersion, whether economic or for fundamental measures of health such as BMI or haemoglobin.
What is already known on the topic
In The BMJ nearly 30 years ago, Geoffrey Rose and Simon Day showed that the population mean of a risk factor was strongly associated with prevalence of “deviance” or illness, for example mean BMI to obesity or mean blood pressure to hypertension
Rose and Day’s findings, and many subsequent papers extending these findings to other measures, focused almost entirely on the how the population mean relates to illness at the right side of the distribution. These are often conditions of excess, such as excess alcohol intake
What this study adds
Using nationally representative data from 65 countries, this paper shows that the relation between population mean and deviance is weaker for conditions of scarcity at the left side of the distribution. Two critical population health parameters, BMI and haemoglobin, are used to demonstrate this finding
Rose’s expansive population strategy for prevention is central to many current perspectives on population health, and was supported, in part, by the empirical work of Rose and Day. This study places a critical lens on how growing inequalities and distributional changes affect population health and the relation of mean to deviance
Contributors: All authors reviewed the literature, provided critical revisions and drafted and edited the manuscript. Dataset download, merging and statistical analysis performed by DC. FR is the overall guarantor. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
Funding: No specific funding was received for this work. GDS works in the Medical Research Council Integrative Epidemiology Unit at the University of Bristol (MC_UU_00011/1), which is supported by the Medical Research Council and the University of Bristol.
Competing interests: All authors have completed the ICMJE uniform disclosure form (available on request from the corresponding author) and declare: no support from any organisation for the submitted work; no financial relationships with any organizations that might have an interest in the submitted work, no other relationships or activities that could appear to have influenced the submitted work.
Data sharing: All data available from: https://dhsprogram.com/data/
Transparency: The lead author (FR) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.