Experimentation and social interventions: a forgotten but important historyBMJ 1998; 317 doi: https://doi.org/10.1136/bmj.317.7167.1239 (Published 31 October 1998) Cite this as: BMJ 1998;317:1239
- Ann Oakley, professor ()
- Accepted 1 October 1998
The research design of the randomised controlled trial is primarily associated today with medicine. It tends either to be ignored or regarded with suspicion by many in such disciplines as health promotion, public policy, social welfare, criminal justice, and education. However, all professional interventions in people's lives are subject to essentially the same questions about acceptability and effectiveness. As the social reformers Sidney and Beatrice Webb pointed out in 1932, there is far more experimentation going on in “the world sociological laboratory in which we all live” than in any other kind of laboratory, but most of this social experimentation is “wrapped in secrecy” and thus yields “nothing to science.”1
Many social scientists argue that randomised controlled trials are inappropriate for evaluating social interventions, but they ignore a considerable history, mainly in the United States, of the use of randomised controlled trials to assess different approaches to public policy and health promotion
A tradition of experimental sociology was well established by the 1930s, built on the early use of controlled experiments in psychology and education
From the early 1960s to early 1980s randomised experiments were considered the optimal design for evaluating public policy interventions in the United States, and major evaluations using this design were carried out
This approach became less popular as policy makers reacted negatively to evidence of “near zero” effects
Lessons to be learnt about implementing randomised controlled trials in real life settings include the difficulty of assessing complex multi-level interventions and the challenge of integrating qualitative data
The Webbs argued for a more “scientific” social policy, with social scientists being trained in experimental methods and evaluations of social interventions being carried out by independent investigators. They were apparently unaware that a strong tradition in experimental sociology had already been established, mainly in the United States. This was a precursor to a period between the early 1960s and the late 1980s when randomised controlled trials became the ideal for American evaluators assessing a wide range of public policy interventions. This history is conveniently overlooked by those who contend that randomised controlled trials have no place in evaluating social interventions. It shows clearly that prospective experimental studies with random allocation to generate one or more control groups is perfectly possible in social settings. Notably, too, the history of experimentation in social science predates that in medicine in certain key respects.
A short history of control groups
The original meaning of “control” is “check”—the word comes from “counter-roll,” a duplicate register or account made to verify an official account.2 The term “control” entered scientific language in the 1870s in the sense of a standard of comparison used to check inferences deduced from an experiment. The main use of the term was in experimental psychology.3
In 1901 the American educationalists Thorndike and Woodworth identified the need for a control group in their experiments on the use of training to improve mental function.4 A series of experiments with schoolchildren that addressed questions about the transferability of memory skills from one subject to another, reported by Winch in 1908,5were among the first to use the design of pretest, intervention, post-test in the experimental group and pretest, nothing, post-test in the control group. These educational and psychology researchers invented randomised assignment to experimental treatments and Latin square designs independently of, and considerably earlier than, R A Fisher's work at the Rothamsted Agricultural Research Station.6 The psychologist C S Peirce introduced both the idea of randomisation and that of “blindness” into psychology experiments in the 1880s.7
Selection of experimental and control subjects by means of the principle of chance is described in McCall's How to Experiment in Education, published in 1923: “Representativeness [of research subjects] can be secured by making a chance selection from the total group, or a chance selection from a chance portion of the total group …. Just as representativeness can be secured by the method of chance, so equivalence may be secured by chance …. One method of equating by chance is to mix the names of the subjects to be used. Half may be drawn at random. This half will constitute one group while the other half will constitute the other group.”8McCall's book also describes the Latin square design under the name of the “rotation experiment”; this had been used in educational experiments as early as 1916.9
The major impetus driving these new approaches to assessing effectiveness was not the desire to imitate natural science, but, rather, to respond to an uneasiness within the research community of educational psychology about the inability of existing evaluation methods to rule out plausible rival hypotheses. Similar methodological developments were occurring in other spheres. For example, in 1924-5 an experiment using a mail campaign to increase electoral turnout was carried out in Chicago, in which housing precincts were assigned either to receive individual mail appeals or not.10 This experiment followed earlier research which had suggested that the strength of local party organisation was the main factor distinguishing voters from non-voters, but the research design used in the first study had made it impossible to have confidence in this finding. Thus, in the social field as well as later in medicine, the advantages of prospective experimental studies with randomly chosen controls were seen to offer an important solution to the problem of linking intervention with outcomes.
Two other American social scientists, Ernest Greenwood at Columbia University and F Stuart Chapin at the University of Minnesota, pioneered the application of experimental methods to the study of social problems in the early decades of the 20th century. Chapin first wrote on this theme in 1917; his Experimental Designs in Sociological Research, published in 1947, details nine experimental studies carried out by his research team and a number undertaken elsewhere covering such topics as rural health education, the social effects of public housing, recreation programmes for “delinquent” boys, and the effects of student participation in extracurricular activities.11 Chapin was particularly interested in reviewing the use of experimental research designs in “the normal community situation” because of the objection, voiced at the time, that experimental studies could only be done in “laboratory” settings.
Ernest Greenwood's Experimental Sociology, published in 1945, outlined the theoretical rationale for applying experimental methods to social issues.12 He defined an experiment as “the proof of a causal hypothesis through the study of two controlled contrasting situations,” recommended the use of case studies as a prelude to experimental research, and supported Fisher's strategy of randomisation as the best way of securing equivalent study groups. Chapin's and Greenwood's interest in experimental research designs was stimulated by the social reform concerns of the Depression, and informed by a desire to establish the most effective methods of improving people's lives. Their work was part of a general move in the United States to make social science more experimental; by 1931 at least 26 universities there were offering courses in experimental sociology.13
A golden age of evaluation
Donald Campbell and Julian Stanley's Experimental and Quasi-experimental Designs for Research published in 196614 is to social research what Fisher's Design of Experiments (1935) is to medical research. Campbell's paper “Reforms as experiments” established an explicit link between social reform and the use of rigorous experimental design.15 His complaint that the randomised control group design had not often been used in the social arena prompted another American experimentalist, Robert Boruch, to publish a bibliography of these in 1974.16 This listed 83 “randomised field experiments” in such areas as criminal justice, legal policy, social welfare, education, mass communications, and mental health. A revised version of the bibliography produced four years later updated the total in these areas to 245.17
This period in the United States has been nicknamed the “golden age of evaluation.”18 It was one in which there was an enormous burst of activity in applying the randomised controlled trial design to the evaluation of public policy. The table shows nine of the major evaluations of broadly based social programmes initiated between the 1960s and early 1980s. Four of the studies were of income maintenance experiments,19–23 one focused on an experimental housing allowance scheme, 24 25 two examined programmes for supporting disadvantaged workers, 19 26 and two examined interventions for former prison inmates.27 All the studies included one or more prospectively generated control groups, either by some method of random allocation or by matching. Supporting all this effort was a government mandate specifying that 1% of budgets for social programmes had to be spent on evaluation. There was widespread recognition that social services were in a mess while expenditure on them was rising exponentially; and, for a time at least, there was a consensus in policy circles that randomised controlled experiments provided the best way of assessing effectiveness.
Other evaluations (not shown in the table) carried out during this period included the Manhattan bail bond experiment with pre-trial release for prisoners,28 the Rand Corporation's well known study of health insurance (several components of which used a randomised controlled trial design),29 and studies of educational performance contracting.30
The reasons why the use of randomised controlled trials in evaluating policy interventions has declined in attractiveness in the United States over the past 20 years are as interesting as those explaining its acceptance in the first place. A primary one was disenchantment with the apparent ineffectiveness (sometimes seemingly damaging effects) of the interventions in some of the evaluations. Secondly, policy makers were often impatient with the length of time it took for evaluations of their favoured approaches to provide answers: this was particularly marked in the case of the income experiments. As Senator Moynihan appositely said, “The bringing of systematic inquiry to bear on social issues is not an easy thing. There is no guarantee of pleasant and simple answers, but if you make a commitment to an experimental mode it seems to me … something larger is at stake when you begin to have to deal with the results.”31
All claims to successful expertise need to tackle the issue of causal inference—how do people know that what they do works, and how can they reasonably demonstrate this to others? As Stanley noted in 1957, “Expert opinions, pooled judgements, brilliant intuitions, and shrewd hunches are frequently misleading.”32 Among the reasons why randomised controlled trials gained legitimacy in medicine was the realisation that the decisions of the medical profession need to be regulated.33 The history of social experimentation indicates clearly that all the same issues have attended attempts to evaluate the impact of social interventions.
Experts in the social domain, like those in medicine, have resisted the notion that rigorous evaluation of their work is more likely to give reliable answers than their own individual preferences. When randomised controlled trials find that new “treatments” are no better than old ones, a retreat to other methods of evaluation is particularly likely, as though the prime task is not to identify whether anything works but to prove that something does.
The forgotten history of social experimentation also shows that, as in clinical research, implementing randomised controlled trials in real life settings commonly carries a number of hazards: low participation rates or high attrition, problems with “informed consent,” unanticipated side effects of the intervention, a problematic relation between research and policy.
There are many lessons to be learnt from this experience about the challenges of randomised controlled trials, including the difficulty of establishing the effectiveness of complex multi-level interventions and the problem of integrating ethnographic or qualitative data. But, as Chapin wrote in 1931, “Experimental method in sociology does not mean interference with individual movement or freedom. It does not endanger life or limb or moral character.”34 On the contrary, what randomised controlled trials offer in the social domain is exactly what they promise to medicine: protection of the public from potentially damaging uncontrolled experimentation and a more rational knowledge about the benefits to be derived from professional intervention.