- Ted J Kaptchuk, associate professor of medicine1,
- John M Kelley, assistant professor of psychology and statistics2,
- Lisa A Conboy, instructor of medicine1,
- Roger B Davis, associate professor of medicine and biostatistics3,
- Catherine E Kerr, instructor of medicine1,
- Eric E Jacobson, lecturer4,
- Irving Kirsch, professor of psychology5,
- Rosa N Schyner, research associate1,
- Bong Hyun Nam, research fellow1,
- Long T Nguyen, research fellow1,
- Min Park, research coordinator1,
- Andrea L Rivers, research coordinator1,
- Claire McManus, research coordinator1,
- Efi Kokkotou, assistant professor of medicine3,
- Douglas A Drossman, professor of medicine6,
- Peter Goldman, professor emeritus 7,
- Anthony J Lembo, assistant professor of medicine3
- 1Osher Research Center, Harvard Medical School, 401 Park Drive, Boston, MA 02215, USA
- 2Endicott College, 376 Hale Street, Beverly, MA 01915, USA
- 3Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA 02215, USA
- 4Department of Social Medicine, Harvard Medical School, 641 Huntington Avenue, Boston, MA 02215, USA
- 5Department of Psychology, University of Hull, Hull HU6 7RX
- 6Center for Functional GI and Motility Disorders, University of North Carolina School of Medicine, Chapel Hill, NC 27699, USA
- 7Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA
- Correspondence to: T J Kaptchuk
- Accepted 2 March 2008
Objective To investigate whether placebo effects can experimentally be separated into the response to three components—assessment and observation, a therapeutic ritual (placebo treatment), and a supportive patient-practitioner relationship—and then progressively combined to produce incremental clinical improvement in patients with irritable bowel syndrome. To assess the relative magnitude of these components.
Design A six week single blind three arm randomised controlled trial.
Setting Academic medical centre.
Participants 262 adults (76% women), mean (SD) age 39 (14), diagnosed by Rome II criteria for and with a score of ≥150 on the symptom severity scale.
Interventions For three weeks either waiting list (observation), placebo acupuncture alone (“limited”), or placebo acupuncture with a patient-practitioner relationship augmented by warmth, attention, and confidence (“augmented”). At three weeks, half of the patients were randomly assigned to continue in their originally assigned group for an additional three weeks.
Main outcome measures Global improvement scale (range 1-7), adequate relief of symptoms, symptom severity score, and quality of life.
Results At three weeks, scores on the global improvement scale were 3.8 (SD 1.0) v 4.3 (SD 1.4) v 5.0 (SD 1.3) for waiting list versus “limited” versus “augmented,” respectively (P<0.001 for trend). The proportion of patients reporting adequate relief showed a similar pattern: 28% on waiting list, 44% in limited group, and 62% in augmented group (P<0.001 for trend). The same trend in response existed in symptom severity score (30 (63) v 42 (67) v 82 (89), P<0.001) and quality of life (3.6 (8.1) v 4.1 (9.4) v 9.3 (14.0), P<0.001). All pairwise comparisons between augmented and limited patient-practitioner relationship were significant: global improvement scale (P<0.001), adequate relief of symptoms (P<0.001), symptom severity score (P=0.007), quality of life (P=0.01).Results were similar at six week follow-up.
Conclusion Factors contributing to the placebo effect can be progressively combined in a manner resembling a graded dose escalation of component parts. Non-specific effects can produce statistically and clinically significant outcomes and the patient-practitioner relationship is the most robust component.
Trial registration Clinical Trials NCT00065403.
Aside from the provision of a specific therapeutic regimen, a medical encounter might elicit non-specific or contextual benefits or what are most often called placebo effects. Experimental settings seek to contain these “nuisance” effects with placebo controls. Such non-specific effects in a clinical setting can theoretically be separated into three components: a patient’s response to observation and assessment (Hawthorne effects), the patient’s response to the administration of a therapeutic ritual (placebo treatment), and the patient’s response to the patient-practitioner interaction.1 2 3 We tested this by determining whether these distinct potential contributions to clinical care can be separated and then combined incrementally under controlled conditions to produce progressive improvement in clinical outcomes in a manner resembling a graded dose escalation of component parts. We also quantified the extent to which the patient-practitioner relationship enhances the effects of a placebo treatment alone and whether a placebo intervention is more effective than no treatment/natural course of the illnessalone.
We carried out the trial on patients with irritable bowel syndrome. This is a chronic, functional gastrointestinal disorder characterised by recurrent abdominal pain and disturbed bowel function—that is, diarrhoea, constipation, or alternation between the two.4 Irritable bowel syndrome is one of the top 10 reasons for seeking primary care and is the reason for nearly a third of all consultations with gastroenterologists,5 with an estimated direct and indirect cost in the eight major industrial countries of over $41bn (£20bn, €27bn).6 Irritable bowel syndrome seemed a suitabledisease to study because previous randomised controlled trials of treatments have shown a large positive response (about 40%) in placebo groups.7 This also suggests that it might be possible to show a graded response when the three hypothetical non-specific components of the clinical encounter were added individually or in combinations.
We conducted this randomised controlled trial in a single centre in 262 participants over two study periods of three weeks (fig 1)⇓ For the first three week period, participants were randomised to one of three groups: a “waiting list” that controlled for any effects of assessment and observation (Hawthorne effects) as well as the effects of the natural course of the illnessand regression to the mean; “limited interaction,” providing placebo treatment with minimal interaction with the practitioner; or “augmented interaction,” providing placebo treatment with a defined positive patient-practitioner relationship. Our placebo treatment was delivered with a validated sham acupuncture device. We therefore assumed that the three study groups represented the successive addition of the three postulated elements of the non-specific clinical interactions: group 1 (waiting list) having only observation alone, group 2 (limited) adding a dummy treatment, and group 3 (augmented) adding a warm, empathetic, and confident patient-practitioner relationship. All participants were evaluated at entry to the trial and after three and six weeks.
At the end of the first three week period, participants in groups 2 (limited) and 3 (augmented) were, without their knowledge, randomised a second time in equal numbers either to continue with sham acupuncture or to receive genuine acupuncture. Patient-practitioner relationships for these groups, however, remained the same. (Results of this nested secondary study, comparing acupuncture and sham acupuncture, in the second three week period will be reported elsewhere.) Data from patients in groups 2 and 3 who remained on placebo for the second period, however, again as planned prospectively, are included in this report. Participants in group 1 (waiting list) remained on the list for the second three week period. Results at three weeks provided data for the primary end point; those who remained on placebo for the additional three weeks served to provide observations on non-specific effects over time.
We randomly assigned participants to the three study arms using permuted block randomisation with variable block sizes and assignments provided in sequentially numbered opaque sealed envelopes. An administrative assistant, not otherwise involved in the study, opened the assignment envelopes and recorded the assignment of each participant in a confidential log. At three weeks, we used similar methods to randomise patients in the sham acupuncture groups to continue sham acupuncture or to switch to genuine acupuncture. This randomisation was stratified by the level of abdominal pain at the three week visit (<30 v ≥30 on a 100 point visual analogue scale).
Participants were recruited from advertisements in the media, fliers, and referrals from health professionals, were all at least 18 years old, and met the Rome II criteria for irritable bowel syndrome8 with a score of ≥150 on the symptom severity scale.9 We excluded patients if they had unexplained findings such as weight loss >10% body weight, fever, blood in stools, family history of colon cancer, or inflammatory bowel disease; they were also excluded if they had previously received acupuncture. The diagnosis of irritable bowel syndrome was based on typical symptoms and confirmed by a board certified gastroenterologist experienced in functional bowel disorders (AL) who also judged the exclusion of patients with alarm symptoms.10 11 Participants were allowed to continue medications for irritable bowel syndrome taken before entering the study (such as fibre, anti-spasmodics, and loperamide) if this therapeutic regimen had remained constant for at least the previous 30 days and they agreed to keep the regimen constant during the trial.
Group 1 (waiting list)
Participants had neither placebo treatment nor interaction with a healthcare practitioner but, like other participants, were assessed at baseline and at three and six weeks.
Group2 (limited interaction)
Participants received a placebo intervention and “limited” interaction with a practitioner (see below).
We chose dummy acupuncture for our placebo because the evidence is that acupuncture has high placebo effects.12 The validated sham acupuncture is indistinguishable from acupuncture itself.13 (The shaft of the sham device does not actually pierce the skin but creates the illusion of doing so because it retracts into a hollow handle; a small plastic mount and surgical tape hold the sham needle in place). Placebo treatments were performed twice a week, a schedule similar to that used by many acupuncturists. At each session, six to eight dummy needles were placed for 20 minutes over predetermined non-acupuncture points on the arms, legs, and abdomen; this intervention was the same for groups 2 (limited) and 3 (augmented).
Thelimited patient-practitioner relationship was established at the initial visit (duration <5 minutes) during which practitioners introduced themselves and stated they had reviewed the patient’s questionnaire and “knew what to do.” They then explained that this was “a scientific study” for which they had been “instructed not to converse with patients.” The placebo needles were then placed, and the patient left alone in a quiet room for 20 minutes, a common acupuncture practice, after which the practitioner returned to remove the “needles.” Subsequent visits were scheduled twice a week for 20 minutes. At week three, participants completed assessments and those randomised to continue the placebo treatment received an additional six sham treatments.
Group 3 (augmented interaction)
Participants in group 3 (augmented) also received six sessions of placebo acupuncture under the same conditions and in the same room(s) as group 2. Unlike participants in group 2 (limited), however, they received an augmented patient-practitioner relationship that began at the initial visit (45 minutes’ duration) and was structured with respect to both content (four primary discussions) and style (five primary points). Content included questions concerning symptoms, how irritable bowel syndrome related to relationships and lifestyle, possible non-gastrointestinal symptoms, and how the patient understood the “cause” and “meaning” of his or her condition. The interviewer incorporated at least five primary behaviours including: a warm, friendly manner; active listening (such as repeating patient’s words, asking for clarifications); empathy (such as saying “I can understand how difficult IBS must be for you”); 20 seconds of thoughtful silence while feeling the pulse or pondering the treatment plan; and communication of confidence and positive expectation (“I have had much positive experience treating IBS and look forward to demonstrating that acupuncture is a valuable treatment in this trial”). We based this intervention model on research concerning an optimal patient-practitioner relationship.14 15 Only after completing this nine item agenda did the acupuncturist place the placebo needles and leave the participant in a quiet room for 20 minutes. On returning, the practitioner “removed” the placebo needles and exchanged a few words of encouragement. Specific cognitive and behavioural interventions that might be beneficial for irritable bowel syndrome (such as relaxation,16 cognitive behavioural therapy,17 or education/counselling18) were not allowed.
Practitioners for group 2 and 3
The practitioners in this study consisted of four licensed acupuncturists, all of whom had participated in previous randomised placebo controlled trials on acupuncture. The practitioners’ training followed methods described in earlier studies of structured patient-physician interactions.19 Practitioners received 20 hours of training to ensure they were able to create the two different clinical contexts. They were instructed in advance on the “scripts” for their interactions with the two treated groups by means of a training manual, a video of model sessions, and by role playing with both simulated and real patients. During the trial, practitioners also received routine feedback from the videotaping of all sessions, which was used to score adherence to protocol (see below). Practitioners never had contact with participants in group 1 (waiting list).
Informed consent and blinding
All participants gave written informed consent, but the consent disclosure omitted certain descriptors of the trial to protect the study’s scientific validity. Thus, participants were told that the trial was a placebo controlled study of acupuncture for irritable bowel syndrome and were completely unaware of the study’s primary aim to examine placebo effects.
Although the trial was prospectively designed to investigate non-specific effects in irritable bowel syndrome, its design included a nested acupuncture substudy that allowed potential participants in the “treatment” arms to be told, truthfully, that they had a 50% chance of receiving genuine acupuncture during the trial. When the study ended, a letter was sent to all participants explaining the exact purpose of the study and offering them the opportunity to withdraw their original consent to use their data. All study personnel, except the practitioners, were blinded to participant assignment. Blinded registered nurses who were otherwise unconnected to the study conducted assessments.
Adherence to treatment
We evaluated the adherence of practitioners to protocols by videotaping all treatment sessions, of which 102 (10% of the sample) were randomly selected for evaluation. We used a well established procedure.19 20 Two research assistants otherwise unconnected with the trial separately rated each session. Reliability between raters was high (κ=0.92), and 97% of sessions were rated as adherent.
Following validated procedures in research on irritable bowel syndrome, our a priori primary outcome was a change from baseline at three weeks in the global improvement scale, which asks participants, “Compared to the way you felt before you entered the study, have your IBS symptoms over the past 7 days been: (1)=substantially worse, (2)=moderately worse, (3)=slightly worse, (4)=no change, (5)=slightly improved, (6)=moderately improved, or (7)=substantially improved.”21 22 Our other main outcome was adequate relief, which is a single dichotomous categorisation that asks participants “Over the past week have you had adequate relief of your IBS symptoms?”23 24 Neither of these primary outcomes were measured at baseline. Our other two outcomes were the symptom severity scale and the quality of life scale. The symptom severity scale is a questionnaire that measures the sum of the participant’s evaluation on a 100 point scale of each of five items: severity of abdominal pain, frequency of abdominal pain, severity of abdominal distension, dissatisfaction with bowel habits, and interference with quality of life.9 All five components contribute equally to the score, yielding a theoretical range of 0-500, in which a higher score indicates a more severe condition. The quality of life scale is a 34 item assessment of the degree to which the condition interferes with a patient’s quality of life. Each item is rated on a five point Likert scale and a linear transformation yields a summed score with a theoretical range of 0 to 100, a higher score indicating better quality of life.25 Side effects were recorded at each assessment.
We estimated a priori that a sample size of 262 would provide 95% power for finding a significant difference in scores on the global improvement scale at three weeks if the augmented, limited, and waiting list groups reported improvements of 50%, 40%, and 25%, respectively. Even if the rates were only 20%, 15%, and 10%, respectively, however, a sample size of 87 per group would afford a power of 55%. We replaced missing data from dropouts using the last observation carried forward method. We did not, however, carry forward a baseline observation to week six if a participant missed assessments at both three and six weeks.
The primary test for each outcome measure was a test of trend examining the ordered alternative hypothesis, waiting list (group 1)<limited (group 2)<augmented (group 3). For dichotomous measures, we used Cochran-Armitage tests. For continuous measures, we used a Wald test from ordinary least squares regression models with two independent variables: a treatment group variable (coded waiting list=1; limited=2; augmented=3) and the baseline (before treatment) value of the outcome variable. Using a Bonferroni correction, we considered P<0.0125 (two sided) to be significant for each test of trend. To better describe the association between group and outcome, if the trend test was significant we conducted pairwise comparisons of the groups—that is, augmented v limited and limited v waiting list. For the dichotomous outcomes we used Pearson χ2 tests, and for the continuous outcomes we used Tukey tests from analysis of covariance (ANCOVA), again using the baseline measures of the outcome variables as the covariate. All analyses were carried out on an intention to treat basis.
Between December 2003 and February 2006, we screened 350 prospective participants of whom 289 were eligible. We randomised 262 people into the three groups. (Simultaneously, we randomly selected an additional 27 patients to participate in a parallel qualitative study of identical assessments and treatments that also included a series of interviews on their experiences. Prospectively, these participants were considered a separate study.) At baseline the three groups were well balanced with regard to demographics, psychiatric symptoms (as measured by the Beck anxiety index and the Maier subscale of the Carroll depression scale), type of irritable bowel syndrome, and quality of life score (table 1)⇓, though the limited group had lower symptom severity scale scores. Our data analysis plan included the use of analysis of covariance for continuous measures such as the symptom severity scale. This adjusts for baseline differences between individuals and thus provides a statistical control for group differences when randomisation does not succeed in producing completely balanced groups on baseline measures.
Outcomes at three weeks
The observed values for all outcome measures were consistent with our prediction of a progressive improvement in symptoms among the three groups such that waiting list was less effective than limited, which was less effective than augmented. As indicated in table 2⇓ and figure 2⇓ the test of trend for each of the outcome measures was significant (P<0.001). For the global improvement scale and the adequate relief of symptoms, each of the pairwise comparisons (augmented v limited and limited v waiting list) was significant (P<0.001). For the symptom severity score, the augmented group improved significantly more than the limited group (P=0.007), but the limited and waiting list groups were not significantly different (P=0.20). We observed the same pattern for quality of life (P=0.01 and P=0.58). The proportions of patients reporting moderate or substantial improvement on the global improvement scale were 3% (waiting list), 20% (limited), and 37% (augmented) (P<0.001).
Outcomes at six weeks
For participants in the augmented and limited groups, the follow-up evaluation was limited to those who were randomised to continue placebo treatments. As can be seen in table 2 and figure 3, each of the tests for trend at week six was significant.⇑ ⇓ Moreover, except for quality of life where improvement in the waiting list group was similar to that in the limited group, the observed values for all outcome measures were consistent with our a priori prediction of order of improvement.
More than 80% of patients reported no side effects. The most common side effects included pain during needle placement (10%) and redness or swelling (6%) or pain (5%) after needle removal. At the three week assessment, 2% of patients reported that they considered increased constipation, increased diarrhoea, and dry mouth as probably caused by the treatment. Also, up to 1% of patients reported bad dreams, loss of appetite, sleepiness, fatigue, insomnia, nausea, giddiness, weakness, dizziness, and headache as possibly related to their treatment.
At the three week end point 76% of the limited group and 84% of the augmented group thought that they had been treated with genuine acupuncture. This difference was not significant (P=0.21), suggesting that blinding was successful. In contrast, at the six week follow-up, 56% of the limited group and 84% of the augmented group thought that they had been treated with genuine acupuncture. This difference was significant (P=0.02). We could not ask any questions about participants’ beliefs about their different group assignment because they were never told that the study included different patient-practitioner relationships until the study was over.
In this large prospective study of placebo effects we found that such effects can be disentangled into three components that can then be recombined to produce incremental improvement in symptoms in a manner resembling a graded dose escalation of component parts. In the pairwise comparisons, we also found that an enhanced relationship with a practitioner, together with the placebo treatment, provides the most robust effect in terms of the four measures we used. Placebo treatment with only limited interaction with practitioners was superior to staying on a waiting list with respect to only two of the four measures, suggesting that the supportive interaction with a practitioner is the most potent component of non-specific effects.
The magnitude of non-specific effects in the augmented arm is not only statistically significant but also clearly clinically significant in the management of irritable bowel syndrome. A decrease in the symptom severity score of 50 reliably indicates improvement in symptoms,9 and our study indicates that 61% and 59% of patients in the augmented arm achieved this level of improvement at three and six weeks, respectively. Likewise, the changes we observed in quality of life indicate at least moderate clinical improvement in symptoms.25 Finally, the percentage of patients reporting adequate relief (62% and 61% at three and six weeks, respectively) is comparable with the responder rate in clinical trials of drugs currently used in the treatment of irritable bowel syndrome.26 27 These results indicate that such factors as warmth, empathy, duration of interaction, and the communication of positive expectation might indeed significantly affect clinical outcome. Future investigations will have to determine the relative importance of each of these elements of the patient-practitioner relationship.
One limitation of our study is that we could not separate the effects of observation and assessment (and related issues like reporting bias). Thus, an additional control might have been a waiting list group in which participants were followed without their knowledge. As setting up such a control group would have been operationally and ethically difficult to arrange, we believe that our waiting list group is the only feasible and best baseline control for estimating the effects of placebo treatment.
Our outcome measures were subjective rather than objective. None the less, these measures, including the global improvement scale and adequate relief of symptoms are consistent with the recommendations by the Rome committees for use in trials of irritable bowel syndrome because no objective measures of severity are currently available.28 We chose irritable bowel syndrome for this study because we suspected that non-specific effects are most likely to be demonstrable in disorders defined by subjective symptoms rather than more objective measures of disease.29 Whether our findings apply to other illnesses, including those with biochemical or other objective outcome measures, awaits further study. None the less, our study has important implications for routine clinical care and suggests that routine medical care would be less efficient if patient-practitioner interactions were reduced. Based on the results of the present study, a positive patient-practitioner relationship can make a difference.
Additionally in terms of limitations, it is unclear whether our placebo outcomes correspond to biological changes in irritable bowel syndrome or have any of the biochemical, neuroendocrine, or neuroanatomical correlates of placebo response found in recent laboratory experiments30 31 or whether our outcomes are mainly related to shifts in selective attention to diffuse symptoms.32 In either case, our study represents an incremental step in placebo studies and shows that non-specific effects have a considerable clinical impact.
What is already know on this topic
In theory, the placebo effect of the clinical encounter can be divided into the response to three main components: assessment/observation, therapeutic ritual (placebo), and patient-physician relationship
What this study adds
Three components of the medical encounter can be progressively added to produce incremental improvement in symptoms
A therapeutic ritual (placebo treatment) has a modest benefit beyond no treatment
Placebo effects produce statistically and clinically significant improvement and the patient-physician relationship is the most robust component of the placebo effect
We thank Franklin Miller, Kate Stoney, and Jongbae Park for scientific mentorship; Mary Quilty, Oriana Rodrigues, Gabriel Kaptchuk, Elizabeth Morey, and Patricia Wilkinson for research assistance; and the research nurses at the BIDMC, under the direction of Mary Williams and Jamie Vickers, and acupuncturists Stephanie Prady, Bella Rosner, and Lisa Desrosiers for all their hard work. We also thank J Thomas LaMont for his thoughtful review of the manuscript and the university seminar on effective and affordable health care at Harvard University for input on study design and analysis.
Contributors: TJK is guarantor and led the conception, design, and analysis of the study. AJL, LAC, JMK, PG, RBD, CEK, EEJ, IK, RNS, DAD, and EK contributed to conception, design, and analysis. BHN, MP, ALR, CMcM contributed to design and implementation. JMK, RBD, LAC, BHN, and LTN performed statistical analyses.
Funding: NIH grant No 1R01 AT001414-01 from the National Center for Complementary and Alternative Medicine (NCCAM) and the National Institutes of Digestive, Diabetes and Kidney Disease (NIDDK), grant No 1R21 AT002860-01 from NCCAM and the Office of Behavioral and Social Science Research (OBSSR), and grant No 1 R21 AT002564 and 1K24 AT004095 from NCCAM. This research was also supported in part by grant RR 01032 to the Beth Israel Deaconess Medical Center (BIDMC) General Clinical Research Center from the NIH.
Ethical approval: Institutional review boards at the Beth Israel Deaconess Medical Center and Harvard Medical School.
Competing interests: TJK is a consultant for Kan Herbal Company, Scotts Valley, CA. AL has served on the scientific advisory boards and served as a consultant for Novartis, Takeda, Sucampo, Schwarz, Salix, Microbia, and GSK. PG is a consultant for Tsumura.
Provenance and peer review: Not commissioned; externally peer reviewed.