Rapid Responses to:

EDUCATION AND DEBATE:
Andrew Kotaska
Inappropriate use of randomised trials to evaluate complex phenomena: case study of vaginal breech delivery
BMJ 2004; 329: 1039-1042 [Full text]
*Rapid Responses: Submit a response to this article

Rapid Responses published:

[Read Rapid Response] Are the results of the Term Breech Trial generalisable?
Mary E Hannah, for the Term Breech Trial Steering Committee and Collaborative Group   (5 November 2004)
[Read Rapid Response] No Photographic Evidence
Yincent Tse   (9 November 2004)
[Read Rapid Response] Recovering vaginal breech delivery
Andrew J Kotaska   (10 November 2004)
[Read Rapid Response] Uncritical Acceptance of Clinical Trials
Harold Ewart Woolley   (15 November 2004)
[Read Rapid Response] PLEASE DON'T THROW THE BABY OUT WITH THE BATHWATER
Isabelle Boutron, Bruno Giraudeau PhD, Philippe Ravaud MD,PhD   (17 November 2004)
[Read Rapid Response] Randomized Trials Should Be Used for Complex Procedures
George D. Carson   (17 November 2004)
[Read Rapid Response] Continuing vaginal breech birth: a German perspective
Michael Krause   (19 November 2004)
[Read Rapid Response] External Validity and Internal Trial Conditions
Michael C Klein   (24 November 2004)
[Read Rapid Response] Another reason why randomised controlled trials do not necessarily tell the whole truth in clinical medicine
Petra AMM De Sutter   (10 January 2005)

Are the results of the Term Breech Trial generalisable? 5 November 2004
 Next Rapid Response Top
Mary E Hannah,
Professor, Department of Obstetrics and Gynaecology
Sunnybrook and Women's, CRWH, MIRU, 790 Bay St, Toronto, ON, M6G 1N8, Canada,
for the Term Breech Trial Steering Committee and Collaborative Group

Send response to journal:
Re: Are the results of the Term Breech Trial generalisable?

The dramatic results of the Term Breech Trial, which have influenced clinical practice worldwide, are continuing to stimulate discussion and debate. Dr Kotaska’s article questions the generalisability or external validity of the trial results.

Dr Kotaska suggests that a vaginal delivery rate of 57% in the planned vaginal birth group of the Term Breech Trial (68% in countries with a high national perinatal mortality rate; 45% in countries with a low perinatal mortality rate), is higher than would occur in experienced centres. We disagree. Published rates of vaginal breech delivery among those women having a trial of labour (thus excluding the many women having an elective caesarean) are actually very similar or higher than what occurred in the planned vaginal birth arm of Term Breech Trial.1-3 Dr Kotaska also suggests that practitioners in the Term Breech Trial increased their vaginal delivery rates beyond their comfort level. Does Dr Kotaska mean to imply that a practitioner would undertake a vaginal breech delivery that he or she did not feel comfortable doing, simply because the woman was in the Term Breech Trial? We believe that such a practice would be unethical, unprofessional, and unlikely given the current medico-legal climate in many of our countries.

We agree that surgical and other complex procedures are more difficult to evaluate than medical interventions, and that operator expertise is of crucial importance. Indeed, in the Term Breech Trial, the presence of a skilled and experienced practitioner at vaginal breech delivery was significantly associated with better outcome for the baby.4 However, although variation in operator expertise and technique is inherent in surgical procedures, randomised controlled trials continue to provide the best evidence as to whether these procedures result in more good than harm. We sympathize with the many practitioners who do not believe that the results of the Term Breech Trial apply to them. No one was more disappointed with the findings of the trial than the Term Breech Trial collaborating clinicians who were hoping to prove the safety of vaginal breech delivery, and who, by participating in the trial, collectively put their vaginal breech delivery skills to the test.

Dr Kotaska criticizes the selection criteria and the intrapartum management protocol of the Term Breech Trial as being too broad and suggests that the results of the Term Breech Trial do not apply in centres that have more restrictive criteria and protocols. The Term Breech Trial protocol was developed by a group of obstetricians who were recognised in their communities as experts at vaginal breech delivery, and was vetted by experienced obstetricians worldwide prior to beginning the trial and while it was in progress.5 However, if others support the view that the question of how best to deliver some women with a singleton fetus in breech presentation at term, using a more restrictive protocol, is still unanswered, we would encourage them to enroll their patients in a randomised controlled trial, so that we can learn whether a policy of caesarean is (or is not) better than a policy of vaginal breech delivery for these more highly selected women using these more restrictive protocols.

Finally, for those women preferring a vaginal breech birth, they should be reassured that although planned caesarean section reduced the risk of perinatal or neonatal mortality or serious neonatal morbidity, compared to planned vaginal birth, in the Term Breech Trial, 95% of babies in the planned vaginal birth arm did well. Also, although our statistical power was limited, we did not find planned caesarean to be associated with better outcomes for the children at 2 years of age.6

References

1. Sanchez-Ramos L, Wells TL, Adair CD, Arcelin G, Kaunitz AM, Wells DS. Route of breech delivery and maternal and neonatal outcomes. Int J Gynaecol Obstet 2001;73:7-14.

2. Schiff E, Friedman SA, Mashiach S, Hart O, Barkai G, Sibai BM. Maternal and neonatal outcome of 846 term singleton breech deliveries: seven-year experience at a single center. Am J Obstet Gynecol 1996;175:18-23.

3. Irion O, Almagbaly PH, Morabia A. Planned vaginal delivery versus elective cesarean section: a study of 705 singleton term breech presentations. Br J Obstet Gynaecol 1998;105:710-7.

4. Su M, McLeod L, Ross S, Willan A, Hannah WJ, Hutton E, Hewson S, Hannah M for the Term Breech Trial Collaborative Group. Factors associated with adverse perinatal outcome in the Term Breech Trial. Am J Obstet Gynecol 2003;189:740-5.

5. Hannah WJ, Allardice J, Amankwah K, Baskett T, Cheng M, Fallis B, Farquharson D, Gauthier R, Hannah M, Hewson S, Lalonde A, Lange I, Milne K, Mitchell B, Penkin P, Ritchie K, Hackett G, Walkinshaw S, Turner M. The Canadian consensus on breech management at term. J SOGC 1994; 16(4):1839-58.

6. Whyte H, Hannah ME, Saigal S, Hannah WJ, Hewson S, Amankwah K, Cheng M, Gafni A, Guselle P, Helewa M, Hodnett E, Hutton E, Kung R, McKay D, Ross S, Willan A for the 2 year infant follow-up Term Breech Trial Collaborative Group. Outcomes of children at 2 years after planned cesarean birth vs planned vaginal birth for breech presentation at term: the international randomized Term Breech Trial. Am J Obstet Gynecol 2004;191:864-71.

Competing interests: None declared

No Photographic Evidence 9 November 2004
Previous Rapid Response Next Rapid Response Top
Yincent Tse,
SpR Paediatrics
Paediatric Intensive Care unit, Newcastle General Hospital, Newcastle Upon Tyne NE4 6BE

Send response to journal:
Re: No Photographic Evidence

One of the articles in your recent themed issue of Evidence Based Medicine featured a photograph (page 1039) illustrating a vaginal breech delivery. In the photograph the midwife is preparing to suction meconium from the baby’s oropharynx using a device that generates suction from her mouth. Ironically there is ample evidence that oropharyngeal suction does not prevent meconium aspiration syndrome and there is also an infection risk to the midwife from accidental ingestion of the said meconium.

References:

Vain NE. Szyld EG. Prudent LM. Wiswell TE. Aguilar AM. Vivas NI. Oropharyngeal and nasopharyngeal suctioning of meconium-stained neonates before delivery of their shoulders: multicentre, randomised controlled trial. Lancet. 364(9434):597-602, 2004

Ballard JL. Musial MJ. Myers MG. Hazards of delivery room resuscitation using oral methods of endotracheal suctioning. Pediatric Infectious Disease. 5(2):198-200, 1986

Competing interests: None declared

Recovering vaginal breech delivery 10 November 2004
Previous Rapid Response Next Rapid Response Top
Andrew J Kotaska,
Senior Registrar, University of BC; Visiting Registrar, South Nürnberg Perinatal Clinic
BC Women's Hospital, Vancouver BC V6H 3V5 Canada

Send response to journal:
Re: Recovering vaginal breech delivery

The Term Breech Trial Collaborative Group are to be commended for their comprehensive two-year follow-up of their trial.1 The widely touted short-term advantage of elective cesarean section over planned vaginal birth has disappeared. Eighteen infants with “serious (early) morbidity” as defined by experts were followed, and seventeen were neurologically normal at two-years of age. Even with the average level of care and the bias of license present in the trial, 97% of women in both groups had a normal child, rendering further debate about the trial’s external validity academic.

Professor Hannah brings attention to the varying rates of vaginal delivery in the literature. She and I both cite Irion and Schiff, who participated in the term breech trial and who, in their own retrospective series, delivered 39% of all breeches vaginally, with less mortality and short-term morbidity than the term breech trial.2,3 She misses the point. Such experienced centres exist, but were under-represented in the term breech trial (Irion and Schiff out of 121 centres). Why did none of the published centres from Austria, France, Ireland, Germany, Sweden, or Norway participate? In Norway, 40% of breech deliveries are vaginal vs. 12% in North America. Some of these European centres with demonstrated safety and expertise in vaginal breech delivery, including the most experienced unit in Germany, deemed the trial protocol unsafe and declined to participate in the trial. This constitutes a selection bias that resulted in a lower than average level of care within the trial.

It is difficult to compare overall vaginal delivery rates because individual units allow vastly different proportions of women with a breech to labour(32% to 94%).4,5 Regardless, our maternity unit, the largest in Canada, delivers 200+ breech babies from 7000 births annually. Before the term breech trial, 12% of these delivered vaginally: one per consultant and per registrar per year and one per labour nurse every 5 years. This rate is average for North America, and reassurances about “the presence of a skilled practitioner” aside, it does not represent a body of expertise with vaginal breech delivery. Varying levels of expertise would be expected to result in varying rates of successful, safe vaginal delivery. Demanding all units achieve a 40+% vaginal delivery rate in a trial setting certainly invited some of them to push their limits. Despite Professor Hannah’s doubts about the existence of a bias of license, comfort level at the edge of one’s capabilities is a delicate and etherial entity.

Professor Hannah’s suggestion that the safety and success of experienced units can be defined in a reductionist fashion with selection criteria and restrictive protocols also misses the point. These units are by no means uniform, having developed a variety of strategies and safety measures and some unifying philosophies over years. Success rates vary widely from 14% to 66% of all breech presentations.5,6 Like many complex human skills, expertise is acquired through mentorship and attention to detail and cannot be adequately defined or exported via written protocols. The recent Norwegian audit suggests that even for experienced units, pushing the envelope will at some point compromise safety.7 Each practitioner (and unit) has his/her own level of comfort and capability. Pushing this level demands caution, whether with vaginal hysterectomy or breech delivery; safety will be compromised if one pushes too hard.

Professor Hannah’s call to centres demonstrating expertise and safety with vaginal breech delivery to mount another randomised trial suggests an uncritical enthusiasm for large-scale randomisation. Maternity units that have proven safety with adequate numbers through meticulous self-audit have no need for a randomised trial. They understand the complex nature of the phenomenon, the dependence of success and safety on careful development and maintenance of expertise, and the inability to describe, teach or ensure this expertise with written protocols, even those from Professor Hannah’s research unit. Consequently, some have also declined to participate in her randomised study of elective cesarean section in twin gestations: another selection bias.

The rapid transformation of the early term breech trial results into clinical practice guidelines has effectively eliminated the option of vaginal breech delivery for most women in North America and the U.K.8,9 With publication of the 2-year follow-up showing no difference in long- term outcome, many women will again desire a trial of labour. Our task is to learn from the remaining centres demonstrating expertise in vaginal breech delivery and carefully re-establish the skill in motivated centres. New guidelines are urgently needed to release the choke-hold of the early term breech trial results on clinical practice and re-establish choice for women. Despite the unquestionable utility and power of randomised trials, the safety of some complex procedures may be more effectively demonstrated through careful self-audit than through large-scale randomisation. Perhaps new guidelines should reflect this.

References:

1.Whyte H, Hannah ME, Saigal S, Hannah WJ, Hewson S, Amankwah K, et al. for the 2 year infant follow-up Term Breech Trial Collaborative Group. Outcomes of children at 2 years after planned cesarean birth vs. planned vaginal birth for breech presentation at term: the international randomized Term Breech Trial. Am J Obstet Gynecol 2004;191:864-71

2.Irion O, Almagbaly PH, Morabia A. Planned vaginal delivery versus elective cesarean section: a study of 705 singleton term breech presentations. Br J Obstet Gynecol 1998;105:710-7

3.Schiff E, Friedman SA, Mashiach S, Hart O, Barkai G, Sibai BM. Maternal and neonatal outcome of 846 term singleton breech deliveries: seven year experience at a single centre. Am J Obstet Gynecol 1996;175:18- 23

4.Sanchez-Ramos L, Wells TL, Adair CD, Arcelin G, Kaunitz AM, Wells DS. Route of breech delivery and maternal and neonatal outcomes. Int J Gynaecol Obstet 2001;72:7-14

5.Krause PM. (Nürnberg breech study: is cesarean section the better mode of delivery for the child?) Hebamme 2001;14(3):137-147

6.Hauth JC, Cunningham FG. Vaginal breech delivery is still justified. Obstet Gynecol 2002;99:1115-6

7.Haheim LL, Albrechtsen S, Berge LN, Bordahl PE, Egeland T, Henri Oian P. Breech birth at term: vaginal delivery or elective cesarean section. A systematic review of the literature by a Norwegian review team. Acta Obstet Gynecol Scand. 2004 Feb;83(2):121-3

8.ACOG Committee Opinion No. 265. Mode of term singleton breech delivery. December 2001. American College of Obstetricians and Gynecologists. Int J Gynaecol Obstet 2002;77(1):65-6

9.RCOG Guidelines: Clinical Green Top Guidelines: The Management of Breech Presentation

Competing interests: None declared

Uncritical Acceptance of Clinical Trials 15 November 2004
Previous Rapid Response Next Rapid Response Top
Harold Ewart Woolley,
Obstetrics Dept. Head (retired)
Burnaby Hospital Vancouver.

Send response to journal:
Re: Uncritical Acceptance of Clinical Trials

In his penetrating analysis of the Breech Trial reported by Hannah M. E. et al. Lancet 2000;329: 1375-83 Dr. Andrew Kotaska, with devastating lucididity, examines the defects in the report and the potentially serious implications of its uncritical acceptance. It seems remarkable that an article which may subject thousands of women to a major surgical procedure should have elicited no editorial response. For a prestigious journal to publish so significant a report without careful assessment by experienced obstetricians, or at least so it would appear for there is no other obvious explanation, and to have to have the shortcomings brought to the attention of the profession through the report of a Senior Registrar Dr. Andrew Kotaska BMJ.2004;329: 1039-1042 (30 October 2004) is unfortunate, to say the least.

Competing interests: None declared

PLEASE DON'T THROW THE BABY OUT WITH THE BATHWATER 17 November 2004
Previous Rapid Response Next Rapid Response Top
Isabelle Boutron,
MD
Dpt Epidémiologie, Biostatistique Recherche Clinique, INSERM E057, Hospital Bichat, Paris,
Bruno Giraudeau PhD, Philippe Ravaud MD,PhD

Send response to journal:
Re: PLEASE DON'T THROW THE BABY OUT WITH THE BATHWATER

Isabelle Boutron, MD*, Bruno Giraudeau PhD**, Philippe Ravaud, MD, PhD*

*Département d’Epidémiologie, Biostatistique et Recherche Clinique, INSERM EMI 0357, Groupe Hospitalier Bichat-Claude Bernard (AP-HP), Faculté de Médecine Xavier Bichat, Université Paris VII, 75018 Paris, France **INSERM CIC 202, Tours, France

The article by Kotaska(1) outlines the important issue related to care providers’ skill and settings’ experience when assessing nonpharmacological treatments such as vaginal breech delivery in randomised controlled trials (RCT). We agree that the evaluation of nonpharmacological treatments raises specific methodological issues, among them care providers’ skill(2). Actually, in RCTs of nonpharmacological treatment, care providers are also part of the intervention to be tested, and bias could occur with (i) highly skilled or experienced care providers in one arm and low-skilled or less-experienced care providers in the other or (ii) care providers with more experience in performing one of the interventions tested than another. However, appropriate methodological organisation and planning of RCTs in this field could circumvent this bias. Thus, care providers in a trial of surgery could be trained and selected only if they achieved a predetermined standard(3) or selected according to their experience of the procedure. For example, in a trial assessing carotid endarterectomy, each surgeon had their last 50 operations assessed and if more than 2 of the operations resulted in complications the surgeons could not join the trial(4). Such a prerequisite then allows the surgical procedure to be assessed in the context of the skills required to achieve it. As well, patients could be randomised not to operations but to care providers, who would perform their treatment of preference, thus ensuring similar skill levels in the two arms of the trial(5).

Thus, although this article was an interesting example of potential bias linked to care providers’ experience in an RCT assessing nonpharmacological treatment, the author generalizes sweepingly when concluding that “complex procedures are poorly amenable to the methods of large multicentre randomised trials”. To condemn multicentre RCTs assessing complex interventions because of one imperfect RCT appears as inappropriate as defining a new standard of care for vaginal breech delivery on the basis of a unique and potentially biased trial.

1.Kotaska A. Inappropriate use of randomised trials to evaluate complex phenomena: case study of vaginal breech delivery. BMJ. Oct 30 2004;329:1039-1042.

2.Boutron I, Tubach F, Giraudeau B, Ravaud P. Methodological differences in clinical trials evaluating nonpharmacological and pharmacological treatments of hip and knee osteoarthritis. JAMA. Aug 27 2003;290:1062-1070.

3.Feldon SE, Scherer RW, Hooper FJ, et al. Surgical quality assurance in the Ischemic Optic Neuropathy Decompression Trial (IONDT),. Controlled Clinical Trials. 2003;24:245-354.

4.Barnett HJ, Taylor DW, Eliasziw M, et al. Benefit of carotid endarterectomy in patients with symptomatic moderate or severe stenosis. North American Symptomatic Carotid Endarterectomy Trial Collaborators. N Engl J Med. Nov 12 1998;339:1415-1425.

5.McCulloch P, Taylor I, Sasako M, Lovett B, Griffin D. Randomised trials in surgery: problems and possible solutions. BMJ. 2002;324:1448- 1451.

Competing interests: None declared

Randomized Trials Should Be Used for Complex Procedures 17 November 2004
Previous Rapid Response Next Rapid Response Top
George D. Carson,
Head of Obstetrics and Gynecology
Regina Qu'Appelle Health Region, Regina, SK, S4P 0W5

Send response to journal:
Re: Randomized Trials Should Be Used for Complex Procedures

Not every decision requires a randomized trial. For example, the use of parachutes! (1) But this powerful method of investigation has often usefully informed our practices and added evidence to help with making the choices that can be done in consultation by doctors and patients. The randomization controls for the biases of which we are aware and, even more importantly, the ones of which we are not aware.

Now Andrew Kotaska has argued in a BMJ article (BMJ 2004; 329:1039- 42) that randomized trials cannot appropriately be used to evaluate complex phenomena, using the recent Term Breech Trial as an example. He further alleges that the conduct of the trial may have led to an inappropriately poor standard of care.

I think he is wrong on both counts. Here is why:

I fully acknowledge the complexities of both judgment and skill that go into the decision of whether or not to attempt a vaginal delivery when the fetus is in breech presentation, of more judgment during the conduct of labour, if that is chosen, and skill for the performance of the manoeuvers to assist with the delivery. Despite the claim that this constitutes a limitation on the proper use of a randomized trial, it is precisely because of this degree of complexity, particularly with judgment, that may allow the inapparent creeping in of bias, that a randomized trial is required for evaluation.

Cohort and other trials of vaginal breech delivery have demonstrated, in some instances, a remarkably good outcome. This may simply illustrate very good judgment by a few unusually good physicians in selecting which patients should attempt a vaginal birth breech. It may also reflect judgment as to which groups of patients to report in the literature.

Randomized trials may indeed be applied to complex conditions and reveal differences that otherwise might not be apparent. An example is the application of fetal endoscopic tracheal occlusion for management of congenital diaphragmatic hernia. This intervention was biologically highly plausible, and initial, uncontrolled observations, as is often the case when new procedures are performed by enthusiasts, yielded optimistic results. The eventual trial (2) compared the new intervention to standard care. Both were highly complex packages of care. Despite the expectations of the investigators at the beginning of the trial, it was stopped after 24 patients had been enrolled because there was no improvement in survival or morbidity. There was a substantial initial intervention and then a cascade of various events associated with the intervention ( have we heard that before?) The randomized trial provided the basis for decision-making as to whether to continue this intervention or, wisely, not.

Evaluation of interventions with respect to their effectiveness has been increasingly called for in surgery. (3) In calling for a most effective evaluation as to whether complex procedures are effective (4), it is noted that “although clinical observations can provide important insights, thet are limited by lack of objectivity.”

Examples of contradictory observational and randomized trial results are not restricted to medical interventions and are found for procedural interventions. Because most interventions have moderate rather than large treatment effects, they are the ones most susceptible to misleading conclusions from observational studies. The performance of an episiotomy is a less complex matter of both judgment and skill than breech delivery but this intervantion was usefully studied in randomized trials. It is the results of those randomized trials that contributed to a desirable ( because of what we know from the randomized trial results) reduction in the previously "normal routine" provision of episiotomy.

Andrew Kotaska uses the concept of “bias of licence” to suggest that perhaps there was some suspension of the normal devotion to good care and safety of patients, which he alleges practitioners may have done by increasing too much the number of breech deliveries they did. It is implied that the quality of care may have been inferior in the breech trial. I must acknowledge my bias at this point. Ours was one of the participating centres in the Term Breech Trial, and I was one of the practitioners who delivered women who had been randomized. I was hoping to achieve a good outcome for the patients randomized to vaginal delivery of a breech (which is not actually different from what I hope for for all of the patients for whom I have the privilege to provide care). I can speculate that there may have been more than the usual pressure to resort to a caesarean section if it seemed the outcome might not be good by proceeding with a vaginal delivery to avoid having bad outcomes with fetal entrapment in the vaginal delivery group.

The comparison of the rates of vaginal deliveries of women with fetuses in breech presentation may be spurious. The calculation of 13 percent of all breeches being delivered vaginally in centres participating in the Term Breech Trial would include in the denominator women with breeches who were not suitable for vaginal delivery and others who declined having an attempted vaginal delivery. By important contrast, the minimum 40 percent actual vaginal births sought in the limb of the Term Breech Trial for women randomized to attempt a vaginal delivery would have included only women who met the stringent inclusion criteria defined by a consensus of experts and who were willing to accept randomization to the possibility of vaginal delivery. The subsequent conduct of the labour for each individual woman randomized to attempt a vaginal delivery would then have been done with the judgment, skill, and commitment to good care by individual obstetricians.

Achievement of at least a 40 percent actual vaginal birth rate in this selected group is consistent with that of cohort studies from other centres with similar maternal consent and expert selection and conduct of the labour and delivery as noted by Dr. Kotaska. Achievement of such numbers should not only be not surprising but reasonably expected.

In any case, the evidence from a variety of trials is that the outcome of care provided within a trial is generally better than the outcome of comparable care for comparable conditions provided outside of the trial. (5,6) The first of these references notes that “clinical trials have a positive rather than negative effect on the outcome of patients.” In larger trials where effective treatment already exists and is included in the trial protocol (and that surely applied to the Term Breech Trial) whichever limb one would regard as “effective” has better results for trial participants.

It is, of course, a matter of judgment whether to apply the results of any randomized trial to a care of one’s own patients. Included in that judgment is the determination of the individual practitioner as to whether the trial itself was well done and whether the patients in the populations studied are sufficiently like the population of the individuals within the population cared for by the practitioner to be applicable to them.

One can always argue that each individual is unique, and yet in the practice of medicine we must also be appropriately guided by what works for most of the people most of the time since that is most likely to apply to the work that we are doing. It is advanced as a criticism of the Term Breech Trial that this involved multiple practitioners and multiple sites, and yet it is just that fact that may make its findings more generalizeable. The speculation by Dr. Kotaska in his paper that practitioners were somehow induced “to exceed their comfort level with vaginal breech delivery” is only speculation and is inconsistent with the results from the evaluation of the outcomes of care in other trials.

Of the two options within the term breech trial, I argue that the judgment and skill to perform a vaginal breech delivery is probably more complex and demanding than doing a caesarean section. Thus, the improvement in performance, which is a characteristic of the outcome of care within the context of a clinical trial, is more likely to have been beneficial for the vaginal breech deliveries in the trial than for the caesarean section limb. In other words, the short term disadvantage of intended vaginal delivery of a breech fetus is likely to be wider in the real world than in the trial results.

The two-year follow-up results of the term breech trial have, of course, just recently been published and have demonstrated no difference in the long-term outcomes for a subgroup in the trial which is different from the composite short-term results. This is consistent with much data about the inherent toughness of newborns who tend to do quite well in the long-run, even with unfavourable immediate assessments.

I, too, was very disappointed in the outcome of the term breech trial. I like delivering breeches vaginally, and I am still willing to do so for women who appear likely to succeed and who, after consideration of the available information and their own particular situation, want to try for a vaginal delivery. I had hoped that the term breech trial would have provided an outcome more consistent with my pre-trial hopes and contribute to reversing the trend to increasing caesarean section delivery of breeches. Disappointment at the message, however, should not lead to disparaging comments about the messengers or denigration of the methodology leading to the message. We owe our patients and our professional integrity a greater devotion to the acquisition and use of the best available relevent information., so that we can use that information when we are decision making with our patients.

References

1. Gordon C.S. Smith, Jill D. Pell. Parachute Use to Prevent Death in Major Trauma Related to Gravitational Challenge: Systematic Review of Randomized Controlled Trials BMJ 2003; 327: 1450.

2. Michael R. Harrison, Roberta L. Keller, Samuel B. Hawgood et al N Engl J Med 2003; 349: 1916-24.

3. Moritz N. Wente, Christoph M. Seiler, Waldemar Uhi, Markus Buchler Dig Surg 2003;20: 263-269.

4. P.J. Devereaux, Michael D. McKee, Salim Yusuf. Methodologic Issues in Randomized Controlled Trials of Surgical Interventions, Clin Orthopedics and Related Research 2003; 413: 25-32.

5. David A. Braunholtz, Sarah JL Edwards, Richard J Lilford Are randomized clinical trial good for us (in the short term)? Evidence for a "trial effect." J Clin Epidemiology 2001; 54: 217-224.

6. Al Hallstrom, Lawrence Friedman, Pablo Denes, Carlos Rizo-Patron, Mary Morris and the CAST and AVID Investigators Do arrythmia patients improve survival by participating in randomized clinical trials? Observations from the cardiac arrhythmia suppression trial (CAST) and the Antiarrhythmics Versus Implantable Defibrillators Trial (AVID) Controlled Clinical Trials 2003: 24: 341-352.

Competing interests: None declared

Continuing vaginal breech birth: a German perspective 19 November 2004
Previous Rapid Response Next Rapid Response Top
Michael Krause,
Perinatologist and Clinical Director
Nürnberg Perinatal Unit, D - 90471 Nürnberg

Send response to journal:
Re: Continuing vaginal breech birth: a German perspective

At the beginning of the Term Breech Trial, our centre deliberated extensively over the trial protocol. Over 15 years, we have accumulated a large volume of experience with breech birth (1500 births). Through audit of our results, we determined that vaginal delivery is safer than planned cesarean section and does not lead to higher perinatal mortality or neonatal morbidity. After long discussions, we decided on ethical grounds not to participate in the Term Breech Trial. For years, our vaginal delivery rate for breech presentation has been over 60%.1 To have taken part would have meant that 90 to 100% of women with a breech presentation randomised to elective cesarean section would receive a cesarean section. Without the trial, over 60% of these women delivered vaginally. The excess surgical intervention without neonatal benefit was unjustifiable.

For us, the question whether a planned cesarean section or a planned vaginal birth is safer for the newborn had already been answered. Through close neonatal follow-up, we have observed that the excess early morbidity associated with vaginal breech delivery stems primarily from respiratory acidosis (pH < 7.2, Base deficit < 15), of no long term consequence for normally grown term newborns. A German group demonstrated in 621 breech children followed to 56 months of age that the mode of delivery had no influence on long-term neurological morbidity.2 Other groups have reached the same conclusion, including the Term Breech Trial Collaborative Group in their recently published 2-year follow-up.3,4,5

The complexity of birth, particularly birth from a breech presentation, does not lend itself easily to investigation through a protocol in which many deciding factors can only be described. Experience and expertise can not be transferred through a protocol. For a safe vaginal breech birth, meticulous risk assessment, case selection, and peripartum management are prerequisites, and the structure of the birthing unit and immediate availability of an experienced operative obstetrician are of paramount importance. As this is also true for the safe vaginal delivery of twins, we have similarly declined to participate in the Twin Birth Study, also led by Professor Hannah.6 Our vaginal delivery rate for dichorionic twins, irrespective of the presentation of Twin 1, is 75%, without increase risk to Twin B in an audit of over 500 twin births. The Twin Birth Study will have the same short-term endpoint and the same problems with external validity as outlined in Dr. Kotaska’s article.7 Again, large-scale randomisation is an inappropriate method of investigation.

In contrast, our unit has eagerly embraced the large, multicenter randomised Multiple Antinatal Corticosteroids (MACS) trial, also from Professor Hannah’s unit.8 The methodology, including a blinded placebo control arm, is well suited to the question of whether multiple courses of steroids improve short and long-term neonatal outcome in the context of threatened premature birth.

In Bavaria, 3845 term breech deliveries occurred in 2003. Despite the results of the Term Breech Trial, the proportion of breeches delivered by elective cesarean section has dropped from 71.1% in 2000 to 63.7%% in 2003.9 In Nürnberg, we have not altered our management and continue to recommend planned vaginal birth in over 90% of breech presentations.

References:

1.Feige, A; Krause, M: Beckenendlage (Breech Presentation), Urban & Schwarzenberg, 1998

2.Wolke, D; Söhne, B; Schulz, J; Ohrt, B; Riegel, K: Die kindliche Entwicklung nach vaginaler und abdominaler Entbindung bei Beckenendlage. In: Beckenendlage. Feige, A, Krause, M (eds.), Urban & Schwarzenberg 1998, p.186-206

3.Danielian, PJ; Wang, J; Hall, MH: Long term outcome by method of delivery of fetuses in breech presentation at term: population based follow up. BMJ 1996;312:1451-3

4.Munstedt, K; von Georgi, R; Reucher, S; Zygmunt, M; Lang, U: Term breech and long-term morbidity – caesarean section versus vaginal breech delivery. Eur J Obstet Gynecol Reprod Biol 2001;96(2):163-7

5.Whyte, H at al. for the 2-year infant follow-up Term Breech Trial Collaborative Group: Outcomes of children at 2 years after planned cesarean birth versus planned vaginal birth for breech presentation at term: The International Randomized Term Breech Trial. Am Obstet Gynecol 2004;191:864-71

6.www.utoronto.ca/miru/tbs/main.htm

7.Kotaska, A: Inappropriate use of randomisation trials to evaluate complex phenomena: case study of vaginal breech delivery. BMJ 2004;329:1039-42

8.www.utoronto.ca/miru/macs/main.htm

9.Bayerische Arbeitsgemeinschaft für Qualitätssicherung in der stationären Betreuung. Qualitätsbericht Geburtshilfe, Jahresauswertungen 2000, 2003

Competing interests: None declared

External Validity and Internal Trial Conditions 24 November 2004
Previous Rapid Response Next Rapid Response Top
Michael C Klein,
Emeritus Professor of Family Practice and Pediatrics, University of British Columbia
BC Research Institute for Children's and Women's Health, Vancouver BC Canada, V0N2W0

Send response to journal:
Re: External Validity and Internal Trial Conditions

To the editor: It is not a question of rejecting RCTs as a methodology. It is only about the methodology of particular RCTs and if they were well-constructed and if some conditions of the trial might or might not have external validity. RCTs are "is" studies. They say that something is or is not apparently true for the conditions under which the trial was constructed. For other conditions/settings/groups of practitioners that truth may not be a reality. "Why" studies try to determine why something appears to be true or not. RCTs are not "why" studies.

For some German and Scandinavian practitioners who did not join the Term-Breech Trial the "conditions" of the trial did not represent their reality, so they declined to participate as they determined in advance that the results could not have enough external, or in their case, local validity to induce them to participate. From the discussion so far we can probably deduce that the Term-Breech trial had truth for the conditions of the trial. The discussion is only about if the conditions of the Trial represented most clinical realities or only the realities of that trial

I would like to use some material from our own RCT of episiotomy to illustrate some of these points. Obviously we thought that we had constructed a good trial and most people seemed happy with the results i.e. they confirmed what they believed. But let’s look a deeper. For the most part outcomes in the two arms of our classical RCT analyzed by intention to treat were equivalent, except for a little of extra benefit for multips in the restricted episiotomy arm. That was enough for most people, as the proponents of routine episiotomy had claimed superiority of routine episiotomy. But further study showed that for nulliparous women, 3rd/4th degree tear rates ran about 12-13% which seemed rather high as few people had rates so high outside the trial (sound familiar?). This caused us to look inside the trial for a "why" answer. What we found was that some practitioners regardless of trial arm had 3rd/4th degree tear rates in excess of 20% and others rarely if ever had such tears. Similarly some practitioners (the same ones), almost never had a patient with an intact perineum while others had them almost all the time.

Hence if we looked at the trial not by intention to treat but by the behaviour of the practitioners within the trial, we found out another truth. Those practitioners, regardless of trial arm, who truly limited their episiotomy use, had superior results (no severe tears and lots of intact perineums) while those who had trouble following the trial protocol and either could not randomize very often or when randomized did episiotomy almost all the time, "owned" the severe tears and none of the intact perineums (sound familiar)?

We do not reject the RCT that we ran; we just found that aditional interesting information came from a detailed analysis of what was going on inside the trial. If the trial were reanalyzed after eliminating "non- compliant" physicians, who after all really did not know how to attend birth without episiotomy, the results would dramatically favour episiotomy limitation--even by intention to treat. Without thinking hard about the meaning of the trial, we would know less than we now know.

I think that the Term-Breech Trial perhaps can tell us additional important information if it were further analyzed by subgroups according to the proportion of breech births delivered vaginally at each participating study site or site groupings--before the trial was launched.

Our sub analyses were published in various articles, perhaps the most interesting being the one from the CMAJ on beliefs and behaviour:

1. Klein MC, Gauthier R, Robbins J, Kaczorowski J, Jorgensen S, Franco E, Johnson B, Waghorn K, Gelfand M, Guralnick M, Luskey G, Joshi J: Relation of Episiotomy to Perineal Trauma and Morbidity, Sexual Dysfunction and Pelvic Floor Relaxation. American Journal of Obstetrics and Gynecology. 1994:171 (3):591-8.

2. Klein MC, Kaczorowski J, Robbins JM, Gauthier RJ, Jorgensen SH, Joshi AK: Physician Beliefs and Behaviour within a Randomized Controlled Trial of Episiotomy: Consequences for Women under their Care. Can Med Assoc J. 1995; 153(6):769-79. See also supporting editorial: Schulz KF: Unbiased Research and the Human Spirit: The Challenges of Randomized Controlled Trials [editorial]. Can Med Assoc J. 153 (6):783-786, September 15, 1995, and Graham I: I Believe Therefore I Practise [commentary on our work]. Lancet 347:4-5, January 6, 1996

3. Klein M: Studying Episiotomy: When Beliefs Conflict with Science. J Fam Practice. 1995: 41(5):483-8.

4. Klein M, Janssen PA, MacWilliams L, Kaczorowski J, Johnson B: Determinants of vaginal/perinealIntegrity and pelvic floor functioning in childbirth. Am J Obstet & Gynecol. 1997;176:403-10.

Michael C. Klein, MD, CCFP, FAAP(Neonatal-Perinatal),FCFP, ABFP
Emeritus Professor of Family Practice and Pediatrics
University of British Columbia, Room L309B Shaughnessy Building, Faculty Centre Community and Child Health Research, BC Research Institute for Children's and Women's Health, 4500 Oak Street, Vancouver, V6H 3N1
Email: mklein@interchange.ubc.ca

Competing interests: None declared

Another reason why randomised controlled trials do not necessarily tell the whole truth in clinical medicine 10 January 2005
Previous Rapid Response  Top
Petra AMM De Sutter,
Professor in reproductive medicine
University hospital Gent, Belgium

Send response to journal:
Re: Another reason why randomised controlled trials do not necessarily tell the whole truth in clinical medicine

The randomised controlled trial (RCT) is considered to be the gold standard (or at least the silver standard, Simon, 2001) in clinical research and by far superior to all other forms of study design. There are good reasons to accept this. Non-randomised trials may generate all kinds of bias. Observed outcomes may be caused by differences among the patients given the two treatments, rather than the treatments alone (Barton, 2000). The only way to avoid known differences (selection bias) as well as concealed differences (confounding bias) between treatment and control groups, is to let fate determine to which group a given patient will be allocated (Grimes and Schultz, 2002). Blinding furthermore precludes information bias. One can expect that, if the sample size is large enough, the play of chance will conduct to similar groups, in which possible confounding factors are equally distributed. Any result in outcome may then be attributed to the intervention under study. If no significant differences in outcome are detected, either the intervention is not effective, or the power of the trial is not sufficiently large to detect a real difference. Of course, in the latter case one may question the clinical relevance of a possible small difference.

Some limitations of the RCT have recently been demonstrated by Kotaska (2004) who showed that in heterogeneous populations and for complex phenomena the conclusions of a RCT may not necessarily be valid for each individual patient.

In this letter, we want to further argue that the RCT indeed has this limitation, which has to be kept in mind by readers of medical literature. Besides recognised restrictions concerning the external validity and other forms of bias, we will also dispute the internal validity of the RCT, with respect to patient subgroups. A few examples will demonstrate that the conclusions of RCT 's may not be valid for every patient and therapies should not be a priori rejected because they are not proven efficacious.

One of the recognised problems of the RCT is its limited external validity. It may not always be possible to generalise its results to the broader community, because screened volunteers for RCT 's are often good prognosis patients, in contrast to the general population (Grimes and Schultz, 2002). In the recent years two papers challenged the concept of the superiority of the RCT over well-designed observational studies (Benson and Hartz, 2000; Concato et al., 2000), and one of the arguments they used to explain their conclusion, was the limited external validity of RCT 's, in contrast with observational studies.

Although the RCT undoubtedly remains a very powerful tool to study possible effects of a treatment, its conclusions are often used in an incorrect way, for instance because of exaggerated enthusiasm for a positive result, or the tendency of clinicians to overestimate benefit and underestimate harm. The dangers of interpretation bias and dissemination bias of the results of RCT 's have been convincingly exposed by McCormack and Greenhalgh (2000), using the 'United Kingdom prospective diabetes study' as an example. In day-to-day practice, clinicians who try to follow the principles of evidence-based medicine are at risk for only accepting and applying treatments that have been validated by RCT 's, and rejecting therapies the efficiency of which has not been proven by a RCT.

In contrast to the problem of limited external validity, the internal validity of the RCT usually goes unquestioned because of the design of the RCT. This design follows the rules of controlled experiments in basic science. In basic science, however, experiments are usually performed on subjects (cell cultures or laboratory animals) that are behaving in a predictable and identical way. Human beings usually do not behave in an identical and predictable way. The RCT design assumes that all interventions are effective (or not) to the same extent for all patients, who should all behave in the same way, since they have been properly randomised. Just like meta-analysis suffers from heterogeneity between studies, RCT 's suffer from (often concealed) heterogeneity between patients. The inference of a properly designed RCT is that a certain intervention is more, less or as effective as another, for an average population of patients. It actually does not tell anything about a given individual patient. Stephen Jay Gould, a Harvard evolutionary biologist, has nicely elaborated on this in his text “The Median Isn't the Message” (http://www.cancerguide.org/median_not_msg.html), dealing with his own survival prognosis after he received a fatal cancer diagnosis.

In reproductive medicine the main outcome of many trials is pregnancy, and it is easily acknowledged that the successful establishment of a pregnancy is a complex and multifactorial process. Even in controlled assisted reproductive technology (ART) trials, establishment of pregnancy still is determined by many factors, such as oocyte quality, post-fertilisation embryonic events that are not quite understood (Tucker, 1990), and implantation (Denker, 1993; Lessey, 2000). Some therapeutic trials may affect the outcome of àll patients in one way or another, e.g. trials comparing stimulation protocols, because the outcome of the treatment strongly correlates with the number of oocytes obtained. But some interventions, in which pregnancy is the main outcome measure, may be effective for some patients, but not for others, and a randomised trial will not allow to detect this.

Many clinicians feel that interventions, that have not been proven overall effective in a RCT, must be avoided. We believe this can be wrong in some instances, because interventions may work for only certain subgroups of patients and if these minority subgroups are not identified before the trial starts (and they can not be identified because they are unknown), it is impossible to detect this effect. In reproductive medicine there are many examples of treatments considered to be controversial, because some trials do not demonstrate an effect whereas other (often smaller) trials do. How to interpret these results? Wait for a meta-analysis? This will not solve the problem. We will try to demonstrate that it is impossible to prove the efficacy of a treatment, if it does not work for the majority of patients, but only for a subgroup, or vice versa. We will consider two opposite situations. In the first scenario a new treatment is tested that is beneficial for a subgroup of patients, but has no effect whatsoever for the other patients. Scenario 2 is a situation in which a new treatment is beneficial for most patients but not for a subgroup.

Scenario 1: Take two groups of 1000 patients each, who undergo a new therapeutic intervention and a placebo treatment, respectively. Assume that in both groups 90% of the patients are standard patients for whom the new treatment will make no difference, because they are good prognosis patients (G-patients). Both groups therefore contain a minority of 10% bad prognosis patients (B-patients), who will benefit from the new treatment. The proportion of G-patients and B-patients in each group should be the same, thanks to the randomisation process. Pregnancy rate for G-patients is 40% in the treatment group as well as in the control group. The pregnancy rate for the B-patients, however, increases from 10% in the control group to 40% in the treatment group. Table 1 shows the statistics of this scenario. It is clear that the conclusion of the trial will be that there is no significant difference in pregnancy rates between the two groups (37% and 40% respectively). The relative risk (RR) is 1.05 (95% confidence interval or CI : 0.98-1.12). In the subgroup of B- patients, however, the pregnancy rate increases from 10% to 40%, yielding a RR of 1.50 (95% CI: 1.26-1.78), which is highly significant (P<0.0001). Because of the dilution of this group of patients (for whom the treatment works) in the total population, this effect is not detected and the treatment is rejected. Wrongly so for the bad prognosis group!

Scenario 2: Two groups of 1000 patients undergo a treatment that is beneficial for the G-patients (the pregnancy rate increases from 30% to 50%), but not for the minority of B-patients (the pregnancy rate is 10% and remains 10%). Table 2 depicts this situation. Now, the difference for the G-patients is significant (RR 1.67; 95%CI 1.48-1.88), as well as in the total group (RR 1.64; 95%CI 1.46-1.85). The trial fails to detect that there is a subgroup of patients for whom the treatment does not work and we wrongly conclude that it may work in all instances…

Of course, many other scenarios are possible. We can vary the proportion of good and bad prognosis patients (but they will remain the same between groups, because of the randomisation process). We can analyse much more subgroups of patients with intermediate prognoses, or we can think of treatments that are beneficial for one group of patients but harmful to others… It is clear from this exercise that RCT 's only tell part of the truth, since they do not account for this effect.

Absence of a significant difference between treatments and controls in a RCT does not mean that treatment and control are equivalent ("lack of evidence of a difference does not mean that there is evidence of a lack of difference", (Altman and Bland, 1995). In the same way, absence of significant differences between treatments and controls do not rule out that the treatment be beneficial in certain subgroups. This means that therapies which are not “proven” to be efficacious should not a priori be rejected for all patients. In the same way conclusions from positive studies should not always be accepted and implemented in all instances. Indeed, patient heterogeneity is not only a problem in negative studies, but also in positive studies… It is important, however, to realize that the RCT in this is not to blame, but our lack of knowledge how to identify possible subgroups of patients and when to apply the conclusions of the RCT rightfully.

In order to identify a possible subgroup of patients, for whom an otherwise ineffective treatment may be effective (or vice versa), one can apply discriminant analysis or some other multivariate statistical technique to all known entry variables, and look for discriminators which distinguish responders from non-responders. When these discriminators have been identified, a second, independent randomised trial may be performed, stratifying for the distinguished features. If there is a responsive subgroup, their stratum will show a greater response and the subgroup will have been identified.

Another solution is to wait for new techniques which will enable proper subgroup identification. A good example of this is the discovery of polymorphisms in cytochrome p450 genes, which account for differences between individuals in cancer risk and also in risk for adverse drug reactions (this is the subject of pharmacogenomics, Tribut et al., 2002). Another example is the subtypes of leukaemia which have been identified using microarray cell type analysis and which strongly correlate with prognosis (Stevenson et al., 2001). New techniques may thus lead to subgroup identifications which were not possible before. Concerning reproductive medicine, however, many of the patient characteristics which we may need to identify the subgroup, are lacking or simply unknown, due to our limited knowledge about all factors that lead to pregnancy.

There are quite some examples in reproductive medicine illustrating the problem of subgroup identification. Assisted hatching is definitely not beneficial for all patients undergoing ART (Hellebaut et al., 1996), but maybe it is in a subgroup? Our RCT on assisted hatching of 1996 illustrates the hypothesis of subgroup effectiveness. We obtained a pregnancy rate of 42.1% in the assisted hatching group and 38.1% in the control group. These figures nicely fit the figures of Table 1, which could mean that there indeed is a subgroup for which assisted hatching would be beneficial. We are eagerly awaiting the results of the ongoing Cochrane review on this subject (Seif et al., 2002).

Another example is the often controversial and non-validated treatment for unexplained and repetitive implantation failure after ART. Many therapies have been offered to these patients: zygote intrafallopian transfer (Levran et al., 2002), autologous endometrial coculture (Spandorfer et al., 2002), intravenous immunoglobulins (Coulam et al., 2000), cytoplasmic transfer (Barritt et al., 2001), blastocyst transfer (Simon and Pellicer, 2000) and many other. The large number of therapies offered to patients suffering from unexplained recurrent miscarriage is another well known example (aspirin, heparin, intravenous immunoglobulins (Bracnch et al., 2001), leukocyte immunotherapy (Clark et al., 2001), for further review see Kutteh (1999) and Lee and Silver (2000). Both unexplained implantation failure after ART and unexplained recurrent abortion leave us with a sense of helplessness, and in front of the patient imploring us to do something, many clinicians will offer controversial and unproven therapies. Believers of evidence based medicine will reject most of these, but we believe that evidence itself shows that the truth often remains covert.

References

Altman DG and Bland JM (1995) Absence of evidence is not evidence of absence. BMJ 311, 485.

Barritt J, Willadsen S, Brenner C and Cohen J (2001) Cytoplasmic transfer in assisted reproduction. Hum Reprod 7, 428-435.

Barton S (2000) Which clinical studies provide the best evidence? BMJ 321: 255-256.

Benson K and Hartz AJ (2000) A comparison of observational studies and randomized, controlled trials. N Engl J Med 342, 1878-1886.

Branch DW, Porter TF, Paidas MJ, Belfort MA and Gonik B (2001) Obstetric uses of intravenous immunoglobulin: successes, failures, and promises. J Allergy Clin Immunol 108 (4 Suppl): S133-138.

Clark DA, Coulam CB, Daya S and Chaouat G (2001) Unexplained sporadic and recurrent miscarrage in the new millennium: a critical analysis of immune mechanisms and treatments. Hum Reprod Update 7, 501-511.

Concato J, Shah N and Horwitz RI (2000) Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med 342, 1887-1892.

Coulam CB and Goodman CC (2000) Increased pregnancy rates after IVF/ET with intravenous immunoglobulin treatment in women with elevated circulating C56+ cells. Early Pregnancy 4, 90-98.

Denker HW (1993) Implantation: a cell biological paradox. J Exp Zool 266, 541-558.

Grimes DA and Schulz KF (2002) An overview of clinical research: the lay of the land. The Lancet 359, 57-61.

Hellebaut S, De Sutter P, Dozortsev D, Onghena A, Qian C and Dhont M (1996) Does assisted hatching improve implantation rates after in vitro fertilization or intracytoplasmic sperm injection in all patients? A prospective randomized study. J Assist Reprod Genet 13, 19-22.

Kotaska A (2004) Inappropriate use of randomised trials to evaluate complex phenomena: case study of vaginal breech delivery. BMJ 329,1039- 1042.

Kutteh WH (1999) Recurrent pregnancy loss: an update. Curr Opin Obstet Gynecol 11: 435-439.

Lee RM and Silver RM (2000) Recurrent pregnancy loss: summary and clinical recommendations. Semin Reprod Med 2000, 18: 433-440.

Lessey BA (2000) The role of the endometrium during embryo implantation. Hum Reprod 15 Suppl 6: 39-50.

Levran D, Farhi J, Nahum H, Royburt M, Glezerman M and Weissman A (2002) Prospective evaluation of blastocyst stage transfer vs. zygote intrafallopian tube transfer in patients with repeated implantation failure. Fertil Steril 77, 971-977.

McCormack J and Greenhalgh T (2000) Seeing what you want to see in randomised controlled trials: versions and perversions of UKPDS data. BMJ 320, 1720-1723.

Sackett DL, Rosenberg WM, Gray JA, Haynes RB and Richardson WS (1996) Evidence based medicine: what it is and what it isn't. BMJ 312, 71-72.

Seif MW, Edi-Osagie ECO, McGinlay P, Blake D and Brison D (2002) Assisted hatching for assisted conception (IVF & ICSI) (Protocol for a Cochrane Review). In: The Cochrane Library, Issue 2. Oxford: Update Software, 2002.

Simon C and Pellicer A (2000) Blastocyst transfer for couples with multiple IVF failures? Fertil Steril 73, 872.

Simon SD (2001) Is the randomized clinical trial the gold standard of research? J Androl 22, 938-943.

Spandorfer SD, Barmat LI, Navarro J, Liu HC, Veeck L and Rosenwaks Z (2002) Importance of the biopsy date in autologous endometrial cocultures for patients with multiple implantation failures. Fertil Steril 77, 1209- 1213.

Stevenson FK, Sahota SS, Ottensmeier CH, Zhu D, Forconi F and Hamblin TJ (2001) The occurrence and significance of V gene mutations in B cell- derived human malignancy. Adv Cancer Res 83, 81-116.

Tribut O, Lessard Y, Reymann JM, Allain H and Bentue-Ferrer D (2002) Pharmacogenomics. Med Sci Monit 8, 152-163.

Tucker MJ (1990) Periimplantational events post-IVF. Int J Fertil 35, 100-105.

Table 1 - New treatment is beneficial in a subgroup, but not for 
other patients.

Table Legend: Scenario in which a new treatment is tested that is 
beneficial for a subgroup of patients (B-patients), but has no effect for 
the other patients (G-patients) (Pr = pregnant; N Pr = not pregnant; Pr% =
pregnancy rate)

Total   N Pr    Pr      Pr%

G-patients    Treatment  900     540     360     40%

Control    900     540     360     40%

B-patients    Treatment  100     60      40      40%

Control    100     90      10      10%

Total         Treatment  1000    600     400     40%

Control    1000    630     370     37%
Table 2 - New treatment is beneficial for most patients, but not for 
a subgroup.

Table legend: Scenario in which a new treatment is tested that is 
beneficial for most patients (G-patients), but has no effect for the other
patients (B-patients) (Pr = pregnant; N Pr = not pregnant; Pr% = pregnancy
rate).

Total	N Pr	Pr	Pr%

G-patients   Treatment  900     450     450     50%

Control    900     630     270     30%

B-patients   Treatment  100     90      10      10%

Control    100     90      10      10%

Total        Treatment  1000    540     460     46%

Control    1000    720     280     28%

Competing interests: None declared