Strategy for randomised clinical trials in rare cancers
BMJ 2003; 327 doi: https://doi.org/10.1136/bmj.327.7405.47 (Published 03 July 2003) Cite this as: BMJ 2003;327:47Data supplement
Towards a strategy for randomised clinical trials in rare cancers: an example in childhood S-PNET
1Say-Beng Tan, 2Keith BG Dear, 3Paolo Bruzzi, 1,4David Machin
1Division of Clinical Trials & Epidemiological Sciences, National Cancer Centre, Singapore
2National Centre for Epidemiology and Population Health, Australian National University, Canberra, Australia
3Unit of Clinical Epidemiology and Trials, National Cancer Research Institute, Genoa, Italy
4United Kingdom Children’s Cancer Study Group, University of Leicester, United Kingdom
Address for correspondence:
Dr Say-Beng Tan
Division of Clinical Trials & Epidemiological Sciences
National Cancer Centre
11 Hospital Drive
Singapore 169610.
Tel: (65) 6436 8255
Fax: (65) 6225 0047
Email: ctetsb{at}nccs.com.sg
Introduction
In areas such as paediatrics, certain cancers occur only very rarely. For example, childhood supratentorial primitive neuroectodermal tumours (S-PNET) represent less than 2.5% of childhood brain tumours, which are themselves rare.1 In the United Kingdom, there were only 89 such cases among children aged below 15 during the 10-year period 1992-2001 (C Stiller, personal communication, 2003).
The evaluation of new therapies for the treatment of such rare cancers poses some difficulties. Randomised controlled clinical trials (RCT) are regarded as the gold standard to be used in comparing a new therapy with the standard treatment for a particular cancer. However, well-designed RCTs typically require a large number of patients. For example, in the trial of chemotherapy given before radiotherapy in childhood medulloblastoma, 364 patients were accrued.2
Such a trial size is often unrealistically large for rare cancers. On the other hand, the naïve conduct of small, underpowered RCTs will typically give rise to outcome estimates which have unacceptably large confidence intervals, thus failing to provide clear-cut answers on questions of interest.
In this context, Honkanen et al 3 proposed a three-stage clinical trial design which gives improved statistical power compared to traditional designs. However they acknowledge that their design is only applicable to "chronic conditions where both response to therapy and flare upon withdrawal of therapy can be assessed". They also do not make use of information external to the trial, which may be of relevance. In contrast, Lilford et al 4 suggested that a Bayesian statistical approach, making use of external information, would be useful in conducting small clinical trials. However, they do not provide much detail regarding how such an approach might be implemented in practice.
In our companion paper,5 we have proposed, itemised and described the key components of the design process for a ‘small’ randomised trial. The method follows the Bayesian approach suggested by Lilford. 4 To illustrate the method, we now consider the design of a possible randomised trial for the treatment of childhood S-PNET. We indicate how, once completed, the information from the trial itself would be combined using Bayesian methods with that available at the design stage and any new external information that might have emerged during the course of the trial.
S-PNET forms a subset of the primitive neuroectodermal tumours (PNET), the most common malignant brain tumour in children. PNET occurs most frequently in the cerebellum, in which they are referred to as medulloblastoma (MB). Histologically, S-PNET and MB are identical,6 and the same treatments have been used for both types of tumour. However, a worse prognosis has been consistently reported for S-PNET, and biological differences are beginning to emerge. Accordingly, the need for specific refinements in the treatment of S-NET has been stressed.6
Methods
A detailed description of the proposed approach has already been given.5 Briefly, we begin with a clear specification of the treatments to be tested and the outcome of interest. A detailed literature search is then carried out to identify relevant published studies. These are then assessed for their pertinence to the question of interest, their validity and precision (as quantified by the study’s sample size or number of events). Appropriate pertinence (PS) and validity (VS) scores are then attached to each of these "prior studies".
As part of this process, Adjustment Factors (AF) 5 may also be computed. This is calculated as the ratio of two HRs corresponding to for example, event-free survival (EFS) and overall survival (OS), reported in the same study. It can be used to convert HR values based on EFS to ones based on OS in studies which only report EFS, but for which OS is the endpoint of interest.
The PS and VS scores are used as a correction factor to down-weight the information contained in the study. This is done by multiplying the number of events in the study in turn by each of the scores. The adjusted number of events from each study are then used to calculate the weighted mean prior log hazard ratio (LHRPrior). The prior distribution is assumed to be a Normal distribution with mean μPrior = LHRPrior and standard deviation σPrior = Ö (4/ mPrior), where mPrior is the adjusted number of events from all studies.
Having constructed the prior distribution, we then update it with trial data to give the posterior distribution. If μData is the LHR based on mData deaths, then the posterior distribution is Normal with mean μPostereior = (mPriorμ Prior + mDataμ Data) / (mPrior + mData)and standard deviation σPosterior = Ö [4/(mPrior + mData)]. At the planning stage, μData and mData are obtained from hypothetical scenario values, whereas once the trial is completed they are obtained from the actual data.
A series of scenarios are considered which represent conjectured outcomes of the planned trial were it to be conducted. By combining this ‘information’ with the prior distribution, we show that worthwhile conclusions can be drawn following the conduct of such a small trial. In particular, we can compute the probabilities of concluding that the experimental treatment has a clinically useful advantage, is equivalent or is inferior (adverse outcome) to the control. Here, by clinically useful advantage, we mean an advantage efficacy which is large enough to compensate for the possibility of increased cost or toxicity associated with the experimental treatment. If there is some advantage but this is not sufficient or is "traded-off" with the increased cost/toxicity, we say the treatments are equivalent. Otherwise, we say that there is an adverse outcome.
Results
Thus far, there has been no clear consensus on what constitutes a standard treatment for childhood S-PNET 7 and few trials have been conducted on the disease (see Table 1 for a summary). The most widely used approach to treat these tumours is the same as that used to treat MB. This comprises surgery followed by radiotherapy, followed by polychemotherapy.
Table 1 Summary of trials and studies considered in constructing the prior
Study
Study Type
Size
Design
Disease
Treatments
Endpoint
HR (PFS)
Cohen (1995)
RCT
55
Subset
S-PNETs
RX® CTX1 vs CTX2® RX® CTX2
PFS
0.63
Zeltzer (1999)
RCT
203
ITT
MB
RX® CTX1 vs CTX2® RX® CTX2
PFS
1.7
Timmermann (2002)
CS
31
Retrosp.
S-PNETs
Various
OS, PFS
-
RCT
32
Unclear
S-PNETs
RX® CTX1 vs CTX2® RX
PFS
2.0
Kortmann (2000)
RCT
137
Unclear
MB
RX® CTX1 vs CTX2® RX
PFS
1.7
Reddy (2000)
CS
22
Retrosp.
S-PNETs
RX® CTX
PFS
-
Bailey (1995)
RCT
364
ITT
MB
RX® CTX1 vs CTX2® RX® CTX1
PFS
1.0
Evidence of Relative Treatment Efficacy
The majority of trials, both in MB and S-PNET, have focussed on the potential usefulness of neoadjuvant chemotherapy(i.e. postoperative chemotherapy given before radiotherapy) and some of these are summarised in Table 1. The most important study in this area has been a Children’s Cancer Group RCT in all childhood PNETs. This compares a ‘standard’ regimen of radiotherapy followed by 8 cycles of a 3-drug chemotherapy treatment, with an ‘experimental’ regimen where two more cycles of chemotherapy were given before radiotherapy and a different (8-in-1) chemotherapy was used. The analysis of S-PNETs, which focussed on 44 patients with the diagnosis confirmed at central review (out of 55), was published in 1995.7 Cohen et al 7 reported a 3-year progression-free survival (PFS) of 35% for the control arm and 52% for the experimental arm, when 5 and 11 patients respectively in each group were still in follow-up. However, the survival curves during the first 2 years were very similar. This suggests that the experimental arm may not be as promising as would appear based on the 3-year PFS data. We consider this when obtaining an estimate of the HR for this trial.
It was also noted, in the larger group of 203 patients with MB included in this trial,8 that the 5-year PFS of 63% for the control arm was better than the 45% of the experimental arm. This implying an adverse HR = 1.7. This difference was attributed to the fact that the experimental chemotherapy was less intense in two crucial drugs.
Another report of 63 children with S-PNET, enrolled over a 10-year period in two consecutive single arm protocols using different combinations of radiation and chemotherapy,1 gave a 3-year PFS of 39% and a 3-year OS of 48%. Thirty-two of these children had been included in a randomised trial comparing standard post-radiotherapy chemotherapy with experimental pre-radiotherapy chemotherapy (2 cycles). A worse PFS was observed in children assigned to pre-irradiation chemotherapy (3-year PFS 36%) compared to patients assigned to the standard arm (3-year PFS 63%) suggesting HR = 2.0. However, in the experimental group, no (maintenance) chemotherapy following radiotherapy was given. In their report, Timmermann et al1 listed several other mostly small single arm or retrospective studies which reported either survival or response rate information from various radiation and chemotherapy combinations in childhood S-PNET. In particular, Reddy et al 6 reported a 3-year PFS of 47%.
Kortmann et al 9 reported results of 137 randomised MB patients which showed a worse PFS in the experimental arm than in the standard arm (3-year PFS of 65% and 78%, respectively, HR = 1.7).
In the SIOP II trial,2 364 children with MB were randomly assigned to either receive or not receive a 6-week module of chemotherapy before radiotherapy. All high-risk patients receive, after radiotherapy, six cycles of chemotherapy. In the intention-to-treat analyses on all 364 randomised patients, little difference in 5-year event-free survival (EFS) was seen after a median follow-up of 6.3 years (HR = 1.06).
Constructing a prior distribution
Suppose we planned to design an RCT to compare a new radiation and chemotherapy combination with the control arm used by Cohen et al 7 for the treatment of childhood S-PNET. The trial is aimed at evaluating whether the use of neoadjuvant chemotherapy in addition to standard post-radiation chemotherapy, improves OS. Suppose also that the only prior information we had available are those mentioned in the papers listed.
For the purpose of constructing our prior distribution, we adopt the following criteria. For pertinence scores, Cancer PS will be 1 for studies involving S-PNET and 0.4 for studies involving MB. Treatment PS will be 1 for studies where the only difference between the two arms (if any) is the addition of some chemotherapy in the experimental arm before ‘standard’ radiotherapy followed by chemotherapy. Decreasing PS values will then be given according to the difference between this ideal treatment contrast and the contrasts provided in each study. Endpoint PS will be 1 for studies reporting OS and 0.8 for studies reporting PFS. VS values will be 1 for well designed RCTs, 0.8 for RCTs with potential flaws and 0.2 for single arm studies.
Although the article by Timmermann at al1 reports both OS and PFS, and as such allows for the possibility of computing an AF, we choose not to do this. This is because we feel the study is of such low pertinence and validity (see our reasons below) that any AF would in turn be very unreliable. No other study provided both OS and PFS information.
As no information on the "number of events" was presented in any study, we have approximated this number by taking OS or PFS as appropriate, multiplied by the study size. In studies published without sufficient follow-up, for example those reporting soon after the completion of patient accrual, a further reduction factor was used, taking into account follow-up duration. Table 2 presents, for each study, the estimated (and then adjusted -- see below) number of events, the VS and PS values, and the HR estimate provided by that study. All these scores and estimates were arrived at based on the following considerations.
Table 2 Estimates and scores for each study
Study
Pertinence PS
Validity VS
Weight
HR (adjusted)
Events (estimated)
Adjusted number of events
Disease
Treatment
Endpoint
Overall score
Cohen (1995)
1
0.8
0.8
0.8
0.8
0.64
0.8
23
14.7
Zeltzer (1999)
0.4
0.8
0.8
0.4
1
0.4
1.7
90
36.0
Timmermann (2002)
[non-randomised]
1
0.3
1
0.3
0.2
0.06
0.9
15 *
0.9
[randomised]
1
0.3
0.8
0.3
0.8
0.24
2
15
3.6
Kortmann (2000)
0.4
0.3
0.8
0.3
0.8
0.24
1.7
30
7.2
Reddy (2000)
1
0.6
0.8
0.6
0.2
0.12
0.9
13
1.6
Bailey (1995)
0.4
1
0.8
0.4
1
0.40
1.0
120
48.0
*Calculated by finding the estimated number of events for the whole Timmermann (2002) study (30 events), then subtracting the 15 events already included in the RCT subset.
The first study considered 7 was given a VS of 0.8 because only the results of 44 patients with central pathology review were presented, out of 55 randomised patients with S-PNET. The overall PS was 0.8, obtained by taking the minimum of the cancer PS (same cancer: score = 1), the endpoint PS (PFS: score = 0.8) and the treatment PS (0.8). The last score was based on the fact that, although the study design well fitted the question of interest (standard: radiotherapy followed by chemotherapy versus experimental: chemotherapy followed by radiotherapy, followed by chemotherapy), two different chemotherapy regimens were used in the two arms. Hence, the estimated number of 23 events was downsized by a weight factor of W = 0.8 ’ 0.8 = 0.64, giving an adjusted number of 23 ’ 0.64 = 14.7 events. The estimate of the (mortality) HR provided by this study was set at 0.8. This is because even though a difference in PFS was seen at 3 years (35% standard, 52% experimental), which might suggest an HR of 0.63, this difference was based on few events, and the PFS curves during Years 1 and 2 were similar.
Similar considerations on pertinence apply to the larger CCG trial on children with MB,8 with the only exception being in cancer pertinence (cancer in a closely related site, with the same histology but genetic, biological and prognostic differences; hence cancer PS = 0.4). A VS of 1 was given as this was a properly conducted RCT. The overall weight was thus found to be W = 0.4, and the estimated number of 90 events was correspondingly downsized to 36.0. A clear indication of an increased mortality in the experimental group was seen (estimated HR = 1.7).
The German studies on children with PNET reported separately the results for children with S-PNET and those with MB. However, the report on 63 patients with S-PNET,1 where 3 different groups of patients receiving different treatments were inextricably mixed in a case-series report, was considered of little pertinence and validity (W = 0.06). As this was not a controlled trial, the HR was computed by comparison of the reported 3-year PFS of 39%, with that of the control arm of the Cohen study (35%).7 This gives an HR = 0.9. We are unable to use the 3-year OS information provided by Timmermann1 as we are unable to obtain this information for the control arm from the other studies.
Higher validity and relevance were attributed to the results reported in the subset of 32 patients enrolled in the RCT (W = 0.24, HR = 2.0). However, the pertinence to the primary trial question of this and that of the larger group of MB patients9 (W = 0.24, HR = 1.7) are greatly undermined. This is because the treatment contrast (before-only versus after-only chemotherapy) is not able to provide any information on the role of pre-radiotherapy chemotherapy, if indeed post-radiotherapy chemotherapy affects survival and PFS.
The report by Reddy et al6 on 22 consecutive patients treated with a ‘standard’ regimen indicates a 3-year PFS of 47%, somewhat better than that observed in ‘other control groups’, and yet lower than that reported by Cohen et al7 in the experimental arm. This provides some, though weak, support for the study hypothesis (W = 0.12, approximated HR = 0.9).
Finally, the study by Bailey et al2 was appropriately designed, conducted and analysed (VS = 1), and the treatment contrast was ideal for the question of interest (treatment PS = 1). However, it was conducted on MB patients (cancer PS = 0.4) and only EFS was presented (endpoint PS = 0.8). Thus, we have an overall weight value of W = 0.4, resulting in an adjusted number of events of 48.0. This is much fewer than the original estimated number of 120 events.
Combining all the 7 HR estimates as summarised in Table 2, we have as our overall estimate, LHRPrior = [14.7 ’ log(0.8) + 36.0 ’ log(1.7) + 0.9 ’ log(0.9) + 3.6 ’ log(2.0) + 7.2 ’ log(1.7) + 1.6 ’ log(0.9) + 48.0 ’ log(1.0)] / [14.7 + 36.0 + 0.9 + 3.6 + 7.2 + 1.6 + 48.0] = 0.20. The prior mean, μPrior, is therefore given by 0.20, which corresponds to HRPrior = 1.22. The prior standard deviation is σPrior =Ö (4 / mPrior) = 0.19, where mPrior = (14.7 + 36.0 + 0.9 + 3.6 + 7.2 + 1.6 + 48.0) = 112.0.
Our prior distribution for LHR is thus a Normal distribution with mean 0.20 and standard deviation 0.19, suggesting that the evidence at this stage reflects a preference for the standard regimen.
Designing the new trial
Having constructed the prior distribution, we want to make use of it to design the new trial. Suppose the clinical investigators indicated as their horizon that they expect to recruit sufficient patients to observe 50 deaths over three years. We then proceed by considering three possible scenarios for the hypothetical outcome of the trial itself once conducted. The Enthusiastic scenario (E) corresponds to the situation that the experimental treatment is better (observed HR = 0.5, LHR = - 0.69), the Neutral scenario (N) is equivalent (HR = 1, LHR = 0), while the Sceptical scenario (S) is worse (HR = 2.0, LHR = 0.69), than the control.
For each of these scenarios, we now update the prior with the (hypothetical) data to give posterior distributions. As the posterior distributions are all Normal with known mean and standard deviation, we can plot them as well as compute the probabilities that the LHR fall in any region of interest. Table 3 summarises the scenarios and also gives the corresponding posterior means and standard deviations, along with selected probabilities of falling in various regions of interest. Figure 1 plots the prior distribution and the posterior distributions obtained from the three scenarios.
Table 3 Summary of posterior distributions corresponding to three scenarios
Scenario
Hypothetical data
Posterior Distribution of true HR
Mean (SD)
Probability (%) that true HR
HR
log HR
< 1
< 0.8
Enthusiastic (E)
0.5
- 0.69
- 0.08 (0.16)
69
19
Neutral (N)
1.0
0.00
0.13 (0.16)
21
1
Sceptical (S)
2.0
0.69
0.35 (0.16)
1
0.02
For Scenario E (Table 3), it can be seen that the hypothetical trial data once combined with the prior did indeed show an advantage to the experimental treatment. We would be able to conclude at the end of the trial that there is a 69% probability that the experimental treatment is better (true HR < 1, LHR < 0). In fact, there is a 19% chance that the experimental treatment is sufficiently better (true HR < 0.8, LHR < -0.22) to compensate for the anticipated extra toxicity implied by the equivalence zone. From Figure 1, the posterior distribution is "to the left" of the prior, reflecting how the hypothetical data has shifted the probability distribution of the true HR in favour of the experimental treatment.With Scenario N, the conclusions from the trial favour the control arm, with only a 21% probability that the true HR < 1. This reflects the information present in the prior distribution which suggests an advantage to the control arm. As for Scenario S, we would conclude that there is only a 0.002% probability that the experimental arm is sufficiently better than the control. In this case, the posterior distribution can be clearly seen to fall mostly to the right of LHR = 0 (true HR = 1).
Discussion
We have illustrated how a Bayesian approach could be used to design a potential RCT involving the treatment of childhood supratentorial primitive neuroectodermal tumours. This has suggested that even under an enthusiastic view of the efficacy of the experimental treatment, there is only a 19% chance of concluding that the new treatment is usefully better. However, this is based on an ‘equivalence’ zone which assumes the new therapy is more toxic or perhaps is more costly than the control. Were this not the case, and the toxicity and cost profiles of both treatments were likely to be the same, then there is no equivalence zone. In these circumstances, we would be able to conclude at the end of the trial that there is a 69% probability that the experimental treatment is better. Thus the investigators, by summarising the information from relevant prior studies and discussing posterior distributions from a wide range of scenarios, may judge that worthwhile conclusions can be drawn, even from a small trial.
We also note, in deciding on the pertinence and validity scores for the prior studies, that we have not restricted ourselves to the categories and scores given in Tables 1 and 2 of Tan et al.5 The categories given in those tables are meant to just provide a general guide to the user. In practice, investigators should define their own categories and scores as necessary.
Our example is only illustrative and does not summarise an exhaustive search of the literature for studies in children with S-PNET. In practice, such a search should be conducted to identify all relevant previous published and unpublished studies. Once identified, they should be formally assessed using a written protocol to ascertain their relationship to the proposed new therapy to be tested.
More generally, although we have focused on paediatric tumours, the approach clearly extends to other rare cancers and diseases. In such situations, we agree with others that small randomised trials are the "only way that any unbiased measurements of effectiveness can be made".10 Our next step is to use the approach to design and conduct an actual clinical trial involving a rare cancer. Doing that will put us in a better position to ascertain the practical usefulness of the method proposed.
References
1. Timmermann B, Kortmann RD, Kühl J, et al. Role of radiotherapy in the treatment of supratentorial primitive neuroectodermal tumors in childhood: results of the prospective German brain tumor trials HIT 88/89 and 91. J Clin Oncol 2002; 20: 842-9.
2. Bailey CC, Gnekow A, Wellek S, et al. Prospective randomised trial of chemotherapy given before radiotherapy in childhood medulloblastoma. International Society of Paediatric Oncology (SIOP) and the (German) Society of Paediatric Oncology (GPO): SIOP II. Med Pediatr Oncol 1995; 25:166-78.
3. Honkanen VEA, Siegel AF, Szalai JP, Berger V, Feldman BM, Siegel JN. A three-stage clinical trial design for rare disorders. Stat Med 2001; 20: 3009-21.
4. Lilford RJ, Thornton JG, Braunholtz D. Clinical trials and rare diseases: a way out of the conundrum. BMJ 1995; 311: 1621-5.
5. Tan SB, Dear KBG, Bruzzi P, Machin D. Towards a strategy for randomised clinical trials in rare cancers.
6. Reddy AT, Janss AJ, Philips PC, Weiss HL, Packer RJ. Outcome for children with supratentorial primitive neuroectodermal tumors treated with surgery, radiation and chemotherapy. Cancer 2000; 88: 2189-93.
7. Cohen ME, Zeltzer PM, Boyett JM, et al. Prognostic factors and treatment results for supratentorial primitive neuroectodermal tumors in children using radiation and chemotherapy: a children’s cancer group randomized trial. J Clin Oncol 1995; 13: 1687-96.
8. Zeltzer PM, Moilanen B, Yu JS, Black KL. Immunotherapy of malignant brain tumors in children and adults: from theoretical principles to clinical application. Childs Nerv Syst. 1999; 15:514-28.
9. Kortmann RD, Kuhl J, Timmermann B, et al. Postoperative neoadjuvant chemotherapy before radiotherapy as compared to immediate radiotherapy followed by maintenance chemotherapy in the treatment of medulloblastoma in childhood: results of the German prospective randomized trial HIT ‘91. Int J Radiat Oncol Biol Phys. 2000; 46:269-79.
10. Lilford R, Stevens AJ. Underpowered studies. Br J Surg 2002; 89: 129-31.
Fig 1 Prior, likelihood and posterior distributions corresponding to Enthusiastic (E), Neural (N) and Sceptical (S) scenarios
Related articles
See more
- Chemoprevention of colorectal cancer in individuals with previous colorectal neoplasia: systematic review and network meta-analysisBMJ December 05, 2016, 355 i6188; DOI: https://doi.org/10.1136/bmj.i6188
- NHS to fund large trial of pre-exposure prophylaxis for HIV preventionBMJ December 05, 2016, 355 i6537; DOI: https://doi.org/10.1136/bmj.i6537
- Bill to boost medical research funding and speed drug approval passes US houseBMJ December 01, 2016, 355 i6498; DOI: https://doi.org/10.1136/bmj.i6498
- Alpha blockers for treatment of ureteric stones: systematic review and meta-analysisBMJ December 01, 2016, 355 i6112; DOI: https://doi.org/10.1136/bmj.i6112
- Sixty seconds on . . . solanezumabBMJ November 29, 2016, 355 i6389; DOI: https://doi.org/10.1136/bmj.i6389
Cited by...
- Innovative research methods for studying treatments for rare diseases: methodological review
- Novel clinical trials for pediatric leukemias: lessons learned from genomic analyses
- Diagnosis and treatment of primary myelodysplastic syndromes in adults: recommendations from the European LeukemiaNet