Evaluating how clear the questions being investigated in randomised trials are: systematic review of estimands

Abstract Objectives To evaluate how often the precise research question being addressed about an intervention (the estimand) is stated or can be determined from reported methods, and to identify what types of questions are being investigated in phase 2-4 randomised trials. Design Systematic review of the clarity of research questions being investigated in randomised trials in 2020 in six leading general medical journals. Data source PubMed search in February 2021. Eligibility criteria for selecting studies Phase 2-4 randomised trials, with no restrictions on medical conditions or interventions. Cluster randomised, crossover, non-inferiority, and equivalence trials were excluded. Main outcome measures Number of trials that stated the precise primary question being addressed about an intervention (ie, the primary estimand), or for which the primary estimand could be determined unambiguously from the reported methods using statistical knowledge. Strategies used to handle post-randomisation events that affect the interpretation or existence of patient outcomes, such as intervention discontinuations or uses of additional drug treatments (known as intercurrent events), and the corresponding types of questions being investigated. Results 255 eligible randomised trials were identified. No trials clearly stated all the attributes of the estimand. In 117 (46%) of 255 trials, the primary estimand could be determined from the reported methods. Intercurrent events were reported in 242 (95%) of 255 trials; but the handling of these could only be determined in 125 (49%) of 255 trials. Most trials that provided this information considered the occurrence of intercurrent events as irrelevant in the calculation of the treatment effect and assessed the effect of the intervention regardless (96/125, 77%)—that is, they used a treatment policy strategy. Four (4%) of 99 trials with treatment non-adherence owing to adverse events estimated the treatment effect in a hypothetical setting (ie, the effect as if participants continued treatment despite adverse events), and 19 (79%) of 24 trials where some patients died estimated the treatment effect in a hypothetical setting (ie, the effect as if participants did not die). Conclusions The precise research question being investigated in most trials is unclear, mainly because of a lack of clarity on the approach to handling intercurrent events. Clear reporting of estimands is necessary in trial reports so that all stakeholders, including clinicians, patients and policy makers, can make fully informed decisions about medical interventions. Systematic review registration PROSPERO CRD42021238053.


Appendix 2: Classification of Intercurrent events
Intercurrent events were classified as; (1) Treatment non-adherence/discontinuations where no reason was specified, (2) Treatment non-adherence/discontinuation due to an adverse event, (3) Treatment non-adherence/discontinuation due to a specified reason excluding adverse event, (4) Use of additional non-trial treatment that is not part of usual care (i.e. rescue/prohibited therapy), (5) Treatment switching (to another randomised treatment), (6) Mortality, (7) Other terminal events excluding mortality when measurement becomes impossible and not part of outcome (e.g. ankle amputation in a trial assessing ankle function) and (8) Other. These categories were selected as events referenced within the ICH-E9 R1 addendum [1]. Eligible and included in review based on full text assessment n=255 -Not randomised study/more than one study n=252 -Commentary/letter n=85 -Cluster RCT n=40 -Secondary analysis n=37 -Non-inferiority/equivalence n=35 -Phase1/Pilot n=13 -Interim analysis n=14 -Cost effectiveness n=3 -Crossover n=1 -Not randomised study/more than one study n= 3 -Interim analysis n= 4 -Cluster RCT n=1 -Secondary analysis n=6 -Non-inferiority n=1 -Phase 1/Pilot/Feasibility n=3 -Interim analysis n=12 -Cost effectiveness n=3  Participants with other post-randomisation event excluded, but unclear whether target population was all patients (under hypothetical scenario) or subset 12/82 (15%)

Population-level summary measure
Inferred from type of analysis model 77/224 (34%) Stated type of summary measure they would estimate 147/224 (66%) Not inferable (N=31) Analysis strategy not clearly described 16/31 (52%) Statistical test only 15/31 (48%) a Statements included: (i) "Target population consists of persons with type 1 diabetes that meets the inclusion/exclusion criteria", (ii) "The 'trial product' estimand evaluates the treatment effect (difference in change of A1C from baseline to week 26) between once-weekly insulin icodec and once-daily IGlar U100 for all randomised participants", (iii) "The primary objective of the trial was to assess the benefit of maintenance avelumab therapy over control therapy in prolonging overall survival among all the patients who had undergone randomisation (overall population)", and (iv) "The primary estimand of interest is called the efficacy estimand and is the effect of the randomised treatments in all subjects assuming continuation of randomised treatments for the duration of the study regardless of actual compliance".b Statement: 'a treatment policy/de-factor estimand approach was applied'. C Statement: "Non-adherence to study drug schedule" and "Permanent discontinuation of study drug" and "use of prohibited medication" were "Ignored" and "ESKD diagnosis or treatment" are " imputed under MNAR assumption". d Statement: "the trial-product estimand, defined as the between-group difference in the change in glycated hemoglobin level from baseline to week 26 among all patients who underwent randomisation, had all patients continued to receive the trial product without receiving ancillary therapies." e Statement ""The primary estimand of interest is called the efficacy estimand and is the effect of the randomised treatments in all subjects assuming continuation of randomised treatments for the duration of the study regardless of actual compliance". f n=4 specified a per-protocol analysis but did not further indicate what this meant  regarding treatment deviations and N=1 specified an ITT analysis with LOCF for post treatment deviation data, unclear exactly what  treatment condition of interest (treatment policy or

Handling of intercurrent event statements Non-adherence -no reason (N=5)
"The primary estimand of interest was the efficacy estimand, which assumed continuation of randomised treatments for the trial duration, regardless of actual compliance." "the trial-product estimand, defined as the between-group difference in the change in glycated hemoglobin level from baseline to week 26 among all patients who underwent randomisation, had all patients continued to receive the trial product without receiving ancillary therpies." "The composite strategy, which was applied to categorical efficacy and health outcome variables, indicated that any intercurrent events-eg, discontinuing treatment or switching to open-label treatment-were to be assigned unfavourable values (ie, non-responder). Thus, patients were considered non-responders at timepoints when they did not meet the clinical response criteria or when they had missing clinical response data." "a treatment policy/de-facto estimand approach was applied for the primary analysis that included all observed data irrespective of subject adherence to the randomised treatment.
Patients who withdrew from their randomised treatment before week 24 were followed up " [Group A event includes Non-adherence to study drug schedule/permanent discontinuation of study drug] "Group A IC events were considered as directly interpretable. Effectively, IC events in this group are ignored, in agreement with the intention-to-treat (ITT) principle" Non-adherence -due to adverse event (N=5) "The primary estimand of interest was the efficacy estimand, which assumed continuation of randomised treatments for the trial duration, regardless of actual compliance." "the trial-product estimand, defined as the between-group difference in the change in glycated hemoglobin level from baseline to week 26 among all patients who underwent randomisation, had all patients continued to receive the trial product without receiving ancillary therpies." "The composite strategy, which was applied to categorical efficacy and health outcome variables, indicated that any intercurrent events-eg, discontinuing treatment or switching to open-label treatment-were to be assigned unfavourable values (ie, non-responder). Thus, patients were considered non-responders at timepoints when they did not meet the clinical response criteria or when they had missing clinical response data." "a treatment policy/de-facto estimand approach was applied for the primary analysis that included all observed data irrespective of subject adherence to the randomised treatment.
Patients who withdrew from their randomised treatment before week 24 were followed up " [Group A event includes Non-adherence to study drug schedule/permanent discontinuation of study drug] "Group A IC events were considered as directly interpretable. Effectively, IC events in this group are ignored, in agreement with the intention-to-treat (ITT) principle" Non-adherence -due to other reasons (N=3) "The primary estimand of interest was the efficacy estimand, which assumed continuation of randomised treatments for the trial duration, regardless of actual compliance." "the trial-product estimand, defined as the between-group difference in the change in glycated hemoglobin level from baseline to week 26 among all patients who underwent randomisation, had all patients continued to receive the trial product without receiving ancillary therapies." "The composite strategy, which was applied to categorical efficacy and health outcome variables, indicated that any intercurrent events-eg, discontinuing treatment or switching to open-label treatment-were to be assigned unfavourable values (ie, non-responder). Thus, patients were considered non-responders at timepoints when they did not meet the clinical response criteria or when they had missing clinical response data."

Additional non-trial treatment (N=2)
"A treatment policy/de-facto estimand approach was applied for the primary analysis that included all observed data irrespective of subject adherence to the randomised treatment. Patients who withdrew from their randomised treatment before week 24 were followed up through to week 24, and their week 24 data were considered as primary, irrespective of treatment discontinuation, treatment interruptions, or use of rescue medication (WLL or other)." [Group A event includes use of prohibited medication ] "Group A IC events were considered as directly interpretable. Effectively, IC events in this group are ignored, in agreement with the intention-to-treat (ITT) principle"

Treatment switching (N=1)
"The composite strategy, which was applied to categorical efficacy and health outcome variables, indicated that any intercurrent events-eg, discontinuing treatment or switching to open-label treatment-were to be assigned unfavourable values (ie, non-responder). Thus, patients were considered non-responders at timepoints when they did not meet the clinical response criteria or when they had missing clinical response data." "Patients who discontinued the masked study treatment to which they were originally assigned and switched to open-label ixekizumab Q2W were considered non-responders after switching."

Mortality (N=1)
[Group C event = Terminal event, i.e., death] "Group C IC events were assumed to conform to a hypothetical scenario, in which post-IC iGFR values have a similar distribution to other non-ESKD subjects with similar characteristics and pre-IC iGFR values."

Other intercurrent event 1 (N=3)
[Group B event = ESKD diagnosis or treatment ] "Group B IC events were assumed to follow a hypothetical scenario, in which iGFR values after developing ESKD take on biologically plausible values that are not confounded by the IC event, i.e., by ESKD treatments such as dialysis or kidney transplant. " [Other IE = anciallary treatment] "The 'trial product' estimand evaluates the treatment effect (difference in change of A1C from baseline to week 26) between once-weekly insulin icodec and once-daily IGlar U100 for all randomised participants, under the assumption that all participants had adhered to treatment for the entire planned duration of the trial and did not receive ancillary treatment. This is a 'hypothetical' estimand intended to provide an estimation of the achievable treatment effect of insulin icodec without any confounding effect of ancillary treatment for participants that are actually able to take the drug during the intended treatment period." [Other IE = supplemental oxygen through a nasal cannula during blood gas analysis] "a treatment policy/de-facto estimand approach was applied for the primary analysis that included all observed data irrespective of subject adherence to the randomised treatment. Patients who withdrew from their randomised treatment before week 24 were followed up through to week 24, and their week 24 data were considered as primary, irrespective of treatment discontinuation, treatment interruptions, or use of rescue medication (WLL or other)." eTable 7 -Subgroup analysis by sponsor (academic/not for profit versus pharmaceutical/for profit) eTable 9 -Analysis where inferring of the strategy for handling intercurrent events was difficult Analysis Details to inform inferability Primary estimand One trial specified analysis followed the ITT principle and used a joint model for a continuous outcome (change in the estimated glomerular filtration rate (eGFR) from baseline modelled using linear mixed model) and time to trial discontinuation due to death or end stage kidney disease before end of 104 week follow-up (Weibull parametric survival model).
We reached consensus that this inferred a hypothetical estimand strategy with respect to death/end stage kidney since the continuous outcome is not collected post the occurrence of death/end stage kidney disease (terminal IEs where the measurement no longer exists) and the resulting treatment effect is estimated conditional on a patientspecific frailty (random effect) that models the correlation between the continuous outcome and occurrence of death/end stage kidney disease. Following ITT principle inferred treatment policy with respect to non-terminal intercurrent events.
Four trials used an outcome of time to recovery or clinical improvement over a pre-define follow-up period (28 or 29 days) and for patients who did not recover and died prior to the end of the follow-up period right-censored the data at the last follow-up day. Analysis followed ITT principle. This is equivalent to setting death to an infinite recovery/improvement time, as clarified in a reference given by one of the three trials "We note that, with time-toimprovement/recovery models, the competing event of death requires special handling. Patients who die during follow-up should not be censored at time of death, as that assumes their recovery time would be like all who remain alive and unrecovered at that time. To state the obvious, once dead, a patient cannot recover. A death must be set to an infinite recovery time, so that at the end of follow-up, the patient is counted as "not recovered. We achieve the same objective by censoring deaths at the last observation day. Therefore, patients censored on the last observation day reflect two different states: death and failure to recover by day 28" [2]." This handles the competing event of death in a similar manner to the Fine and Gray competing risk approach (see Table 4). Therefore we reached consensus that this approach and the Fine and Gray approach (Table 4) inferred a composite strategy with respect to death. Following ITT principle, inferred treatment policy with respect to non-terminal intercurrent events. Two trials specified analysis followed the ITT principle and used Competingrisk marginal models for re-current events to handle terminal competing events We reached consensus that this inferred a hypothetical strategy with respect to the handling of the specified terminal intercurrent events, since the recurrent event can no longer occur post the occurrence of the terminal competing events and the resulting treatment effect is estimated conditional on a patient-specific frailty that models the correlation between the count outcome and occurrence of the competing event. This concurs with Krol et al. who explored the use of the joint frailty model with recurrent event when targeting the hypothetical estimand [3]. Following ITT principle inferred treatment policy with respect to non-terminal intercurrent events.

Supplementary estimand Fine and Gray Model
We reached consensus that this approach inferred a composite strategy with respect to the handling of deaths as described by [2]. When implemented with an ITT analysis strategy we inferred treatment policy with respect to the non-terminal intercurrent events. Referenced ICH-E9 R1 only; n=1 said would use "ITT estimand" (primary)/"on-treatment estimand" (supplement) but did not define either of these estimands further; n=1 said "a de-facto estimand approach" will be applied but did not define this estimand; n=1 said would use an "effectiveness estimand" but did not define this estimand. *Eight trials attempted a population definition but we classified this as not stated as this referred to the analysis population as follows: a) Subjects who are randomised and received at least 1 dose of investigational product (FAS), b) All randomised subjects, c) All randomised subjects (FAS), d) Full analysis set, e) All randomised subjects who received at least one dose (FAS) f) Modified Intent-to-Treat population g) All randomised subjects who received at least one dose of doubleblinded BMN 111 or placebo (Protocol) v FAS (defined as all randomised subjects in the SAP) h) In all subjects. ¥ One trial specified the population-level summary measure attribute as "Populationaverage treatment effect on eGFR at 4 months after randomization." But we classified this as not stated as information on the estimator was required to infer the actual population level summary measure.

eTable 13 Estimand attributes stated by the trial authors in protocol/SAP populations:
Population stated by trialists Individuals with T1D meeting the inclusion/exclusion criteria specified in the Study Protocol. Adults aged 18 years and older in circumstances at a high risk of SARS-CoV-2 infection but without medical conditions that pose additional risk of developing severe disease. [& list of other exclusions.] Population of Patients with heart failure with reduced ejection fraction Healthy adults after 1 or 2 doses Population: Defined through appropriate inclusion/exclusion criteria (see Section 6.1 and 6.2) to reflect the targeted patient population Treatment conditions stated by trialists Treatment effect due to the initially randomised treatments as actually taken The initially assigned and dosed investigational product (anifrolumab and placebo) Initiating treatment with semaglutide as compared to placebo If had adhered to treatment and did not receive ancillary (1/wk v 1/daily Iglar) Regardless of treatment adherence (evolocumab vs placebo) The effect of the initially assigned randomised study drug, Test: mRNA-1273. Reference: Placebo [given] receiving the second dose of IP per protocol schedule Empagliflozin 10 mg and placebo regardless of changes of treatment (including discontinuation of trial medication) until completion of the planned treatment phase Regardless of adherence to treatment and subsequent therapies Randomised treatments……assuming continuation of randomised treatments for the duration of the study regardless of actual compliance Complying with receipt of second dose Measurement of intervention effect: Regardless of stopping study treatment or adherence to study treatment. Intercurrent events handling stated by trialists Group A will be considered as directly interpretable. Effectively, IC events in this group are ignored, which is consistent with the ITT principle.", Group B are assumed to follow a hypothetical scenario, in which variable of interest after developing ESRD takes on biologically plausible values that are not confounded by IC event i.e. by ESRD treatment Group C are assumed to conform to a hypothetical scenario in which post-IC values of the variable of interest (or endpoint) have a similar distribution to other non-ESRD subjects [Group A events = Non-adherence to study drug schedule, Permanent discontinuation of study drug, use of prohibited medication, missed scheduled visit. Group B = ESRD treatment, Group C = Early discontinuation from the study, 7 54% *Eight trials attempted a population definition but we classified this as not stated as this referred to the analysis population. ¥ One trial specified the population-level summary measure attribute as "Population-average treatment effect on eGFR at 4 months after randomization." But we classified this as not stated as information on the estimator was required to infer the actual population level summary measure.