Evaluation of pathological complete response as surrogate endpoint in neoadjuvant randomised clinical trials of early stage breast cancer: systematic review and meta-analysisBMJ 2021; 375 doi: https://doi.org/10.1136/bmj-2021-066381 (Published 21 December 2021) Cite this as: BMJ 2021;375:e066381
- Fabio Conforti, medical oncologist1,
- Laura Pala, medical oncologist1,
- Isabella Sala, masters student in statistics2,
- Chiara Oriecuia, doctoral student in statistics3,
- Tommaso De Pas, medical oncologist1,
- Claudia Specchia, professor of statistics4,
- Rossella Graffeo, medical oncologist5,
- Eleonora Pagan, postdoctoral fellow in statistics2,
- Paola Queirolo, medical oncologist1,
- Elisabetta Pennacchioli, surgical oncologist1,
- Marco Colleoni, medical oncologist6,
- Giuseppe Viale, professor of pathology78,
- Vincenzo Bagnardi, professor of statistics2,
- Richard D Gelber, professor of statistics9
- 1Division of Melanoma, Sarcomas and Rare Tumors, IEO, European Institute of Oncology, IRCCS, Milan, Italy
- 2Department of Statistics and Quantitative Methods, University of Milan-Bicocca, Milan, Italy
- 3Department of Clinical and Experimental Sciences, University of Brescia, Brescia, Italy
- 4Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
- 5Breast Unit of Southern Switzerland, Oncology Institute of Southern Switzerland, Bellinzona, Switzerland
- 6Division of Medical Senology, IEO, European Institute of Oncology IRCCS, Milan, Italy
- 7Department of Pathology, IEO, European Institute of Oncology, IRCCS, Milan, Italy
- 8University of Milan, Milan, Italy
- 9Medical School, Harvard T H Chan School of Public Health, and Frontier Science and Technology Research Foundation, Boston, MA, USA
- Correspondence to: F Conforti (or @ieoufficiale and @fabioconforti1 on Twitter)
- Accepted 16 November 2021
Objective To evaluate pathological complete response as a surrogate endpoint for disease-free survival and overall survival in regulatory neoadjuvant trials of early stage breast cancer.
Design Systematic review and meta-analysis.
Data sources Medline, Embase, and Scopus to 1 December 2020.
Eligibility criteria for study selection Randomised clinical trials that tested neoadjuvant chemotherapy given alone or combined with other treatments, including anti-human epidermal growth factor 2 (anti-HER2) drugs, targeted treatments, antivascular agents, bisphosphonates, and immune checkpoint inhibitors.
Data extraction and synthesis Trial level associations between the surrogate endpoint pathological complete response and disease-free survival and overall survival.
Methods A weighted regression analysis was performed on log transformed treatment effect estimates (hazard ratio for disease-free survival and overall survival and relative risk for pathological complete response), and the coefficient of determination (R2) was used to quantify the association. The secondary objective was to explore heterogeneity of results in preplanned subgroups analysis, stratifying trials according treatment type in the experimental arm, definition used for pathological complete response (breast and lymph nodes v breast only), and biological features of the disease (HER2 positive or triple negative breast cancer). The surrogate threshold effect was also evaluated, indicating the minimum value of the relative risk for pathological complete response necessary to confidently predict a non-null effect on hazard ratio for disease-free survival or overall survival.
Results 54 randomised clinical trials comprising a total of 32 611 patients were included in the analysis. A weak association was observed between the log(relative risk) for pathological complete response and log(hazard ratio) for both disease-free survival (R2=0.14, 95% confidence interval 0.00 to 0.29) and overall survival (R2 =0.08, 0.00 to 0.22). Similar results were found across all subgroups evaluated, independently of the definition used for pathological complete response, treatment type in the experimental arm, and biological features of the disease. The surrogate threshold effect was 5.19 for disease-free survival but was not estimable for overall survival. Consistent results were confirmed in three sensitivity analyses: excluding small trials (<200 patients enrolled), excluding trials with short median follow-up (<24 months), and replacing the relative risk for pathological complete response with the absolute difference of pathological complete response rates between treatment arms.
Conclusion A lack of surrogacy of pathological complete response was identified at trial level for both disease-free survival and overall survival. The findings suggest that pathological complete response should not be used as primary endpoint in regulatory neoadjuvant trials of early stage breast cancer.
The US Food and Drug Administration and European Medicines Agency support the use of pathological complete response in neoadjuvant randomised clinical trials of early stage breast cancer as a surrogate endpoint for long term patients’ clinical outcome (event-free or disease-free survival and overall survival), in the accelerated approval process of new drugs; and the current FDA table of surrogate endpoints includes pathological complete response for breast cancer.12 This decision addressed the need to expedite drug approvals, allowing patients to have access to effective treatments faster, more efficiently, and more economically than waiting for the final results of adjuvant or neoadjuvant randomised clinical trials.12
Evidence supporting the decision of regulatory agencies was mainly derived from an FDA sponsored meta-analysis of individual patient data from 12 randomised controlled trials.3 This analysis robustly showed a strong correlation between pathological complete response and both disease-free survival and overall survival at patient level, but it failed to show a statistically significant association at trial level.3
Buyse et al proposed that “a good surrogate endpoint must be shown to be causally linked to the true endpoint” and “to capture the whole effect of treatment upon the true endpoint.”45 Operationally this means that a good surrogate endpoint should fulfil the condition of a meaningful association with the true endpoint at both patient and trial level.45 A strong association at patient level indicates that the surrogate and true endpoint are likely causally linked, whereas a strong association at trial level indicates that the surrogate captures a large proportion of the treatment effect on the true endpoint.45 The trial level association between endpoints, however, does not simply follow from the patient level association.45
Several surrogate endpoints in oncology show a statistically significant association with patients’ overall survival at both individual and trial level, such as disease-free survival for human epidermal growth factor 2 (HER2) positive early stage breast cancer or disease-free survival and progression-free survival for early and advanced stage colorectal cancer, respectively.67 Berry and Hudis suggested that the absence of pathological complete response surrogacy at trial level observed in the FDA meta-analysis for breast cancer could be potentially explained by the limited number of trials analysed, especially given the little spread of treatment effects across the trials (ie, narrow range of the pathological complete response odd ratios and disease-free survival and overall survival hazard ratios reported in the 12 trials included in the analysis).8 Furthermore, the power of the surrogacy analysis could be affected by the use of hazard ratio as a measure of the effect of treatments on disease-free survival and overall survival. Whereas the hazard ratio is the gold standard measure for treatment effect in adjuvant and neoadjuvant trials of breast cancer, it could be affected by the loss of the proportional hazards assumption. To account for the potential loss of power owing to non-proportionality, surrogacy analysis should be performed with many trials.
Berruti et al’s subsequent meta-analysis of aggregate data from 29 randomised controlled trials also failed to show a statistically significant surrogacy for pathological complete response at trial level.9 This analysis, however, had several limitations—one of the most important being that the potential heterogeneity of results according to the biological features of disease has not been evaluated.9 Another important limitation shared by the FDA sponsored analysis and that of Berruti et al is that the analyses only included randomised clinical trials that tested chemotherapy, and, in most cases old regimens, with the exception of only two trials that tested anti-HER2 targeted treatment.39
The FDA guidance for the use of complete pathological response as an endpoint to support accelerated approval has highlighted all such limitations affecting the results of the previous analyses and recognised the important value of further analyses to overcome these limitations.1 The controversy about the surrogacy value of pathological complete response was also discussed at the St Gallen International Breast Cancer Consensus Conference (Vienna, 2021), where only 40% of panellists supported its use as an appropriate endpoint for defining standard adjuvant or neoadjuvant systemic regimens to treat early stage breast cancer.
Because a larger number of trials is now available, we performed a meta-analysis of all the randomised clinical trials that tested neoadjuvant treatments for early stage breast cancer, to assess the utility of pathological complete response as a surrogate for long term patients’ outcome at trial level.
In this study we followed the Preferred Reporting Items for Systematic Reviews and Meta analyses (PRISMA) and Reporting of Surrogate Endpoint Evaluation using Meta analyses (ReSEEM) guidelines.10 We systematically searched PubMed, Medline, Embase, and Scopus to 1 December 2020 for all randomised clinical trials that tested neoadjuvant chemotherapy given alone or combined with other treatments. The search terms were “breast cancer”, “neoadjuvant therapy”, “preoperative therapy”, and “pathologic complete response”.
Trials were considered eligible for inclusion if they were randomised clinical trials that tested chemotherapy administered alone or in combination with other treatments in a neoadjuvant setting; they contained data on pathological complete response rates and survival outcomes (ie, a combination of disease-free survival, event-free survival, relapse-free survival, and overall survival) in the different treatment arms; and an explicit definition of pathological complete response was reported and based on excision histology.
Any trial in which additional post-surgical adjuvant treatments were delivered were considered to be eligible if all participants received the same treatment. We excluded neoadjuvant trials that tested endocrine treatment because of the low associated rate of pathological complete response. Two investigators (FC and LP) independently reviewed the list of retrieved articles for relevancy, and two investigators (IS and CO) independently extracted data from the studies, with discrepancies resolved by consensus with all investigators. Data were extracted on study design, number of patients enrolled, type of treatment, pathological complete response rate, definition of pathological complete response, number of disease-free survival and overall survival events, and duration of follow-up. The Cochrane Collaboration’s risk of bias tool was used to determine study methodological quality.11
The primary objective was to assess the trial level association between pathological complete response as the surrogate endpoint and long term outcome in patients (ie, disease-free survival, overall survival, or both). The secondary objective was to explore heterogeneity of results according to the type of treatment in the experimental arm, the definition of pathological complete response (breast and lymph nodes v breast only), and biological features of the disease (HER2 positive and triple negative breast cancer).
We used the classification reported in the original paper to define treatment arms as experimental or control for each trial. In all analyses we used the endpoints for long term outcome in patients, as provided by the trial investigators. Because the endpoints definition was not standardised in most of the neoadjuvant trials, we considered several time-to-event endpoints to be equivalent to disease-free survival: relapse-free survival, event-free survival, and progression-free survival. In the case of studies reporting results for more than one time-to-event endpoint, we selected only one for the regression analysis using the following hierarchical order: disease-free survival, event-free survival, relapse-free survival, and progression-free survival. The hazard ratio of disease-free survival (or equivalent endpoint) and overall survival between the experimental arm and the control arm was used as the treatment effect estimate for the long term patients’ clinical outcome (the true endpoint).
From each trial we extracted the proportion of patients with a pathological complete response per treatment arm as the surrogate endpoint for the analysis. The relative risk of pathological complete response between the experimental and control arm was used as the treatment effect estimate on the surrogate outcome. We recognised specific definitions of what constitutes a pathological complete response: ypT0-ypN0 indicates absence of invasive and intraductal disease in breast and nodes; ypT0/is-ypN0 indicates absence of invasive disease in breast and nodes; ypT0-ypN0/+indicates absence of invasive and intraductal disease in breast, irrespective of nodes; and ypT0/is-ypN0/+indicates absence of invasive disease in breast, irrespective of nodes. When studies used more than one definition to report the rates of pathological complete response, we recorded all information and selected the appropriate definition of the primary analysis using the hierarchical order: ypT0-ypN0, ypT0/is-ypN0 (in both cases the endpoint applies to breast and lymph nodes), ypT0-ypN0/+, and ypT0/is-ypN0/+ (in both cases the endpoint applies to breast only).
Finally, we performed three sensitivity analyses: in the first we excluded small trials enrolling fewer than 200 patients, in the second we excluded trials with a median follow-up shorter than 24 months, and in the third we used the absolute difference of pathological complete response rates between control and experimental arm (rate in experimental arm–rate in control arm) instead of relative risk for pathological complete response as an estimate of treatment effects on the surrogate endpoint.
We used a correlation approach to assess surrogacy as previously described.4512 To quantify the association between the effect of treatment on the reference endpoints of disease-free survival and overall survival and the effect of treatment on the surrogate endpoint of pathological complete response, we used a weighted linear regression model. From each trial we extracted treatment effects, expressed as hazard ratios for disease-free survival and overall survival and relative risks for pathological complete response, from each trial and considered these on a log scale in the model. Weights were defined as the number of disease-free survival and overall survival events reported or derived from each trial. In addition, as sensitivity analysis we also evaluated two different weighting systems based on the inverse of the variance of the log of the pathological complete response relative risk and on the trial sample size.
The coefficient of determination(R2) was used to measure the variation of the weighted treatment effects explained by the model and to quantify the surrogacy level of pathological complete response. We used the TrialLevelMA function of the R package Surrogate to calculate R2 and associated 95% confidence intervals.13 According to ReSEEM (Systematic Review and Recommendation for Reporting of Surrogate Endpoint Evaluation using Meta-analyses) guidelines, R2 values ≥0.7 represent strong correlations (and thus suggest surrogacy), values between 0.69 and 0.5 represent moderate correlations, and values <0.5 represent weak correlations.10 The slope of the regression line was also reported as an alternative measure of surrogacy.
Leave-one-out cross validation was performed to validate results obtained in the main analysis. Each trial was left out once, and the surrogate model was built with the other trials; this model was then reapplied to the left out trial to predict the effect of treatment on the reference endpoints (disease-free survival or overall survival). The leave-one-out cross validated R2 was calculated as the correlation between the individual predictions made by the model over all trials and the actual treatment effects.
To assess homogeneity of slopes according to the levels of a defined factor, we included the interaction term between log(relative risk) for pathological complete response and the defined factor in a multivariable meta-regression model and calculated the associated F statistic. Moreover, to adjust the R2 for trial level covariates, we also fitted a multivariable weighted linear model, including the trial level covariates as adjustable variables. We report the adjusted R2—that is, the square of the partial correlation coefficient obtained from the multivariable model.
Finally, we calculated the surrogate threshold effect, defined as the minimum relative risk of the pathological complete response necessary to predict a statistically significant disease-free survival or overall survival benefit in a future trial. The surrogate threshold effect was located as the intersection of the upper limit of the 95% prediction band and the horizontal line representing the predicted hazard ratio for disease-free survival or overall survival equal to 1 (null effect).14 The 95% prediction band was calculated from the weighted regression model used to derive the coefficient of determination R2 and was based on the predicted weight assigned to the hazard ratio for a future trial. Because the regression model in the main analyses was weighted by the number of events, in the calculation of the prediction band and consequently in the identification of the surrogate threshold effect, we considered a future trial with expected number of events to be equal to the average number of events observed in the set of trials included in the model itself. As a sensitivity analysis the surrogate threshold effect was computed for different scenarios, varying the expected number of events for a future trial.
Patient and public involvement
Members of the study group have regular meetings with patient representatives about ongoing scientific projects and activities. During these meetings the project and its objectives are discussed, and we accepted the patients’ suggestions, which were mainly focused on the need to make the final version of the paper as clear and less technical as possible, to widely disseminate the results given the relevant implications for research and clinical practice.
Characteristics and quality assessment
Overall, 54 randomised clinical trials comprising a total of 32 611 patients were included in the analysis (supplementary fig S1 and table S1; references in table S1 are cited in the full paper only). Seven trials had three arms and three trials had four arms, for a total of 67 comparisons analysed.1516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
The trials tested different regimens or schedule of neoadjuvant chemotherapy. Ten trials evaluated an anthracycline based regimen versus an anthracycline or taxane based chemotherapy or the two combined.151617181920212223242582 Ten compared a dose dense or intensified chemotherapy regimen or both with a standard dose regimen.2627282930313233343536 Six trials tested the addition of capecitabine,3738394041424378 three of carboplatin,5758596075 two of nab-paclitaxel, two of gemcitabine, and one of vinorelbine to a standard anthracycline or taxane based regimen or the two combined.6162717778 Twelve trials tested the combination of chemotherapy with anti-HER2 targeted treatment,444546474849505152536869748083 five with bevacizumab,5770727881 two with anti-programmed death 1(PD1) or anti-PDL1 drugs,6364 two with everolimus,7376 and one with zoledronic acid.65
Forty studies applied a pathological complete response definition to breast and lymph nodes and 13 to breast only. Study specific pathological complete response rates ranged between 2% and 68%, relative risks for pathological complete response ranged between 0.52 and 3.0 (pooled relative risk 1.21, 95% confidence interval 1.15 to 1.27) and the hazard ratios for disease-free survival ranged between 0.26 and 2.61 (pooled hazard ratio 0.91, 95% confidence interval 0.85 to 0.96) and for overall survival ranged between 0.19 and 2.27 (pooled hazard ratio 0.89, 0.84 to 0.94; supplementary table S1).
The endpoint for time to recurrence was disease-free survival in 35 trials, event-free survival in nine trials, relapse-free survival in seven trials, and progression-free survival in three trials (supplementary table S2). The median follow-up across trials was 56 months (range 15.5-120 months).
Randomised treatment allocation sequences were generated in all trials. Six trials were double blinded. Supplementary table 3 lists the quality scores according to the risk of bias tool for each trial. No trial was scored as low quality.
The effects of breast cancer treatment on pathological complete response compared with on disease-free survival or overall survival was assessed and a regression equation was estimated based on data from all the trials included in the analysis. A weak association was found between the log(relative risk) for pathological complete response and the log(hazard ratio) for disease-free survival (R2=0.14, 95% confidence interval 0.00 to 0.29), and the slope of the regression line was −0.27 (fig 1 and table 1). The corresponding association for overall survival was similarly weak (R2=0.08, 0.00 to 0.22), and the slope of the regression line was −0.20 (fig 1 and table 1). After adjustment for trial level covariates, such as definition of pathological complete response, type of treatment, and size of trial, the R2 values did not materially change (disease-free survival R2=0.11, 0.00 to 0.25, overall survival R2=0.08, 0.00 to 0.22).
The leave-one-out cross validation analysis confirmed that the surrogacy of pathological complete response was weak for both disease-free survival and overall survival: the leave one-out cross validated R2 was 0.07 for disease-free survival and 0.02 for overall survival. The R2 values obtained in the leave-one-out models ranged from 0.11 to 0.20 for disease-free survival and from 0.06 to 0.12 for overall survival (supplementary fig S2A and B).
Subgroup and sensitivity analyses
The surrogacy of pathological complete response was explored in preplanned analyses stratifying trials according to the type of treatment in the experimental arm, definition of pathological complete response, and biological features of the disease. The different systemic treatments administered in the experimental arm were classified according to five groups (supplementary table S1): regimens using anthracycline or taxane based chemotherapy, or both (10 trials; fig.2A and fig2D), dose dense or intensified chemotherapy regimens (10 trials; fig 2B and fig2E), regimens containing capecitabine (six trials; fig 2G and fig2J), chemotherapy in combination with anti-HER2 targeted treatments (12 trials; fig 2C and fig2F), chemotherapy in combination with bevacizumab (five trials; fig 2H and fig2K), and other treatments (17 trials; fig 2I and fig2L). The association between the log(relative risk) for pathological complete response and log(hazard ratio) for both disease-free survival and overall survival was weak in all the treatment subgroups explored (F test for homogeneity of slopes: P=0.03 for disease-free survival and P=0.17 for overall survival; table 1), with the only exception represented by the two subgroups of trials testing, respectively, the dose dense or intensified regimens, in which the association was moderate for disease-free survival (R2=0.62, 95% confidence interval 0.18 to 1.00; fig 2B) but weak for overall survival (R2=0.45, 0.00 to 0.99; fig 2E and table 1) and regimens containing capecitabine, in which the association was weak for disease-free survival (R2=0.21, 0.00 to 1.00; fig 2G) and moderate for overall survival (R2=0.64, 0.00 to 1.00; fig.2J and table 1).
Both definitions of pathological complete response showed a weak association with disease-free survival and with overall survival: R2 for disease-free survival and overall survival was, respectively, 0.02 (0.00 to 0.19) and 0.01 (0.00 to 0.12) for pathological complete response applied to breast only (fig S3A and C), and 0.15 (0.00 to 0.34) and 0.10 (0.00 to 0.28) for pathological complete response applied to breast and lymph nodes (fig S3B and D; F test for homogeneity of slopes: P=0.23 for disease-free survival and P=0.35 for overall survival; table 1).
The association between the log(relative risk) for pathological complete response and log(hazard ratio) for both disease-free survival and overall survival was weak in both triple negative and HER2 positive breast cancer: R2 for disease-free survival and overall survival was, respectively, 0.42 (0.05 to 0.79; fig 3A) and 0.17 (0.00 to 0.55; fig 3C) for triple negative breast cancer, and 0.37 (0.05 to 0.69; fig 3B) and <0.01 (0.00 to 0.05; fig 3D) for HER2 positive disease (F test for homogeneity of slopes: P=0.56 for disease-free survival and P=0.33 for overall survival; table 1).
A post hoc analysis was also performed with trials stratified according to the type of time-to-event endpoint used (disease-free survival, event-free survival, or other endpoints, including relapse-free survival and progression-free survival; table 1): the R2 for the association between the log(relative risk) for pathological complete response and log(hazard ratio) was 0.03 (0.00 to 0.15) for disease-free survival (supplementary fig S4A), 0.40 (0.00 to 0.85) for event-free survival (fig S4B), and 0.39 (0.00 to 0.82) for the other endpoints (fig S4C).
Sensitivity analyses were performed excluding small trials (11 trials with a sample size <200 patients; supplementary fig S5A and B and table 1); excluding trials with short median follow-up (five trials with median follow-up <24 months; table 1); using absolute difference of pathological complete response between treatment arms instead of the relative risk for pathological complete response (supplementary fig S6A and B and table S4, and table 1); using the sample size as weighting systems in the regression model, instead of number of disease-free survival and overall survival events (table 1); and using the inverse of the variance of the log of the pathological complete response relative risk as weighting systems in the regression model (table 1). The results of the sensitivity analyses were comparable to those of the main analysis.
Assessment of surrogate threshold effect of pathological complete response
The surrogate threshold effect was calculated, indicating the minimum relative risk of the pathological complete response necessary to confidently predict a non-null effect on hazard ratios for disease-free survival or overall survival in a future randomised trial. In this calculation, a future trial was considered to have an expected number of events equal to the average number of events observed in the main analysis including all the trials (131 disease-free survival events and 91 overall survival events). Because of the weak association observed between pathological complete response and disease-free survival and overall survival, the surrogate threshold effect was not estimable or was high for the main analysis including all the trials (fig 4A and fig4B), as well as for all the subgroups explored (supplementary figs S7-S10 and table 1). The surrogate threshold effect calculated for absolute difference of pathological complete response instead of the relative risk was, respectively, 0.22 for disease-free survival and 0.31 for overall survival (table 1). Supplementary table S5 and figures S11-S14 show the surrogate threshold effect obtained by varying the number of expected events in a future randomised controlled trial—for example, in a randomised controlled trial with, respectively, 800 disease-free survival or overall survival events, a relative risk greater than 1.67 for pathological complete response predicts significant gains in disease-free survival, whereas a relative risk greater than 1.51 predicts significant gains in overall survival.
The findings from this meta-analysis do not support the use of pathological complete response as a surrogate endpoint for disease-free survival and overall survival in neoadjuvant trials of early stage breast cancer. We found that the coefficient of determination of the association between pathological complete response and overall survival was 0.08 (95% confidence interval 0.00 to 0.22), indicating that only 8% of the variability among treatment effects on overall survival is explained by the effects observed with pathological complete response. This coefficient was even lower when estimated using leave-one-out cross validation. Much has been discussed about when a surrogate endpoint could be theoretically considered validated, but the consensus is that a candidate surrogate endpoint would be valid only if the coefficient of determination (R2) is at least equal to or higher than 0.7.4510 Furthermore, our subgroup analysis confirmed that the weak association between pathological complete response and long term clinical outcomes was evident for all the subgroups explored, independently of the type of treatment, the definition of pathological complete response, and the biological features of the disease. Finally, results of the surrogate threshold effect analysis suggested that a statistically significant effect on overall survival could be confidently predicted only if a very high relative risk for pathological complete response was observed.
Several explanations might account for the lack of pathological complete response surrogacy at trial level. One hypothesis is that pathological complete response measures the effect of a treatment only on the primary tumour and not on micrometastatic systemic disease, which is the main target of adjuvant and neoadjuvant treatments. The surrogacy assumption is that responses of primary tumours and micrometastases are comparable, but the validity of such an assumption could be affected by the disease itself and type of neoadjuvant treatments.8485 In our opinion, the strong association observed between pathological complete response and long term outcomes at patient level in early stage breast cancer, and the excellent prognosis of patients achieving a pathological complete response, do not support such an hypothesis.3
Another potential explanation is that patients who do not achieve a pathological complete response might not be disadvantaged, as shown by those with endocrine responsive breast cancers who derive important survival benefit from endocrine treatments but rarely obtain a pathological complete response.86 In fact, several more granular definitions of pathological response that could capture treatment effects better than pathological complete response have been proposed as surrogate endpoints, such as residual cancer burden in breast cancer or major pathological response (<10% vital tumour cells) in lung cancer and melanoma. Although a strong association between such surrogate endpoints and long term outcomes has been found at patient level, no evidence has been provided yet on their surrogacy value at trial level.8788
Another explanation could be that a surrogate endpoint that exclusively relies on comparing pathological complete response rates between treatment arms overlooks relevant information from most of the other patients who do not achieve a pathological complete response and who might experience a large spectrum of responses, including primary resistance and disease progression during neoadjuvant treatment. Such a broad spectrum of responses might not be equally distributed between treatment arms, affecting the overall prognosis of the population more than pathological complete response rates. This is the scenario described in the study by Fleming et al, in which false negative and false positive conclusions about clinical efficacy of a new intervention compared with standard treatment could arise if a surrogate endpoint only captures the effects of interventions on one causal pathway of the disease process (ie, a substantial reduction of relapses in patients achieving a pathological complete response), while the interventions also have an impact on other principal causal pathways (ie, the ability of treatments to modify the clinical course of disease and thus the risk of relapse independently of achieving a pathological complete response).89 This could explain results such as those observed in the large GeparTrio trial, which found no difference in pathological complete response rates between treatment arms but reported a survival advantage for the experimental treatments.40 A composite surrogate endpoint that takes into account differences between arms not only in pathological complete response rates but also in the rate of the other types of response, including progression of disease, might have greater surrogacy value at trial level.45 In a recent retrospective analysis of 938 women treated in the neoadjuvant I-SPY2 trial, patients’ event-free survival was found to worsen significantly for each unit of residual cancer burden, regardless of tumour subtype and type of neoadjuvant treatment. Comparing distributions of residual cancer burden as a continuous measure of response obtained by treatment arms in randomised controlled trials, would probably provide additional information beyond pathological complete response rate and would better capture the effect of treatments on long term patients’ clinical outcomes.83 Furthermore, recently, meta-analytical methods allowing for use of multiple surrogate endpoints jointly have been proposed with the potential benefit of reducing the uncertainty around predictions.90
All such hypotheses to explain lack of trial level surrogacy in the presence of strong patient level surrogacy are speculative and remain to be shown. Moreover, this discrepancy can simply occur because of causal inference mechanisms, such as the confounding effect by known and unknown prognostic factors that have a similar influence on both the surrogate and the final endpoints creating a correlation between them at individual level, even when the association is weak at trial level.45
To date, the FDA has approved two drugs in the neoadjuvant setting for breast cancer under the accelerated approval pathway, based on results on surrogate endpoints: pertuzumab for HER2 positive disease and pembrolizumab for triple negative breast cancer. Although the follow-up for overall survival of the Keynote-522 trial, leading to accelerated approval of pembrolizumab, is too short to draw any conclusions, the discrepancy observed in the Adjuvant Pertuzumab and Herceptin IN Initial TherapY in Breast Cancer (APHINITY) trial between the statistically significant and large improvement of pathological complete response rate and the lack of evidence of survival benefit for patients treated with pertuzumab, pointed to the risk of using pathological complete response as a surrogate endpoint. In oncology, many drugs were originally approved on the basis of substantial improvement of a supposed—but actually not fully shown—surrogate endpoint, which in later studies failed to show evidence of survival benefit, such as bevacizumab for breast cancer, olaratumab for sarcoma, and atezolizumab for urothelial carcinoma.9192 These and numerous other examples suggest a fundamental flaw in the use of surrogate endpoints for drug approvals and the need for rigorous evidence of the surrogacy value of drugs before use.9192 Despite the caveats, the reliance of regulatory agencies on surrogate endpoints for drug approval has increased considerably in recent years.9192
The lack of surrogacy at trial level showed here, substantially limits the possibility of using pathological complete response to confidently anticipate the results of randomised controlled trials and to predict long term outcome of the populations enrolled and thus to support accelerated drugs approval. However, all this does not undermine the value of pathological complete response when used for other reasons, as well as the importance of neoadjuvant trials.93 Indeed, given the strong association between pathological complete response and overall survival shown at patient level, pathological complete response represents the best biomarker available to predict patients’ residual risk of relapse after neoadjuvant therapy and has utility in identifying those at substantial risk who require escalation of adjuvant therapy, as shown for HER2 positive disease in the KATHERYNE trial and for triple negative breast cancer in the Capecitabine for Residual Cancer as Adjuvant Therapy (CREATE-X) trial.9495
Limitations of this study
Our study has several limitations. Our analysis is based on aggregate data from trials, and not on individual patient data (IPD). IPD analyses allow for checking the plausibility of randomisation sequences, verifying data integrity and consistency, fitting bivariate and copula based models that are among the preferred methods of assessment of trial level associations, adjusting the analyses for baseline prognostic covariates, and taking into account the fact that each within trial surrogate outcome is estimated with error. Nevertheless, the specific aim of our analysis was to assess surrogacy at trial level, and we used only data from randomised clinical trials of high quality, making it unlikely that an IPD analysis would substantially change our conclusions.969798 We also did not explore potential differences of the pathological complete response surrogacy value within the subgroups of HER2 positive disease defined by hormone receptor status. Finally, the terminology of time to recurrence endpoints used across trials is heterogenous. However, in many cases—particularly in the earliest trials—the definition provided by authors in the original papers for both disease-free survival and progression-free survival endpoints substantially resembled the FDA definition of event-free survival.150 An IPD meta-analysis of a large number of randomised controlled trials to assess the surrogacy value of pathological complete response as well as of other intermediate endpoints in adjuvant and neoadjuvant trials would complement our analyses, and we hope that the Early Breast Cancer Trialists’ Collaborative Group might support such an analysis in the future.
Our meta-analysis found lack of surrogacy of pathological complete response for long term patients’ outcome at trial level. Although this finding does not affect the role of pathological complete response to estimate patients’ residual risk of relapse after neoadjuvant treatment and to identify those patients who are candidates for further adjuvant treatments, use of pathological complete response to predict long term outcomes of patient populations enrolled in neoadjuvant randomised clinical trials is questionable. For this reason, we suggest that pathological complete response should not be used as a primary endpoint in regulatory neoadjuvant trials in early stage breast cancer.
What is already known on this topic
Pathological complete response is a US Food and Drug Administration approved surrogate endpoint for disease-free survival and overall survival in randomised clinical trials testing neoadjuvant treatments in early stage breast cancer
Previous meta-analyses including a limited number of trials showed a strong correlation between pathological complete response and disease-free survival and overall survival at patient level but not trial level
The surrogacy value of pathological complete response is controversial
What this study adds
This meta-analysis showed a weak association between pathological complete response and disease-free survival and overall survival at trial level
The findings suggest that pathological complete response should not be used as a surrogate endpoint in regulatory neoadjuvant randomised clinical trials of early stage breast cancer
Better surrogate endpoints are needed
Data availability statement
Detailed data on the included studies are available on reasonable request to the corresponding author.
FC and LP thank Aron Goldhirsch for his mentorship.
Contributors: FC and LP are joint first authors. VB and RDG are joint last authors. FC, LP, VB, and RDG conceived, designed, planned, and managed the study, acquired data, interpreted the results, drafted the manuscript, and critically reviewed or revised the manuscript for important intellectual content. VB, FC, LP, IS, CO, and CS managed the study, acquired data, and performed the statistical analyses. All other authors supervised the data analysis, provided the interpretation of results, and contributed to the drafting and critical review of the manuscript. All authors approved the final draft. FC is the guarantor. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
Funding: None received.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
The lead author (FC) affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.
Dissemination to participants and related patient and public communities: We plan to disseminate the findings and conclusions from this study through a lay language summary of our findings (see supplementary file), which will be widely promoted by our respective institutions (the European Institute of Oncology, Bicocca University of Milan, and Harvard University) through press releases, social media (such as Twitter), and the websites of our institutions. We also plan to present the results of our study at international scientific conferences.
Provenance and peer review: Not commissioned; externally peer reviewed.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.