Intended for healthcare professionals

CCBYNC Open access
Research Christmas 2023: Marginal Gains

Projecting complete redaction of clinical trial protocols (RAPTURE): redacted cross sectional study

BMJ 2023; 383 doi: https://doi.org/10.1136/bmj-2023-077329 (Published 14 December 2023) Cite this as: BMJ 2023;383:e077329
  1. Nir Balaban, student1,
  2. Ghulam Rehman Mohyuddin, assistant professor2,
  3. Adi Kashi, intern1,
  4. Amir Massarweh, research physician3,
  5. Gal Markel, professor of medicine134,
  6. David Bomze, research physician1,
  7. Daniel A Goldstein, senior lecturer13,
  8. Tomer Meirson, research physician134
  1. 1Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
  2. 2Division of Hematology, Huntsman Cancer Institute, University of Utah, Salt Lake City, UT, USA
  3. 3Davidoff Cancer Center, Rabin Medical Center-Beilinson Hospital, Petah Tikva, Israel
  4. 4Samueli Integrative Cancer Pioneering Institute, Rabin Medical Center-Beilinson Hospital, Petah Tikva, Israel
  1. Correspondence to: T Meirson tomermrsn{at}gmail.com (or @TomerMeirson on Twitter)
  • Accepted 7 November 2023

Abstract

Objectives To characterise redactions in clinical trials and estimate a time when all protocols are fully removed (RAPTURE).

Design Redacted cross sectional study.

Setting Published phase 3 randomised controlled trials from 1 January 2010 to ██████████████.

Participants New England Journal of Medicine, ██████████, and Journal of the American Medical Association.

Main outcome measures █████ ████████ ██████████████ ██████ ██████████ ████████ ████████ ██████████ ███████████ ████████████ ████████████ ████████████████████████ ██████████████████

Results ████████████████████ met the inclusion criteria, with 268 (56.7%) research protocols available and accessible. The rate of redactions in protocols has increased from 0 in 2010 to 60.8% in 2021 (P<0.001). The degree of data redaction has also increased, with the average cumulative redactions among industry funded trials rising from 0 in 2010 to 3.5 pages in 2021 (P<0.001). Modelling predicts that RAPTURE is expected to occur between 2073 and 2136. Redactions featured predominantly in ████████ sponsored trials and mostly occurred in the statistical design.

Conclusions This study highlights the rise in protocol redactions and predicts that, ██████████████████████████████████████████ will be entirely redacted between 2073 and 2136. A legitimate rationale for the redactions could ███ be found. A multipronged strategy against protocol redactions is required to maintain the integrity of science.

Availability This paper is partially redacted, but for the sake of ███████████, a version without any redactions can be found in the supplementary material.

Introduction

Redactions in clinical trial protocols are █████████████████████ and interfere with ██████████████████████████████████████ █████████ a clinical trial.123 There are various reasons for protocol redactions by sponsors, including fears of release of commercially sensitive information or trademarked intellectual property.14 However, the authors of this study have yet to stumble upon any hidden trade secrets when reviewing many protocols during their research endeavours.

Given the ██████████████████████████████████████████, we fear that a time will come when protocols will be fully redacted, which would pose a problem for all aspects of science. In religious cultures, the term rapture is often used to signify the end of time. Therefore, we aimed to predict when all protocols and amendments would become fully redacted (ie, removal of amendments and protocols of trials using redactions—RAPTURE), which would signify ████████████████████████████████ ███████████████████████████████, justifying the use of the word RAPTURE in this context.

In this cross sectional study, we aimed ████████████████████████████ ███████████████████████ by assuming a constant increase in rate of redactions, predict the time when all protocols would become fully redacted (RAPTURE). Our approach was to assess the prevalence and characteristics of data redactions in phase 3 clinical trial protocols published in high impact factor journals. An unredacted version of this paper can be found in the supplementary material.

Methods

Search strategy

This redacted cross sectional study followed the ██████████████████████ ████████████████████████████████████████████████████ ██████████████████.5 We searched PubMed for phase 3 randomised clinical trials published between 1 January 2010 and 1 January 2022. Our analysis included studies published in the New England Journal of Medicine (NEJM), The Lancet, and the Journal of the American Medical Association (JAMA). The search terms included randomized, randomised, phase 3, phase III, Lancet (London, England), The New England journal of medicine, and JAMA (see appendix). We excluded studies reporting pooled analysis, patient reported outcomes, or post hoc analysis. When several publications for the same trial were identified (eg, long term follow-up), we included the trial only once, favouring studies with a published protocol.

Data extraction

████████████████████ ██████████████████████████ ██ ███████████████████████ █████████████████████ ████████ ███████████████████████████████████████████ ██████ ████ ███ ██████████ █████████████ ███ ███████ ███ ████████ █████ ███████████████ ████████████ ███████████ ██████████ ██ ███ ███████████ █████ █████████████ █████ ████████ ████████████████ Funding of the clinical trial was classified into industry, academic, or mixed if at least one of the funding parties was non-academic. For each publication ██████████ █████████████████████████ ███████ ████████ ████████████ section and the manuscript’s appendix. Two observers (NB and AK) screened the studies and consulted an additional observer (TM) when there was uncertainty about eligibility or the data.

Redaction analysis

The primary outcome of this study was to ███████████████ ████████████ ██████████ ██████████████████ ███████ ██████████ ███████ ███████████████ ██████████ █████████████ ███████████████████ ███████████ ███████████████ ███████████████████ ██████ ███████████████ █████████████ ███████████ ████████████████ ██████████████████ ██████████████ ███████████████ ██████████████████████ █████████████ ███████████ ███████ ████████████████ ███████████ The average protocol size was calculated by multiplying the mean number of lines and pages of the sampled protocols. To assess the extent of concealment, for each section, one observer (NB) with the help of a second observer (AK) went through the protocols and estimated the total amount of redactions using lines (eg, 0.5, 1, 2 lines). Large sections of redacted text were estimated using pages (eg, 0.5, 1, 2 pages). Because the variation in the number of lines within the protocols was relatively low (mean 39.5, interquartile range 36.8-42.0), ███████ ██████████ ██████████ ██████ ████████ ███████████ ████████████████ ████████████ ███████████ ███████████ ██████████████ ███████████████ ███████ ██████████████████ █████████████████ ████████ █████████████ █████████████████ ███████████████ ██████ ██████████████████ ██████████ ███████████ █████████████████ █████████████████ ██████████

███████ ███████████ █████████████, we used curve fitting on redaction patterns over time with linear and nonlinear regression. We used various models, including a simple linear model, a second order polynomial model, and an exponential model using nlsLM() in the R package minpack.lm6 to fit the equation █████████, where y(t) represents mean redaction at time t; a and b are coefficients. Additionally, we applied a general additive model with the default basis dimension for the smooth function using the R module mgcv. We calculated the coefficient of determination to assess the goodness of fit of the models. The 95% confidence intervals for the regression models were obtained using the R package insight. For the projection, we assumed that the increase in redaction extent would persist at the current rate until the entire protocol becomes completely redacted. To predict the time of RAPTURE, we calculated the time at which the mean redaction model would pass 100%, while assuming a constant average protocol size. We established the time range by identifying where the lower and upper bounds of the 95% confidence interval of the fitted model intersect with 100% mean redaction protocol rate.

██████████████ ██████████████████ ███████████ ███████ █████████ ███████████████ ███████████████ ███████ ██████████████ ████████████████ █████ ███████████ ████████████ ███████ ███████ ████████████ ████████ ██████████ ██████████ ████████████ It was usually possible to infer what had been redacted from the heading, table of contents, or from nearby unredacted text. When it was impossible to tell what kind of information had been concealed, we categorised it as unknown.

Statistical analysis

██████████████████████████████████████████████████ ████████████████████7 ███████████████████████████8 ██ ███████████████████████████████████████████████████ █████████████████████████████████████████████████████ ██████████

Patient and public involvement statement

Although patients and the public were not directly involved in the study, we did speak to patients about the study, and we asked a member of the public to read our manuscript before submission.

Results

Between 1 January 2020 and ██████████████, we reviewed █████████ ██████████████████ that met the inclusion criteria (NEJM 157, Lancet 261, and JAMA 55; supplementary figure 1S). A total of 268 (56.7%) research protocols were available and accessible. In 2010, all but one study published in the NEJM had a protocol (99.4%), whereas protocols were included in 72.7% and 27.6% of the studies published in the JAMA and the Lancet, respectively (fig 1). Most studies were funded by industry (71.2%); the remainder had academic funding (18.0%) or mixed funding (10.8%). The most common topics were oncology (41.2%), followed by infectious diseases (17.3%), and neurology (5.9%). We estimated the average number of pages in protocols as 202.5 (interquartile range 130.5-240.2), with each page containing 39.5 lines (interquartile range 36.8-42.0).

Fig 1
Fig 1

General properties of publications and research protocols. Upper figure: distribution of publications in New England Journal of Medicine (NEJM), Lancet, and Journal of the American Medical Association (JAMA). The additional layer depicts the presence of published and accessible protocols within each journal. Middle figure: distribution of publications by topic. Lower figure: distribution of publications by funding type

Publication of protocols ████████ (fig 2) from 17.6% in 2010 to 86.4% in 2021. Up until 2012, none of the protocols were redacted, however from 2013 they started to contain concealed information with gradually increasing proportions (P<0.001; fig 3). By 2021, 60.8% of protocols had redacted data and the extent of redaction ███████████████ (fig 3, fig 4). The mean number of cumulative redacted lines increased from 0 in 2010 to 137.4 (3.5 pages) in 2021 for industry funded trials (P<0.001). Redactions featured predominantly in industry funded trials and in only a small proportion of those with joint academic and industry funding (fig 3). In 2021, ████████ funded trials (excluding those with joint academic and industry funding) had an average of 162.0 lines (4.1 pages) of redacted data per protocol compared with 7.2 lines (0.2 page) for those with joint academic and industry funding (P<0.001). In contrast, no studies with only academic funding were found to have redactions between 2010 and 2021. These findings were consistent for the proportion of redacted protocols and the extent of redactions per protocol.

Fig 2
Fig 2

Presence of protocols over time

Fig 3
Fig 3

Presence and extent of data redaction in research protocols over time in studies with published protocols and by funder of study

Fig 4
Fig 4

Curve fitting of redaction patterns over time between 2010 and 2021 (upper panel) to extrapolate and predict date (lower panel) at which 100% of protocol (horizontal dashed line) will be fully concealed if the increase in rate of redactions and average protocol size remain constant among industry funded studies. Time range (indicated by vertical dashed lines) was estimated by identifying where the lower and upper bounds of the 95% confidence interval of the fitted model (shown in orange) intersect with estimated line representing 100% redaction of protocol. 95% confidence interval for projected RAPTURE (removal of amendments and protocols of trials using redactions) in 2088 ranges from 2073 to 2136. Average protocol size was calculated by multiplying mean number of lines and pages of sampled protocols

Considering that the extent of redactions has █████████████████ over time, larger sections of the protocol are expected to be concealed in the future. We sought to predict the time when entire protocols of industry sponsored studies will be fully redacted (RAPTURE). Linear, polynomial, exponential, and general additive models were used to fit the data. Supplementary figure 2S shows the results of linear and nonlinear regression analysis of mean redacted lines. The second order polynomial and general additive models yielded a superior fit with an R2 value of 0.92. Because both models performed equally well, we selected the simpler polynomial model to predict RAPTURE. If the increase in redaction rates remains constant, we predict RAPTURE to occur between 2073 and 2136 for industry funded studies (fig 4). To test the accuracy of our model, we sampled data from 1 January to 1 June for the years 2022 and 2023. This search identified 37 and 45 studies with protocols for each respective year. Despite the model's prediction of an expected mean number of redacted lines greater than 137.4, the values for 2022 and 2023 were 35.2 and 112.8, respectively.

Supplementary figure 3S shows the patterns of data redaction in the protocols of studies by topic. High proportions of protocol redactions were found for dermatology, paediatrics, pulmonology, and haematology. Excessive redacted text was evident for rheumatology studies, followed by pulmonology, paediatrics, nephrology, and oncology studies. Similar patterns were observed for studies published in the past three years, but the extent of the redactions has risen substantially (supplementary figure 3S).

We calculated the extent of redactions according to a set of defined categories (fig 5, fig 6, fig 7; see also supplementary figure 4S). Most redactions occurred in the ████████ ███████ (46.4% of protocols), concealing a total of 6685 lines (169 pages). Most redactions in the statistical analysis occurred in the statistical analysis plan (95.2%). Excessive redactions were also found in the protocol’s appendix (4445 lines, 113 pages), background (1638 lines, 41.4 pages), and study design (1128 lines, 28.5 pages). Redactions also occurred in sections classified as unknown (400 lines, 10.1 pages)—entire sections were so heavily redacted that their original context or category could not be identified. Redaction in the statistical analysis plan was further categorised into subcategories (fig 5). Redactions affecting the efficacy analysis were more common in rheumatology and oncology; other subcategories were more specific to oncology studies, such as exploratory analyses, power calculations, interim analyses, and subgroup analyses. To outline the variations in redactions across various sections of the protocol and between different protocols, we examined their distribution and the strength of their association (fig 6, fig 7). The analysis shows █████ ███████████████████, with some implementing minimal redactions limited to specific sections, while others are linked to extensive redactions encompassing several sections, notably within the statistical analysis. In most studies where parts of the appendix, background, or study design were redacted, sections of the statistical analysis were also redacted.

Fig 5
Fig 5

Presence and extent of data redaction in research protocols by selected protocol variables and in statistical analysis section of protocols classified into subcategories. Upper panels show distribution of topics for each variable. Visits denote visit schedule and assessments; PK/PD=pharmacokinetics and pharmacodynamics

Fig 6
Fig 6

Distribution of redactions among protocol variables. Each row represents a study protocol. Visits denote visit schedule and assessments; PK/PD=pharmacokinetics and pharmacodynamics

Fig 7
Fig 7

Circos plot depicting associations between different sections of protocol. Broader chords indicate stronger average association among variables within protocols. Visits denote visit schedule and assessments; PK/PD=pharmacokinetics and pharmacodynamics

We ranked studies and companies according to the amount of redactions (fig 8, fig 9, fig 10). The studies that concealed the most lines include █████████████████████ ██████████████████████████,9 SOLAR-1 (alpelisib for breast cancer),10 and VOYAGE (dupilumab in children with asthma11; fig 8). Studies with the highest number of redacted lines were ██████████████████████████ and █████████ (fig 8).

Fig 8
Fig 8

Extent of data redaction in research protocols for top 30 studies with redacted information and by company

Fig 9
Fig 9

Presence of redacted protocol by company with at least six trials

Fig 10
Fig 10

Distribution of studies in dataset by company. Companies with five or fewer publications are classified as other

Discussion

In this ██████████████ study of data redaction in phase 3 clinical trials published in ██████████ journals, we found █████████████████████████ from 0 in 2010 to 60.8% of protocols in 2021. Additionally, we found that the amount of redacted text within a protocol increased. Our mathematical model predicted that RAPTURE (the date when all protocols are fully concealed) is expected to occur between 2073 and 2136 for industry funded trials, a worrying possibility with serious implications for the scientific community.

We acknowledge that there is levity involved in our prediction of RAPTURE, and that the intention of our paper is to highlight the rise in redactions rather than believe that all protocols will be 100% redacted in the predicted timeframe. As such, our mathematical model is meant to be provocative and not ███████████████████. Furthermore, we only provide a timeframe for when RAPTURE might occur rather than a fixed date. Throughout history, humanity has been wrong about predicting with precision when RAPTURE will occur, mostly believing that the end is imminent, only to be proven wrong repeatedly (remember Y2K?). To avoid a similar predicament and to stay true to our scientific methods, we only provide a range for when RAPTURE might occur, rather than a precise date.

However, the rise in redactions we observed is worrying in itself even if RAPTURE does not occur. In a strange twist of events, there is a glimmer of hope—2022 and 2023 saw a slight drop in redactions. We are not sure if this is the result of a secret society of antiredaction advocates who have successfully led their campaign. Perhaps the ██████████████████ industry has suddenly paid heed to antiredaction efforts and are now willing to publicly share some of their trade secrets—that is, obscure statistical calculations. In all seriousness, we are cautious in inferring too much from this drop in redactions, and we do not know the impact the covid-19 pandemic might have had on redaction practices. We believe that advocacy against these redaction practices remains essential, even if RAPTURE appears to be less of a mathematical certainty based on recent trends.

Our work is a recent characterisation of redactions in the scientific literature. This characterisation shows █████████████████████████████████████ █████████████████████████████████████████████████████ ████████████████████████████████ We also found that protocol redaction appears to be unique to ███████ funded trials, and that they mostly occur in the statistical analysis section of the protocol, particularly the statistical analysis plan. We struggled to see the logic in redacting statistical analysis sections to conceal trade secrets. Surely no commercial advantage can be gained from hiding statistical power calculations.

Previous work has shown a lack of clear objective reasoning explaining why protocol redaction occurs.4 While redaction of confidential commercial information is understandable, in our study most redactions were found in the ███████████ and ████████. In contrast, only a small proportion of redactions occurred in the biomarkers section, which might include commercial and innovative information. Unless the research and statistical design are fully transparent and adequately prespecified, redactions of the methods might allow enough margin to manipulate the results.1213141516 We cannot think of other reasons why these redactions occurred. It seems ironic when companies justify redactions by stating that disclosures could greatly affect their commercial interests; they are being truthful, but not in the way they intended.

What would be the scientific implications if RAPTURE were to occur? Science as we know it would ████████████, as would reproducibility of scientific experiments. Regulatory authorities would struggle with drug approvals, physicians would not know the specifics when obtaining patient consent, and patients would be at risk of harm.

How can we avert this RAPTURE? The solutions are ██████. Editors can refuse to publish research unless a fully unredacted protocol is available for review. Governments can mandate public access to a fully unredacted protocol available for review. Although previous guidelines by regulatory authorities have addressed redaction,17 further oversight is needed.

Strengths and limitations of study

Our work has several ████████. The degree of redaction across lines and pages was visually approximated using the dimensions of the obscured sections in relation to the text. However, small discrepancies might exist between estimated and actual concealment. The extent of redactions within a protocol does not necessarily correlate with its significance; concealing a small section or even a single phrase within the study's statistical design could hinder the ability to accurately appraise and reproduce a clinical trial. Apart from contact information, which was excluded, we regarded all sections of the protocol as potentially important because the choice of concealment was not arbitrary; however, different sections carry varying degrees of significance. Although our search strategy was designed to capture randomised controlled trials published in top journals during the given timeframe, this search strategy inherently selects industry published studies because most large randomised controlled trials are funded by industry.18 Therefore, a selection bias exists in our study because we used this sample for convenience, and our findings might not reflect redaction throughout all clinical trials.

Statistical power for findings from earlier years is limited, and it is possible that the apparent increase in redactions might not be solely caused by more editing, but rather a result of more protocols being edited out over time. While there are fewer protocols from the early years, the total number published is still substantial, with 45 protocols until 2014. The lack of redactions until that time further indicates that they are a recent phenomenon. However, there is variability in the type and extent of redaction and many protocols remain redaction free. Notably, academic sponsored trials consistently show no redactions.

The model used to predict the time of RAPTURE assumes that the increase in redactions will persist at the same rate as it did until 2021. If the rate of concealment stabilises or decreases in the future, as suggested by data from 2022 and 2023, the projected date will consequently be adjusted and pushed back. Other limitations include █████████████ ██████████████████████████████████████. The dataset does not encompass all relevant published studies, particularly those that do not explicitly mention their design in the title or abstract with terms like randomised or phase 3. We could not determine why protocol redaction occurred and can only speculate on the reasoning. We did not request protocols that were missing. Previously authors have faced lawsuits4 when requesting access to full protocols during their analysis of redaction, and industry lawsuits might have served as a successful deterrent.

Conclusions

In summary, this cross sectional study has found ███████████ protocol redactions. If redactions increase at the same rate as ███████████████████, we estimate protocols will be fully redacted between 2073 and 2136 (RAPTURE), although a limited decrease in the amount of redactions was observed in 2022 and 2023. A multipronged strategy against protocol redactions is necessary to maintain the integrity of science.

What is already known on this topic

  • Sponsors claim to redact protocols to protect sensitive commercial and intellectual property

  • Increasing redactions in clinical trial protocols hinder accurate appraisal and replication of trials

  • If redaction rates continue to rise, there might come a time when protocols are fully redacted (RAPTURE—removal of amendments and protocols of trials using redactions)

What this study adds

  • Protocol redactions appear to be unique to industry funded trials, are rising consistently, and mainly occur in the statistical analysis section

  • At the current rate of redaction, RAPTURE is estimated to occur between 2073 and 2136

  • A multipronged strategy against protocol redactions is necessary to maintain the integrity of science

Ethics statements

Ethical approval

██████████████████████████████████████████ ███████████████████████████████████

Data availability statement

Data and fully redacted version of the manuscript are available from the corresponding author (tomermrsn@gmail.com) on reasonable request.

Acknowledgments

Nir Balaban and Ghulam Rehman Mohyuddin contributed to this study equally.

Footnotes

  • Contributors: NB and GRM contributed equally and share first authorship. TM conceived the idea of the study and was primarily responsible for its design and analysis of the data. TM and DB contributed to the study conceptualisation. NB and AK contributed to data collection and curation. TM, GRM, and NB cowrote the first draft of the manuscript. TM, GM, and DAG supervised the project. All authors contributed to the interpretation of the results and reviewed and approved the final draft of the manuscript. TM and NB are the guarantors. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

  • Funding: None declared.

  • Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: no support from any organisation for the submitted work; TM reports receiving personal fees from Purple Biotech, outside the submitted work. GRM reports receiving royalties from MashupMD for writing, outside the submitted work. GM reports receiving personal fees from MSD and Roche; grants and personal fees from BMS and Novartis; personal fees and stock options from 4C Biomed; and stock options from Nucleai, Biond Biologics, and Ella Therapeutics, outside the submitted work.

  • The lead authors (the manuscript’s guarantors) affirm that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

  • Dissemination to participants and related patient and public communities: The findings of this study will be shared publicly through lay press coverage, social media, and presentations at virtual and in-person conferences.

  • Provenance and peer review: Not commissioned; externally peer reviewed.

http://creativecommons.org/licenses/by-nc/4.0/

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

References