TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods

The TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) statement was published

The TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) statement was published in 2015 to provide the minimum reporting recommendations for studies developing or evaluating the performance of a prediction model.Methodological advances in the field of prediction have since included the widespread use of artificial intelligence (AI) powered by machine learning methods to develop prediction models.An update to the TRIPOD statement is thus needed.TRIPOD+AI provides harmonised guidance for reporting prediction model studies, irrespective of whether regression modelling or machine learning methods have been used.The new checklist supersedes the TRIPOD 2015 checklist, which should no longer be used.This article describes the development of TRIPOD+AI and presents the expanded 27 item checklist with more detailed explanation of each reporting recommendation, and the TRIPOD+AI for Abstracts checklist.TRIPOD+AI aims to promote the complete, accurate, and transparent reporting of studies that develop a prediction model or evaluate its performance.Complete reporting will facilitate study appraisal, model evaluation, and model implementation.
Prediction models are used across different healthcare settings.They are used to estimate an outcome value or risk.Most models estimate the probability of the presence of a particular health condition (diagnostic) or whether a particular outcome will occur in the future (prognostic). 1Their primary use is to support clinical decision making, such as whether to refer patients for further testing, monitor disease deterioration or treatment effects, or initiate treatment or lifestyle changes.Examples of well known prediction models include EuroSCORE II (cardiac surgery), 2 the Gail model (breast cancer), 3 the Framingham risk score (cardiovascular disease), 4 IMPACT (traumatic brain injury), 5 and FRAX (osteoporotic and hip fractures). 6rediction models are abundant in the biomedical literature, with thousands of models published annually (and increasing), and have been developed for many outcomes and health conditions. 7 8At least 731 diagnostic and prognostic prediction model studies on covid-19 were published during the first 12 months of the pandemic. 9Despite this interest in developing prediction models, there have been longstanding

There has been considerable interest and financial investment in developing prediction models by applying artificial intelligence (AI) methods, typically powered by advances in machine learning
To ensure that a prediction model study is valuable to users, authors should prepare a transparent, complete, and accurate account of why the research was done, what they did, and what they found An update of the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) statement aims to harmonise the landscape of prediction model studies using AI methods and to provide guidance regardless of whether regression models or machine learning methods have been used The TRIPOD+AI statement consists of a 27 item checklist, an expanded checklist concerns about transparency and completeness of reporting in the field, 10 11 and the resulting usability.For readers (including peer reviewers, editors, health professionals, regulators, patients, and the general public), incomplete or inaccurate reporting impairs the ability to critically appraise the study design and methods, have confidence in the findings, and further evaluate or implement a prediction model.Poor reporting of a model might also mask flaws in the design, data collection, or conduct of a study that, if the model was implemented in the clinical pathway, could cause harm.Harm can be perceived to occur when insufficient measures are in place to mitigate bias.Better reporting can create more trust and influence patient and public acceptability of the use of prediction models in healthcare.Authors have an ethical and scientific obligation to honestly report their research in a complete and transparent manner.As noted by the late Doug Altman and colleagues, "Good reporting is not an optional extra; it is an essential component of research" 12 -anything less is little more than avoidable research waste. 13n response to concerns about incomplete reporting, 10 11 14 15 the TRIPOD (Transparent Reporting of a multivariable model for Individual Prognosis Or Diagnosis) statement was published in 2015 (TRIPOD 2015) to provide minimum reporting recommendations.16 17 TRIPOD 2015 comprises a checklist of 37 items, which includes 25 items to report in both development and validation studies, and an additional six items for model development studies and six items for validation studies.Accompanying the checklist is an explanation and elaboration document that provides the rationale behind each reporting item; published examples of good reporting; and a discussion of issues relating to the design, conduct, and analysis of prediction model studies. 17 TRIPOD 215 mainly focused on models developed using regression modelling, which was the prevailing approach at the time.Additional guidance has since been created for reporting abstracts of prediction model studies (TRIPOD for Abstracts 18 ), studies developing or validating prediction models using clustered data (TRIPOD-Cluster 19 20 ), systematic reviews and meta-analyses of prediction model studies (TRIPOD-SRMA 21 ), and guidance in preparation for study protocols (TRIPOD-P 22 ).All available guidance, as well as template checklists for filling out separately, can also be found on the TRIPOD website (https:// www.tripod-statement.org/).
Since the publication of TRIPOD 2015, there have been numerous methodological advances in prediction modelling, including sample size guidance for developing models [23][24][25][26][27] and evaluating their performance, [28][29][30][31][32] and greater recognition of operationalising fairness, 33 reproducibility, 34 and adopting open science principles. 35However, interest and financial investment in applying methods ascribed to artificial intelligence (AI), typically powered by advances in machine learning methods (eg, random forests, deep learning), is where we have seen the most progress and change.With increasing access to data and availability of off-the-shelf software to apply machine learning methods, developing a prediction model has become faster and easier.Vast numbers of prediction models are now entering the scientific literature for many clinical settings, and for a wide range of outcomes and health conditions, with multiple models often available for the same outcome, health condition, and target population. 7 8 36The ability to critically evaluate the quality of prediction models and understand their ability to serve well in a particular setting or for a particular use case is therefore of even greater critical importance.This ability is predicated on complete and transparent reporting.
However, systematic reviews evaluating studies of prediction models have shown that they are often poorly conducted (including deficiencies in study design or data collection 37 38 ); use poor methodology 37 38 ; are incompletely reported with key details missing [39][40][41][42][43][44][45][46][47][48][49][50][51][52][53][54] ; are consequently at high risk of bias 41 49 55-57 ; rarely adhere to open science practices, 58 and are susceptible to overinterpretation or so-called spin. 59 60These deficiencies cast considerable doubt on models' usefulness and safety, and raises concerns about their potential to create or widen healthcare disparities. 61While TRIPOD 2015 is largely agnostic to the type of modelling approach, and much of its reporting recommendations apply equally to non-regression approaches, additional reporting considerations are needed for the growing class of machine learning methods.For example, unlike regression based models, the flexibility and complexity underpinning other machine learning approaches typically means that the resulting prediction models do not result in a simple equation and sometimes even the predictors used remain unclear.Additional reporting considerations are therefore needed that are not currently covered in TRIPOD 2015.Alongside methodological advancements, considerations of fairness, 62 wider acceptance of open science practices, 63 and public and patient involvement in research and implementation of research, 64 65 an update to the TRIPOD 2015 statement is needed to capture these developments and the consequences for reporting.
The aim of this paper is to describe the development of the updated TRIPOD guidance, present the new TRIPOD+AI checklist, and discuss how to use it.TRIPOD+AI aims to harmonise the landscape of prediction model studies and provide guidance regardless of whether regression models or machine learning methods have been used. 66The "+" in TRIPOD+AI indicates that it provides consolidated reporting recommendations for studies of prediction models developed using regression modelling or machine learning (ie, deep learning, random forests) approaches.We also use the additional term "AI" to be consistent with existing reporting guidelines for studies broadly labelled as involving AI.However, for readability, this article will refer to the methods underpinning them as machine learning (table 1).A glossary of terms (box 1) clarifies key concepts used within the TRIPOD+AI reporting guideline.

Development of TRIPOD+AI
We describe the development of the TRIPOD+AI statement, a guideline to aid the reporting of studies developing prediction models for diagnosis or prognosis using machine learning or regression methods or evaluating (validating) their performance.There is no such thing as a validated prediction model. 76To avoid ambiguity and harmonise terminology, we refer to validation as evaluation 74 in this article (box 1).Existing reporting guidelines and those in development for reporting other types of biomedical studies involving a machine learning component are detailed in table 1. Literature reviews and consensus exercises were used to develop the TRIPOD+AI checklist as recommended by the EQUATOR Network. 77A steering group was convened by GSC and KGMM to oversee the guideline development process, with members selected to cover a broad range of expertise and experience (comprising GSC, KGMM, RDR, ALB, JBR, BVC, XL, and PD).
In April 2019, a commentary was published announcing the TRIPOD+AI initiative. 78The guideline was registered as a reporting guideline under development with the EQUATOR Network on 7 May 2019 (https://www.equator-network.org/).A study protocol was made available on 25 March 2021 on the Open Science Framework (https://osf.io/zyacb/),describing the process and methods used to develop the TRIPOD+AI reporting guideline.The protocol, which also describes the development of a quality assessment and risk-of-bias tool for prediction models developed using machine learning methods (PROBAST+AI), was published in 2021. 79The reporting of the consensus based methods used in the development of TRIPOD+AI followed the ACCORD (Accurate Consensus Reporting Document) recommendations. 80

Ethics
This study was approved by the Central University research ethics committee, University of Oxford on 10 December 2020 (R73034/RE001).Participant information was provided to the Delphi survey participants electronically before starting the survey and to the consensus participants before the consensus meeting.Delphi survey participants provided electronic informed consent before completing the survey.

Candidate item list generation
An initial list of items was drafted by GSC and KGMM using TRIPOD 2015. 16 17Additional items were identified from TRIPOD-Cluster, 19 20 TRIPOD for Abstracts, 18 CAIR, 81 MI-CLAIM, 82 CLAIM, 68 MINIMAR, 83 SPIRIT-AI, 71 and CONSORT-AI, 72 along with additional literature identified by the steering group. 34 84-89The list of items was also informed by the findings of systematic reviews evaluating the reporting, methods, and overinterpretation of prediction model studies using machine learning. 37-39 48 51 54 59 60The steering group harmonised the initial list of items to form a final list of 65 unique candidate items covering the title (one item); abstract (one item); introduction (three items); methods (37 items); results (15 items); discussion (five items), and other (three items).This list was used in a modified Delphi exercise as described below.

Recruitment of Delphi panellists
Delphi participants were identified by the steering committee, from authors of relevant publications via a call to participate on social media (eg, Twitter), and through personal recommendations.Including experts recommended by other Delphi participants.The steering group identified participants to achieve geographical and disciplinary diversity and include key stakeholder groups, for example, researchers (statisticians/data scientists, epidemiologists, machine learning researchers/scientists, clinicians, radiologists, and ethicists), healthcare professionals, journal editors, funders, policymakers, healthcare regulators, patients, and the general public as end users of prediction models from a range of settings (eg, universities, hospitals, primary care, biomedical journals, non-profit organisations, and for-profit organisations).
No minimum sample size was placed on the number of Delphi participants.A steering group member checked the expertise or experience of each identified person.Individuals were then invited to participate via email and were sent an information pack with the study description, aims, and contact details.Once participants accepted, they were added to the Delphi panel and received the link to the survey.Delphi panellists did not receive any financial incentive or gift to participate.

Delphi process
The Delphi surveys were designed and delivered electronically using the Welphi online platform (www.welphi.com) to be responded to individually, online, and in English.The platform ensures responses are anonymous by sending a different link to each participant and applying codes to respondents.The panellists received a package of information clarifying the study's Box 1: Glossary of terms used in TRIPOD+AI The definitions and descriptions given below relate to the specific context of the TRIPOD+AI* guideline; they do not necessarily apply to other areas of research.

Artificial intelligence
Field of computer science that focuses on developing models and algorithms capable of performing tasks that typically require human intelligence.

Calibration
Agreement between observed outcomes and estimated values from the model.Calibration is best assessed graphically with a plot of the estimated values on the x axis and observed values on the y axis, with a smoothed flexible calibration curve in the individual data.

Care pathway
Structured and coordinated plan of care for managing a specific health condition or dealing with a patient's healthcare needs throughout their healthcare journey.

Class imbalance
When the frequency of individuals with and without the outcome event is unequal.

Discrimination
How well the predictions from the model differentiate between individuals with and without the outcome.Discrimination is typically quantified by the c statistic (sometimes referred to as the area under the curve (AUC) or area under the receiver operating characteristics (AUROC)) for binary outcomes, and the c index for time-to-event outcomes.

Evaluation or test data
Data used to estimate the performance of a prediction model, sometimes referred to as test data or validation data.† Evaluation data should be distinct from the data used to train the model, tune hyperparameters, or do model selection, such that there is no overlap in participants between the training and evaluation data.Evaluation data should be representative of the population in whom the model is to be used.

Fairness
Property of prediction models that do not discriminate against individuals or groups of individuals based on attributes such as age, race/ethnicity, sex/gender, or socioeconomic status.

Hyperparameters
Values that control the model development or learning process.

Hyperparameter tuning
Finding the best (hyper)parameter settings for a particular model building strategy.

Internal validation
Evaluating the performance of a prediction model on the same population on which the model was developed (eg, train test split, cross validation, or bootstrapping).

Machine learning
A subfield of artificial intelligence that focuses on developing models capable of learning and making predictions or decisions from data, without being explicitly programmed.

Model evaluation
Evaluating predictive accuracy of a model by estimating model discrimination (eg, c statistic), model calibration (eg, calibration plot, calibration slope), and clinical utility (eg, decision curve analysis).This process is referred to as evaluating a prediction model. 74 75tcome Diagnostic or prognostic event that is being predicted.In machine learning, this event is often referred to as the target value, response variable, or label.

Predictor
Characteristic that can be measured or attributed at an individual level (eg, age, systolic blood pressure, sex, disease stage, radiomics features) or group level (eg, country).It is also often referred to as an input, feature, independent variable, or covariate.

Training or development data
Data used to train or develop a prediction model.The training data are ideally representative of the population in whom the model is to be used.
*TRIPOD=Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis; AI=artificial intelligence.†Validation data often has different meanings.For example, in machine learning studies, validation data can refer to data used for parameter tuning or data used to evaluate model performance (often referred to as external validation).To avoid any ambiguity, we refer to data used to evaluate model performance as evaluation data.
objectives and scope and explaining how to participate, use the platform, and contact the development team with any questions.Participants were asked to rate each item as "can be omitted," "possibly include," "desirable for inclusion," or "essential for inclusion."Participants were also invited to comment on any item, and to suggest new items.Free text responses were collated and analysed by PL.The themes generated were used by GSC and KGMM to inform item rephrasing, merging, or suggesting new items.All members of the steering group were invited to participate in the Delphi surveys.

Checklist item evolution from round 1 to round 2
In round 1 of the modified Delphi, participants rated 65 initial candidate items generated from literature reviews and other reporting checklists, as described above.Agreement was considered when the individuals agreed an item was desirable or essential for inclusion.As defined in the protocol, 79 items with agreement of 70% or higher were carried over to round 2. Items that had an agreement rate lower than 70% were excluded, merged, or rephrased to be presented to panellists for revaluation.These modifications were based on or inspired by the hundreds of comments added by panellists.
In round 2, survey participants were given a link to the aggregated ratings from round 1 (https://osf.io/zyacb/;supplementary table 3).In round 2, participants rated 59 candidate items, covering the title (one item), abstract (one item), introduction (four items), methods (32 items), results (11 items), discussion (eight items), and other (two items).The item relating to patient and public involvement received 69% agreement for inclusion (supplementary table 4).Despite falling just below the 70% threshold, the steering group agreed to retain this item for discussion during the consensus meeting.

Patient and public involvement and engagement meeting
An online meeting was held on 8 April 2022 with nine members of the Health Data Research UK's group for patient and public involvement and engagement (PPIE) (https://www.hdruk.ac.uk/about-us/involvingand-engaging-patients-and-the-public/), chaired by Sophie Staniszewska (University of Warwick, UK).This meeting was not planned in the study protocol and was the only deviation from the published protocol. 79efore the meeting, the PPIE group was sent a summary of the TRIPOD+AI project (available at https://osf.io/zyacb/), including an executive summary drafted by one member of PPIE group, and the draft checklist.At the meeting, GSC presented details of the TRIPOD+AI initiative, the project status, and draft guidance resulting from round 2 of the Delphi survey.Participants then asked questions and discussed the project aims and scope.Following feedback received at the PPI meeting, and through correspondence written after the meeting, the draft checklist was revised to improve clarity.Three members of the PPI group were invited and two subsequently attended the online consensus meeting with the wider group of stakeholders on 5 July 2022.The manuscript was circulated to the three PPI members for their input and approval.

Consensus meeting
An online consensus meeting was held on 5 July 2022, chaired by GSC and KGMM.Participants were chosen to try to ensure balanced representation of the key stakeholder groups, disciplines, and geographical diversity.Twenty eight participants attended part or all of the meeting, including one non-voting attendee (PL).
Before the meeting, invited participants were emailed a document (available at https://osf.io/zyacb/)containing a brief overview of TRIPOD+AI, the consensus meeting format and instructions, a summary of the aggregated responses from round 2 of the Delphi survey (supplementary table 3), and the draft TRIPOD+AI checklist.The checklist circulated to the consensus meeting participants included 59 items covering: title (one item), abstract (one item), introduction (four items), methods (32 items), results (11 items), discussion (eight items), and other (two items).
Given the high endorsement achieved for many items in round 2, a subset of 17 items were highlighted for plenary discussion and voting during the consensus meeting.After discussion, participants were given one minute to vote to include or exclude the item from the TRIPOD+AI checklist.The voting was registered using the poll function of the online meeting program.The 17 items included one item that had not achieved consensus in round 2 and 16 items that had undergone rewording after round 2 or were new items that were not included in TRIPOD 2015.After discussion and voting on these 17 items, the final TRIPOD+AI checklist was formed.

TRIPOD+AI statement
TRIPOD+AI comprises a checklist of items that are considered essential for good reporting of studies developing or evaluating (validating) a prediction model using any statistical or machine learning methods (table 2).Box 2 summarises noteworthy additions and changes to TRIPOD 2015.The TRIPOD+AI checklist comprises 27 main items about the title (item 1), abstract (item 2), introduction (items 3 and 4), methods (items 5-17), open science practises (item 18), patient and public involvement (item 19), results (items 20-24), and discussion (items 25-27).Some items included multiple subitems, totalling to 52 checklist subitems.TRIPOD+AI covers studies that describe the development of a prediction model, the evaluation (validation) of prediction model performance, or both.Any items denoted D;E apply to all studies regardless of whether they are developing a prediction model or evaluating the performance of a prediction model (table 2).Items in the checklist denoted D apply to studies that describe the development of a prediction model, while items denoted E apply to studies that evaluate the performance of a prediction model.For studies both developing and evaluating the performance of a prediction model, all checklist items apply.
A separate checklist for journal or conference abstracts of prediction model studies is included in ‡TRIPOD-Cluster is a checklist of reporting recommendations for studies developing or validating models that explicitly account for clustering or explore heterogeneity in model performance (eg, at different hospitals or centres). 19 20elates to the analysis code, for example, any data cleaning, feature engineering, model building, and evaluation.TRIPOD+AI.This checklist updates the TRIPOD for Abstracts statement, 18 reflecting new content and maintaining consistency with TRIPOD+AI (table 3).The recommendations in TRIPOD+AI are for transparently reporting how prediction model research was conducted; it does not prescribe how to develop or evaluate a prediction model.The checklist is not a quality appraisal tool.Readers are referred to PROBAST 90 91 and the forthcoming PROBAST+AI 79 to assess the quality and risk of bias of prediction models (https://www.probast.org/).

How to use TRIPOD+AI
The TRIPOD+AI checklist supersedes the TRIPOD 2015 checklist, which should no longer be used.For prediction model studies that have accounted for clustering (eg, multiple hospitals, multiple datasets), authors should consult TRIPOD-Cluster for additional reporting recommendations. 19 20The 2015 explanation and elaboration document remains an important document to provide background and examples for most of the TRIPOD+AI reporting items 17 (because many items have not changed or have been minimally changed), while we produce a detailed and updated document for TRIPOD+AI.We recommend using TRIPOD+AI early in the writing process to ensure that all key details are addressed and reported.An expanded checklist in a bullet point structure has been developed (supplementary table 1) to facilitate implementation of TRIPOD+AI by providing a brief rationale and guidance for each item in the checklist.
Although many of the items in the TRIPOD+AI checklist have a natural order and sequence in a report, some do not.We do not stipulate a structured format or dictate where each individual reporting recommendation should appear in a prediction model report or publication, because this order might also depend on journal formatting policies.
The recommendations contained within TRIPOD+AI are the minimum reporting recommendations, and authors may provide additional information.If journal word limits and restrictions on number of tables and figures in the main body of the manuscript complicate reporting, authors can report and reference some of the requested or additional information in supplementary material.If the information required is already reported in a publicly accessible study protocol, then referring to that document may suffice.If a particular checklist item cannot be discussed in the report because the information is unknown or irrelevant, then this should be acknowledged and clearly stated.Additional files and study materials not included in the supplementary material should be deposited in Table 3 | Essential items to include for the reporting of prediction model studies in a journal or conference abstract (TRIPOD+AI for Abstracts*) Section and item Checklist item Title 1 Identify the study as developing or evaluating the performance of a multivariable prediction model, the target population, and the outcome to be predicted Background 2 Provide a brief explanation of the healthcare context and rationale for developing or evaluating the performance of all models Objectives 3 Specify the study objectives, including whether the study describes model development, evaluation, or both Methods 4 Describe the sources of data 5 Describe the eligibility criteria and setting where the data were collected 6 Specify the outcome to be predicted by the model, including time horizon of predictions in case of prognostic models 7 Specify the type of model, a summary of the model-building steps, and the method for internal validation † 8 Specify the measures used to assess model performance (eg, discrimination, calibration, clinical utility) Results

9
Report the number of participants and outcome events 10 Summarise the predictors in the final model † 11 Report model performance estimates (with confidence intervals) Discussion 12 Give an overall interpretation of the main results

Registration 13
Give the registration number and name of the registry or repository TRIPOD=Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis; AI=artificial intelligence.*This checklist is based on the TRIPOD for Abstracts statement published in 2020, 17 but has been revised and updated for consistency with the TRIPOD+AI statement.†Relevant only to studies describing the development of a prediction model.
Box 2: Noteworthy changes and additions to TRIPOD 2015 • New checklist of reporting recommendations to cover prediction model studies using any regression or machine learning method (eg, random forests, deep learning), and harmonise nomenclature between regression and machine learning communities • New TRIPOD+AI checklist supersedes the TRIPOD 2015 checklist, which should no longer be used • Particular emphasis on fairness (box 1) to raise awareness and ensure that reports mention whether specific methods were used to deal with fairness.Aspects of fairness are embedded throughout the checklist We recommend that authors submit a completed checklist indicating the page or line where each requested item can be found, to help the editorial and peer review process.A template for the TRIPOD+AI checklist for filling out separately can be found in supplementary table 2 and is available to download from www.tripod-statement.org.
News, announcements, and information relating to TRIPOD+AI can be found on the TRIPOD website (www.tripod-statement.org) and on social media accounts such as X (formerly known as Twitter; @ TRIPODStatement).The Enhancing the Quality and Transparency Of health Research (EQUATOR) Network (https://www.equator-network.org/)will also disseminate and promote the TRIPOD+AI statement.Translation of TRIPOD+AI into different languages is welcomed and encouraged, please contact the corresponding author.Translations should use the structured and predefined process that includes authors of the original publication and receives their approval.The TRIPOD website contains further details on translation (www.tripod-statement.org).

Discussion
TRIPOD+AI has been developed through an international multi-stakeholder consensus process.It provides minimum reporting recommendations for studies describing the development or evaluation (validation) of prediction models using any regression or machine learning methods.At the time of guideline development, foundation and large language models (such as ChatGPT) that are rapidly gaining momentum were not considered-the TRIPOD+AI guidance is primarily aimed at non-generative models.However, many of the principles are applicable for driving transparency in generative AI studies in health.Periodic updating of TRIPOD+AI will be needed to remain relevant and reflect advancements in AI and machine learning methods, for example, by explicitly looking at generative approaches.
TRIPOD+AI was developed by updating TRIPOD 2015, with recommendations informed by systematic reviews of the literature, a Delphi survey, and an online consensus meeting.Reporting TRIPOD+AI items can help users to understand and appraise the quality of the study methods, increasing transparency around the study findings, reducing overinterpretation of study findings, facilitating replication and reproducibility, and aiding implementation of the prediction model.The checklist items are minimum reporting recommendations, and authors will typically provide additional details on the data, study design, methods, analysis, results, and discussion.
TRIPOD+AI emphasises fairness issues throughout the checklist, which was lacking or not explicitly stated in TRIPOD 2015. 33Fairness in prediction model research is particularly important in healthcare, which has gained prominence with AI and machine learning methods being used to develop models to assist in decision making.Fairness in this context means that prediction models are designed and used in a way that does not adversely discriminate against any particular group of individuals and does not create or exacerbate (and ideally mitigates or reduces) existing inequalities in healthcare provision or patient outcomes. 92One important aspect of fairness is ensuring that the data used to develop or evaluate prediction models are representative and diverse, and that limitations of data bias are acknowledged, dealt with, and mitigated during model development.The STANDING Together initiative is in the process of developing standards for data diversity, inclusivity, and generalisability to tackle bias in AI health datasets. 62ata should ideally include information from individuals of different ages, sexes/genders, and races/ethnicities, with different health conditions or comorbidities and from different geographical locations.These differences should be representative of the population in whom the prediction model is intended to be used.If the data used to develop the models do not adequately represent the full diversity of the intended use population, the resulting model might not perform as expected in those missing from the data, which should be clearly stated.If the data used to evaluate a model are not representative of the target population, then the estimated predictive accuracy in subgroups (eg, defined by relevant personal, social, or clinical attributes) could be biased and misleading.
Fairness in healthcare also means involving diverse stakeholders, including patients, the general public, and clinicians, in the development, evaluation, implementation, and deployment of a prediction model into the clinical pathway. 94Involving a variety of perspectives will help to ensure that the prediction model is, in principle, designed to meet the needs of all individuals and is used in a way that is fair and equitable, promoting health equity.TRIPOD+AI includes item 19 on public and patient involvement to incentivise the integration of patient involvement in prediction model studies beyond a mere tick box exercise, to encourage and promote the principles of open science and engagement, and to ensure better clinical and public acceptability of the work.
TRIPOD+AI prominently features open science practices. 356 potentially leading to more accurate predictions and better informed decisions for patient healthcare.Therefore, TRIPOD+AI includes a section on open science, covering issues such as funding declarations (item 18a), conflicts of interest (18b), protocol availability (18c), study registration (18d), data sharing (18e), and code sharing (18f).
We anticipate that the key users and beneficiaries of TRIPOD+AI will be researchers writing papers, journal editors and peer reviewers who evaluate research papers, and other stakeholders (eg, academic institutions, policy makers, funders, regulators, patients, study participants, and the broader public) who will benefit from the increased quality of prediction model research (table 4).The guideline is relevant for any reports related to clinical prediction model development and validation studies, including medical research articles and other areas where evidenced reports are needed, for example, to accompany software and tools.We encourage editors and publishers to support adherence to TRIPOD+AI by referring to it in journals' instructions to authors, enforcing its use during the submission and peer review process, and making adherence to the recommendations an expectation.We also encourage funders to require that funding applications for prediction model studies include a plan to report their prediction model according to the TRIPOD+AI recommendations, thereby minimising ¶Relates to the code to implement the model to get estimates of risk for a new individual.the bmj | BMJ 2024;385:e078378 | doi: 10.1136/bmj-2023-078378 on 22 April 2024 at UNIVERSITY LIBRARY UTRECHT.Protected by copyright.http://www.bmj.com/BMJ: first published as 10.1136/bmj-2023-078378 on 16 April 2024.Downloaded from

Table 1 |
Reporting guidelines for healthcare studies using machine learning AI Early stage clinical evaluation (including safety, human factors evaluation) of decision support systems driven by artificial intelligence 69 CHEERS-AI Studies describing health economic evaluations to estimate the value for money (cost effectiveness) of artificial intelligence interventions 70 SPIRIT-AI Protocols for clinical trials evaluating an intervention with an artificial intelligence component 71 CONSORT-AI Clinical trial reports evaluating an intervention with an artificial intelligence component 72 PRISMA-AI Systematic reviews and meta-analyses of artificial intelligence interventions (in preparation) 73 STARD=Standards for Reporting of Diagnostic Accuracy; TRIPOD=Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis; AI=artificial intelligence; CLAIM=Checklist for Artificial Intelligence in Medical Imaging; DECIDE=Decisions in health Care to Introduce or Diffuse innovations using Evidence; CHEERS=Consolidated Health Economic Evaluation Reporting Standards; SPIRIT=Standard Protocol Items: Recommendations for Interventional Trials; CONSORT=Consolidated Standards of Reporting Trials; PRISMA=Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

Table 2 |
TRIPOD+AI checklist for the reporting of prediction model studies Describe if and how any heterogeneity in estimates of model parameter values and model performance was handled and quantified across clusters (eg, hospitals, countries).See TRIPOD-Cluster for additional considerations ‡ 12e D;E Specify all measures and plots used (and their rationale) to evaluate model performance (eg, discrimination, calibration, clinical utility) and, if relevant, to compare multiple models 12f E Describe any model updating (eg, recalibration) arising from the model evaluation, either overall or for particular sociodemographic groups or settings 12g E For model evaluation, describe how the model predictions were calculated (eg, formula, code, object, application programming interface) (Continued)

Table 2 |
Continued Section/topic Item Development/evaluation* Checklist item Describe the flow of participants through the study, including the number of participants with and without the outcome and, if applicable, a summary of the follow-up time.A diagram may be helpful 20b D;E Report the characteristics overall and, where applicable, for each data source or setting, including the key dates, key predictors (including demographics), treatments received, sample size, number of outcome events, follow-up time, and amount of missing data.A table may be helpful.Report any differences across key demographic groups 20c E For model evaluation, show a comparison with the development data of the distribution of important predictors (demographics, predictors, and outcome) Discuss any next steps for future research, with a specific view to applicability and generalisability of the model TRIPOD=Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis; AI=artificial intelligence.*D=items relevant only to the development of a prediction model; E=items relating solely to the evaluation of a prediction model; D;E=items applicable to both the development and evaluation of a prediction model.†Separately for all model building approaches.
• Inclusion of TRIPOD+AI for Abstracts for guidance on reporting abstracts • Modification of the model performance item recommending that authors evaluate model performance in key subgroups (eg, sociodemographic) • Inclusion of a new item on patient and public involvement to raise awareness and prompt authors to provide details on any patient and public involvement during the design, conduct, reporting (and interpretation), and dissemination of the study • Inclusion of an open science section with subitems on study protocols, registration, data sharing and code sharing By registering the bmj | BMJ 2024;385:e078378 | doi: 10.1136/bmj-2023-078378 on 22 April 2024 at UNIVERSITY LIBRARY UTRECHT.Protected by copyright.http://www.bmj.com/BMJ: first published as 10.1136/bmj-2023-078378 on 16 April 2024.Downloaded from research and making study materials such as protocols, data, code, and the prediction model open available, other researchers can verify the findings and evaluate model performance in new data to ensure that models are accurate, and evaluate models for safety.Open science practices also enable researchers to build on each other's work, leading to more efficient progress in healthcare.These practices can have a considerable impact on patient outcomes by improving the accuracy, integrity, and reliability of prediction models.If data are openly shared, clinicians and researchers can develop or evaluate models on larger and more diverse sets of patient data,

Table 4 |
Adherence to the TRIPOD+AI reporting guideline: potential benefits from stakeholders' actions Enhance a culture of transparency in the design, analysis, and reporting of prediction model research Provide training for early career researchers on the importance and benefits of transparent and complete reporting, including requiring doctoral students to write their thesis and manuscripts in accordance with the full TRIPOD+AI guideline Improve the quality, accountability, reproducibility, replicability, and usefulness of produced research Researchers Adhere to TRIPOD+AI when writing studies for publication Improved completeness and quality of reporting Increased awareness of the minimal detail required and expected when writing a prediction model publication Improved quality, accountability, reproducibility, replicability, and usefulness of produced research Improved reporting of details that facilitate independent evaluation of the model Journal editors Require and enforce authors to use TRIPOD+AI and submit a completed a checklist when writing the manuscript Improved understanding of journal requirements and expectations for prediction model publications Recommend peer reviewers use TRIPOD+AI Increased efficiency of peer review resulting from improved author understanding of journal requirements for prediction model publications Improved quality, accountability, reproducibility, replicability, and usefulness of published research Peer reviewers Use TRIPOD+AI to evaluate completeness of reporting Improve the efficiency and quality of peer review Facilitate and direct specific feedback to authors on where important details are missing Funders Recommend or mandate use of TRIPOD+AI by investigators when receiving a grant for prediction model research Increase the usefulness of research outputs Reduce avoidable research waste due to incomplete reporting Ensure that funded research can be used by others Patients, public, and study participants Advocate use of TRIPOD+AI by authors, peer reviewers, journals, and funders Improved trust in research findings Improved understanding of prediction model research Promote health equity considerations in research Align patient reported outcomes and patient experience with clinical research outcomes for precision medicine and personalised disease management Systematic reviewers and meta-researchers Use TRIPOD+AI to assess completeness of reporting Improved evaluation of study quality when used alongside risk of bias tools (eg, PROBAST) Use TRIPOD+AI as an aid when assessing quality and risks of bias Improved availability of data needed for meta-analysis Policy makers Use or promote TRIPOD+AI to ensure research is transparently and completely reported Ensure decisions to evaluate or implement a prediction model are based on complete and transparently reported information Add integrity for evidence based policy recommendations Regulators Clinical reviewers use TRIPOD+AI to assess completeness of clinical investigation reporting for "software as a medical device" regulatory submissions where the operating principle of the product is a prediction model Align reported intended use with regulatory intended purpose Align medical device regulatory review and pivotal investigational reporting with best practice Encourage manufacturers to publish clinical investigation reports by encouraging one common standard Technology and medical device manufacturers Verify whether sufficient details about a model are available to enable development and manufacturing of technology and devices Encourages manufacturers to publish clinical investigation reports by encouraging one common standard Healthcare professionals Verify whether sufficient details about a model are available before purchasing or using a model to support clinical use Improved understanding of the target population of a model and the clinical decision it is intended to support Improved understanding of model predictions and awareness of limitations Improved trust in research findings TRIPOD=Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis; AI=artificial intelligence.