Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studiesBMJ 2020; 368 doi: https://doi.org/10.1136/bmj.m689 (Published 25 March 2020) Cite this as: BMJ 2020;368:m689
All rapid responses
Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
I'd like to stress the importance of this review and add as a suggestion for further research on this topic that other fields of AI in healthcare could and should be assessed as carefully as the medical imaging field, which was used here. One example I’d like to point out is the emerging field of self-diagnosis apps that are supposedly driven by machine learning algorithms. They could have a potentially high impact on public health and are used by patients themselves. 
Some manufacturers have begun publishing assessments of such systems  while, as mentioned in this study, neither the TRIPOD statement update for the evaluation of machine learning software has been released nor did the FDA update its regulations with regards to the increasing utilization of machine learning in medical devices, which includes such software. There is a risk that without proper acknowledgement of policy makers and regulators, public health may suffer from lack of quality, intransparency of algorithms and drain of valuable patient data into corporate equity instead of the public. Similar to the chain of adverse events that triggered the improvement of the approval procedures of pharmaceuticals , rules for safe, unbiased and thorough evaluation of such devices would then only be created in hindsight.
This review clearly demonstrates the lack of robust data in a field where claims have long been staked by various companies but objective evaluation and regulation of function is lagging - as it seems to be the case in many fields of online transactions. Therefore, any responsible physician should advocate for an expansion of studies as this one.
1 Ćirković A. AI in Self-Diagnosis - History, Theoretical Foundations, Potentials and Current Status. ResearchGate. doi:10.13140/RG.2.2.33461.83684
2 Gilbert S, Mehl A, Baluch A, et al. How accurate are digital symptom assessment apps for suggesting conditions and urgency advice? A clinical vignettes comparison to GPs. BMJ Open 2020;10:e040269. doi:10/ghtkkg
3 Rägo L, Santoso B. Drug regulation: history, present and future. Drug benefits and risks: International textbook of clinical pharmacology 2008;2:65–77.
Competing interests: No competing interests