AI fails to pass radiology qualifying examination
AI candidate could not outperform radiologists, but further training may improve results, say researchers
Artificial intelligence (AI) is currently unable to pass one of the qualifying radiology examinations, suggesting that this promising technology is not yet ready to replace doctors, finds a study in the Christmas issue of The BMJ.
AI is increasingly being used for some tasks that doctors do, such as interpreting radiographs (x-rays and scans) to help diagnose a range of conditions.
But can AI pass the Fellowship of the Royal College of Radiologists (FRCR) examination, which UK trainees must do to qualify as radiology consultants?
To find out, researchers compared the performance of a commercially available AI tool with 26 radiologists (mostly aged between 31 and 40 years; 62% female) all of whom had passed the FRCR exam the previous year.
They developed 10 ‘mock’ rapid reporting exams, based on one of three modules that make up the qualifying FRCR examination that is designed to test candidates for speed and accuracy.
Each mock exam consisted of 30 radiographs at the same or a higher level of difficulty and breadth of knowledge expected for the real FRCR exam. To pass, candidates had to correctly interpret at least 27 (90%) of the 30 images within 35 minutes.
The AI candidate had been trained to assess chest and bone (musculoskeletal) radiographs for several conditions including fractures, swollen and dislocated joints, and collapsed lungs.
Allowances were made for images relating to body parts that the AI candidate had not been trained in, which were deemed “uninterpretable.”
When uninterpretable images were excluded from the analysis, the AI candidate achieved an average overall accuracy of 79.5% and passed two of 10 mock FRCR exams, while the average radiologist achieved an average accuracy of 84.8% and passed four of 10 mock examinations.
The sensitivity (ability to correctly identify patients with a condition) for the AI candidate was 83.6% and the specificity (ability to correctly identify patients without a condition) was 75.2%, compared with 84.1% and 87.3% across all radiologists.
Across 148 out of 300 radiographs that were correctly interpreted by more than 90% of radiologists, the AI candidate was correct in 134 (91%) and incorrect in the remaining 14 (9%).
In 20 out of 300 radiographs that over half of radiologists interpreted incorrectly, the AI candidate was incorrect in 10 (50%) and correct in the remaining 10.
Interestingly, the radiologists slightly overestimated the likely performance of the AI candidate, assuming that it would perform almost as well as themselves on average and outperform them in at least three of the 10 mock exams.
However, this was not the case. The researchers say: “On this occasion, the artificial intelligence candidate was unable to pass any of the 10 mock examinations when marked against similarly strict criteria to its human counterparts, but it could pass two of the mock examinations if special dispensation was made by the RCR to exclude images that it had not been trained on.”
These are observational findings and the researchers acknowledge that they evaluated only one AI tool and used mock exams that were not timed or supervised, so radiologists may not have felt as much pressure to do their best as one would in a real exam.
Nevertheless, this study is one of the more comprehensive cross comparisons between radiologists and artificial intelligence, providing a broad range of scores and results for analysis.
Further training and revision are strongly recommended, they add, particularly for cases the artificial intelligence considers “non-interpretable,” such as abdominal radiographs and those of the axial skeleton.
AI may facilitate workflows, but human input is still crucial, argue researchers in a linked editorial.
They acknowledge that using artificial intelligence “has untapped potential to further facilitate efficiency and diagnostic accuracy to meet an array of healthcare demands” but say doing so appropriately “implies educating physicians and the public better about the limitations of artificial intelligence and making these more transparent.”
The research in this subject is buzzing, they add, and this study highlights that one foundational aspect of radiology practice—passing the FRCR examination necessary for the licence to practise—still benefits from the human touch.
Notes for editors
Research: Can artificial intelligence pass the fellowship of the Royal College of Radiologists examination? Multi-reader diagnostic accuracy study doi: 10.1136/ bmj-2022-072826
Editorial: Robots, radiologists, and results doi: 10.1136/ bmj.o2853
Journal: The BMJ
Funding: SCS is funded by a National Institute for Health Research (NIHR) advanced fellowship award (NIHR-301332). JWM is supported by the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). This article presents independent funded research. The views expressed are those of the authors and not necessarily those of the National Health Service (NHS), NIHR, or Department of Health.
Link to Academy of Medical Sciences press release labelling system:
Externally peer reviewed? Yes (research); No (linked editorial)
Evidence type: Observational; Opinion
Subjects: AI technology and radiologists