Artificial intelligence models that analyse medical images have the potential to improve doctors’ capacity to make correct and fast diagnoses, while also reducing physicians’ workload by enabling them to concentrate on essential cases and assign routine duties to AI.
Problematic, however, are AI models that lack clarity regarding how and why a diagnosis is produced. This opaque reasoning, sometimes known as “black box” AI, can reduce clinicians’ confidence in the tool’s dependability and thus discourage its use. This lack of openness could also mislead doctors into misinterpreting the interpretation of the tool.
In the field of medical imaging, saliency assessments have been used to construct more intelligible AI models and demystify AI decision-making. This technique employs heat maps to determine if a tool is correctly focusing on the key parts of an image or if it is focusing on irrelevant aspects.
Heat maps function by highlighting regions of an image that influenced the interpretation of an AI model. This could assist human physicians in determining if the AI model concentrates on the same places as they do, or whether it is erroneously concentrating on irrelevant areas of an image.
Despite its potential, a new study published in Nature Machine Intelligence on October 10 indicates that saliency heat maps may not be ready for prime time.
Pranav Rajpurkar of Harvard Medical School, Matthew Lungren of Stanford, and Adriel Saporta of New York University quantified the validity of seven widely used saliency methods to determine how reliably and accurately they could identify pathologies associated with 10 conditions commonly diagnosed on X-ray, including lung lesions, pleural effusion, edoema, and enlarged heart structures. To determine performance, the researchers compared the performance of the instruments to expert human judgement.
In the final research, systems employing saliency-based heat maps consistently underperformed human radiologists in picture assessment and the capacity to detect problematic lesions.
This is the first study to compare the performance of saliency maps with human experts in the evaluation of different X-ray diseases. In addition, the work provides a granular understanding of whether and how pathological traits in an image may affect the effectiveness of AI tools.
Clinical procedures that use AI to interpret computer-aided detection methods, such as reading chest X-rays, currently utilise the saliency-map feature as a quality assurance instrument. In light of the current findings, however, this function should be implemented with caution and a fair dose of scepticism, according to the researchers.
Due to the significant limitations discovered in the study, the researchers suggest that saliency-based heat maps need be modified before they are widely utilised in clinical AI models.
The entirety of the team’s coding, data, and analysis are available to anyone interested in researching this crucial part of clinical machine learning in medical imaging applications.
Saporta, A., et al. (2022) Benchmarking saliency methods for chest X-ray interpretation. Nature Machine Intelligence. doi.org/10.1038/s42256-022-00536-x.