New Method Improves AI’s Ability to Explain Its Predictions
Researchers have developed a new technique that could make artificial intelligence models more transparent, helping users better understand and trust AI decisions in high-stakes fields such as healthcare and autonomous driving.
Scientists from the Massachusetts Institute of Technology created an approach that enables computer vision models to explain their predictions using concepts that humans can easily understand.
The method builds on a technique known as concept bottleneck modeling, which requires an AI system to identify specific concepts before making a final prediction. For example, a medical AI analyzing an image might highlight features such as irregular pigmentation or clustered spots before predicting a disease like melanoma.
However, traditional concept bottleneck models rely on concepts defined beforehand by human experts, which may not always match the task the model is performing. This mismatch can reduce accuracy or lead the model to rely on hidden information that users cannot see.
To address this, researchers developed a system that extracts concepts the AI has already learned during training and converts them into explanations written in plain language.
“In a sense, we want to be able to read the minds of these computer vision models,” said Antonio De Santis, lead author of the study and a graduate student at the Polytechnic University of Milan who conducted the research while visiting MIT’s Computer Science and Artificial Intelligence Laboratory.
The technique uses two specialized machine-learning models. One model identifies the most relevant features the AI learned during training, while a multimodal large language model converts those features into clear text explanations.
To make the system more understandable, researchers also limited the AI to using only five concepts when making a prediction. This forces the model to focus on the most relevant features and produce simpler explanations.
In tests involving tasks such as identifying bird species and detecting skin lesions in medical images, the method produced more accurate predictions and clearer explanations compared with existing concept bottleneck models.
Still, the researchers note that highly accurate “black-box” AI models—systems whose internal decision-making is difficult to interpret—can sometimes outperform explainable models.
In the future, the team plans to further improve the system by addressing issues such as information leakage, where models rely on hidden concepts that users cannot observe.
Experts say the research represents a promising step toward making AI systems more transparent and accountable.
“This work pushes interpretable AI in a very promising direction,” said Andreas Hotho of the University of Würzburg, who was not involved in the research.
The study will be presented at the International Conference on Learning Representations.