Explainable Disease Classification: Exploring Grad-CAM Analysis of CNNs and ViTs

Ali Alqutayfi*, Wadha Almattar*, Sadam Al-Azani, Fakhri Alam Khan, Abdullah Al Qahtani, Solaiman Alageel, Mohammed Alzahrani

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Deep learning architectures, such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), are playing an increasingly crucial role in early diagnosis and treatment across medical fields. As these AI models are integrated into clinical practice, the need for explainable AI tools, like Gradient-weighted Class Activation Mapping (Grad-CAM), becomes paramount to building clinician trust and ensuring the reliability of AI-driven diagnoses. However, a gap exists in the literature regarding comprehensive, quantitative, and qualitative comparisons of CNN and ViT performance across diverse medical imaging tasks, particularly those involving variations in object scale. This study compares CNN-based and ViT-based models for two medical imaging tasks: diabetic retinopathy detection from fundus images (small objects) and pneumonia detection from chest X-rays (large objects). We evaluate popular CNN architectures (ResNet, EfficientNet, VGG, Inception) and ViT models (ViT-Base, ViT-Large, ViT-Huge), using both quantitative metrics and expert qualitative assessments. We also analyze Grad-CAM’s effectiveness for visualizing regions of interest in these models. Our results show that ViT-Large outperforms other models on X-rays, while EfficientNet excels on fundus images. However, Grad-CAM struggles to highlight small regions of interest, particularly in diabetic retinopathy, revealing a limitation in current explainable AI methods. This work underscores the need for optimization of explainability tools and contributes to a better understanding of CNN and ViT strengths in medical imaging.

Original languageEnglish
Pages (from-to)264-273
Number of pages10
JournalJournal of Advances in Information Technology
Volume16
Issue number2
DOIs
StatePublished - 2025

Bibliographical note

Publisher Copyright:
© 2025 by the authors.

Keywords

  • Convolutional Neural Network (CNN)
  • Gradient-weighted Class Activation Mapping (Grad-CAM)
  • Vision Transformer (ViT)
  • explainable AI
  • medical imaging

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Explainable Disease Classification: Exploring Grad-CAM Analysis of CNNs and ViTs'. Together they form a unique fingerprint.

Cite this