Skip to main navigation Skip to search Skip to main content

A novel deep semantic- and vision-based self-attention architecture for skin cancer classification

  • Junaid Aftab
  • , Muhammad Attique Khan*
  • , Sobia Arshad
  • , Amir Hussain*
  • , Shrooq Alsenan
  • , Yongwon Cho
  • , Yunyoung Nam*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Objectives: In the world, skin cancer is a significant health concern, and early diagnosis of this cancer plays a key role in improving patient outcomes. The early detection of this cancer reduces the death rate, but due to the complexity of the diagnosis, incorrect detection and prediction are provided by the experts. Therefore, it is essential to propose a computer-aided diagnostic system based on deep learning and explainable Artificial Intelligence (XAI) techniques that can be used as a second opinion in clinics and help physicians more accurately detect and predict this type of cancer. Methods: This work presents the proposed deep learning architecture consisting of two modules—skin lesion segmentation and lesion type classification. The proposed architecture is interpreted using XAI techniques to better evaluate the black-box model. In the skin lesion segmentation phase, we implemented DeepLab V3 architecture for semantic segmentation. The ResNet-18 model was used as the backbone, and later hyperparameters were optimized using Bayesian Optimization (BO). In the classification phase, we design a FusedNet architecture called Inverted self-attention with Vision Transformer (ISAwViT). The proposed fused network combines an inverted self-attention residual architecture with a vision transformer. The proposed fused network extracted feature information more deeply than performing an accurate prediction in a later stage. The design model is trained, and later in the testing phase, extracted features are classified using Softmax and several other classifiers. Results: The lesion segmentation and classification experiment was conducted on the HAM10000 dataset. The accuracy achieved by the HAM10000 dataset was 95.16% for lesion segmentation and 97.5% for lesion classification. Conclusion: Compared with recent techniques, the proposed model is more effective and efficient. In addition, the interpretation of the proposed model was performed using LIME and Grad-CAM, which show how the fused model makes correct classifications.

Original languageEnglish
JournalDigital Health
Volume12
DOIs
StatePublished - 1 Jan 2026
Externally publishedYes

Bibliographical note

Publisher Copyright:
© The Author(s) 2026. This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).

Keywords

  • digital health
  • interpretation
  • lesion classification
  • lesion segmentation
  • models fusion
  • skin cancer

ASJC Scopus subject areas

  • Health Policy
  • Health Informatics
  • Computer Science Applications
  • Health Information Management

Fingerprint

Dive into the research topics of 'A novel deep semantic- and vision-based self-attention architecture for skin cancer classification'. Together they form a unique fingerprint.

Cite this