Explainable AI in Feature Selection: Improving Classification Performance on Imbalanced Datasets

  • Shahriar Siddique Ayon
  • , Muhammad Ebrahim Hossain
  • , Md Saef Ullah Miah
  • , M. Mostafizur Rahman
  • , Mufti Mahmud*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Particularly in biomedical applications, feature selection plays a critical role in enhancing the interpretability and efficacy of machine learning models. This work examines the performance of Explainable AI (XAI), Information Gain (IG), and Principal Component Analysis (PCA) methods on an imbalanced dataset pertaining to stroke prediction. Data from 4,603 patient records, including 362 instances of stroke from the National Health and Nutrition Examination Survey are used in this investigation. Methodologically, IG is used for feature ranking, PCA is used to reduce dimensionality and XAI techniques are used to improve model transparency. The chosen features are used to assess the performance of several machine learning models, including Random Forest, Support Vector Machine, k-Nearest Neighbours, and Logistic Regression, in terms of classification. Our experimental results show that the combined PCA-IG approach significantly enhances classification accuracy, achieving 91.75%. Furthermore, LIME-based feature selection outperformed in precision, recall, and F1 score, with the highest accuracy at 91.86%. LIME discovered nine positive impact features, highlighting the top contributors in the dataset. We also applied the same feature selection technique to datasets from other domains. These findings highlight the robustness of using PCA-IG and XAI approaches separately to create reliable and understandable machine learning models for healthcare and other applications. By offering insights into the optimal use of PCA, IG, and XAI to enhance the accuracy and practicality of machine learning models in healthcare and other domains, this paper advances the field of feature selection across all areas of data analysis.

Original languageEnglish
Title of host publicationNeural Information Processing - 31st International Conference, ICONIP 2024, Proceedings
EditorsMufti Mahmud, Maryam Doborjeh, Kevin Wong, Andrew Chi Sing Leung, Zohreh Doborjeh, M. Tanveer
PublisherSpringer Science and Business Media Deutschland GmbH
Pages303-318
Number of pages16
ISBN (Print)9789819666058
DOIs
StatePublished - 2025
Event31st International Conference on Neural Information Processing, ICONIP 2024 - Auckland, New Zealand
Duration: 2 Dec 20246 Dec 2024

Publication series

NameLecture Notes in Computer Science
Volume15296 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference31st International Conference on Neural Information Processing, ICONIP 2024
Country/TerritoryNew Zealand
CityAuckland
Period2/12/246/12/24

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

Keywords

  • (Principal Component Analysis)
  • Classification
  • Explainable AI
  • Feature Selection
  • Imbalanced Data

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Explainable AI in Feature Selection: Improving Classification Performance on Imbalanced Datasets'. Together they form a unique fingerprint.

Cite this