Abstract
Particularly in biomedical applications, feature selection plays a critical role in enhancing the interpretability and efficacy of machine learning models. This work examines the performance of Explainable AI (XAI), Information Gain (IG), and Principal Component Analysis (PCA) methods on an imbalanced dataset pertaining to stroke prediction. Data from 4,603 patient records, including 362 instances of stroke from the National Health and Nutrition Examination Survey are used in this investigation. Methodologically, IG is used for feature ranking, PCA is used to reduce dimensionality and XAI techniques are used to improve model transparency. The chosen features are used to assess the performance of several machine learning models, including Random Forest, Support Vector Machine, k-Nearest Neighbours, and Logistic Regression, in terms of classification. Our experimental results show that the combined PCA-IG approach significantly enhances classification accuracy, achieving 91.75%. Furthermore, LIME-based feature selection outperformed in precision, recall, and F1 score, with the highest accuracy at 91.86%. LIME discovered nine positive impact features, highlighting the top contributors in the dataset. We also applied the same feature selection technique to datasets from other domains. These findings highlight the robustness of using PCA-IG and XAI approaches separately to create reliable and understandable machine learning models for healthcare and other applications. By offering insights into the optimal use of PCA, IG, and XAI to enhance the accuracy and practicality of machine learning models in healthcare and other domains, this paper advances the field of feature selection across all areas of data analysis.
| Original language | English |
|---|---|
| Title of host publication | Neural Information Processing - 31st International Conference, ICONIP 2024, Proceedings |
| Editors | Mufti Mahmud, Maryam Doborjeh, Kevin Wong, Andrew Chi Sing Leung, Zohreh Doborjeh, M. Tanveer |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 303-318 |
| Number of pages | 16 |
| ISBN (Print) | 9789819666058 |
| DOIs | |
| State | Published - 2025 |
| Event | 31st International Conference on Neural Information Processing, ICONIP 2024 - Auckland, New Zealand Duration: 2 Dec 2024 → 6 Dec 2024 |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Volume | 15296 LNCS |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 31st International Conference on Neural Information Processing, ICONIP 2024 |
|---|---|
| Country/Territory | New Zealand |
| City | Auckland |
| Period | 2/12/24 → 6/12/24 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
Keywords
- (Principal Component Analysis)
- Classification
- Explainable AI
- Feature Selection
- Imbalanced Data
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science