Abstract
An imbalanced dataset with class overlapping is a challenging issue in medical research. Imbalanced data points lead to overfitting for the majority class while overlapped classes cause misclassification for both classes. Hence, this combination makes it challenging for classic machine learning algorithms to define a decision boundary between minority and majority classes. In our study, different algorithms with different techniques have been compared for example oversampling, undersampling, combined over and under sampling, and the ensemble methods to deal with class imbalance along with class overlapping. Two well-known highly imbalanced and overlapped medical datasets are used to compare the performance of different approaches and performance is evaluated by sensitivity and specificity. On the sleep apnea dataset, oversampling combined with ensemble classifier AdaBoost with the specificity and sensitivity of 0.72 and 0.46 which proved better than other techniques and classifiers. On the diabetes dataset, SMOTE-TOMEK oversampling combined with the Random Forrest classifier with the specificity and sensitivity of 0.91 and 0.77 proved to be better than all the combinations that have been tried for the classification with minimal number of features.
| Original language | English |
|---|---|
| Title of host publication | 2022 5th International Conference on Multimedia, Signal Processing and Communication Technologies, IMPACT 2022 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9781665476478 |
| DOIs | |
| State | Published - 2022 |
| Externally published | Yes |
| Event | 5th International Conference on Multimedia, Signal Processing and Communication Technologies, IMPACT 2022 - Aligarh, India Duration: 26 Nov 2022 → 27 Nov 2022 |
Publication series
| Name | 2022 5th International Conference on Multimedia, Signal Processing and Communication Technologies, IMPACT 2022 |
|---|
Conference
| Conference | 5th International Conference on Multimedia, Signal Processing and Communication Technologies, IMPACT 2022 |
|---|---|
| Country/Territory | India |
| City | Aligarh |
| Period | 26/11/22 → 27/11/22 |
Bibliographical note
Publisher Copyright:© 2022 IEEE.
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Imbalanced Data
- Machine Learning
- Machine Learning Algorithms
- Medical Dataset
- Overlapped Data
- Oversampling
- SMOTE
- Undersampling
ASJC Scopus subject areas
- Computer Networks and Communications
- Hardware and Architecture
- Signal Processing
- Media Technology
- Instrumentation
Fingerprint
Dive into the research topics of 'Handling Imbalanced and Overlapped Medical Datasets: A Comparative Study'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver