Handling Imbalanced and Overlapped Medical Datasets: A Comparative Study

  • Mohammad Sarosh Basit*
  • , Adeeba Khan
  • , Omar Farooq
  • , Yusuf Uzzaman Khan
  • , Mohammad Shameem
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

An imbalanced dataset with class overlapping is a challenging issue in medical research. Imbalanced data points lead to overfitting for the majority class while overlapped classes cause misclassification for both classes. Hence, this combination makes it challenging for classic machine learning algorithms to define a decision boundary between minority and majority classes. In our study, different algorithms with different techniques have been compared for example oversampling, undersampling, combined over and under sampling, and the ensemble methods to deal with class imbalance along with class overlapping. Two well-known highly imbalanced and overlapped medical datasets are used to compare the performance of different approaches and performance is evaluated by sensitivity and specificity. On the sleep apnea dataset, oversampling combined with ensemble classifier AdaBoost with the specificity and sensitivity of 0.72 and 0.46 which proved better than other techniques and classifiers. On the diabetes dataset, SMOTE-TOMEK oversampling combined with the Random Forrest classifier with the specificity and sensitivity of 0.91 and 0.77 proved to be better than all the combinations that have been tried for the classification with minimal number of features.

Original languageEnglish
Title of host publication2022 5th International Conference on Multimedia, Signal Processing and Communication Technologies, IMPACT 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665476478
DOIs
StatePublished - 2022
Externally publishedYes
Event5th International Conference on Multimedia, Signal Processing and Communication Technologies, IMPACT 2022 - Aligarh, India
Duration: 26 Nov 202227 Nov 2022

Publication series

Name2022 5th International Conference on Multimedia, Signal Processing and Communication Technologies, IMPACT 2022

Conference

Conference5th International Conference on Multimedia, Signal Processing and Communication Technologies, IMPACT 2022
Country/TerritoryIndia
CityAligarh
Period26/11/2227/11/22

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Imbalanced Data
  • Machine Learning
  • Machine Learning Algorithms
  • Medical Dataset
  • Overlapped Data
  • Oversampling
  • SMOTE
  • Undersampling

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Signal Processing
  • Media Technology
  • Instrumentation

Fingerprint

Dive into the research topics of 'Handling Imbalanced and Overlapped Medical Datasets: A Comparative Study'. Together they form a unique fingerprint.

Cite this