A Data-Driven Heart Disease Prediction Model Through K-Means Clustering-Based Anomaly Detection

Rony Chowdhury Ripan, Iqbal H. Sarker*, Syed Md Minhaz Hossain, Md Musfique Anwar, Raza Nowrozy, Mohammed Moshiul Hoque, Md Hasan Furhad

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

70 Scopus citations

Abstract

Heart disease, alternatively known as cardiovascular disease, is the primary basis of death worldwide over the past few decades. To make an early diagnosis, a data-driven prediction model considering the associate risk factors in heart disease can play a significant role in healthcare domain. However, to build such an effective model based on machine learning techniques, the quality of the data, e.g., data without “anomalies” or outliers, is important. This research investigates anomaly detection in the healthcare domain to effectively predict heart disease using unsupervised K-means clustering algorithm. Our proposed model first determines an optimal value of K using the Silhouette method to form the clusters for finding the anomalies. After that, we eliminate the identified anomalies from the data and employ the five most popular machine learning classification techniques, such as K-nearest neighbor, random forest, support vector machine, naive Bayes, and logistic regression to build the resultant prediction model. The efficacy of the proposed methodology is justified using a standard heart disease dataset. We also take into account the data plotting to test the exactness of the detection of anomalies in our experimental analysis.

Original languageEnglish
Article number112
JournalSN Computer Science
Volume2
Issue number2
DOIs
StatePublished - Apr 2021
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2021, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. part of Springer Nature.

Keywords

  • Anomaly detection
  • Healthcare
  • Heart disease prediction
  • K-means clustering

ASJC Scopus subject areas

  • General Computer Science
  • Computer Science Applications
  • Computer Networks and Communications
  • Computer Graphics and Computer-Aided Design
  • Computational Theory and Mathematics
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A Data-Driven Heart Disease Prediction Model Through K-Means Clustering-Based Anomaly Detection'. Together they form a unique fingerprint.

Cite this