Scalable parallel SVM on cloud clusters for large datasets classification

  • Md Sarwar M. Haque
  • , Ghazanfar Latif
  • , Md Rafiul Hasan
  • , Md Arifuzzaman
  • , Shakib S. Shafin
  • , Quazi A. Rahman

Research output: Contribution to conferencePaperpeer-review

Abstract

This paper proposes a new parallel support vector machine (PSVM) that is efficient in terms of time complexity. Support vector machine is one of the popular classifiers for analysis of data and classification of patterns. However, SVM requires a large memory (in the range of 100 GB or more) in order to process big-data (i.e., in the range of 1 TB data or more). This paper proposes to execute SVMs in parallel on several clusters to analyze and classify big-data. In this approach, the data are divided to n equal partitions. Each partitioned data is used by an individual cluster to train an SVM. The outcomes of each of the SVMs executed on several clusters are then combined by another SVM referred as final SVM. The inputs to this final SVM are the support vectors (SVs) of the SVMs that were executed on different clusters, while the desired output is the corresponding output of the respective SV. We evaluated our proposed method on high performance computing (HPC) clusters and amazon cloud clusters (ACC) using different benchmark datasets. Experimental results show that the proposed method is efficient in terms of training time with minimal error rate and memory requirement, compared to the existing stand-alone SVM.

Original languageEnglish
StatePublished - 2019

Bibliographical note

Publisher Copyright:
© 2019 Institution of Engineering and Technology. All rights reserved.

Keywords

  • Amazon Web Services
  • Cloud Computing
  • Cluster Computing
  • Parallel SVM
  • Support Vector Machine

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Scalable parallel SVM on cloud clusters for large datasets classification'. Together they form a unique fingerprint.

Cite this