Comparative Study of Ensemble Learning Techniques for Text Classification

Yusuf Ibrahim, Emmanuel Okafor, Basira Yahaya, Shehu Mohammed Yusuf, Zainab Mukhtar Abubakar, Umar Yusuf Bagaye

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

The task of labelling textual data from a set of thematic labels has become increasingly important nowadays. This is not unconnected to the large volumes of such data generated in recent times and the need to search with effectiveness meanings from these vast data with flexible queries. Therefore, text classification is a key focus in machine learning applications and research as a way of doing away with manual labelling of text data which is not only time consuming but abysmally tedious. This study aims to comprehensively compare the performances of several ensemble learning techniques with base supervised machine learning techniques for text classification. For the experiments, we utilized two encoding paradigms for feature engineering; Bag-of-Words technique and Term Frequency Inverse Document Frequency or TF-IDF. The effective feature vectors obtained were passed as input to the supervised learning algorithms (ensemble learning methods and others). Hence, each of the algorithms were trained on YouTube Spam Collection Dataset based on 5-fold and 10-fold cross-validation. The results show that Adaboost and LightGBM outperform other approaches on both evaluations. This indicate that some of the Ensemble learning methods often yield better text classification performance compared with base techniques examined.

Original languageEnglish
Title of host publication2021 1st International Conference on Multidisciplinary Engineering and Applied Science, ICMEAS 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665434935
DOIs
StatePublished - 2021
Externally publishedYes

Publication series

Name2021 1st International Conference on Multidisciplinary Engineering and Applied Science, ICMEAS 2021

Bibliographical note

Publisher Copyright:
© 2021 IEEE.

Keywords

  • Ensemble Learning
  • Machine Learning
  • Natural Language Processing
  • Text Classification

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Control and Optimization
  • Engineering (miscellaneous)

Fingerprint

Dive into the research topics of 'Comparative Study of Ensemble Learning Techniques for Text Classification'. Together they form a unique fingerprint.

Cite this