Abstract
The task of labelling textual data from a set of thematic labels has become increasingly important nowadays. This is not unconnected to the large volumes of such data generated in recent times and the need to search with effectiveness meanings from these vast data with flexible queries. Therefore, text classification is a key focus in machine learning applications and research as a way of doing away with manual labelling of text data which is not only time consuming but abysmally tedious. This study aims to comprehensively compare the performances of several ensemble learning techniques with base supervised machine learning techniques for text classification. For the experiments, we utilized two encoding paradigms for feature engineering; Bag-of-Words technique and Term Frequency Inverse Document Frequency or TF-IDF. The effective feature vectors obtained were passed as input to the supervised learning algorithms (ensemble learning methods and others). Hence, each of the algorithms were trained on YouTube Spam Collection Dataset based on 5-fold and 10-fold cross-validation. The results show that Adaboost and LightGBM outperform other approaches on both evaluations. This indicate that some of the Ensemble learning methods often yield better text classification performance compared with base techniques examined.
| Original language | English |
|---|---|
| Title of host publication | 2021 1st International Conference on Multidisciplinary Engineering and Applied Science, ICMEAS 2021 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9781665434935 |
| DOIs | |
| State | Published - 2021 |
| Externally published | Yes |
Publication series
| Name | 2021 1st International Conference on Multidisciplinary Engineering and Applied Science, ICMEAS 2021 |
|---|
Bibliographical note
Publisher Copyright:© 2021 IEEE.
Keywords
- Ensemble Learning
- Machine Learning
- Natural Language Processing
- Text Classification
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Science Applications
- Control and Optimization
- Engineering (miscellaneous)