Using Word Embedding and Ensemble Learning for Highly Imbalanced Data Sentiment Analysis in Short Arabic Text

Sadam Al-Azani, El Sayed M. El-Alfy*

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

123 Scopus citations

Abstract

Sentiment analysis has gained increasing importance with the massive increase of online content. Although several studies have been conducted for western languages, not much has been done for the Arabic language. The purpose of this study is to compare the performance of different classifiers for polarity determination in highly imbalanced short text datasets using features learned by word embedding rather than hand-crafted features. Several base classifiers and ensembles have been investigated with and without SMOTE (Synthetic Minority Over-sampling Technique). Using a dataset of tweets in dialectical Arabic, the results show that applying word embedding with ensemble and SMOTE can achieve more than 15% improvement on average in F1 score over the baseline, which is a weighted average of precision and recall and is considered a better performance measure than accuracy for imbalanced datasets.

Original languageEnglish
Pages (from-to)359-366
Number of pages8
JournalProcedia Computer Science
Volume109
DOIs
StatePublished - 2017

Bibliographical note

Publisher Copyright:
© 2017 The Authors. Published by Elsevier B.V.

Keywords

  • Arabic sentiment analysis
  • Ensemble learning
  • Imbalanced dataset
  • Polarity classification
  • SMOTE
  • Tweets
  • Word embedding

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Using Word Embedding and Ensemble Learning for Highly Imbalanced Data Sentiment Analysis in Short Arabic Text'. Together they form a unique fingerprint.

Cite this