Empirical study on imbalanced learning of Arabic sentiment polarity with neural word embedding

El Sayed M. El-Alfy*, Sadam Al-Azani

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

With the proliferation of social media and mobile technology, huge amount of unstructured data is posted daily online. Consequently, sentiment analysis has gained increasing importance as a tool to understand the opinions of certain groups of people on contemporary political, cultural, social or commercial issues. Unlike western languages, the research on sentiment analysis for dialectical Arabic language is still in its early stages with several challenges to be addressed. The main goal of this study is twofold. First, it compares the performance of core machine learning algorithms for detecting the polarity in imbalanced Arabic tweet datasets using neural word embedding as a feature extractor rather than hand-crafted or traditional features. Second, it examines the impact of using various oversampling techniques to handle the highly-imbalanced nature of the sentiment data. Intensive empirical analysis of nine machine learning methods and six oversampling methods has been conducted and the results have been discussed in terms of a wide range of performance measures.

Original languageEnglish
Pages (from-to)6211-6222
Number of pages12
JournalJournal of Intelligent and Fuzzy Systems
Volume38
Issue number5
DOIs
StatePublished - 2020

Bibliographical note

Publisher Copyright:
© 2020 - IOS Press and the authors. All rights reserved.

Keywords

  • Arabic tweets
  • Social network
  • imbalanced dataset
  • machine learning
  • polarity detection
  • sentiment analysis
  • word embedding

ASJC Scopus subject areas

  • Statistics and Probability
  • General Engineering
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Empirical study on imbalanced learning of Arabic sentiment polarity with neural word embedding'. Together they form a unique fingerprint.

Cite this