Abstract
With the proliferation of social media and mobile technology, huge amount of unstructured data is posted daily online. Consequently, sentiment analysis has gained increasing importance as a tool to understand the opinions of certain groups of people on contemporary political, cultural, social or commercial issues. Unlike western languages, the research on sentiment analysis for dialectical Arabic language is still in its early stages with several challenges to be addressed. The main goal of this study is twofold. First, it compares the performance of core machine learning algorithms for detecting the polarity in imbalanced Arabic tweet datasets using neural word embedding as a feature extractor rather than hand-crafted or traditional features. Second, it examines the impact of using various oversampling techniques to handle the highly-imbalanced nature of the sentiment data. Intensive empirical analysis of nine machine learning methods and six oversampling methods has been conducted and the results have been discussed in terms of a wide range of performance measures.
Original language | English |
---|---|
Pages (from-to) | 6211-6222 |
Number of pages | 12 |
Journal | Journal of Intelligent and Fuzzy Systems |
Volume | 38 |
Issue number | 5 |
DOIs | |
State | Published - 2020 |
Bibliographical note
Publisher Copyright:© 2020 - IOS Press and the authors. All rights reserved.
Keywords
- Arabic tweets
- Social network
- imbalanced dataset
- machine learning
- polarity detection
- sentiment analysis
- word embedding
ASJC Scopus subject areas
- Statistics and Probability
- General Engineering
- Artificial Intelligence