Abstract
Twitter has become one of the most popular social networking platforms for sharing activities and opinions. In this study, we explore the idea of applying word embedding based features with machine-learning techniques to detect Arabic spam tweets. In addition, the effects of text domain of the collected corpus to learn word embedding is analyzed. This is evaluated using a publicly available dataset of 3503 tweets alongside with three popular classifiers for binary classification, namely: Naïve Bayes, Decision trees and SVM. The experimental results reveal that the proposed method outperforms the baseline approach in distinguishing between machine-generated tweets and human-generated tweets. An accuracy rate of 87.33% is achieved using skip-gram word2vec technique with SVM.
Original language | English |
---|---|
Title of host publication | 2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies, 3ICT 2018 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781538692073 |
DOIs | |
State | Published - Nov 2018 |
Publication series
Name | 2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies, 3ICT 2018 |
---|
Bibliographical note
Publisher Copyright:© 2018 IEEE.
ASJC Scopus subject areas
- Safety, Risk, Reliability and Quality
- Information Systems and Management
- Artificial Intelligence
- Computer Networks and Communications
- Computer Science Applications
- Information Systems