Detection of Arabic spam tweets using word embedding and machine learning

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

18 Scopus citations

Abstract

Twitter has become one of the most popular social networking platforms for sharing activities and opinions. In this study, we explore the idea of applying word embedding based features with machine-learning techniques to detect Arabic spam tweets. In addition, the effects of text domain of the collected corpus to learn word embedding is analyzed. This is evaluated using a publicly available dataset of 3503 tweets alongside with three popular classifiers for binary classification, namely: Naïve Bayes, Decision trees and SVM. The experimental results reveal that the proposed method outperforms the baseline approach in distinguishing between machine-generated tweets and human-generated tweets. An accuracy rate of 87.33% is achieved using skip-gram word2vec technique with SVM.

Original languageEnglish
Title of host publication2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies, 3ICT 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538692073
DOIs
StatePublished - Nov 2018

Publication series

Name2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies, 3ICT 2018

Bibliographical note

Publisher Copyright:
© 2018 IEEE.

ASJC Scopus subject areas

  • Safety, Risk, Reliability and Quality
  • Information Systems and Management
  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Detection of Arabic spam tweets using word embedding and machine learning'. Together they form a unique fingerprint.

Cite this