Statistical comparison of opinion spam detectors in social media with imbalanced datasets

El Sayed M. El-Alfy*, Sadam Al-Azani

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Sentiment analysis is a growing research area that analyzes people’s opinions towards a specific target using posts shared in social media. However, spammers can inject false opinions to change sentiment-oriented decisions, e.g. low quality products or policies can be promoted or advocated over others. Therefore, identifying and removing spam posts in social media is a crucial data cleaning operation for text mining tasks including sentiment analysis. An inherent problem related to spam detection is the imbalanced-class problem. In this paper, we explore the impact of imbalance ratio on the performance of Twitter spam detection using multiple approaches of single and ensemble classifiers. Besides ensemble-based learning (Bagging and Random forest), we apply the SMOTE oversampling technique to improve detection performance especially for classifiers sensitive to imbalanced datasets.

Original languageEnglish
Title of host publicationSecurity in Computing and Communications - 6th International Symposium, SSCC 2018, Revised Selected Papers
EditorsSabu M. Thampi, Danda B. Rawat, Jose M. Alcaraz Calero, Sanjay Madria, Guojun Wang
PublisherSpringer Verlag
Pages157-167
Number of pages11
ISBN (Print)9789811358258
DOIs
StatePublished - 2019

Publication series

NameCommunications in Computer and Information Science
Volume969
ISSN (Print)1865-0929

Bibliographical note

Publisher Copyright:
© Springer Nature Singapore Pte Ltd. 2019.

Keywords

  • Imbalanced dataset
  • Opinion spam detection
  • SMOTE
  • Sentiment analysis
  • Social big data
  • Social media security

ASJC Scopus subject areas

  • General Computer Science
  • General Mathematics

Fingerprint

Dive into the research topics of 'Statistical comparison of opinion spam detectors in social media with imbalanced datasets'. Together they form a unique fingerprint.

Cite this