Robust Benchmark for Propagandist Text Detection and Mining High-Quality Data

  • Pir Noman Ahmad*
  • , Yuanchao Liu
  • , Gauhar Ali
  • , Mudasir Ahmad Wani*
  • , Mohammed ElAffendi
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

20 Scopus citations

Abstract

Social media, fake news, and different propaganda strategies have all contributed to an increase in misinformation online during the past ten years. As a result of the scarcity of high-quality data, the present datasets cannot be used to train a deep-learning model, making it impossible to establish an identification. We used a natural language processing approach to the issue in order to create a system that uses deep learning to automatically identify propaganda in news items. To assist the scholarly community in identifying propaganda in text news, this study suggested the propaganda texts (ProText) library. Truthfulness labels are assigned to ProText repositories after being manually and automatically verified with fact-checking methods. Additionally, this study proposed using a fine-tuned Robustly Optimized BERT Pre-training Approach (RoBERTa) and word embedding using multi-label multi-class text classification. Through experimentation and comparative research analysis, we address critical issues and collaborate to discover answers. We achieved an evaluation performance accuracy of 90%, 75%, 68%, and 65% on ProText, PTC, TSHP-17, and Qprop, respectively. The big-data method, particularly with deep-learning models, can assist us in filling out unsatisfactory big data in a novel text classification strategy. We urge collaboration to inspire researchers to acquire, exchange datasets, and develop a standard aimed at organizing, labeling, and fact-checking.

Original languageEnglish
Article number2668
JournalMathematics
Volume11
Issue number12
DOIs
StatePublished - Jun 2023
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2023 by the authors.

Keywords

  • ProText
  • big data
  • fact-check
  • misinformation
  • propaganda
  • social media

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Mathematics
  • Engineering (miscellaneous)

Fingerprint

Dive into the research topics of 'Robust Benchmark for Propagandist Text Detection and Mining High-Quality Data'. Together they form a unique fingerprint.

Cite this