WASM: A Dataset for Hashtag Recommendation for Arabic Tweets

Maged S. Al-Shaibani, Hamzah Luqman*, Abdulaziz S. Al-Ghofaily, Abdullatif A. Al-Najim

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

As one of the largest microblogging websites in the world, Twitter generates a huge amount of information daily. The massive size of the generated data increases the difficulty for humans to follow and receive information relevant to their interests. Therefore, Twitter allows users to annotate and categorize their tweets using appropriate hashtags. However, finding an appropriate hashtag for a tweet is not always straightforward. Furthermore, many users violate the hashtag flow by posting irrelevant content to the hashtag topic. These problems increase the need for a hashtag recommendation and classification system. This topic has received considerable attention from researchers in some languages, such as English and Chinese. However, this problem has not yet been explored for the Arabic language owing to the lack of datasets. In this study, we bridge this gap by proposing WASM, an Arabic Twitter hashtag recommendation dataset consisting of more than 100,000 tweets annotated with 87 hashtags. The proposed dataset is subjected to several rounds of automatic and manual filtrations to ensure that it is suitable for tasks related to tweets and hashtags. Further, we propose three systems for hashtag recommendation and classification. Each of these systems approaches the task differently by considering it as classification, generation, and named entity recognition problems. The results obtained using these systems are promising and can be used to benchmark the WASM dataset. The data and code are available at https://github.com/Hamzah-Luqman/wasm.

Original languageEnglish
Pages (from-to)12131-12145
Number of pages15
JournalArabian Journal for Science and Engineering
Volume49
Issue number9
DOIs
StatePublished - Sep 2024

Bibliographical note

Publisher Copyright:
© King Fahd University of Petroleum & Minerals 2024.

Keywords

  • Arabic Tweets
  • Hashtag Generation
  • Hashtag Recommendation
  • Hashtags
  • Tweets Classification
  • Twitter

ASJC Scopus subject areas

  • General

Fingerprint

Dive into the research topics of 'WASM: A Dataset for Hashtag Recommendation for Arabic Tweets'. Together they form a unique fingerprint.

Cite this