Abstract
Sentiment analysis, crucial for discerning emotional tones in text, relies on manual annotation to train machine learning models and is considered the gold standard for creating annotated corpora. However, this process is time-consuming, labour-intensive, and prone to biases. This paper proposes an automatic annotation approach for the Twitter-based Arabic Mental Illness (AMI) dataset, which encompasses both Modern Standard Arabic and Dialectal Arabic. The approach leverages transfer learning with existing manually annotated datasets and three advanced Arabic language models to automate annotation, thereby enriching Arabic as a low-resource language with labelled sentiment data. Validation was conducted by comparing the automatically generated annotations to manual annotation on the same dataset, achieving strong inter-annotator agreement with a Cohen's Kappa statistic of k = 0.8457. Additionally, various baseline models were evaluated on the AMI dataset, identifying AraBERT as the top performer with the highest F1 score and accuracy.
| Original language | English |
|---|---|
| Article number | e70128 |
| Journal | Expert Systems |
| Volume | 42 |
| Issue number | 10 |
| DOIs | |
| State | Published - Oct 2025 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2025 The Author(s). Expert Systems published by John Wiley & Sons Ltd.
Keywords
- Arabic sentiment analysis
- Corpus
- data annotation
- language models
ASJC Scopus subject areas
- Control and Systems Engineering
- Theoretical Computer Science
- Computational Theory and Mathematics
- Artificial Intelligence
Fingerprint
Dive into the research topics of 'Transfer Learning-Based Automatic Sentiment Annotation of a Twitter-Based Arabic Mental Illness (AMI) Dataset'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver