Abstract
This study introduces a character-level approach specifically designed for Arabic NLP tasks, offering a novel and highly effective solution to the unique challenges inherent in Arabic language processing. It presents a thorough comparative study of various character-level models, including Convolutional Neural Networks (CNNs), pre-trained transformers (CANINE), and Bidirectional Long Short-Term Memory networks (BiLSTMs), assessing their performance and exploring the impact of different data augmentation techniques on enhancing their effectiveness. Additionally, it introduces two innovative Arabic-specific data augmentation methods-vowel deletion and style transfer-and rigorously evaluates their effectiveness. The proposed approach was evaluated on Arabic privacy policy classification task as a case study, demonstrating significant improvements in model performance, reporting a micro-averaged F1-score of 93.8%, surpassing state-of-the-art. Our code is publicly available available at https://github.com/mohanad-hafez/char_models_arabic_nlp.
| Original language | English |
|---|---|
| Title of host publication | Main Conference |
| Editors | Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 2744-2757 |
| Number of pages | 14 |
| ISBN (Electronic) | 9798891761964 |
| State | Published - 2025 |
| Event | 31st International Conference on Computational Linguistics, COLING 2025 - Abu Dhabi, United Arab Emirates Duration: 19 Jan 2025 → 24 Jan 2025 |
Publication series
| Name | Proceedings - International Conference on Computational Linguistics, COLING |
|---|---|
| ISSN (Print) | 2951-2093 |
Conference
| Conference | 31st International Conference on Computational Linguistics, COLING 2025 |
|---|---|
| Country/Territory | United Arab Emirates |
| City | Abu Dhabi |
| Period | 19/01/25 → 24/01/25 |
Bibliographical note
Publisher Copyright:© 2025 Association for Computational Linguistics.
ASJC Scopus subject areas
- Theoretical Computer Science
- Computer Science Applications
- Computational Theory and Mathematics