Abstract
Arabic letters in their early stages were only shapes (Rasm) without dots. Dots were added later to ease reading and reduce ambiguity. Thereafter, diacritics were introduced for phonetic guidance, mainly for nonnative speakers. Many studies have been conducted to automatically diacritize Arabic texts using machine learning techniques. However, to the best of our knowledge, automatically adding dots to Arabic Rasms has not been reported in the literature. In this work, we present the automatic addition of dots to Arabic Rasms using deep recurrent neural networks. Different design choices were explored, including the use of character sequences and word sequences as tokens. The presented techniques were evaluated on four diverse publicly available datasets. Character-level models with stacked BiGRU architecture outperformed all the other architectures with character error rates ranging from 2.0% to 5.5% and dottization error rates ranging from 4.2% to 11.0% on independent test sets.
| Original language | English |
|---|---|
| Pages (from-to) | 47-55 |
| Number of pages | 9 |
| Journal | Pattern Recognition Letters |
| Volume | 162 |
| DOIs | |
| State | Published - Oct 2022 |
Bibliographical note
Publisher Copyright:© 2022 Elsevier B.V.
Keywords
- Arabic NLP
- Bidirectional RNNs
- Deep learning
- Dottization of Arabic Rasms
ASJC Scopus subject areas
- Software
- Signal Processing
- Computer Vision and Pattern Recognition
- Artificial Intelligence
Fingerprint
Dive into the research topics of 'Automatic dottization of Arabic text (Rasms) using deep recurrent neural networks'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver