Automatic dottization of Arabic text (Rasms) using deep recurrent neural networks

  • Zainab Alhathloul
  • , Irfan Ahmad*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Arabic letters in their early stages were only shapes (Rasm) without dots. Dots were added later to ease reading and reduce ambiguity. Thereafter, diacritics were introduced for phonetic guidance, mainly for nonnative speakers. Many studies have been conducted to automatically diacritize Arabic texts using machine learning techniques. However, to the best of our knowledge, automatically adding dots to Arabic Rasms has not been reported in the literature. In this work, we present the automatic addition of dots to Arabic Rasms using deep recurrent neural networks. Different design choices were explored, including the use of character sequences and word sequences as tokens. The presented techniques were evaluated on four diverse publicly available datasets. Character-level models with stacked BiGRU architecture outperformed all the other architectures with character error rates ranging from 2.0% to 5.5% and dottization error rates ranging from 4.2% to 11.0% on independent test sets.

Original languageEnglish
Pages (from-to)47-55
Number of pages9
JournalPattern Recognition Letters
Volume162
DOIs
StatePublished - Oct 2022

Bibliographical note

Publisher Copyright:
© 2022 Elsevier B.V.

Keywords

  • Arabic NLP
  • Bidirectional RNNs
  • Deep learning
  • Dottization of Arabic Rasms

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Automatic dottization of Arabic text (Rasms) using deep recurrent neural networks'. Together they form a unique fingerprint.

Cite this