Context-Sensitive Arabic Spell Checker Using Context Words and N-Gram Language Models

Majed M. Al-Jefri, Sabri A. Mahmoud

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Scopus citations

Abstract

This paper addresses real-word spell checking using context words and n-gram language models. A corpus that consists of different Arabic topics is collected. A collection of confusion sets is normally used in addressing real-word errors. Twenty eight confusion sets are chosen in our experiments. These sets were collected from the most common confused words made by non-native Arabic speakers and from OCR misrecognized words. The probabilities of the context words of the confusion sets are estimated using a window-based technique. N-gram language models are used to detect real-word errors and to choose the best correction for the errors once found. An automatic context-sensitive spell checking prototype that detects and corrects real-word errors in Arabic text is implemented. The experimental results showed promising correction accuracy.

Original languageEnglish
Title of host publicationProceedings - 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, NOORIC 2013
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages258-263
Number of pages6
ISBN (Electronic)9781479928231
DOIs
StatePublished - 25 Sep 2015

Publication series

NameProceedings - 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, NOORIC 2013

Bibliographical note

Publisher Copyright:
© 2015 IEEE.

Keywords

  • confusion sets
  • n-gram language models
  • real-word errors
  • spell checking

ASJC Scopus subject areas

  • Information Systems

Fingerprint

Dive into the research topics of 'Context-Sensitive Arabic Spell Checker Using Context Words and N-Gram Language Models'. Together they form a unique fingerprint.

Cite this