Abstract
This paper addresses real-word spell checking using context words and n-gram language models. A corpus that consists of different Arabic topics is collected. A collection of confusion sets is normally used in addressing real-word errors. Twenty eight confusion sets are chosen in our experiments. These sets were collected from the most common confused words made by non-native Arabic speakers and from OCR misrecognized words. The probabilities of the context words of the confusion sets are estimated using a window-based technique. N-gram language models are used to detect real-word errors and to choose the best correction for the errors once found. An automatic context-sensitive spell checking prototype that detects and corrects real-word errors in Arabic text is implemented. The experimental results showed promising correction accuracy.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, NOORIC 2013 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 258-263 |
| Number of pages | 6 |
| ISBN (Electronic) | 9781479928231 |
| DOIs | |
| State | Published - 25 Sep 2015 |
Publication series
| Name | Proceedings - 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, NOORIC 2013 |
|---|
Bibliographical note
Publisher Copyright:© 2015 IEEE.
Keywords
- confusion sets
- n-gram language models
- real-word errors
- spell checking
ASJC Scopus subject areas
- Information Systems