Abstract
This paper presents a robust lexicon reduction technique using segment descriptors for Arabic handwritten text. The method segments an Arabic word into graphemes and adaptively generates a descriptor of the presence/absence of dots in those segments. The segmentation algorithm is based on the characteristic of Arabic script, which indicates predictable segmentations of Arabic characters. This in turn results in novel canonical segment descriptors for the lexicon entries. These descriptors are then used for lexicon reduction using a matching algorithm adapted for Arabic handwriting. Unlike other methods, features based on segment descriptors are computable for both word images and lexicon entries. Experimental results are reported on IfN/ENIT database which compare favorably with other approaches for lexicon reduction.
| Original language | English |
|---|---|
| Article number | 6628817 |
| Pages (from-to) | 1265-1269 |
| Number of pages | 5 |
| Journal | Proceedings of the International Conference on Document Analysis and Recognition, ICDAR |
| DOIs | |
| State | Published - 2013 |
Keywords
- canonical descriptor
- dot assignment
- lexicon reduction
- segment descriptor
ASJC Scopus subject areas
- Computer Vision and Pattern Recognition