KHATT: An open Arabic offline handwritten text database

Sabri A. Mahmoud*, Irfan Ahmad, Wasfi G. Al-Khatib, Mohammad Alshayeb, Mohammad Tanvir Parvez, Volker Märgner, Gernot A. Fink

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

160 Scopus citations

Abstract

A comprehensive Arabic handwritten text database is an essential resource for Arabic handwritten text recognition research. This is especially true due to the lack of such database for Arabic handwritten text. In this paper, we report our comprehensive Arabic offline Handwritten Text database (KHATT) consisting of 1000 handwritten forms written by 1000 distinct writers from different countries. The forms were scanned at 200, 300, and 600 dpi resolutions. The database contains 2000 randomly selected paragraphs from 46 sources, 2000 minimal text paragraph covering all the shapes of Arabic characters, and optionally written paragraphs on open subjects. The 2000 random text paragraphs consist of 9327 lines. The database forms were randomly divided into 70%, 15%, and 15% sets for training, testing, and verification, respectively. This enables researchers to use the database and compare their results. A formal verification procedure is implemented to align the handwritten text with its ground truth at the form, paragraph and line levels. The verified ground truth database contains meta-data describing the written text at the page, paragraph, and line levels in text and XML formats. Tools to extract paragraphs from pages and segment paragraphs into lines are developed. In addition we are presenting our experimental results on the database using two classifiers, viz. Hidden Markov Models (HMM) and our novel syntactic classifier. The database is made freely available to researchers world-wide for research in various handwritten-related problems such as text recognition, writer identification and verification, forms analysis, pre-processing, segmentation. Several international research groups/researchers acquired the database for use in their research so far.

Original languageEnglish
Pages (from-to)1096-1112
Number of pages17
JournalPattern Recognition
Volume47
Issue number3
DOIs
StatePublished - Mar 2014

Bibliographical note

Funding Information:
The authors would like to acknowledge the support provided by King Abdul-Aziz City for Science and Technology (KACST) through the Science & Technology Unit at King Fahd University of Petroleum & Minerals (KFUPM) for funding this work through project no. 08-INF99-4 as part of the National Science, Technology and Innovation Plan . In addition, we would like to thank all the writers and persons who contributed to this database. We also thank the anonymous reviewers whose comments have improved this paper.

Keywords

  • Arabic OCR
  • Arabic handwritten text database
  • Document analysis
  • Form processing

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'KHATT: An open Arabic offline handwritten text database'. Together they form a unique fingerprint.

Cite this