A Multi-Layer Holistic Approach for Cursive Text Recognition

  • Muhammad Umair*
  • , Muhammad Zubair*
  • , Farhan Dawood
  • , Sarim Ashfaq
  • , Muhammad Shahid Bhatti
  • , Mohammad Hijji
  • , Abid Sohail
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

Urdu is a widely spoken and narrated language in several South-Asian countries and communities worldwide. It is relatively hard to recognize Urdu text compared to other languages due to its cursive writing style. The Urdu text script belongs to a non-Latin cursive family script like Arabic, Hindi and Chinese. Urdu is written in several writing styles, among which ‘Nastaleeq’ is the most popular and widely used font style. A gap still poses a challenge for localization/detection and recognition of Urdu Nastaleeq text as it follows modified version of Arabic script. This research study presents a methodology to recognize and classify Urdu text in Nastaleeq font, regardless of the text position in the image. The proposed solution is comprised of a two-step methodology. In the first step, text detection is performed using the Connected Component Analysis (CCA) and Long Short-Term Memory Neural Network (LSTM). In the second step, a hybrid Convolution Neural Network and Recurrent Neural Network (CNN-RNN) architecture is deployed to recognize the detected text. The image containing Urdu text is binarized and segmented to produce a single-line text image fed to the hybrid CNN-RNN model, which recognizes the text and saves it in a text file. The proposed technique outperforms the existing ones by achieving an overall accuracy of 97.47%.

Original languageEnglish
Article number12652
JournalApplied Sciences (Switzerland)
Volume12
Issue number24
DOIs
StatePublished - Dec 2022
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2022 by the authors.

Keywords

  • deep learning applications
  • machine learning
  • natural language processing
  • natural language understanding
  • text detection
  • text recognition

ASJC Scopus subject areas

  • General Materials Science
  • Instrumentation
  • General Engineering
  • Process Chemistry and Technology
  • Computer Science Applications
  • Fluid Flow and Transfer Processes

Fingerprint

Dive into the research topics of 'A Multi-Layer Holistic Approach for Cursive Text Recognition'. Together they form a unique fingerprint.

Cite this