Detection and Recognition of Bilingual Urdu and English Text in Natural Scene Images Using a Convolutional Neural Network–Recurrent Neural Network Combination with a Connectionist Temporal Classification Decoder

  • Khadija Tul Kubra
  • , Muhammad Umair
  • , Muhammad Zubair
  • , Muhammad Tahir Naseem*
  • , Chan Su Lee*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Urdu and English are widely used for visual text communications worldwide in public spaces such as signboards and navigation boards. Text in such natural scenes contains useful information for modern-era applications such as language translation for foreign visitors, robot navigation, and autonomous vehicles, highlighting the importance of extracting these texts. Previous studies focused on Urdu alone or printed text pasted manually on images and lacked sufficiently large datasets for effective model training. Herein, a pipeline for Urdu and English (bilingual) text detection and recognition in complex natural scene images is proposed. Additionally, a unilingual dataset is converted into a bilingual dataset and augmented using various techniques. For implementations, a customized convolutional neural network is used for feature extraction, a recurrent neural network (RNN) is used for feature learning, and connectionist temporal classification (CTC) is employed for text recognition. Experiments are conducted using different RNNs and hidden units, which yield satisfactory results. Ablation studies are performed on the two best models by eliminating model components. The proposed pipeline is also compared to existing text detection and recognition methods. The proposed models achieved average accuracies of 98.5% for Urdu character recognition, 97.2% for Urdu word recognition, and 99.2% for English character recognition.

Original languageEnglish
Article number5133
JournalSensors
Volume25
Issue number16
DOIs
StatePublished - Aug 2025

Bibliographical note

Publisher Copyright:
© 2025 by the authors.

Keywords

  • bidirectional gated recurrent unit
  • bidirectional long short-term memory
  • bilingual
  • connectionist temporal classification
  • convolutional neural network
  • convolutional recurrent neural network
  • multilingual
  • natural scene images
  • recurrent neural network
  • text recognition

ASJC Scopus subject areas

  • Analytical Chemistry
  • Information Systems
  • Atomic and Molecular Physics, and Optics
  • Biochemistry
  • Instrumentation
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Detection and Recognition of Bilingual Urdu and English Text in Natural Scene Images Using a Convolutional Neural Network–Recurrent Neural Network Combination with a Connectionist Temporal Classification Decoder'. Together they form a unique fingerprint.

Cite this