Enhancing Object Detection for VIPs Using YOLOv4_Resnet101 and Text-to-Speech Conversion Model

  • Tahani Jaser Alahmadi*
  • , Atta Ur Rahman
  • , Hend Khalid Alkahtani
  • , Hisham Kholidy
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

23 Scopus citations

Abstract

Vision impairment affects an individual’s quality of life, posing challenges for visually impaired people (VIPs) in various aspects such as object recognition and daily tasks. Previous research has focused on developing visual navigation systems to assist VIPs, but there is a need for further improvements in accuracy, speed, and inclusion of a wider range of object categories that may obstruct VIPs’ daily lives. This study presents a modified version of YOLOv4_Resnet101 as backbone networks trained on multiple object classes to assist VIPs in navigating their surroundings. In comparison to the Darknet, with a backbone utilized in YOLOv4, the ResNet-101 backbone in YOLOv4_Resnet101 offers a deeper and more powerful feature extraction network. The ResNet-101’s greater capacity enables better representation of complex visual patterns, which increases the accuracy of object detection. The proposed model is validated using the Microsoft Common Objects in Context (MS COCO) dataset. Image pre-processing techniques are employed to enhance the training process, and manual annotation ensures accurate labeling of all images. The module incorporates text-to-speech conversion, providing VIPs with auditory information to assist in obstacle recognition. The model achieves an accuracy of 96.34% on the test images obtained from the dataset after 4000 iterations of training, with a loss error rate of 0.073%.

Original languageEnglish
Article number77
JournalMultimodal Technologies and Interaction
Volume7
Issue number8
DOIs
StatePublished - Aug 2023
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2023 by the authors.

Keywords

  • YOLOv4_Resnet101
  • disabilities
  • object detection
  • recognition
  • text-to-speech conversion
  • tracking
  • visual impairments

ASJC Scopus subject areas

  • Neuroscience (miscellaneous)
  • Human-Computer Interaction
  • Computer Science Applications
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Enhancing Object Detection for VIPs Using YOLOv4_Resnet101 and Text-to-Speech Conversion Model'. Together they form a unique fingerprint.

Cite this