Abstract
In this work, we are proposing a new technique for visual recognition of fingerspelling of a sign language by fusing multiple spatial and spectral representations of manual gesture images using a convolutional neural network. This problem is gaining prominence in communication between hearing-impaired people and human-machine interaction. The proposed technique computes Gabor spectral representations of spatial images of hand sign gestures and uses an optimized convolutional neural network to classify the gestures in the joint space into corresponding classes. Various ways to combine both types of modalities are explored to identify the model that improves the robustness and recognition accuracy. The proposed system is evaluated using three databases (MNIST-ASL, ArSL, and MUASL) under different conditions and the attained results outperformed the state-of-the-art techniques.
Original language | English |
---|---|
Pages (from-to) | 10213-10234 |
Number of pages | 22 |
Journal | Multimedia Tools and Applications |
Volume | 80 |
Issue number | 7 |
DOIs | |
State | Published - Mar 2021 |
Bibliographical note
Publisher Copyright:© 2020, Springer Science+Business Media, LLC, part of Springer Nature.
Keywords
- Deep learning
- Gabor filter
- Hand gesture
- Human-machine interaction
- Multimodal recognition systems
- Sign language
ASJC Scopus subject areas
- Software
- Media Technology
- Hardware and Architecture
- Computer Networks and Communications