Abstract
Sign language recognition (SLR) has been an active research area due to the difficulty of interpreting hand and upper body movements in real life. Dynamic finger-spelling recognition is a very challenging task due to the problem associated with algo-rithms attempting to understand the meaning of fingerspelling from real-time videos. In this research, we propose the fusion-based convolutional recurrent neural network (CR-NN) that fuses a three-dimensional convolutional neural network (3D-CNN) and CNN model for extracting robust spatiotemporal features from the sequential images in a video. The fusion based CRNN framework was divided into deep feature extraction and sequence learning modules. In the deep feature extraction, the video was extracted and only 32 frames were selected. Additionally, we trained a YOLOv5 model for detecting or localizing the upper body of a human designed region of interest (ROI). After calculat-ing the ROI, it was sent to 3D-CNN and CNN to extract the solid sequential features. Furthermore, an addition operator was used in merging the sequential features, and the resulting features were passed to a sequence learning mechanism (bidirectional long short-term memory) in creating a robust model for recognizing dynamic fingerspelling. In the experiments, we evaluated the fusion based CRNN on the dynamic Thai fingerspelling dataset, including short videos of 42 classes from 3,025 videos. The experimental results indicated that the fusion based CRNN achieved an accuracy of 91.73% on the dynamic Thai fingerspelling dataset and outperformed the existing method.
| Original language | English |
|---|---|
| Pages (from-to) | 201-210 |
| Number of pages | 10 |
| Journal | ICIC Express Letters, Part B: Applications |
| Volume | 16 |
| Issue number | 2 |
| DOIs | |
| State | Published - Feb 2025 |
Bibliographical note
Publisher Copyright:© 2025 ICIC International.
Keywords
- 3D convolutional neural network
- Bidirectional long short-term memory
- Dynamic fingerspelling recognition
- Fusion strategy
- Spatiotemporal feature
ASJC Scopus subject areas
- General Computer Science