Abstract
Pose-based approaches for sign language recognition provide light-weight and fast models that can be adopted in real-time applications. This article presents a framework for isolated Arabic sign language recognition using hand and face keypoints. We employed MediaPipe pose estimator for extracting the keypoints of sign gestures in the video stream. Using the extracted keypoints, three models were proposed for sign language recognition: Long-Term Short Memory, Temporal Convolution Networks, and Transformer-based models. Moreover, we investigated the importance of non-manual features for sign language recognition systems and the obtained results showed that combining hand and face keypoints boosted the recognition accuracy by around 4% compared with only hand keypoints. The proposed models were evaluated on Arabic and Argentinian sign languages. Using the KArSL-100 dataset, the proposed pose-based Transformer achieved the highest accuracy of 99.74% and 68.2% in signer-dependent and -independent modes, respectively. Additionally, the Transformer was evaluated on the LSA64 dataset and obtained an accuracy of 98.25% and 91.09% in signer-dependent and -independent modes, respectively. Consequently, the pose-based Transformer outperformed the state-of-the-art techniques on both datasets using keypoints from the signer's hands and face.
Original language | English |
---|---|
Article number | 3 |
Journal | ACM Transactions on Asian and Low-Resource Language Information Processing |
Volume | 23 |
Issue number | 1 |
DOIs | |
State | Published - 15 Jan 2024 |
Bibliographical note
Publisher Copyright:© 2024 Association for Computing Machinery. All rights reserved.
Keywords
- Sign language recognition
- TCN
- arabic sign language
- gesture recognition
- pose recognition
- transformer
ASJC Scopus subject areas
- General Computer Science