Abstract
Sign languages are the main visual communication medium between hard-hearing people and their societies. Similar to spoken languages, they are not universal and vary from region to region, but they are relatively under-resourced. Arabic sign language (ArSL) is one of these languages that has attracted increasing attention in the research community. However, most of the existing and available works on sign language recognition systems focus on manual gestures, ignoring other non-manual information needed for other language signals such as facial expressions. One of the main challenges of not considering these modalities is the lack of suitable datasets. In this paper, we propose a new multi-modality ArSL dataset that integrates various types of modalities. It consists of 6748 video samples of fifty signs performed by four signers and collected using Kinect V2 sensors. This dataset will be freely available for researchers to develop and benchmark their techniques for further advancement of the field. In addition, we evaluated the fusion of spatial and temporal features of different modalities, manual and non-manual, for sign language recognition using the state-of-the-art deep learning techniques. This fusion boosted the accuracy of the recognition system at the signer-independent mode by 3.6% compared with manual gestures.
Original language | English |
---|---|
Article number | 1739 |
Journal | Electronics (Switzerland) |
Volume | 10 |
Issue number | 14 |
DOIs | |
State | Published - 2 Jul 2021 |
Bibliographical note
Publisher Copyright:© 2021 by the authors. Licensee MDPI, Basel, Switzerland.
Keywords
- Facial expressions
- Gesture recognition
- Non-manual gestures
- Sign language database
- Sign language recognition
- Sign language translation
ASJC Scopus subject areas
- Control and Systems Engineering
- Signal Processing
- Hardware and Architecture
- Computer Networks and Communications
- Electrical and Electronic Engineering