Multi-Input CNN-LSTM for End-to-End Indian Sign Language Recognition: A Use Case With Wearable Sensors

Multi-Input CNN-LSTM for End-to-End Indian Sign Language Recognition: A Use Case With Wearable Sensors

Rinki Gupta
Copyright: © 2022 |Pages: 19
DOI: 10.4018/978-1-7998-9434-6.ch008
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Sign language predominantly involves the use of various hand postures and hand motions to enable visual communication. However, signing is mostly unfamiliar and not understood by normal hearing people due to which signers often rely on sign language interpreters. In this chapter, a novel multi-input deep learning model is proposed for end-to-end recognition of 50 common signs from Indian Sign Language (ISL). The ISL dataset is developed using multiple wearable sensors on the dominant hand that can record surface electromyogram, tri-axial accelerometer, and tri-axial gyroscope data. Multi-channel data from these three modalities is processed in a multi-input deep neural network with stacked convolutional neural network (CNN) and long short-term memory (LSTM) layers. The performance of the proposed multi-input CNN-LSTM model is compared with the traditional single-input approach in terms of quantitative performance measures. The multi-input approach yields around 5% improvement in classification accuracy over the traditional single-input approach.
Chapter Preview
Top

Introduction

Sign language is primarily used by deaf and mute people for communication. A deaf person often finds it difficult to communicate with other people in form of written communication, since that would require knowledge of verbal languages such as English and Hindi, which are difficult to learn without the sense of hearing. On the other hand, sign language is a combination of hand gestures, facial expressions and body language in a well-defined semantics and lexicon that is easy to learn and utilize for a deaf person (Kudrinko, 2018; Gupta, & Kumar, 2021). However, sign language is not common among the hearing community making it difficult for a deaf person to communicate with them in signing. An electronic translator for converting sign language to spoken language could greatly enhance the communication between a sign language user and a non-signer. With this motivation, several researchers have reported about development of sign language recognition. Hundred and seventeen papers on recognition systems reported for 25 different sign languages over the past decade have been reviewed in (Wadhawan, 2021). Sign language recognition has been performed using non-wearable devices by capturing images, videos, depth or color related information, as well as wearable sensors such as flex sensors, motion sensors, electromyograms and even wi-fi signals. The reported research either explores the use of only the dominant hand during signing or both the hands. Moreover, the considered signs are either just static postures or information about the dynamic motion of hands is also captured. The review concludes that over 54% of work is dedicated towards reliably recognizing the posture of static hand(s) during an isolated sign (Wadhawan, 2021). Around 75% of the research was found to be focused on processing information from only the dominant hand. Use of vision-based approaches were found to be most popular, with 44% using camera and 23% using Kinect or leap motion sensor for data acquisition. The recognition may be carried out using machine learning and deep learning techniques.

A survey of literature on vision-based sign language recognition using deep learning techniques is presented in (Rastgoo, 2020). The authors report that isolated hand gestures without motion, such as those used for signing numerals and most alphabets may be classified using images. However, for identifying dynamic signs at word or sentence level, videos are required for data acquisition. Convolutional neural networks (CNN) and recurrent neural networks (RNN) have been found to be most prominently used for classification of hand and body posture and for hand tracking in dynamic signing. For instance, Wadhawan et al. evaluated 50 CNN models for determining hand posture from RGB images of 100 different signs from the Indian sign language (ISL) to achieve classification accuracy as high as 99.9% (Wadhawan, 2020). Hand pose has been determined using depth sensors for American sign language (ASL) in (Kolivand, 2021). First, the hand and forearm are segmented from the depth image of the hand sign. Thereafter, denoising and extraction of geometrical features is carried out to achieve classification accuracy of up to 96.78% with artificial neural network (ANN). Videos have also been used for recognition of dynamic signs. In (Masood, 2018), the authors used a combination of CNN and RNN to extract spatial and temporal features from video inputs to classify 46 gestures from Argentinean Sign Language with 95.2% accuracy. Despite their successful use for sign language recognition, some of the major challenges encountered with vision-based approaches are partial occlusion, poor illumination and background clutter (Rastgoo, 2020).

Key Terms in this Chapter

Electromyogram: A recording of the muscle potentials, used to detect activity or in monitoring health of muscle.

Long Short-Term Memory: Long short term memory or LSTM is a recurrent neural network that contains feedback connections and are commonly used with time-series data.

Classification Accuracy: The number of times a classification models makes a correct prediction as compared to the total number of predictions made by the classifier, stated in terms of percentage.

Gyroscope: A sensor that can record turn rate in terms of angle per unit time.

Convolutional Neural Network: Convolutional neural network, also known as ConvNet or CNN is a deep feed forward neural network that makes use of operation of convolution with sliding kernel to generate representative feature map of the input.

Wearable Sensors: Worn on body and consisting of a sensor that can record physical phenomenon in terms of electrical signal.

Accelerometer: A sensor that can record acceleration due to gravity as well as linear acceleration caused by motion of the sensor.

Complete Chapter List

Search this Book:
Reset