Article Preview
Top1. Introduction
A gesture is an essential part of communication between humans in day-to-day life. For the Deaf and hard-of-hearing persons, gestures are their primary mode of communication. The Deaf community uses sign language to interact with each other and with hearing people. Because not all Deaf people can read or write, sign language is a natural language for them. When hearing people do not understand sign language used by the Deaf, communication becomes difficult. It becomes even more complicated when an emergency arises, and other people cannot understand the message that the Deaf are attempting to convey. Human interpreters are one of the best alternatives in such an emergency. However, the problem arises when human interpreters are not available, as there are only about 300 Indian Sign Language (ISL) certified human interpreters in India (ISLRTC n.d.). Furthermore, the conversation loses its privacy because the ISL interpreter is present.
Sign language recognition and translation, if automated, can be one of the best alternatives to assist the Deaf community in filling this gap. There are two approaches to automating this process of sign language recognition and translation: (i) Sign language to text conversion, which recognizes signs and converts them into text so that other people can understand the words communicated by Deaf, and (ii) Text to sign language conversion, which helps Deaf understand what hearing people are trying to communicate. The system implemented using any of these approaches should work with different backgrounds, with different people of different ages performing signs. Such an approach can then be used in public places or emergencies such as police stations, hospitals, airports, and railway stations, and many others.
In this paper, the problem of converting ISL to English text is approached as work in this area is still in its infancy. The work for converting sign language to text is divided into two parts: (i) Recognizing sign language gestures and classifying them into gloss sequence, and (ii) Translating ISL gloss sequence into text. This distinction is critical because the grammars of spoken and sign languages are very different. Differences in word sequencing, different channels for conveying concurrent information, and the use of direction and space to indicate object connections are examples of these distinctions. Simply put, there is no simple word-to-sign mapping; the mapping between speech and sign is complicated. Until now, most researchers have considered sign language recognition a gesture recognition problem. Furthermore, most research work in ISL recognition is carried out for manual components, ignoring non-manual components such as facial expressions, body movement, and head movement. Also, research is limited to recognize letters and words in ISL, with little work carried out on sentence recognition. Also, research on ISL sentence recognition and translation has focused on local variants of ISL rather than standardized ISL.
In order to bring a common sign language at the national level, the Indian Sign Language Research and Training Centre (ISLRTC) has taken one initiative to launch a sign language dictionary (ISLRTC n.d.). To the best of our knowledge, no work is reported for Indian sign language recognition using the official dictionary of ISL words launched by ISLRTC. ISLRTC released two ISL word dictionaries in 2018 and 2019, with 3000 and 6000 words, respectively, to promote the use of ISL as an educational model for the Deaf community and train people to facilitate communication between Deaf and others. Therefore, the problem of national-level sign language recognition and translation for ISL sentences into semantically equivalent English text is emphasized by (i) Creating an ISL dataset with 47880 videos of 13720 ISL sentences and (ii) Creating an ISL corpus from the Brown corpus (Brown Corpus n.d.) to motivate research in the field of machine translation. These ISL glosses, for example, can be converted in the Hamburg Notation System (HamNoSys) and other notations to facilitate spoken language generation in a variety of languages.