Enhanced Assistive Technology on Audio-Visual Speech Recognition for the Hearing Impaired

Enhanced Assistive Technology on Audio-Visual Speech Recognition for the Hearing Impaired

DOI: 10.4018/978-1-6684-8851-5.ch017
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

People who have difficulty hearing can use speech recognition software to communicate differently. The task is audio-visual speech recognition for better lip-reading comprehension. Audio speech recognition is the process of turning spoken words into text. The neural network model is trained using the Librispeech dataset. The input sound signal creates sound frames with a stride of 10 milliseconds and a window size of 20-25 milliseconds. It uses audio as the input, and feature extraction extracts information from features. A visual speech recognition system automatically recognizes spoken words by observing how the speaker moves their lips. The suggestion considers body language to understand the communicator's spoken words, increasing interpretation accuracy by 5.05%.
Chapter Preview
Top

Introduction

Natural language processing (Chowdhary, 2022), signal processing (Orfanidis, 1995), and artificial intelligence (Ambika, 2022) all play a role in the field of speech recognition systems, which is interdisciplinary. Discourse is a ceaseless sound sign with a succession of phonemes and the critical method of human communication through hearing disabled individuals perceive words verbally expressed by perusing a lip.

The variety of hearing inside the gathering is partially settled by the sound they can see. This sound is estimated in decibels (dB), laying out a breaking point inside, which is considered a solid level of sound that doesn't surpass 85 dB during a particular timeframe. The consultation limit alludes to the base power apparent by the ear. Between 40 and 70 dB of hearing loss, a language delay will affect how we communicate with the student.

One of the most intricate mechanisms of human sensation ability is found in the auditory system. The mechanical energy of sound waves is converted into electrical stimuli by the fluid-filled inner ear, which the brain will eventually translate. Physically, the internal ear is partitioned into the hearable and vestibular frameworks. The vestibular system is responsible for three-dimensional orientation and gravity perception, while the auditory system is responsible for good sensation. Hearing-impaired individuals frequently develop balance disorders due to the similarities between these two systems. A cochlea in the shape of a snail makes up the auditory system. The cochlea is a fluid-filled tube that wraps around the modiolus in a spiral. Hearing impairment (Hogan & Phillips, 2015; Cremers & Smith, 2002) was put into one of the five categories listed below.

  • Halfway Deafness, Stage 1: The individual experiences issues figuring out discourse in a chapel, at the theater, or in a bunch discussion. However, he can hear discourse at short proximity with next to no counterfeit help.

  • Stage 2, Partial Deafness: The individual has trouble hearing direct conversation at close range, but they can hear clearly over the phone or when speaking loudly.

  • Stage 3 of Partial Deafness: The individual can hear amplified speech through hearing aids, trumpets, or other amplification devices, but they have trouble hearing over the phone at average intensities.

  • Complete Speech Inadequacy: The individual acquired the hearing impairment after learning to speak the language by traditional means, but they cannot hear speech.

  • Mute Deaf: The individual was born deaf or experienced severe deafness early enough to prevent him from learning to speak normally. Figure 1 portrays the gestures with emotion.

Figure 1.

Examples of generated gestures with emotion

978-1-6684-8851-5.ch017.f01
Source: Zabala et al. (2021)

Complete Chapter List

Search this Book:
Reset