Language Classification and Recognition From Audio Using Deep Belief Network

Language Classification and Recognition From Audio Using Deep Belief Network

Santhi Selvaraj, Raja Sekar J., Amutha S.
DOI: 10.4018/978-1-7998-2566-1.ch011
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The main objective is to recognize the chat from social media as spoken language by using deep belief network (DBN). Currently, language classification is one of the main applications of natural language processing, artificial intelligence, and deep learning. Language classification is the process of ascertaining the information being presented in which natural language and recognizing a language from the audio sound. Presently, most language recognition systems are based on hidden Markov models and Gaussian mixture models that support both acoustic and sequential modeling. This chapter presents a DBN-based recognition system in three different languages, namely English, Hindi, and Tamil. The evaluation of languages is performed on the self built recorded database, which extracts the mel-frequency cepstral coefficients features from the speeches. These features are fed into the DBN with a back propagation learning algorithm for the recognition process. Accuracy of the recognition is efficient for the chosen languages and the system performance is assessed on three different languages.
Chapter Preview
Top

Types Of Language Recognition

The language recognition can be divided into two main types, namely

  • Audio language recognition

  • Visual language recognition

Audio Language Recognition

Audio language recognition is a mature technology, able to discriminate quite reliably between tens of spoken languages spoken by speakers that are unknown to the system, using just a few seconds of representative speech.

Visual Language Recognition

In this method information derived from the visual appearance and movement of the mouth to recognize the spoken language, without the use of audio information.

Top

Characteristics Of Languages

The characteristics of languages are known as Language Identification cues. The following characteristics differ from one language to another language.

  • Phonology

  • Morphology

  • Syntax

  • Prosody

Phonology

A phoneme is a basic representation of a phonological unit in a language. A “phone” is a realization of an acoustic-phonetic unit or segment. A “phonotactics” is the rules governing the sequences of allowable phones and phonemes can also be different.

  • Example

  • Word - celebrate

  • Phoneme - /s eh l ix b r ey t/

  • Phone - [s eh l ax bcl b r ey q]

Morphology

The word roots and lexicons are usually different from language to language. Each language has its own vocabulary and own formation of words.

  • Example

  • “Pigs like mud” is a sentence containing three words - pigs, like, mud

Syntax

The sentence patterns are different among languages, (i.e) more than one languages share a word and the sets of words that may precede and follow the word will be different.

  • Example

  • The word “bin” in English and German

Key Terms in this Chapter

GMM: Gaussian mixture model is one of the probabilistic models which states that all data points are derived from a mixture of Gaussian distributions.

MFCC: Mel-frequency cepstral coefficients (MFCCs) are derived from a cepstral representation of the speeches from audio.

Language Recognition: Language recognition is the method of categorizing the languages from its audio speeches and take out the information presented in the speeches.

DBN: Deep belief networks are one of the probabilistic models which will be composed of multiple layers of latent or hidden variables. Deep belief network is a set of restricted Boltzmann machines stacked on top of one another.

Complete Chapter List

Search this Book:
Reset