Deep Stacked Autoencoder-Based Automatic Emotion Recognition Using an Efficient Hybrid Local Texture Descriptor

Deep Stacked Autoencoder-Based Automatic Emotion Recognition Using an Efficient Hybrid Local Texture Descriptor

Shanthi Pitchaiyan, Nickolas Savarimuthu
Copyright: © 2022 |Pages: 26
DOI: 10.4018/JITR.2022010103
Article PDF Download
Open access articles are freely available for download

Abstract

Extracting an effective facial feature representation is the critical task for an automatic expression recognition system. Local Binary Pattern (LBP) is known to be a popular texture feature for facial expression recognition. However, only a few approaches utilize the relationship between local neighborhood pixels itself. This paper presents a Hybrid Local Texture Descriptor (HLTD) which is derived from the logical fusion of Local Neighborhood XNOR Patterns (LNXP) and LBP to investigate the potential of positional pixel relationship in automatic emotion recognition. The LNXP encodes texture information based on two nearest vertical and/or horizontal neighboring pixel of the current pixel whereas LBP encodes the center pixel relationship of the neighboring pixel. After logical feature fusion, the Deep Stacked Autoencoder (DSA) is established on the CK+, MMI and KDEF-dyn dataset and the results show that the proposed HLTD based approach outperforms many of the state of art methods with an average recognition rate of 97.5% for CK+, 94.1% for MMI and 88.5% for KDEF.
Article Preview
Top

Introduction

Emotions are the automatic process of the brain directly caused by events in the environment. Emotions are complicated to define but share some information with others. Emotions can be expressed in different shapes, such as speech, facial expression, and actions. In human interaction, the intuition of facial expressions creates a communication channel along with voice, which conveys primary evidence about the internal emotional state of the person in conversation. Facial expression is caused by a coordinated pattern of muscle movements, triggered by a particular brain area. Facial expression understanding via the machine can modernize user interfaces such as robotics, car driving, etc.,. In the last few decades, many methods were proposed for Automatic Facial Expression Recognition (AFER) towards feature extraction and recognition. Nevertheless, AFER is still a challenging task with high accuracy because of the minor interpersonal variation along with the substantial intra-personal distinctions arising from illumination, posture, expression, and other aspects.

In general, the AFER system contains three primary stages, namely preprocessing, features extraction, and classification. The essential stage for AAFER is to extract standard features from the given image to effectively discriminate different emotions. Facial expressions involve changes in local texture. Appearance feature-based approaches are good at capturing transient features, and it can be extracted from the entire face or the specific facial regions. The region-based feature extraction omits a useful correlation among different features and increases the feature space so that computational cost is high with the risk of overfitting. The most widely used appearance-based methods are Gabor wavelets(Bartlett et al., 2005), Local Binary Pattern (LBP) (Huang, Wang, & Ying, 2010), and its variants. Among these methods, LBP is studied extensively in the application of AFER. This type of code generation method considers the relationship between center and neighboring pixels but neglecting the relationship among the neighboring pixels. Due to this fact, these methods are less discriminative to distinguish different textures as they might be insufficient to define the local substructure feature accurately. Moreover, the presence of intensity variations and noise in the local region may produce false feature codes.

This paper introduces a new intensity variation based hybrid local texture descriptor to overcome the limitations as mentioned above and to increase the discriminative capability of feature pattern for AFER. The proposed hybrid local texture feature descriptor encodes the sectional shape by mean of the relationship of neighboring pixels with the center pixel as well as the relationship among neighboring pixels. This type of feature fusion excludes featureless variations in the local region and also reduces the feature space. In recent years, deep learning approaches perform better in AFER and become more popular in the computer vision community (Chen et al., 2018; Jain, S. Kumar, A. Kumar, Shamsolmoali, &Zareapoor, 2018; Xie, & Hu, 2019). However, it requires extensive training data for proper training of deep network model. Due to the unavailability of sufficient data in the current expression dataset, most of the deep learning methods have employed image augmentation at the cost of high computational power (Christou, & Kanojiya, 2019;D. Liang, H. Liang, Yu,& Zhang, 2019; Nguyen et al., 2019). Without such augmented data, the effectiveness of the proposed hybrid feature is evaluated using a deep-stacked autoencoder. Deep Stacked autoencoder (DSA) is one of the deep learning models that has been broadly applied to various applications (Chen et al., 2018; Zeng et al.,2018). The best part of DSA is that it can extract useful, reliable, and specific characteristics of features after identifying and eliminating redundant features in an unsupervised fashion.

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 15: 6 Issues (2022): 1 Released, 5 Forthcoming
Volume 14: 4 Issues (2021)
Volume 13: 4 Issues (2020)
Volume 12: 4 Issues (2019)
Volume 11: 4 Issues (2018)
Volume 10: 4 Issues (2017)
Volume 9: 4 Issues (2016)
Volume 8: 4 Issues (2015)
Volume 7: 4 Issues (2014)
Volume 6: 4 Issues (2013)
Volume 5: 4 Issues (2012)
Volume 4: 4 Issues (2011)
Volume 3: 4 Issues (2010)
Volume 2: 4 Issues (2009)
Volume 1: 4 Issues (2008)
View Complete Journal Contents Listing