Facial Emotion Recognition Using Ensemble Learning

Facial Emotion Recognition Using Ensemble Learning

DOI: 10.4018/979-8-3693-1738-9.ch007
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Facial emotion recognition (FER) is the task of identifying human emotions from facial expressions. The purpose of this book chapter is to improve accuracy of facial emotion recognition using integrated learning of lightweight networks without increasing the complexity or depth of the network. Compared to single lightweight models, it made a significant improvement. For a solution, the authors proposed an ensemble of mini-Xception models, where each expert is trained for a specific emotion and lets confidence score for the vote. Therefore, the expert model will transform the original multiclass task into binary tasks. The authors target the model to differentiate between a specific emotion and all others, facilitating the learning process. The principal innovation lies in our confidence-based voting mechanism, in which the experts “vote” based on their confidence scores rather than binary decisions.
Chapter Preview
Top

Introduction

Facial Emotion Recognition (FER) currently lying at the crossroads of psychology and computer science, has grown immensely with the advent of machine learning and more specifically in deep learning. Historically, understanding and interpreting human emotions were subjective, relying heavily on human intuition and judgment. However, with the increasing integration of technology into our daily lives, objective identification of emotions through machines becomes not only desirable but, in many scenarios, it is essential.

Delving into the nature of emotions, the basic emotion posits that humans universally experience the foundational emotions, namely, happiness, sadness, fear, anger, disgust, and surprise. These fundamental emotional states can be seen as building blocks, from which more nuanced emotions—such as fatigue, anxiety, or satisfaction—emerge. Practical applications of FER are vast and varied. In human-computer interaction, deep learning algorithms can be designed to adapt and respond based on the user's emotional state, creating a more intuitive and empathetic user experience. In healthcare, it can be employed for monitoring patients for signs of pain or distress, especially if they cannot communicate verbally. In automotive industry, FER can be used to monitor driver's emotions and alertness, potentially preventing accidents caused by drowsiness or distress.

In 2006, Hinton introduced the ground-breaking theory of deep learning and subsequently applied it innovatively to image processing. Deep learning, fundamentally rooted in the deep neural network, is a specialized subset of artificial neural networks. The foundation of deep learning is established upon the research progress in artificial neural networks. By adjusting the number of hidden layers, one can derive an artificial neural network model with multiple hidden layers. Hidden neural networks are able to learn more effectively, mirroring the cognitive processes of the human brain. This facilitates the efficient extraction of image features (Feng et al., 2020).

Among deep learning architectures, CNNs became the poster child for FER. They consist of convolutional layers that can automatically and adaptively learn spatial hierarchies of features from input images. This property alleviated the need for hand-crafted features, a limitation of traditional methods. Layers within CNNs, such as pooling layers, helped in reducing spatial dimensions while retaining crucial information. Activation functions introduced nonlinearity, while enabling the network to capture complex relationships.

The mini-Xception model draws inspiration from the original “Xception” architecture, which stands for “Extreme Inception” (Li et al., 2022). In the Keras deep learning library, Xception was designed to improve upon the Inception architecture by using depthwise separable convolutions. The result of mini-Xception model comprises of four depthwise separable convolution blocks. The batch normalization processes the output to stabilize and accelerate the training process. This is complemented by the introduction of the ReLU activation function, which infuses the model with the necessary non-linearity. In the forward pass, the SoftMax function is invoked to facilitate multi-class classification of the results.

In the vast landscape of ensemble learning, the idea of leveraging multiple models to make a collective decision is central. One of the most intuitive and widely employed methods to achieve this consensus is through voting mechanisms. The intrinsic capability of deep neural networks to capture intricate patterns means that even a simple procedure like unweighted averaging can significantly enhance performance. By averaging across multiple networks, one can effectively reduce the model variance. This is especially impactful given that deep artificial neural networks (ANNs) are characterized with high variance but low bias. If the underlying models are sufficiently diverse or uncorrelated, their collective variance can be markedly diminished if averaged.

In hard voting, each model in the ensemble “votes” for a specific class. The class that receives the majority of votes is chosen as the final prediction. It's straightforward and doesn't require probability estimates. The advantages of voting mechanisms is by aggregating predictions, the ensemble smoothens out the biases and variances of individual models, which leads to a model that's less prone to overfitting.

Complete Chapter List

Search this Book:
Reset