Learning Framework for Real-World Facial Emotion Recognition

Rohan Appasaheb Borgalli, Sunil Surve

Source Title: AI-Enabled Social Robotics in Human Care Services

DOI: 10.4018/978-1-6684-8171-4.ch003

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Facial expression recognition (FER) is an important research area in the fields of computer vision and artificial intelligence due to its application in academics as well as in industry. Research shows that using facial images/videos for recognition of facial expression is better because visual expressions carry major information through which emotions can be conveyed. Past research on FER has focused on the study of seven basic emotions; however, many more facial expressions are exhibited by humans that are considered compound emotions. State of art results shows machine learning and deep learning-based approaches are powerful over conventional FER approaches. This chapter focuses on surveying past work done in the field of real-world compound facial emotion recognition and implementing various learning frameworks such as machine learning and deep learning for real-world facial emotion recognition systems for detecting compound emotion using the facial expression image dataset RAF-DB for a real wild scenario.

Chapter Preview

Top

1 Introduction

Facial expression recognition (FER) is a vital research area as it is helpful in many applications. A FER system model mentioned in this paper detects the human facial expression and identifies the corresponding induced emotion for a static image or sequence of images. Though much research has been done, recognizing basic and compound facial expressions with a high accuracy remains to be difficult due to the complexity and varieties of facial expressions.

Generally, human beings can convey emotions and intentions through nonverbal ways such as facial expressions, gestures, and involuntary language. According to Darwin & Prodger (1998), A person's facial expressions reflect their emotional states and motives. Because of its importance in systems such as machine vision and machine learning, many attempts have been made in the past to use FER system effectively. Various FER systems have been developed based on facial expressions to integrate facial information. Because of its accessibility and efficiency, facial expression recognition has become one of the most important methods to detect emotions. It has also been widely used in lie detection, medical assessment, driver safety, human-machine interaction, and solving other complex real-world problems with FER.

In 1978 automatic facial expression recognition was published first by Suwa (1978). Since then, many researchers have been working on developing FER methods that are robust and accurate. There are two categories of FER based on the environment of a captured image. The first one is in the lab-controlled with facial images usually frontal posed and with fixed illumination. The other one is in the real wild. Researchers focused on facial expression recognition in the lab-control environment in the first few years.

For lab-controlled FER, some databases have been proposed, such as FER2013 (Wolfram Research, 2018), CK+ (Lucey et al., 2010), MMI (Pantic et al., 2005), and JAFFE (Lyons et al., 1998) . Recently the paradigm of researchers shifted from a lab-controlled database to a real-world database. Facial expression recognition in the wild is more challenging because the facial images have arbitrary illuminations, non-frontal faces, and partially occluded faces. From a research point of view, generalizing FER for the real world makes more sense to study it in the wild. As a result, researchers focus is on developing facial expression recognition systems in the wild. To fill this gap of the FER system in lab-controlled and real wild scenarios, some databases in the wild, such as RAF-DB (Li et al., 2017), RAF-ML (Li & Deng, 2019) and AffectNet (Fabian Benitez-Quiroz et al., 2016) are, have been collected and made available for research. Even though the usage of data from the wild for facial expression recognition was increased, it was still challenging due to non-frontal and partially occluded faces.

At the beginning of the twentieth century, Ekman & Friesen (1971) deðned six basic emotions based on a study of different cultures: anger, disgust, fear, happiness, sadness, and surprise,which showed that people deduce certain basic emotions in the same way regardless of culture. Most of the past works mentioned in a survey by Li & Deng (2020) focused on these six basic emotions and Neutral.

However, some facial expressions induced by humans are more complex and cannot be simply considered a basic emotion, leading to compound emotion (Lindquist et al., 2016).

Recently, advanced research on FER focused on compound emotion by mentioning a variety set of compound emotions along with basic emotion given by Gunes & Schuller (2013), Du et al. (2014), Guo et al. (2018) and Li & Deng (2019).

Key Terms in this Chapter

Deep Learning: Deep learning is a subset of machine learning that involves the use of artificial neural networks with multiple layers to process and analyze data. Deep learning algorithms are designed to automatically learn representations of data through the use of multiple layers of artificial neurons, each layer building upon the previous one to create more complex and abstract representations.

Compound Emotion: Compound emotions, also known as blended emotions, are emotional experiences that result from the simultaneous or sequential activation of two or more primary emotions. They can be thought of as complex emotional states that are composed of multiple emotions.

Feature Extraction: Feature extraction is the process of selecting and transforming raw data into a set of features, or measurable attributes, that are relevant and useful for a specific task or application. It is a common technique used in machine learning, computer vision, and signal processing to reduce the dimensionality and complexity of data while retaining important information.

Machine Learning: Machine learning is a subfield of artificial intelligence that focuses on building algorithms and models that can learn from and make predictions or decisions based on data, without being explicitly programmed. It involves the use of statistical and computational techniques to enable machines to learn from experience and improve their performance on a given task.

Confusion Matrix: The confusion matrix is a cross table that counts the instances of the true/actual classification and the anticipated classification within two classes. Columns are used to show model predictions, while rows are used to show the actual classifications, in order to maintain consistency. The amount of times one rater agreed with the other can be found on a diagonal from top left to bottom right, which is where correctly categorised things can be discovered. Using a confusion matrix, the accuracy of the actual labels in comparison to those anticipated is evaluated.

Facial Emotion Recognition: Facial emotion recognition (FER) is the technology that analyses facial expressions from sensors, static images, speech, videos and other kind of available data to reveal information on emotional state of humans.

Generative Adversary Network (GAN): Generative adversary network trains models through a minimax two-player game between a generator G(z) that generates synthesised input data by mapping latents to a data space with z as p(z) and a discriminator D(x) that assigns a probability y = Dis(x) € [0,1] that is a real training sample to distinguish real from fake input data.

Deep Convolutional Generative Adversary Network (DC-GAN): The deep convolutional generative adversary network is one of the most popular and successful GAN designs, consisting mainly of convolutional layers without any max pooling or fully connected layers. For downsampling and upsampling, it uses a convolutional stride and transposed convolutions.

Learning Framework: A learning framework is a structured approach or methodology that provides a systematic process for designing, developing, delivering, and evaluating learning programs or activities. It outlines the key principles, strategies, and methods used to facilitate learning and ensure that learning objectives are achieved.

Convolution Neural Network (CNN): A convolutional neural network (CNN) is a type of deep learning neural network that is specifically designed for image processing and computer vision tasks. CNNs are based on the idea of applying convolutional filters (also known as kernels or feature detectors) to the input image to extract meaningful features.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Learning Framework for Real-World Facial Emotion Recognition

Abstract

1 Introduction

Key Terms in this Chapter

Complete Chapter List