Article Preview
TopIntroduction
Facial expressions are a behavior used by human beings to communicate emotions and intentions. We talk about anger, disgust, fear, happiness, sadness, and surprise. Facial expression recognition (FER) is one of the useful tasks that can be used as a basis for many applications in the field of computer vision and pattern recognition. We can cite, for example, video surveillance, patient treatment in the medical field, identification of people's opinions in marketing, sentiment analysis, advanced driver assistance systems (ADAS), e-learning, etc. Numerous algorithms of FER have been proposed in the literature during the past several years. The goal of FER is to identify the facial expression of a human being. Facial expression can be captured either from the face or from verbal communication. FER is a difficult problem due to the face complexity and variety (Hsu, Chen, & Huang, 2015). For facial expression recognition, it is necessary to compute the descriptors representing these expressions. However, various factors such as lighting variations, poses, alignment, and occlusion make faces inherently variable. Also, expressions are differentiated by very fine variations in muscle movements and therefore the extraction of local features to represent the expressions is a critical task. Hence, expression recognition becomes a difficult task.
The features extracted from the facial expression images make it possible to minimize expression variation within the same class and maximize variations in other classes. Recognizing facial expressions with great precision remains a difficult task due to the subtlety, complexity, and variability of facial expressions. To overcome this problem, a very efficient method for capturing the spatial information of facial expressions must be used. This is the Hierarchical Spatial Pyramid (HSP) representation method. HSP makes the spatial information of facial expressions represented in the form of a pyramid, which gives good recognition results.
Lazebnik, Schmid, et Ponce (2006) proposed to subdivide the image into four quadrants. Each quadrant is subdivided into four other quadrants, and so on. The purpose of these subdivisions is to take into account the spatial layout of features in the image. The results obtained are encouraging. Indeed, this method does not take into account the relationship between features in the spatial layout of different quadrants in the image. These relationships can give relevant information about the image. Consequently, we had a new reflection to remedy this problem. Indeed, the image is split in height at several levels and then in the same way in width. Also, to improve the recognition of facial expressions, we propose to use the Local Binary Patterns (LBP) descriptor to extract local characteristics. The idea behind the combination of LBP and the hierarchical spatial pyramid is to make it easier to take into account variations in texture resolution.
LBP descriptor was proposed by Ojala, Pietikäinen, et Harwood (1996). It emerged as one of the most important texture analysis descriptors. In the latest trends, it is increasingly suited for face recognition. LBP is simple, computationally efficient, robust, and derives local attributes efficiently against variations in luminance. For these reasons, several researchers have chosen this descriptor to represent the characteristics of images containing facial expressions to recognize them (Yaddaden, Adda, & Bouzouane, 2021; Lakshmi, & Ponnusamy, 2021; Huang, Ardabilian, Wang & Chen, 2012; Wang et al., 2020). The principle of the LBP descriptor is to study the relationship between a pixel and its neighbors. The goal is to build a binary description of this pixel by exploiting the information of its neighbors. Large dimension extracted features are more and more prevalent in FER.
Our goal in this paper is to collect the maximum amount of information that can represent facial expressions in an image using HSP combined with LBP. The features extracted using HSP combined with LBP are large. Therefore, the proposed method is to extract features and capture spatial information in an image. Although, we remain supportive of the goal of developing robust and geometrically invariant representations of facial expressions. Indeed, reducing this dimensionality considerably in an interpretable way is a very critical task. The difficulty lies in preserving most of the information contained in the features.