One, Five, and Ten-Shot-Based Meta-Learning for Computationally Efficient Head Pose Estimation

One, Five, and Ten-Shot-Based Meta-Learning for Computationally Efficient Head Pose Estimation

Manoj Joshi, Dibakar Raj Pant, Jukka Heikkonen, Rajeev Kanth
DOI: 10.4018/IJERTCS.316877
Article PDF Download
Open access articles are freely available for download

Abstract

Many real-world applications rely on head pose estimation. The performance of head pose estimation has significantly improved with techniques like convolutional neural networks (CNN). However, CNN requires a large amount of data for training. This article presents a new framework for head pose estimation using computationally efficient first-order model-agnostic meta-learning (FO-MAML)-based method and compares the performance with existing MAML-based approaches. Experiments using one-shot, five-shot, and ten-shot settings are done using MAML and FO-MAML. A mean average error (MAEavg) of 7.72, 6.30, and 5.32 has been achieved in predicting head pose using MAML for one-, five-, and ten-shot settings, respectively. Similarly, MAEavg of 8.33, 6.84, and 6.23 has been achieved in predicting head pose using FO-MAML for one-, five-, and ten-shot settings, respectively. The computational complexity of an outer-loop update in MAML is found to be O(n2) whereas for FO-MAML it is O(n).
Article Preview
Top

Introduction

In the last few years, significant advancement has been seen in the area of computer vision, robotics, and human-machine interaction. With increasing areas of applications in gaze estimation, self-driving cars, and impaired assistance, a reliable head pose estimation framework has been important. Prior research has been done on head pose estimation for understanding how human attention works (Bergasa et al., 2008). It also fits in applications such as analyzing human behavior and social interactions (Ba & Odobez, 2011). Head pose estimation becomes crucial in driver assistance systems to slow down the vehicle when pedestrians are not aware of the presence of the vehicle (Geronimo, López, Sappa & Graf, 2010). Because of this significance, head pose estimation has been thoroughly investigated and explored in various fields.

Head-pose estimation plays a prominent role in use cases such as anomaly detection, surveillance, human-computer interface (HCI), and understanding behavioral dynamics in the crowd (Baxter, Leach, Mukherjee & Robertson, 2015). The extreme facial orientations, varying illumination and resolution, makeup, and presence of hairs in the human face make it challenging to predict head pose. Traditional methods gained some success in head pose estimation using image processing techniques. Histogram of Oriented Gradients (HOG) methods successfully predicted head poses in images and videos (Tran & Lee, 2011). The traditional methods for head pose estimation were founded on discriminative/landmark-based or parameterized appearance-based models. The traditional approaches worked well in estimating head pose but were not flexible and robust to extreme variation in the head pose.

The development of convolutional neural networks (CNN) became a popular choice for estimating head poses (Patacchiola & Cangelosi, 2017) because of their high efficiency. The efficiency of CNNs is reliant on the amount of well-annotated data samples. The more annotated data we have, the more efficiently CNN will perform. But capturing a large and well-annotated dataset is difficult in most cases. Convolutional neural networks while using a large volume of data, are good at predicting head poses, although they lack generalization. A good head poses estimator should be data efficient and have similar efficiency as that of the CNNs. It should also adapt to unseen faces and perform much better as more and more evidence of head pose features becomes available.

In recent years, few-shot learning techniques have been more popular when less data is available. Meta-learning-based techniques gained popularity in the past few years, as they can be applied in few-shot settings and adapt well to unseen data (Sun, Liu, Chua & Schiele, 2018). These techniques can use the knowledge gained from previous experiences and use it to boost their future performance. Meta-learners can learn a novel task from a limited training dataset, and use it to generalize to unseen tasks that the model encounters in the future. This learning method is called learning-to-learn. The use of meta-learning can benefit us with better data and computational efficiency.

This article extends the work (Joshi, Pant, Karn, Heikkonen & Kanth, 2022). The article revises the existing MAML based approach and then proposes a novel approach of using computationally efficient first-order model-agnostic meta-learning (FO-MAML). The novel approach performed well in head-pose estimation and is computationally more efficient. One, five, and ten-shot based experiments have been performed in BIWI head pose dataset using MAML and FO-MAML and comparison has been made in terms of accuracy and time complexity of both approaches.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 2 Issues (2018)
Volume 8: 2 Issues (2017)
Volume 7: 2 Issues (2016)
Volume 6: 2 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing