Silhouette Pose Feature-Based Human Action Classification Using Capsule Network

A. F. M. Saifuddin Saif, Md. Akib Shahriar Khan, Abir Mohammad Hadi, Rahul Proshad Karmoker, Joy Julian Gomes

Source Title: Journal of Information Technology Research (JITR) 14(2)

DOI: 10.4018/JITR.2021040106

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Recent years have seen a rise in the use of various machine learning techniques in computer vision, particularly in posing feature-based human action recognition which includes convolutional neural networks (CNN) and recurrent neural network (RNN). CNN-based methods are useful in recognizing human actions for combined motions (i.e., standing up, hand shaking, walking). However, in case of uncertainty of camera motion, occlusion, and multiple people, CNN suppresses important feature information and is not efficient enough to recognize variations for human action. Besides, RNN with long short-term memory (LSTM) requires more computational power to retain memories to classify human actions. This research proposes an extended framework based on capsule network using silhouette pose features to recognize human actions. Proposed extended framework achieved high accuracy of 95.64% which is higher than previous research methodology. Extensive experimental validation of the proposed extended framework reveals efficiency which is expected to contribute significantly in action recognition research.

Article Preview

Top

Introduction

Human action recognition has been one of the most studied problems in computer vision research community. Action recognition is addressed by extended problem of set of spatial, temporal features and motion modeling in previous research. Numerous studies have been performed to solve these sub-problems. Firstly, spatial features include appearance, pose, background and foreground information. Pose feature based action recognition has been tried to address through hardcoded feature extractor and mapping to recent skeleton information based descriptors. In this context, Zhou, Shi, and Wu (2015) used hard coded features by following Trajectory, Histogram of Gradients, Histogram of Optical Flow and Motion Boundary Histogram methods to jointly learn spatial and temporal extents to recognize human sports actions. Datta, Shah, and Lobo (2002) also used trajectory data to classify fight in surveillance data where a person’s limbs were identified and tracked successfully. Common problem factors for any hardcoded feature extractor are the strong variations of people and background scenes in motion and appearance. This research proposes an extended framework for pose feature based human action classification by overcoming shortcomings of traditional convolutional neural network and recurrent neural network.

Every human action contains some distinct but subtle pose features, i.e. boxing has stance, standing and jumping as collective pose features (Borges, Conci, and Cavallaro, 2013; Jin, Do, Liu, and Kim, 2018). Recent years have seen rise in the use of various machine learning techniques in computer vision, particularly in pose-based human action recognition which includes Convolutional Neural Networks (CNN), Recurrent Neural Network (RNN) or Deep Learning (Angelini, Fu, Long, Shao, and Naqvi, 2018). CNN based methods are very successful in recognizing combined motions, i.e. standing up, hand shaking, walking. However, in case of uncertainty of camera motion, occlusion and multiple people, these approaches are lacking large amount of information and may not be efficient to recognize variations. Previously, Jégou, Drozdzal, Vazquez, Romero, and Bengio (2017) and Karpathy et al. (2014) utilized convolutional neural networks which yielded much better result for human centered segmentation and classification tasks. In addition, Chen, Wu, Konrad, and Ishwar (2017) used CNN to analyze spatial and temporal information in two different streams to classify actions. However, CNNs have drawbacks of their own with the pooling strategies (Zeiler and Fergus, 2013). Convolutional Neural Networks uses max-pooling technique, which suppresses important feature information while increasing computational power. CNN also requires large training data. On the contrary, RNN with Long Short Term Memory (LSTM) need more computational power to retain memories. Cui, Hua, Zhu, Wu and Liu (2019) proposed a model to connect LSTM and CNN sequentially in order to implement identification and action recognition by using geometric features of skeletal information extracted from the global, local, and detailed feature related data to represent actions. However, due to drawback of requiring large training data and disadvantages of pooling strategies of CNN, model proposed by Cui, Hua, Zhu, Wu and Liu (2019) cannot be considered reliable investigation for action recognition research (Zeiler and Fergus, 2013). The problem of pooling strategies such as Max-pooling (Hinton, Srivastava, Krizhevsky, Sutskever, and Salakhutdinov, 2012) remained unanswered until Sabour, Frosst, and Hinton (2017) introduced Capsule Network architecture following dynamic routing method. This newly established network uses several nested capsule layers which generates feature vectors of segmented image features in a frame. These feature vectors are invariant to orientation and magnitude of the pose features of an entity. The network uses vector squashing activation function to normalize the output of the primary capsule layers which suppresses the problems arises due to pooling strategies in conventional CNN. This research proposes extended framework using silhouette pose feature based preprocessing algorithm based on Capsule Network for human action classification.

The rest of this paper is organized as follows, i.e. Background study section demonstrates critical reviews on previous research to state the justification of the problems addressed by this research, Proposed research methodology section illustrates extensive illustration of extended framework using Capsule Network, Experimental results presents comprehensive validation of the proposed extended framework and finally, Conclusion section presents concluding remarks.

Complete Article List

Search this Journal:

Reset

Volume 16: 1 Issue (2024)

Volume 15: 6 Issues (2022): 1 Released, 5 Forthcoming

Volume 14: 4 Issues (2021)

Volume 13: 4 Issues (2020)

Volume 12: 4 Issues (2019)

Volume 11: 4 Issues (2018)

Volume 10: 4 Issues (2017)

Volume 9: 4 Issues (2016)

Volume 8: 4 Issues (2015)

Volume 7: 4 Issues (2014)

Volume 6: 4 Issues (2013)

Volume 5: 4 Issues (2012)

Volume 4: 4 Issues (2011)

Volume 3: 4 Issues (2010)

Volume 2: 4 Issues (2009)

Volume 1: 4 Issues (2008)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Silhouette Pose Feature-Based Human Action Classification Using Capsule Network

Abstract

Introduction

Complete Article List