Multiple Feature Fusion in Particle Filter Framework for Visual Tracking

Multiple Feature Fusion in Particle Filter Framework for Visual Tracking

Singaravelan Shanmugasundaram, V. Selvakumar, S. Balaganesh, P. Gopalsamy, R. Arun
DOI: 10.4018/979-8-3693-2141-6.ch012
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Vision-based human activity recognition in smart homes has become a significant issue in terms of developing the next generation technologies Recently, deep learning models that aim to automatic extraction of low-level to high-level features of input data instead of using complicated conventional feature extraction methods have achieved significant improvements in the classification of a large amount of data especially vision-based datasets. Therefore, in this study, in order to recognize human action of a smart home video dataset. Convolutional neural networks (CNNs) architecture as a deep learning model has been proposed, and an architecture of CNNs has been proposed. Moreover, instead of using commonplace CNNs, a special CNN architecture to recognize human activity has been designed. Additionally, the performance of the proposed method has been compared with the other previous used methods on the same dataset.
Chapter Preview
Top

1. Introduction

Tracking is a challenging task in computer vision as it requires generating a motion model of the target in a given sequence of images. The goal is to establish the trajectory of the target over these frame sequences as it moves along the image plane. This goal can be accomplished by using either a deterministic or a probabilistic filter for tracking. Some of the most popular probabilistic filters used in single object state estimation are Kalman filter and particle filter. Whenever the object state is assumed to be linear with Gaussian noise, Kalman filter can be used to estimate the state of such a system. Kalman filter also has lower computational requirements than particle filter. However, the assumption that all state transitions are linear is very unlikely in real world scenario. Hence, particle filter can be used for state estimation of the object owing to its ability to represent arbitrary stated ensities and not just Gaussian. Recently, there has been advances in the Bayesian estimation approaches where in a generic spatio-temporal Gaussian process is proposed in to track a non-rigid and irregular object. Liu et al. also used Bayesian estimation for tracking of extended target from a network of multiple sensors using a random matrix framework. Generating the motion model of the target is a daunting task if the target has abrupt motion or gets occluded. The motion model in most applications is generated either using the conventional optical flow method or the SUVAT motion equations.

That the smart home will play a significant role in providing intelligent dwellings in the future is inevitable fact. Not only does smart home technology control the incorporated lightning, heating, electrical and all domestic components, but it also can recognize activity of all home residents. Moreover, in conjunction with recognizing the activity of occupants, by utilizing machine learning techniques it can make decision and prepare sufficient devices and services based on user’s need. Therefore, due to increasing demand for human activity recognition in terms of security and health care especially elderly and child care, it has become as a noteworthy issue in recent years. The methods of activity recognition include sensor and vision-based categories. Sensor based methods which contain wearable and ambient sensors seem to be traditional methods of human activity recognition in smart homes.

In both sensors based approaches in order to analyze the human motion, data have been collected and conveyed by sensors. Since wearable sensors which have been attached to the body and ambient sensors have been installed all around the home, they can be annoying for residents. In addition, sensors can produce noise and wrong alarms which can lead to inaccurate results. However, they could achieve satisfying results in human activity recognition in smart homes. In recent years, in order to overcome the mentioned deficiency of sensors in collecting accurate and sufficient data, vision-based methods have gained popularity in human activity recognition researches. Moreover, vision based methods take the advantage of using diverse camera types to provide more accurate and adequate data than sensor based methods.

In this regard, in order to perform data classification and detection, most vision-based methods have used traditional pattern recognition and machine learning methods. Since in traditional methods in order to acquire the features of video frames or images, complicated handcraft methods are being used, vision-based recognizing human activity is a complex method. Furthermore, using handcraft methods by some local descriptors such as Histogram of Gradient (HOG) and Scale-Invariant Feature Transform (SIFT) to achieve low-level features can be acceptable for some fixed datasets. However, since handcrafted features are limited to a certain dataset, achieving effective features from a new dataset and adjusting the manually selected low-level features to a new dataset and condition is a challenging task. Nevertheless, there are significant studies that have used traditional pattern recognition and conventional machine learning methods in human action recognition on video datasets. Amiriet. al., have used conventional feature extraction methods, i.e. Harris3D feature detector and STIP feature descriptor to extract spatiotemporal features from DML Smart Actions activity dataset for activity recognition.

Complete Chapter List

Search this Book:
Reset