K-Means and DNN-Based Novel Approach to Human Identification in Low Resolution Thermal Imagery

K-Means and DNN-Based Novel Approach to Human Identification in Low Resolution Thermal Imagery

Mohit Dua, Abhinav Mudgal, Mukesh Bhakar, Priyal Dhiman, Bhagoti Choudhary
DOI: 10.4018/978-1-7998-4444-0.ch002
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In this chapter, a human detection system based on unsupervised learning method K-means clustering followed by deep learning approach You Only Look Once (YOLO) on thermal imagery has been proposed. Generally, images in the visible spectrum are used to conduct such human detection, which are not suitable for nighttime due to low visibility, hence for evaluation of our system. Hence, long wave infrared (LWIR) images have been used to implement the proposed work in this chapter. The system follows a two-step approach of generating anchor boxes using K-means clustering and then using those anchor boxes in 252 layered single shot detector (YOLO) to predict proper boundary boxes. The dataset of such images is provided by FLIR company. The dataset contains 6822 images for training purposes and 757 images for the validation. This proposed system can be used for real-time object detection as YOLO can achieve much higher rate of processing when compared to traditional method like HAAR cascade classifier in long wave infrared imagery (LWIR).
Chapter Preview
Top

1. Introduction

With the increased interest in automation, a lot of work has been done in the field of self-driving cars (autonomous vehicle systems), security surveillance systems or autonomous search and rescue missions to detect a human/pedestrian. Researchers have proposed many algorithms for human detection using RGB images (Spinello et al., 2011; Vondricket al., 2013). However, detecting a high rate of true positives on RGB images is still a challenging task due to low resolution, moving objects, changing backgrounds or real time requirements using the visual images. Hence, instead of using RGB images, infrared (IR) images are a better option for the task of pedestrian detection in uneven scenarios. IR cameras capture the brightness intensity corresponding to the temperature and radiated heat of the object in the image. They are susceptible to illumination variations, occlusions or background noises. Hence, in the proposed work of this chapter, an additional source of information - IR images or thermal images captured through thermal cameras or sensors to get a gray scale mapping corresponding to a particular location have been used.

Substantial amount of work has been done in the field of human detection (Davis et al..2015; Spinello et al., 2011; Qi et al., 2014). Majority of work has been done in tracking people in well controlled surroundings like detection of hotspots by applying Maximally Stable Extremal Regions (MSER) on the Long-Wave Infrared (LWIR) images (Teutsch et al.,2014). MSER follows the assumption that the temperature of the human body is much greater than that of the background. Hence, MSER detects humans in a better manner than background subtraction or sliding window techniques. Human detection is done through the features that are observed in a person in an image. However, factors such as clothings, background, body temperature and illumination also affect the appearance of the person, that directly affects the feature descriptor. (Bin et al.,2014) proposed a Scattered Difference of Directional Gradients Descriptor (SDDG) which uses local gradient distribution information of thermal image to describe an object in certain directions. SDDG has demonstrated comparable performance (in comparison with Histogram Oriented Gradient (HOG) and HAAR wavelets).

For real-time detection, a cluttered environment must also be considered. Since ages, works have been done for real-time object detection taking every possible aspect into consideration like Histogram Oriented Gradient (HOG) technique and Support Vector Machine (SVM) classifier is used in research proposed in (Kachouane et al., 2012). The idea used behind HOG descriptors is that the intensity distribution of gradients and the direction of contours can describe the shape of an object. Image is divided into small connected regions and for each region histogram of gradient direction is computed. The combination of these histograms is the HOG descriptor. The final classification is done by using the linear kernel function of the SVM classifier. HOG descriptors are invariant to geometric and photometric descriptors and hence, this technique has shown good results in the field of robotics. An extension in HOG approach used in (Foong et al.,2006) has been proposed on thermal images in (Rujikietgumjorn, S., & Watcharapinchai, N. 2017), which takes foreground segmentation and human shape similarity into account with significant reduction in processing time and can perform in real-time on a Raspberry Pi embedded system while maintaining high accuracy in results (Foong et al.,2006).

Complete Chapter List

Search this Book:
Reset