Handling Data Scarcity Through Data Augmentation in Training of Deep Neural Networks for 3D Data Processing

Akhilesh Mohan Srivastava, Priyanka Ajay Rotte, Arushi Jain, Surya Prakash

Source Title: International Journal on Semantic Web and Information Systems (IJSWIS) 18(1)

DOI: 10.4018/IJSWIS.297038

Article PDF Download Open access articles are freely available for download

Abstract

Due to the availability of cheap 3D sensors such as Kinect and LiDAR, the use of 3D data in various domains such as manufacturing, healthcare, and retail to achieve operational safety, improved outcomes, and enhanced customer experience has gained momentum in recent years. In many of these domains, object recognition is being performed using 3D data against the difficulties posed by illumination, pose variation, scaling, etc present in 2D data. In this work, we propose three data augmentation techniques for 3D data in point cloud representation that use sub-sampling. We then verify that the 3D samples created through data augmentation carry the same information by comparing the Iterative Closest Point Registration Error within the sub-samples, between the sub-samples and their parent sample, between the sub-samples with different parents and the same subject, and finally, between the sub-samples of different subjects. We also verify that the augmented sub-samples have the same characteristics and features as those of the original 3D point cloud by applying the Central Limit Theorem.

Article Preview

Top

Introduction

Object recognition is an important topic of research in computer vision. There are many fronts in which the work is going on in this field. Some of the important tasks, for example, include analysis of the quality of training images on the recognition performance, security of the images used in the recognition systems, and training of the recognition models with the availability of limited training data, etc. There are studies such as the one proposed in (Alsmirat et al., 2019) which analyse the impact of quality of images on the recognition performance for a fingerprint based biometric recognition system. In (Chuying et al., 2018), an attempt is made to propose a few algorithms for securing the images while using them in systems and devises. In this work, we analyse the problem of training of a recognition model in the availability of limited data.

Nowadays, many of the object recognition systems are using 3D data instead of 2D. In such systems, availability of limited 3D data makes it challenging to achieve satisfactory recognition performance by the system. The use of 3D data is due to the fact that the object recognition performance on 3D data is significantly better than that on the 2D data. For example, in the case of face recognition, 2D face recognition is hindered by pose, expression, and illumination variations. These limitations are overcome when using 3D data as all the information about the face geometry is processed in the case of 3D based approaches. Given the significance and vast applications of 3D data in areas like object recognition, biometrics, it becomes important to address the issues faced during the training of the deep neural network model. Although the 3D object recognition achieves great accuracy, 3D data collection from objects takes time, and due to this there is relatively limited data available for 3D objects. In the presence of limited data, the model learns the details and noise of these few samples so closely that it has a negative influence while evaluating the selected model on new data. To avoid overfitting, we must increase the variability of the 3D data by increasing the size of the database through data augmentation. There are different ways to represent and input 3D data to a model. Some common and popular ways of representing an object in 3D include 3D voxel and point cloud. The 3D voxel representation is a highly regularized form of representation. In this representation, a 3D object is represented by discretizing its volume where the unit cubic volume is called a voxel. This representation has an advantage as it simplifies weight sharing and other kernel optimizations. However, it is bulky in nature with sparse data spaces and involves convolution operations that renders this representation computationally and spatially expensive. Further, capturing fine structures require a very high voxel resolution, consuming a massive amount of memory. On the other hand, point clouds are the rawest form of 3D data and are the direct outcome of the object scanning process. In point clouds, a 3D object is represented by digitizing its surface in the form of an unordered set of data points which can be directly consumed as inputs to any deep neural network instead of transforming them into regular 3D representations such as 3D voxels. As stated above, the 3D input data for an object which is in the form of a point cloud, contains an unordered set of 3D points. It is seen that this original set of points for an object contains a huge number of 3D points; however, due to the computational and memory limitations of the system, often, we cannot use the entire point cloud of a single sample for processing. To mitigate this problem, usually, the original point cloud data is sub-sampled, and a reduced size cloud is used for processing. However, in this process, the number of samples for a subject remains the same as was available earlier before sampling. We exploit the use of sampling in a different way and propose its use in data augmentation by increasing the number of samples of the subjects. In this paper, we propose three sampling techniques that can be used for creating sub-samples from an original point cloud sample. We use the Iterative Closest Point (ICP) (Chetverikov et al., 2005; Procházková & Martišek, 2018; Wang & Zhao, 2017) algorithm to show that the samples created from the original data all carry the same information. Then, we use Central Limit Theorem (CLT) (Heyde, 2014) to prove that the information carried by the sub-samples is the same as that carried by the original sample, that is, they have the same discriminative power. Finally, we compare the three sampling techniques based on the results.

Complete Article List

Search this Journal:

Reset

Volume 20: 1 Issue (2024)

Volume 19: 1 Issue (2023)

Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming

Volume 17: 4 Issues (2021)

Volume 16: 4 Issues (2020)

Volume 15: 4 Issues (2019)

Volume 14: 4 Issues (2018)

Volume 13: 4 Issues (2017)

Volume 12: 4 Issues (2016)

Volume 11: 4 Issues (2015)

Volume 10: 4 Issues (2014)

Volume 9: 4 Issues (2013)

Volume 8: 4 Issues (2012)

Volume 7: 4 Issues (2011)

Volume 6: 4 Issues (2010)

Volume 5: 4 Issues (2009)

Volume 4: 4 Issues (2008)

Volume 3: 4 Issues (2007)

Volume 2: 4 Issues (2006)

Volume 1: 4 Issues (2005)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Handling Data Scarcity Through Data Augmentation in Training of Deep Neural Networks for 3D Data Processing

Abstract

Introduction

Complete Article List