Article Preview
TopIntroduction
Object recognition is an important topic of research in computer vision. There are many fronts in which the work is going on in this field. Some of the important tasks, for example, include analysis of the quality of training images on the recognition performance, security of the images used in the recognition systems, and training of the recognition models with the availability of limited training data, etc. There are studies such as the one proposed in (Alsmirat et al., 2019) which analyse the impact of quality of images on the recognition performance for a fingerprint based biometric recognition system. In (Chuying et al., 2018), an attempt is made to propose a few algorithms for securing the images while using them in systems and devises. In this work, we analyse the problem of training of a recognition model in the availability of limited data.
Nowadays, many of the object recognition systems are using 3D data instead of 2D. In such systems, availability of limited 3D data makes it challenging to achieve satisfactory recognition performance by the system. The use of 3D data is due to the fact that the object recognition performance on 3D data is significantly better than that on the 2D data. For example, in the case of face recognition, 2D face recognition is hindered by pose, expression, and illumination variations. These limitations are overcome when using 3D data as all the information about the face geometry is processed in the case of 3D based approaches. Given the significance and vast applications of 3D data in areas like object recognition, biometrics, it becomes important to address the issues faced during the training of the deep neural network model. Although the 3D object recognition achieves great accuracy, 3D data collection from objects takes time, and due to this there is relatively limited data available for 3D objects. In the presence of limited data, the model learns the details and noise of these few samples so closely that it has a negative influence while evaluating the selected model on new data. To avoid overfitting, we must increase the variability of the 3D data by increasing the size of the database through data augmentation. There are different ways to represent and input 3D data to a model. Some common and popular ways of representing an object in 3D include 3D voxel and point cloud. The 3D voxel representation is a highly regularized form of representation. In this representation, a 3D object is represented by discretizing its volume where the unit cubic volume is called a voxel. This representation has an advantage as it simplifies weight sharing and other kernel optimizations. However, it is bulky in nature with sparse data spaces and involves convolution operations that renders this representation computationally and spatially expensive. Further, capturing fine structures require a very high voxel resolution, consuming a massive amount of memory. On the other hand, point clouds are the rawest form of 3D data and are the direct outcome of the object scanning process. In point clouds, a 3D object is represented by digitizing its surface in the form of an unordered set of data points which can be directly consumed as inputs to any deep neural network instead of transforming them into regular 3D representations such as 3D voxels. As stated above, the 3D input data for an object which is in the form of a point cloud, contains an unordered set of 3D points. It is seen that this original set of points for an object contains a huge number of 3D points; however, due to the computational and memory limitations of the system, often, we cannot use the entire point cloud of a single sample for processing. To mitigate this problem, usually, the original point cloud data is sub-sampled, and a reduced size cloud is used for processing. However, in this process, the number of samples for a subject remains the same as was available earlier before sampling. We exploit the use of sampling in a different way and propose its use in data augmentation by increasing the number of samples of the subjects. In this paper, we propose three sampling techniques that can be used for creating sub-samples from an original point cloud sample. We use the Iterative Closest Point (ICP) (Chetverikov et al., 2005; Procházková & Martišek, 2018; Wang & Zhao, 2017) algorithm to show that the samples created from the original data all carry the same information. Then, we use Central Limit Theorem (CLT) (Heyde, 2014) to prove that the information carried by the sub-samples is the same as that carried by the original sample, that is, they have the same discriminative power. Finally, we compare the three sampling techniques based on the results.