Virtual Sample Generation and Ensemble Learning Based Image Source Identification With Small Training Samples

Virtual Sample Generation and Ensemble Learning Based Image Source Identification With Small Training Samples

Shiqi Wu, Bo Wang, Jianxiang Zhao, Mengnan Zhao, Kun Zhong, Yanqing Guo
Copyright: © 2021 |Pages: 13
DOI: 10.4018/IJDCF.20210501.oa3
Article PDF Download
Open access articles are freely available for download

Abstract

Nowadays, source camera identification, which aims to identify the source camera of images, is quite important in the field of forensics. There is a problem that cannot be ignored that the existing methods are unreliable and even out of work in the case of the small training sample. To solve this problem, a virtual sample generation-based method is proposed in this paper, combined with the ensemble learning. In this paper, after constructing sub-sets of LBP features, the authors generate a virtual sample-based on the mega-trend-diffusion (MTD) method, which calculates the diffusion range of samples according to the trend diffusion theory, and then randomly generates virtual sample according to uniform distribution within this range. In the aspect of the classifier, an ensemble learning scheme is proposed to train multiple SVM-based classifiers to improve the accuracy of image source identification. The experimental results demonstrate that the proposed method achieves higher average accuracy than the state-of-the-art, which uses a small number of samples as the training sample set.
Article Preview
Top

Introduction

Nowadays, the digital images generation is popular and easier, which makes it possible for some individuals to upload unsuitable images for their interests or to steal images of others for commercial purposes. Therefore, image source identification is very important in the judicial field, which can offer help to bring evil men to justice. The issue of image source identification is usually modeled as a classification problem, which means decent results are expectant with enough training samples. However, it is well known that obtaining a large number of sufficient training samples may be very difficult, and the classifiers perform very poorly in this scenario of small training samples. Therefore, it is always a big challenge when there are only a small set of labeled images used as references in the practical forensic application.

In recent years, many methods are proposed for the small training sample problem, which are mainly divided into three categories. The first category is active learning and semi-supervised learning based methods, but they usually require a large number of unlabeled samples as auxiliary information, and it is sometimes unrealistic in practical forensic applications; the second category is the methods based on gray prediction model, such as BGM(Chang, Li, Huang, & Chen, 2015), GBM (Wang, Wang, Sun, & Zhang, 2014), ANGM (Chang, Li, & Chen, 2014), which is used to deal with raw samples. However, these methods usually ignore the internal mechanism, and then make the generated virtual samples unsuitable; the third category is consist of the methods based on virtual samples generation, which is proposed by Poggio and Vetter in 1992 (Poggio & Vetter, 1992). Considering the insufficient training samples, the appropriate virtual samples are generated under the condition of the training samples' prior information to increase the number of training samples. By obtaining the virtual samples, the training set is supposed to be expanded to effectively improve the generalization ability of the classifier.

In recent years, there are many kinds of researches respect to virtual samples generation. In order to improve the energy prediction accuracy of small training samples problem, He et al. (He, Wang, Zhang, Zhu, & Xu, 2018) propose nonlinear interpolation virtual samples generation method based on the highly nonlinear characteristics of input data and output data. After the virtual samples generation, the images are classified by the extreme learning machine (ELM) (Huang, Zhu, &Siew, 2004) and the experimental results are promising. Li et al. (Li & Fang, 2009) propose a nonlinear virtual sample generation technique (NVSG) and receive an average classification accuracy of 76% for camera models in the Iris data set. The methods of virtual sample generation based on the original samples' distribution are also widely used. Yang et al. (Yang, Yu, Xie, & Zhang, 2011) assume that the samples obey the Gaussian distribution and calculates the mean and variance of the Gaussian distribution from the original training set. Experiments on the Iris data set show that the classification accuracy increases 18%.

In this paper, a MTD based virtual sample generation method is introduced to identify the image source when the training samples are small. By box plot based MTD and sample attributes correlation based method, a reasonable virtual samples generation range is obtained and the virtual samples are generated based on average distribution. Considering the randomness of virtual sample generation, multiple groups of samples are obtained and combined with the original training samples. Multiple weak classifiers based on SVM are trained and integrated to obtain the classifier.

The rest of this paper is organized as follows: Section 2 describes the related work: LBP features and virtual sample generation method; the virtual sample generation and ensemble learning based method are proposed in Section 3; Section 4 demonstrates the experimental design and the discussion of the results and finally, the paper is concluded in Section 5.

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024)
Volume 15: 1 Issue (2023)
Volume 14: 3 Issues (2022)
Volume 13: 6 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing