Article Preview
Top1. Introduction
Image steganography is an important convert communication technology that conceals secret messages in images by the means of slight changes in pixel values or DCT coefficients. Currently, the most secure steganographic algorithms are content-adaptive ones, such as HUGO (Pevný et al., 2010), UNIWARD (Holub & Fridrich, 2014), WOW (Holub & Fridrich, 2012) and so on. They tend to hide the secret data in the complicated texture regions and show the excellence anti-detection ability.
With the rapid development of image steganography, steganalysis techniques that are related to detecting the existence of the hidden messages in images also have made great progress. The popular methods consist of extracting the relevant features that help to detect the presence of hidden message, and then designing suitable classifier to separate the classes of cover and stego images. In order to improve the detection performance, the feature dimensions are ever-increasing. As is well known, the state-of-the-art steganalysts are the Spatial Rich Model (SRM) (Fridrich & Kodovský, 2012) and its variants (Holub & Fridrich, 2013; Denemark et al., 2014), which may contain more than 30,000 features. For the large scale and high dimensional training sets, it has been shown that ensemble classifiers, such as FLD ensemble classifier, (Kodovský et al., 2012) are successful. Later, Fridrich et al. (Cogranne et al., 2015) demonstrate that a simple well regularized FLD or a ridge regression can achieve the comparable performance with the ensemble classifiers. In addition, they show that ridge regression implemented by LSMR achieves almost the same detection accuracy as an ensemble classifier for a computational time up to 10 times smaller.
However, for the widely popular linear Support Vector Machine (SVM) and Gaussian SVM, they are difficult to be trained after the presence of rich media models (Kodovský et al., 2012). Compared with the FLD and ridge regression, the major reason for the difficulty in training these standard SVMs is that they require considerably long computational time to solve a linear or a quadratic program involved. In addition, except the regularization parameter which these machine learning algorithms have, Gaussian SVM also needs to search for another optimal kernel parameter in the training process, and it is time consuming.
Quite different from the standard SVMs, the linear Proximal Support Vector Machine (PSVM) (Fung & Mangasarian, 2001) which has been proposed based on the much more generic regularization networks (Evgeniou et al., 2000) can be fast implemented without of extensive computation. The linear PSVM separates two classes of data points through proximal hyperplanes with the maximum margin. The strong convexity of the formulation leads to the simple proximal code, which is not always the case in the standard SVMs. Motivated by the PSVM, a simplified nonlinear method referred to Extreme Learning Machine (ELM) (Huang et al., 2012) has been presented for learning single hidden layer feed forward neural networks. ELM has the ability of dealing with the nonlinear feature construction by ELM kernel matrix without the selection of parameters (Huang et al., 2015).