Steganalysis classifier training via minimizing sensitivity for different imaging sources

Steganalysis classifier training via minimizing sensitivity for different imaging sources

INS 10897 No. of Pages 14, Model 3G 2 June 2014 Information Sciences xxx (2014) xxx–xxx 1 Contents lists available at ScienceDirect Information Sc...

3MB Sizes 0 Downloads 43 Views

INS 10897

No. of Pages 14, Model 3G

2 June 2014 Information Sciences xxx (2014) xxx–xxx 1

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins 5 6

Steganalysis classifier training via minimizing sensitivity for different imaging sources

3 4 7

Q1

8

Wing W.Y. Ng, Zhi-Min He ⇑, Daniel S. Yeung, Patrick P.K. Chan School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

9

a r t i c l e

1 2 1 3 12 13 14 15 16

i n f o

Article history: Received 7 January 2014 Received in revised form 8 May 2014 Accepted 19 May 2014 Available online xxxx

17 18 19 20 21 22

Keywords: Steganalysis Neural network Quantization table Robustness

a b s t r a c t Owing to the ever proliferation of digital cameras and image editing software, a large variety of JPEG quantization tables are used to compress JPEG images. As a result, learningbased steganalysis methods using a pre-selected quantization table for training images degrade significantly when the quantization table of testing images is different from the one used for training. Recognizing that it would be undesirable and not practical to train a steganalysis classifier with all possible quantization tables, we propose an approach that the differences in features extracted from images with different quantization tables are formulated as perturbations of those features. Then we define a stochastic sensitivity by the expected square of classifier output changes with respect to these feature perturbations to compute the robustness of classifiers with respect to perturbations. A Radial Basis Function Neural Network based steganalysis classifier trained by minimizing the sensitivity is proposed. Experimental results show that the proposed method outperforms learning methods such as Support Vector Machine and Radial Basis Function Neural Network without considering feature perturbations. Ó 2014 Published by Elsevier Inc.

24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

41 42

1. Introduction

43

Steganography presents a potential security threat to society, in general and corporations, in particular. Embedded messages, named stego messages, are hidden in digital media such as images [1–3], audio [4] and video [5] for secret communication. Steganalysis is a technique used to determine whether a digital media has a stego message or not. Current learning based steganalysis methods consist of two major components: a feature extractor and a classifier. Steganalysis classifiers are trained by a set of images which consist of both clean and stego images. Among different types of digital media, JPEG image is the most widely used digital media on the Internet. Therefore, JPEG is a favorable carrier of steganography. In particular, for most JPEG steganalysis, both training and testing datasets use JPEG images compressed by the same quantization table. When different compression quantization tables are used for training and testing image sets, the performance of the steganalysis classifier degrades significantly [6,7]. With the ever proliferation of digital cameras and image editing software available today, JPEG images on the Internet are compressed by many different quantization tables [8]. Moreover, a growing number of digital camera manufacturers, such as Sony, Nikon and Pentax, adopt variable quantization tables which are computed based on the image content dynamically. By

44 45 46 47 48 49 50 51 52 53 54

Q2

⇑ Corresponding author. Tel.: +86 20 39380285. E-mail addresses: [email protected] (W.W.Y. Ng), [email protected] (Z.-M. He), [email protected] (D.S. Yeung), [email protected] (P.P.K. Chan). http://dx.doi.org/10.1016/j.ins.2014.05.028 0020-0255/Ó 2014 Published by Elsevier Inc.

Q1 Please cite this article in press as: W.W.Y. Ng et al., Steganalysis classifier training via minimizing sensitivity for different imaging sources, Inform. Sci. (2014), http://dx.doi.org/10.1016/j.ins.2014.05.028

INS 10897

No. of Pages 14, Model 3G

2 June 2014 Q1 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78

2

W.W.Y. Ng et al. / Information Sciences xxx (2014) xxx–xxx

using the quantization table for training to re-compressing an image, which is compressed by an unseen quantization table, cannot recover an image directly. Extra quantization step will change the steganalysis features of the JPEG image [9]. Therefore it would be unreasonable or impracticable to assume prior knowledge of compression quantization table of an unseen image examined for stego message. In the real world situation, both images and quantization tables could be different from those used for training the classifier. In addition to quantization table difference, the difference in image content will also affect the performance of steganalysis [10]. The extracted steganalysis feature values will be different from those of the training images [7]. These differences could be treated as perturbation in features and are unavoidable. So, the robustness of steganalysis classifier with respect to feature perturbations is essential to its performance. Current steganalysis methods make use of off-the-shelf classification methods such as neural network [11,12], Support Vector Machine (SVM) [13,14], dynamic evolving neural fuzzy inference system [15] and ensemble of classifiers [16,17]. However, none of them addresses the issue of perturbation between training and testing images. In this work, we propose a Localized Generalization Error Steganalysis classifier (LG-Steganalyzer), which is robust to images compressed by quantization tables different from that of training images. Such a situation is unavoidable in realworld applications. A Radial Basis Function Neural Network (RBFNN) is trained via a minimization of a training error and a stochastic sensitivity. RBFNN is selected because of its fast training speed in the presence of large data which is important for dealing with network security problems. The stochastic sensitivity is proposed to capture the influence of feature perturbation created by changes in JPEG quantization table of testing images with respect to the RBFNN classification. With the proposed steganalyzer training method, the LG-Steganalyzer provides a better robustness of steganalysis in real-world applications. Major contributions of the LG-Steganalyzer to steganalysis are: 1. RBFNN trained by the LG-Steganalyzer is robust to real-world situations, e.g. difference in quantization tables and difference in content between training and testing images. 2. The proposed LG-Steganalyzer could be used with any compression quantization table, and any steganalysis feature extraction technique.

79 81

This paper is organized as follows: Section 2 provides a brief introduction on steganalysis and JPEG quantization tables. The LG-Steganalyzer is described in Section 3. Experimental results are presented in Section 4. Section 5 concludes this work.

82

2. Steganalysis and quantization table

83 85

We first provide a brief introduction to current steganalysis methods in Section 2.1. Section 2.2 discusses the importance of quantization tables in steganalysis. Section 2.3 demonstrates input perturbations caused by changes in quantization tables and image contents.

86

2.1. Steganalysis

87

100

In [18] at least one hundred and ten steganographic and steganalysis tools were reported in the year of 2007. Today the number is still growing. Therefore it is desirable to train a steganalysis classifier to work with multiple types of steganographic methods. The current trend is to adopt off-the-shelf classifiers with different feature extraction methods proposed for JPEG steganalysis. Shi et al. [19] suggested extracting Markov features based on differences among neighboring Discrete Cosine Transform (DCT) coefficients using transition probability matrices. Steganalysis based on features extracted by Markov approach using the DCT and the Discrete Wavelet Transform (DWT) coefficients was proposed in [20]. Pevny et al. [14] proposed to extract features based on the average of Markov feature in four directions and combine them with DCT features for multiclass JPEG steganalysis. They also proposed an ensemble method to deal with compressions of different quantization tables. Each base classifier is trained for a particular quantization table. Image is classified by the base classifier trained by the quantization table being most similar to the one compressing the image. Its performance depends on both selection of the base classifiers and the robustness of the base classifiers with respect to unseen quantization tables. In [21] Kodovsky et al. showed that the calibration method in [14] yielded negative effects for several steganographic schemes and thus proposed a Cartesian Calibration method to extract CC-Pevny features. Chen and Shi [13] proposed steganalysis features based on both intra-block and inter-block correlations among DCT coefficient difference matrix of images.

101

2.2. Quantization table in steganalysis

102

JPEG compresses a raw image by dividing each DCT transformed 8x8 block of pixels by a quantization table (the Quantization box in Fig. 1). Hidden message is then embedded into a JPEG image using a steganography method (the steganography box in Fig. 1). There are 100 quantization tables provided by the Independent JPEG Group and each corresponds to a distinct compression quality, a.k.a. a standard quantization table. We denote quantization tables other than these 100’s as non-standard quantization tables.

80

84

88 89 90 91 92 93 94 95 96 97 98 99

103 104 105 106

Q1 Please cite this article in press as: W.W.Y. Ng et al., Steganalysis classifier training via minimizing sensitivity for different imaging sources, Inform. Sci. (2014), http://dx.doi.org/10.1016/j.ins.2014.05.028

INS 10897

No. of Pages 14, Model 3G

2 June 2014 Q1

W.W.Y. Ng et al. / Information Sciences xxx (2014) xxx–xxx

3

Fig. 1. Steganography and steganalysis.

Fig. 2. The two largest principle components of feature vectors of images compressed by different quantization tables of different cameras.

116

Steganalysis methods must cope with images compressed by different unseen quantization tables. In this work, we formulate the differences in steganalysis feature values yielded by the differences in quantization tables as feature perturbations. To show the effect of quantization tables to JPEG steganalysis, an image is compressed using quantization tables from 30 different cameras to create 30 JPEG images. Then, Chen’s steganalysis features [13] are extracted. For visualization effect, the two largest principal components of the Principal Component Analysis (PCA) of extracted features are shown in Fig. 2. The red star in Fig. 2 shows a reference point (the compressed image produced by a Sony DSC-H9(3)) and its neighboring points (compressed images produced by various cameras). The differences of quantization tables cause perturbations in the feature values being extracted for the steganalysis. Sony DSC-H9 uses dynamic quantization table which varies with respect to the current image being processed. One way to deal with this problem is to enhance the robustness of the steganalysis classifier with respect to changes of quantization tables.

117

2.3. Feature perturbations

107 108 109 110 111 112 113 114 115

118 119 120 121 122 123

We divide all cases of JPEG files into six possible scenarios that lead to steganalysis feature perturbations: 1. 2. 3. 4. 5.

The The The The The

content content content content content

of of of of of

training training training training training

and and and and and

testing testing testing testing testing

image image image image image

is is is is is

the same and they are compressed by the same quantization table. the same and they are compressed by different quantization tables. similar and compressed by the same quantization table. similar and compressed by different quantization tables. very different and compressed by the same quantization table.

Q1 Please cite this article in press as: W.W.Y. Ng et al., Steganalysis classifier training via minimizing sensitivity for different imaging sources, Inform. Sci. (2014), http://dx.doi.org/10.1016/j.ins.2014.05.028

INS 10897

No. of Pages 14, Model 3G

2 June 2014 Q1 124

4

W.W.Y. Ng et al. / Information Sciences xxx (2014) xxx–xxx

6. The content of training and testing image is very different and compressed by different quantization tables.

125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154

Case 1 rarely occurs and stego images could be easily detected by a classifier. Therefore, Case 1 will not be discussed any further. Most research works focus on Case 3 [13,21]. However, all cases are possible to occur when a steganalysis system is deployed in the real world. In this section, examples will be presented to show that Cases 2 to 6 will all lead to steganalysis feature perturbations. A raw image is compressed by the quantization table of quality factor 75 and is converted as a stego image. This image is treated as the training image and marked by a red star in figures throughout this section. Then, we simulate Cases 2 to 6 to demonstrate neighborhoods around the training image formed by those perturbed images. To simulate Case 2 artificially, we randomly select elements of the quantization table of quality factor 75 to randomly add a ‘‘+1’’ or a ‘‘1’’ to these elements for 100 times. Hence on average half of the elements are altered among the 100 random perturbations. Then we compress the same raw image using the 100 perturbed quantization tables and convert them to stego images to simulate the testing images. Steganalysis features are extracted from these 101 images using Chen’s features [13]. For visualization of the similarity among JPEG images, each JPEG image is represented by MPEG-7 low level features. Fig. 3(a) shows the two largest principal components of the PCA of MPEG-7 features of the 100 perturbed images and the training image. An image located closer to the training image (red star) in Fig. 3(a) has a smaller overall difference from the training image. Fig. 4(a) shows the first and second largest principal components of Chen’s steganalysis feature vectors between each perturbed sample (blue cross) and the training sample (red star). From Figs. 3(a) and 4(a), perturbations in quantization tables lead to perturbations in steganalysis features. Moreover, Fig. 4(a) shows that steganalysis feature vectors extracted from images compressed by perturbed quantization tables form a neighborhood around that of the training image. Figs. 3(b) and (c) show MPEG-7 features of 100 images (in the same way of Fig. 3(a)) being similar to the training image compressed by the same (Case 3) and different (Case 4) quantization tables, respectively. Images being visually similar are selected as similar images for Cases 3 and 4. For Cases 5 and 6, images with significant difference in visual content are selected. Principal components of MPEG-7 features of both the training image and perturbed images being different visually and compressed by the same quantization table (Case 5) are shown in Fig. 3(d). Principal components of MPEG-7 features of both the training image and perturbed images of Case 6 are shown in Fig. 3(e). Figs. 4 show the first and the second largest principal components of steganalysis features of both the training image and unseen images for Cases 3 to 6. Fig. 5 shows the training image and examples of perturbed images in different Cases. Cases 2 to 6 are unavoidable in real applications and all of them lead to perturbations in steganalysis features being extracted from images. These perturbations form neighborhoods in steganalysis feature space around the original image (red star). It shows that a minimization of error among perturbed images located within a neighborhood of training images

(a) Case 2

(b) Case 3

(d) Case 5

(c) Case 4

(e) Case 6

Fig. 3. The first and the second largest principal components of MPEG-7 features of images in different cases.

Q1 Please cite this article in press as: W.W.Y. Ng et al., Steganalysis classifier training via minimizing sensitivity for different imaging sources, Inform. Sci. (2014), http://dx.doi.org/10.1016/j.ins.2014.05.028

INS 10897

No. of Pages 14, Model 3G

2 June 2014 Q1

5

W.W.Y. Ng et al. / Information Sciences xxx (2014) xxx–xxx

(a) Case 2

(b) Case 3

(d) Case 5

(c) Case 4

(e) Case 6

Fig. 4. The first and the second largest principal components of steganalysis features of images in different cases.

(a) Training Image

(b) Perturbed Images of Case 2

(c) Perturbed Images of Case 3

(d) Perturbed Images of Case 4

(e) Perturbed Images of Case 5

(f) Perturbed Images of Case 6

Fig. 5. Training image and example of perturbed images in cases 2 to 6.

159

(red stars) is needed to deal with Cases 2 to 6. With this observation, in this paper, we propose to train a classifier via a minimization of both training error and sensitivity within a Q-neighborhood to enhance the robustness of steganalyzers for different situations. Instead of creating real unseen perturbed images for different scenarios, we perturb the steganalysis features of training images directly and train classifier minimizing both the training error and the sensitivity at the perturbed samples within the neighborhood.

160

3. LG-Steganalyzer

161

Fig. 6 shows the functional blocks of the LG-Steganalyzer. The two-phase training component works only for the training of LG-Steganalyzer at the beginning or whenever update to the RBFNN is necessary. The steganalysis feature extraction is

155 156 157 158

162

Q1 Please cite this article in press as: W.W.Y. Ng et al., Steganalysis classifier training via minimizing sensitivity for different imaging sources, Inform. Sci. (2014), http://dx.doi.org/10.1016/j.ins.2014.05.028

INS 10897

No. of Pages 14, Model 3G

2 June 2014 Q1

6

W.W.Y. Ng et al. / Information Sciences xxx (2014) xxx–xxx

Fig. 6. Functional blocks of the LG-Steganalyzer.

169

selected by user and transparent to the LG-Steganalyzer. The binary RBFNN classifier is trained via a minimization of the Localized Generalization Error Model (L-GEM) [22] to learn the two-class classification of stego and clean images, with a given set of steganalysis features. The L-GEM consists of the training error, the sensitivity and the constants. Sections 3.1 and 3.2 present the sensitivity and the two-phase RBFNN training method of the LG-Steganalyzer, respectively. The LG-Steganalyzer could be applied with any set of steganalysis features and any classifier which has its L-GEM defined. In addition to the RBFNN, the L-GEM is available for ensemble classifiers [23], multilayer perceptron neural networks [24] and least square support vector machines [25].

170

3.1. Sensitivity

171

The sensitivity measures the robustness of a RBFNN with respect to changes in inputs. As shown in Section 2.3, both changes in quantization table and image content could be formulated as perturbations to inputs of training samples. In additional to minimizing the training error of the LG-Steganalyzer to learn the steganalysis concept embedded in the set of training images, the LG-Steganalyzer also minimizes its sensitivity to enhance the robustness of the RBFNN in LG-Steganalyzer with respect to feature perturbations created by changes in the quantization tables and the image contents. As shown in Section 2.3, images with different perturbations will create a set of neighboring images centered at the original training image. We define a set of perturbed images located within its Q-neighborhood as follows [22]:

163 164 165 166 167 168

172 173 174 175 176 177

178 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202

203

  SQ ðxb Þ ¼ xjx ¼ xb þ Dx; kDxk1 6 Q

where xb and Dx denote the steganalysis feature vector extracted from the bth image and the feature perturbation, respectively. The SQ ðxb Þ is called the Q-neighborhood of the bth training image xb . A Q-Union (SQ ) is defined as the union of the Qneighborhoods of all training images. Every unseen sample (non-training image) in the SQ represents a perturbed training image which is caused either by a difference from the training images in the quantization table or in the content. The sensitivity of a classifier is defined by the average of the squared output differences (Dy) of training images and the unseen samples in their Q-neighborhoods. As we do not have any knowledge about the real distribution of the perturbed images, we assume that they follow a uniform distribution within the Q-neighborhoods. This assumption is based on an intuitive idea that any unseen image could have the same chance to occur in the future. The value of Q provides an upper bound for the expected maximum differences in the feature values. A larger Q value yields a Q-Union with larger coverage in the input feature space which could cover more unseen quantization tables. In contrast, a Q value approaching zero reduces the Q-Union to the training samples only and the proposed algorithm is reduced to the standard training method minimizing the training error of a neural network. Covering too many unseen quantization tables may yield a worse training accuracy while an extreme small Q value may yield a RBFNN too sensitive to differences in the quantization tables. Therefore, the choice of Q value is not trivial and it depends on the training datasets and the robustness requirement of the user. Similar to the parameter selection problem for the Support Vector Machine (SVM) and other classifiers, the Q value is selected by the cross-validation. As shown in [22], a RBFNN trained via a minimization of L-GEM is able to yield good generalization capability for both unseen samples located within and outside the Q-Union. However, if there are a lot of unseen samples under investigation located far from the training samples, i.e. outside the Q-Union, it indicates that the training dataset does not well represent the steganalysis problem. Re-sampling should be performed instead. Since the RBFNN is suitable for large data and its training speed is high, it is adopted as the classifier (f) in our steganalysis method. A RBFNN is defined as follows:

f ðxÞ ¼ 205 206 207

ð1Þ

M X j¼1

Pn wj exp

i¼1 ðxi

 uji Þ2

!

2v 2j

ð2Þ

where M; n; xi ; wj ; v j and uji denote the number of neurons in the hidden layer, the number of steganalysis features, the ith feature value of the image x, the connection weight between the output neuron and the jth hidden neuron, the width paramQ1 Please cite this article in press as: W.W.Y. Ng et al., Steganalysis classifier training via minimizing sensitivity for different imaging sources, Inform. Sci. (2014), http://dx.doi.org/10.1016/j.ins.2014.05.028

INS 10897

No. of Pages 14, Model 3G

2 June 2014 Q1 208 209

210

W.W.Y. Ng et al. / Information Sciences xxx (2014) xxx–xxx

eter and the ith feature value of the center parameter of the jth hidden neuron, respectively. The sensitivity of the RBFNN based steganalyzer is defined as the expectation of the squared output differences caused by input perturbations as follows: N Z   1X 2 E ðDyÞ2 ¼ ðf ðxb Þ  f ðxb þ DxÞÞ pðDxÞdDx N b¼1 SQ 2 0  00 P

1 1 13 n n 4 r2 r2 þðl u Þ2 þ0:2r2Dx X r2Dx i A 6 B exp @@ i¼1 Dxi xi xi ji i A C7 2 6 B C7 2v 4j 2v 2j M 6 C7 B X i¼1 6 B C7   ¼ 0P 1 6uj B C7 n 6 C7 B 2 2 2 2 n 2 r rx þðlxi uji Þ þ0:2rDx X j¼1 6 r C7 B i¼1 Dxi i D x i iA 4 @ 2 exp @ þ 1 A5  4 2 2v j



M X

uj

Pn

i¼1



r2Dxi r2xi þ ðlxi  uji Þ2 þ 0:2r2Dxi

v 4j

j¼1

212

7

M n r2 þ ðl  u Þ2 X 1 X ji xi xi ¼ Q2 uj 4 3 v j j¼1 i¼1

! þ



i¼1

2v j

ð3Þ

M uj 1 4 X Q n 4 45 v j j¼1

214

   P   varðkxuj k2 Þ Eðkxuj k2 Þ where Dy ¼ f ðxÞ  f ðxb þ DxÞ; uj ¼ ðwj Þ2 exp  ; varðkx  uj k2 Þ ¼ ni¼1 E ðxi  lxi Þ4 þ 4r2xi ðlxi  uji Þ2 þ 4 2 2 v v    j P j 2 4E ðxi  lxi Þ3 ðlxi  uji Þ  ðr2xi Þ Þ; Eðkx  uj k2 Þ ¼ ni¼1 r2xi þ ðlxi  uji Þ2 . Dx; N; pðDxÞ and r2Dxi denote the input perturbation

215

subject to a uniform distribution, the number of training samples, the probability density function of the input perturbations

216

and the variance of the input perturbation which is equal to Q 2 =3 for a uniform distribution, respectively. lxi and r2xi denote

213

217 218 219 220 221 222 223 224 225 226

227 229 230 231 232

233 235

the mean and variance of the ith steganalysis feature, respectively. Detailed derivations of (3) can be found in [22]. A smaller sensitivity value (Eq. (3)) leads to a higher robustness of the RBFNN. Eq. (3) shows that the robustness of a classifier depends on both its parameters (weight, center and width parameters of RBFNN) and the distribution of the training samples(mean and variance of input features). Either large weight values or small width parameter values could lead to a sensitive (i.e. less robust) LG-Steganalyzer. Similar to all learning based steganalysis methods, the performance of the trained classifier depends on the quality of the training samples. One could not expect a classifier to correctly classify unseen samples that are very different from its training samples. Therefore, the sensitivity is combined with the training error to form the L-GEM which provides an upper bound for the Mean Square Error (MSE) of a RBFNN for unseen samples located within the Q-Union of its training samples [22,26]. The L-GEM is defined as follows:

RSM ðQ Þ ¼

Z

2

ðFðxÞ  f ðxÞÞ pðxÞdx

ð4Þ

SQ

where FðxÞ and pðxÞ denote the true label of the image x and the true unknown probability density function of x. The pðxÞ is unknown and therefore Eq. (4) cannot be computed exactly. In [22,26], its upper-bound is estimated using the Hoeffding’s inequality. With a probability of 1  g, we have:

RSM ðQ Þ 6

qffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  Remp þ EððDyÞÞ2 þ A þ e ¼ RSM ðQ Þ

ð5Þ

245

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where e ¼ B ln g=ð2NÞ; Remp ; A; B and N denote the training MSE, the difference between the maximum and the minimum value of the outputs, the maximum value of the MSE and the number of training samples, respectively. The minimization of the training error allows a RBFNN to learn the underlying relationship between the input features and the steganalysis results. On the other hand, the minimization of the sensitivity maximizes the robustness of a RBFNN with respect to perturbations of inputs. Constants A and e are derived from the characteristics of the training image set and the training parameters. They remain the same for all possible RBFNNs being trained for a given steganalysis problem, so they could be ignored. Eq. (5) shows the relationship among robustness of the RBFNN, the training error and the sensitivity. This follows the observation that minimizing only the training error is not enough to yield a classifier with good generalization capability for steganalysis. A steganalysis classifier with a good generalization capability should minimize the L-GEM, i.e. minimizes both the training error and the sensitivity.

246

3.2. Two-phase RBFNN training method of LG-Steganalyzer

247

Given a training dataset with N images and each image is described by an n-vector of steganalysis features, a RBFNN is trained using a two-phase algorithm. In the first phase, the architecture of RBFNN yielding the smallest L-GEM value is found by a linear search. Then, the connection weights of the RBFNN are fine tuned by a gradient descent method minimizing the LGEM of RBFNN in the second phase. The training algorithm combines the benefits of fast training in [22,26] and better generalization capability via fine tuning the connection weights in [27].

236 237 238 239 240 241 242 243 244

248 249 250 251

Q1 Please cite this article in press as: W.W.Y. Ng et al., Steganalysis classifier training via minimizing sensitivity for different imaging sources, Inform. Sci. (2014), http://dx.doi.org/10.1016/j.ins.2014.05.028

INS 10897

No. of Pages 14, Model 3G

2 June 2014 Q1 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267

8

W.W.Y. Ng et al. / Information Sciences xxx (2014) xxx–xxx

3.2.1. Phase 1: Determining the RBFNN architecture The number of hidden neurons (M) is the most important parameter to define the architecture of a RBFNN. After the number of hidden neurons is fixed, the center and the width of the Gaussian activation function of each hidden neuron could be determined using any off-the-shelf clustering algorithm. In this work, the center vectors are found by a standard k-means algorithm and the width parameters are determined by the distances between these centers. Owing to the linear relationship between the connection weights and the RBFNN outputs, the connection weights could be computed quickly by a pseudoinverse method [26]. The training procedure is given as follows: 1. Set M ¼ 2; 2. Train a RBFNN with M hidden neurons using the training images; (a) k-means clustering with k ¼ M is performed to find the center vectors (uj ): (b) The width parameter (v j ) is computed by the average of the distances between all centers; (c) Connection weights are computed by a pseudo-inverse method minimizing the least square error. 3. Calculate the RSM ðQ Þ using Eq. (5) for the newly trained RBFNN; 4. If M < ðN=3Þ, then M ¼ M þ 1 and go to Step 2; 5. Output the RBFNN yielding the minimum RSM ðQ Þ value.

268 269 270 271 272 273

3.2.2. Phase 2: Adjusting the connection weight of RBFNN In this work, we improve the RBFNN training algorithm in [22,26] by adding a gradient descent search to further minimize the generalization error, i.e. maximizing the robustness of the RBFNN [27]. As shown in [27], constants A and e do not change with the change of the connection weights and could be ignored during the gradient descent. The connection weights (wj ) are updated via a gradient descent as follows:

274

wtþ1 j 276

 1 2 @E ð D yÞ C B 1 @Remp 1 C ffi ¼ wtj  kB þ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   A @2pffiffiffiffiffiffiffiffiffi @w @w Remp j j 2 2 E ðDyÞ 0

277

279 280

N kxb  uj k2 @Remp 2 X ¼ ðf ðxb Þ  Fðxb ÞÞ exp N b¼1 @wj 2v 2j

  @E ðDyÞ2 @wj

@ 



Q2 3

4

t þ nQ45

j¼1 j

PM

j¼1 fj

2Q wj ¼ 3v 4j

n  X

2 xi

r þ ðlxi  uji Þ

i¼1

! ð7Þ



@wj 2

282

PM

ð6Þ

2



! ! !! nQ 2 kx  uj k2 kx  uj k2 exp var E þ 15 2v 4j v 2j

ð8Þ

288

where k is a constant controlling the step size of the gradient descent. iRprop+ [28] is adopted as the gradient descent method owing to its fast convergence. Eq. (6) shows the weight update formula which iteratively updates connection weights to minimize the L-GEM. Eqs. (7) and (8) show the two components of Eq. (6), i.e. partial derivatives of the training error and the sensitivity respect to connection weights, respectively. With these formulae, the robustness of the RBFNN with selected architecture by procedures shown in Section 3.2.1 is further enhanced by fine tuning its connection weight values via a minimization of the L-GEM.

289

3.3. Enhancement of robustness by the LG-Steganalyzer

290

To demonstrate the robustness of the proposed method with respect to different scenarios, we select one of the training samples as an example and compare it with existing Chen’s method using standard SVM as the classifier. Both the proposed LG-Steganalyzer and the SVM use the same set of features proposed by Chen [13]. For visualization, Manhattan distance of both Intra Features and Inter Features between the training sample (red star) and ten testing images are plotted in Fig. 7. Ten images are shown and categorized into four groups: very similar (⁄), similar (triangle), dissimilar (square) and totally different (circle) in Fig. 7. The LG-Steganalyzer (marked as LGEM in figures) is able to recognize stego images except for those using totally different images as cover. On the other hand, SVM used by Chen’s method fails to recognize stego for both dissimilar and totally different images and one of the similar images. Both classifiers use the same set of steganalysis features, so the improvement in robustness is caused by the proposed RBFNN training method. Then, we perform the same experiment on the same training image but compressed by nine quantization tables different from the one being used to compress the training image. It is to simulate the differences when an image is taken by different cameras. Then, we extract steganalysis features from these images. The Manhattan distances between steganalysis features of those images and the training image (red star) are shown in Fig. 8. It shows that steganalysis features are greatly influ-

283 284 285 286 287

291 292 293 294 295 296 297 298 299 300 301 302

Q1 Please cite this article in press as: W.W.Y. Ng et al., Steganalysis classifier training via minimizing sensitivity for different imaging sources, Inform. Sci. (2014), http://dx.doi.org/10.1016/j.ins.2014.05.028

INS 10897

No. of Pages 14, Model 3G

2 June 2014 Q1

W.W.Y. Ng et al. / Information Sciences xxx (2014) xxx–xxx

9

Fig. 7. Robustness of LG-Steganalyzer versus Chen’s method on images with different degree of dissimilarity.

Fig. 8. Robustness of the LG-Steganalyzer and Chen’s method on the training image compressed by different quantization tables of different cameras.

303 304 305 306 307 308 309 310 311 312

enced by quantization tables (i.e. camera models). Fig. 8 shows that SVM with Chen’s features is less robust comparing to the LG-Steganalyzer with Chen’s features. Fig. 9 shows four images selected from four categories of similarity with respect to the training image. The SVM using Chen’s features recognizes 2 out of 10 images only when the image is very similar to the training image and compressed by different quantization tables. It fails to recognize stego images in all other cases with an exceptional case in Fig. 9(b) where an image compressed by an unseen quantization table is recognized correctly by Chen’s method but not by the LG-Steganalyzer. With the increase of dissimilarity between the unseen and the training image, changes in the quantization tables yield worse performance of the steganalysis. In this scenario, the LG-Steganalyzer is able to yield more robust steganalysis results for both images similar and dissimilar to the training images. It is not reasonable to expect the classifier correctly classifying a stego image with steganalysis features totally different from those of the training images.

Q1 Please cite this article in press as: W.W.Y. Ng et al., Steganalysis classifier training via minimizing sensitivity for different imaging sources, Inform. Sci. (2014), http://dx.doi.org/10.1016/j.ins.2014.05.028

INS 10897

No. of Pages 14, Model 3G

2 June 2014 Q1

10

W.W.Y. Ng et al. / Information Sciences xxx (2014) xxx–xxx

(a) Very Similar

(b) Similar

(c) Dissimilar

(d) Totally Different

Fig. 9. Robustness comparisons on images with different degree of dissimilarity to the training image and compressed by different quantization tables.

Table 1 Comparisons of different classifiers with different features for different steganographic methods and different quantization tables. Stego

Training

Testing

Chen-SVM

Chen-NN

Chen-LG

CC-PEV-SVM

CC-PEV-NN

CC-PEV-LG

CF⁄-NN

CF⁄-LG

nsF5

75 s 80 s 80 s 75 s

75 s 75 s 80 s 80 s

60.12 ± 1.80] 58.67 ± 3.25] 60.75 ± 1.83⁄ 58.86 ± 2.34]

59.13 ± 1.59] 57.99 ± 2.48] 59.79 ± 1.47] 58.38 ± 1.89]

62.72 ± 0.62 62.09 ± 0.98 62.92 ± 1.71 62.59 ± 0.92

79.67 ± 1.10] 60.33 ± 5.48] 71.29 ± 0.85] 67.56 ± 1.52]

75.91 ± 1.71] 59.64 ± 3.44] 67.41 ± 1.04] 64.82 ± 2.05]

81.42 ± 0.65 73.19 ± 2.28 72.57 ± 0.91 70.97 ± 1.61

75.33 ± 0.99] 59.88 ± 1.62⁄ 75.90 ± 1.20] 57.65 ± 1.43⁄

78.63 ± 0.67 63.64 ± 4.16 78.79 ± 1.10 61.74 ± 4.53

MB1

75 s 80 s 75 ns 75 s 80 s 75 s 80 ns 80 s

75 s 75 s 75 ns 75 ns 80 s 80 s 80 ns 80 ns

75.61 ± 1.10] 72.08 ± 4.85] 76.72 ± 2.08 74.49 ± 1.58 80.03 ± 1.99⁄ 73.08 ± 3.85⁄ 80.42 ± 1.80 78.58 ± 1.56⁄

74.16 ± 1.61] 71.24 ± 4.27] 74.40 ± 1.39] 71.87 ± 1.96] 77.75 ± 0.85] 73.16 ± 2.47] 77.77 ± 1.47] 76.93 ± 1.34]

77.26 ± 0.59 77.08 ± 0.94 78.02 ± 0.55 75.21 ± 1.54 82.08 ± 0.86 76.48 ± 1.49 81.53 ± 1.39 79.81 ± 0.84

79.34 ± 1.93] 64.85 ± 6.45] 79.04 ± 1.46] 75.87 ± 2.40] 81.32 ± 1.23] 75.52 ± 3.71] 81.32 ± 1.32] 77.72 ± 4.24]

74.60 ± 1.64] 62.29 ± 4.44] 74.13 ± 1.80] 72.17 ± 1.94] 77.01 ± 1.37] 73.08 ± 2.46] 76.24 ± 1.31] 75.58 ± 2.36]

82.29 ± 1.13 79.13 ± 1.72 82.20 ± 0.80 80.57 ± 1.13 84.07 ± 0.89 81.71 ± 0.77 84.42 ± 0.80 82.25 ± 0.92

81.78 ± 2.39] 66.38 ± 3.95] 80.94 ± 2.09] 80.75 ± 1.77] 81.80 ± 1.27] 67.84 ± 2.41] 79.60 ± 0.75] 79.37 ± 1.49]

86.21 ± 0.85 74.48 ± 4.80 85.45 ± 1.20 83.17 ± 1.22 85.45 ± 0.98 72.72 ± 3.00 83.95 ± 1.07 82.13 ± 1.06

MB2

75 s 80 s 75 ns 75 s 80 s 75 s 80 ns 80 s

75 s 75 s 75 ns 75 ns 80 s 80 s 80 ns 80 ns

80.49 ± 1.95 70.01 ± 6.38] 80.25 ± 1.08 69.76 ± 2.64] 84.22 ± 1.10 69.63 ± 4.50] 84.22 ± 1.61 82.86 ± 0.81]

78.82 ± 2.01] 68.98 ± 3.95] 78.82 ± 1.48] 68.55 ± 4.00] 82.43 ± 1.18] 72.01 ± 4.63] 81.48 ± 1.22] 81.03 ± 1.15]

81.51 ± 0.67 79.70 ± 1.02 80.90 ± 1.03 76.33 ± 1.82 84.94 ± 0.96 78.47 ± 1.19 84.67 ± 0.86 83.88 ± 0.52

86.54 ± 1.37] 66.50 ± 6.35] 83.43 ± 1.49⁄ 79.39 ± 2.17] 83.86 ± 0.85] 78.65 ± 2.46] 84.58 ± 2.05] 81.47 ± 2.75]

83.58 ± 1.87] 66.53 ± 6.06] 78.86 ± 1.48] 75.76 ± 2.15] 80.52 ± 0.94] 75.15 ± 3.79] 80.66 ± 1.07] 80.18 ± 1.94]

89.36 ± 0.52 83.07 ± 1.81 84.79 ± 0.53 86.03 ± 1.03 85.54 ± 0.88 83.08 ± 1.40 86.75 ± 0.68 84.39 ± 0.63

84.58 ± 1.64] 71.05 ± 3.18] 83.62 ± 1.97] 82.87 ± 1.57] 84.03 ± 0.93] 71.90 ± 2.55] 81.86 ± 1.05] 81.75 ± 0.71]

88.46 ± 1.21 77.46 ± 2.07 88.11 ± 1.39 85.85 ± 0.99 87.77 ± 0.65 74.97 ± 1.23 86.69 ± 1.52 83.40 ± 1.09

Q1 Please cite this article in press as: W.W.Y. Ng et al., Steganalysis classifier training via minimizing sensitivity for different imaging sources, Inform. Sci. (2014), http://dx.doi.org/10.1016/j.ins.2014.05.028

INS 10897

No. of Pages 14, Model 3G

2 June 2014 Q1

11

W.W.Y. Ng et al. / Information Sciences xxx (2014) xxx–xxx

Table 2 Overall comparisons of different classifiers with different features for different steganographic methods and different quantization tables. The ‘‘Same QT’’ row shows the average results when training and testing images use the same quantization tables while the ‘‘Different QT’’ row shows the average results when the quantization tables are different in training and testing images. Training ? testing

Chen-SVM

Chen-NN

Chen-LG

CC-PEV-SVM

CC-PEV-NN

CC-PEV-LG

CF⁄-NN

CF⁄-LG

Overall Same QT Different QT

73.54 ± 8.47 76.28 ± 8.78 70.80 ± 7.60

72.23 ± 7.92 74.46 ± 8.32 70.01 ± 7.24

76.41 ± 7.60 77.66 ± 8.18 75.17 ± 7.19

76.91 ± 7.18 81.04 ± 4.22 72.79 ± 7.30

73.71 ± 6.47 76.89 ± 4.47 70.52 ± 6.76

81.89 ± 4.76 83.34 ± 4.45 80.44 ± 4.83

76.44 ± 8.09 80.94 ± 3.18 71.94 ± 9.11

80.45 ± 7.72 84.95 ± 3.56 75.96 ± 8.25

Table 3 Comparisons of different steganalysis methods for with and without re-compressed images. Stego Training Testing

Chen-SVM

Chen-NN

Chen-LG

CC-PEV-SVM

CC-PEV-NN

CC-PEV-LG

CF⁄-NN

CF⁄-LG ⁄

61.74 ± 4.53 75.00 ± 2.13

nsF5

75 s

80 s 80 s ? 75 s

58.86 ± 2.34] 56.66 ± 3.66⁄

58.38 ± 1.89] 55.49 ± 2.59]

62.59 ± 0.92 67.56 ± 1.52] 59.90 ± 1.64 63.74 ± 3.01]

64.82 ± 2.05] 61.03 ± 2.12]

70.97 ± 1.61 57.65 ± 1.43 71.54 ± 2.57 73.55 ± 2.02

MB1

75 s

75 ns 74.49 ± 1.58 75 ns ? 75 s 72.17 ± 1.81⁄ 80 s 73.08 ± 3.85⁄ 80 s ? 75 s 62.75 ± 3.79⁄

71.87 ± 1.96] 71.84 ± 1.51] 73.16 ± 2.47] 62.71 ± 3.77]

75.21 ± 1.54 73.73 ± 1.26 76.48 ± 1.49 66.80 ± 2.20

75.87 ± 2.40] 73.64 ± 2.59] 75.52 ± 3.71] 55.15 ± 2.52]

72.17 ± 1.94] 69.90 ± 1.86] 73.08 ± 2.46] 56.53 ± 1.9]

80.57 ± 1.13 80.75 ± 1.77] 77.02 ± 1.89 79.40 ± 2.14] 81.71 ± 0.77 67.84 ± 2.41] 69.01 ± 2.14 76.48 ± 2.85⁄

83.17 ± 1.22 82.41 ± 2.40 72.72 ± 3.00 79.91 ± 3.29

MB2

75 s

75 ns 69.76 ± 2.64] 75 ns ? 75 s 74.52 ± 1.83⁄ 80 s 69.63 ± 4.50] 80 s ? 75 s 68.04 ± 3.36

68.55 ± 4.00] 72.84 ± 2.24] 72.01 ± 4.63] 67.37 ± 2.44]

76.33 ± 1.82 79.39 ± 2.17] 76.86 ± 1.98 75.10 ± 2.46] 78.47 ± 1.19 78.65 ± 2.46] 70.43 ± 2.09 53.85 ± 1.34]

75.76 ± 2.15] 72.38 ± 3.24] 75.15 ± 3.79 55.70 ± 3.15

86.03 ± 1.03 82.87 ± 1.57] 78.83 ± 1.28 82.11 ± 1.72] 83.08 ± 1.40 71.90 ± 2.55 71.54 ± 2.57 79.69 ± 2.05

85.85 ± 0.99 84.72 ± 2.18 74.97 ± 1.23 84.41 ± 2.41

313

4. Experimental results

314

339

As aforementioned, it is impossible to restrict the imaging source of JPEG images being investigated by a trained steganalysis system. Therefore, the quantization tables used in compressing the unseen images are rarely the same as the ones being used for compressing the training images. In the experiments, we will first compare the testing accuracies for both training and testing images compressed by the same quantization table. To simulate the diversity of possible quantization tables in the real world, random perturbations are added to randomly selected elements in the quantization tables to create the nonstandard quantization tables. For instance, 75 s and 75 ns denote the standard and the non-standard quantization table of quality factor 75, respectively. In this work, we use quality factors 75 and 80 as a demonstration because they are widely used [21,16,14]. Three different steganographic methods, nsF5 [2], MB1 [3] and MB2 [1], are used for comparison. The embedding ratio of MB1 and MB2 are 0.05 bit per non-zero coefficient (bpnc) and the ratio for nsF5 is 0.15 bpnc. The nsF5 method works on standard quantization tables only, so only images compressed by 75 s and 80 s will be used in the testing. The UCID image dataset [29] is adopted in our experiments. The 1338 images are randomly divided into 50% training and 50% testing for ten times. Both the average and the standard deviation of testing accuracies are presented. Three popular feature sets are used for comparison: CC-PEV [21], Chen’s [13] and CF⁄ [16]. A Gaussian kernel SVM with 548 DCT and Markov based features using a Cartesian calibration is used for the CC-PEV method. Chen’s method uses 486 Markov features based on both the intra-block and the inter-block correlations among DCT coefficient difference matrices with a polynomial kernel SVM. Both the features and the classifier follow the setup shown in the original papers. CF⁄ extracts 7850 features from the co-occurrence matrices of the DCT coefficients. The SVMs trained in the original papers of CC-PEV and Chen’s methods are compared in our experiments. The proposed method is also compared with the RBFNNs trained by training MSEs to show the improvement of the proposed L-GEM based RBFNN training method. In this work, we focus on single classifiers while multiple classifier systems optimized by L-GEM for steganalysis will be further studied in our future works. Section 4.1 compares results yielded by the LG-Steganalyzer and those by Chen, CC-PEV and CF⁄ features for different JPEG quantization tables. This experiment demonstrates the improvement of the testing accuracies yielded by the new two-phase RBFNN training algorithm minimizing the L-GEM in the LG-Steganalyzer. Section 4.2 shows experiments with images being re-compressed with the quantization table used in the classifier training. It demonstrates that re-compressing an image with the training quantization table does not necessarily produce the optimal.

340

4.1. Comparison to current methods

341

Table 1 gives individual comparisons on different quantization tables and different steganographic methods. The first to the third columns in Table 1 show the steganographic method, quantization tables used in compressing the training images and quantization tables used in compressing the testing images, respectively. Experimental results of directly applying Chen’s and CC-PEV methods are shown in columns 4 and 7, respectively. The proposed method yields better testing classi-

315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338

342 343 344

Q1 Please cite this article in press as: W.W.Y. Ng et al., Steganalysis classifier training via minimizing sensitivity for different imaging sources, Inform. Sci. (2014), http://dx.doi.org/10.1016/j.ins.2014.05.028

INS 10897

No. of Pages 14, Model 3G

2 June 2014 Q1

W.W.Y. Ng et al. / Information Sciences xxx (2014) xxx–xxx

12

Table 4 Overall comparisons of different steganalysis methods for with and without re-compressed images. Train ? test

Chen-SVM

Chen-NN

Chen-LG

CC-PEV-SVM

CC-PEV-NN

CC-PEV-LG

CF⁄-NN

CF⁄-LG

Overall Without re-compression Re-compression

68.00 ± 6.44 69.16 ± 6.13 66.83 ± 7.23

67.42 ± 6.41 68.79 ± 6.07 66.05 ± 7.14

71.68 ± 6.51 73.82 ± 6.38 69.54 ± 6.56

69.85 ± 9.39 75.40 ± 4.70 64.30 ± 9.96

67.65 ± 7.59 72.20 ± 4.38 63.11 ± 7.66

77.03 ± 5.93 80.47 ± 5.69 73.59 ± 4.14

75.22 ± 7.84 72.20 ± 10.22 78.25 ± 3.30

78.49 ± 7.50 75.69 ± 9.53 81.29 ± 4.01

390

fication accuracies among all steganalysis methods for the three sets of features shown in columns 6, 9 and 11, respectively. Results yielded by RBFNNs trained using the training MSE only without any consideration of feature perturbation are shown in columns 5, 8 and 10, respectively. For each independent run of each steganographic method, each steganalysis method trains one classifier using training images compressed by a given quantization table (either 75 s, 75 ns, 80 s or 80 ns) and a set of given steganalysis features. Table 2 provides the overall results of each method on all three different steganography methods in the ‘‘Overall’’ row. The ‘‘Same QT’’ and the ‘‘Different QT’’ row show results summarized with the same and different quantization tables in the training and the testing images, respectively. T-tests are also performed to compare results between our method and other methods for the same dataset in Tables 1 and 3. For example, results in column 6 are compared with results in columns 4 and 5. In Tables 1 and 3, ‘‘⁄’’ and ‘‘]’’ stand for the proposed method outperforms the corresponding method with a statistical significance at 95% and 99% levels, respectively. Overall, all steganalysis methods perform well when the quantization tables in both the training and the testing images are the same. However, their performances drop when quantization tables are different in the training and the testing images. From Table 1, all three classifiers using Chen’s features yield the least average dropping in testing accuracies when comparing the use of the same and different quantization tables to compress the training and the testing images. The CF⁄ features yield good testing accuracies when the quantization table is the same for compressing the training and the testing images. However, it yields 16.85% and 16.02% drop using RBFNN and LG-Steganalyzer, respectively, for classifying stego embedded by nsF5 when quantization tables are different for the training and the testing images. On average, testing accuracies for all methods are less sensitive to the change of quantization tables for stego images embedded by MB1. The t-test shows that the proposed method outperforms RBFNN without considering perturbation using all three sets of features and SVM using CC-PEV with at least 95% statistical significance. When comparing with SVM using Chen’s features, the LG-Steganalyzer using Chen’s features (Chen-LG) shows better performances in 13 out of 20 experiments with at least 95% significance. From Table 2, in comparison to the polynomial kernel SVM adopted in Chen’s work, the proposed L-GEM based RBFNN outperforms by 2.87% in overall and 4.37% for different training and testing quantization tables. The improvement to the CC-PEV work is more significant. In comparison to the Gaussian kernel SVM in the CC-PEV method, the L-GEM based RBFNN improves the testing accuracies by about 4.98% overall and by 7.65% for different quantization tables in training and testing. When compared with RBFNNs trained using training MSE only, RBFNNs trained by the proposed method yield 5.16%, 9.92% and 4.02% improvement for different quantization tables in training and testing sets for Chen’s, CC-PEV and CF⁄, respectively. Detection accuracies of Chen’s and CC-PEV methods decrease 5.48% and 8.25%, respectively, when the quantization tables compressing training and testing images are different. RBFNNs without considering perturbation trained with Chen’s and CC-PEV features yield 4.45% and 6.37% decrement of detection accuracies, respectively, in the same case. In contrast, RBFNNs trained using the L-GEM with Chen’s and CC-PEV features only yield 2.49% and 2.90% decrement of accuracies, respectively. The least reduction of steganalysis accuracies yielded by the LG-Steganalyzer shows the robustness of the trained RBFNN with respect to difference in quantization tables for compressing the training and the testing images. This set of experiments shows that the proposed L-GEM based two-phase RBFNN training method yields good performances even when different feature sets are used. Experimental results show that the proposed method is robust to the change of quantization tables in testing images and also changes in feature sets. LG-Steganalyzer using CC-PEV’s features (CC-PEV-LG) yields the best overall performance while LG-Steganalyzer using CF⁄ features (CF⁄-LG) yields the best performance when training and testing images are compressed by the same quantization table. It shows that classifiers without minimizing sensitivity suffer from changes in quantization tables. The minimization of sensitivity in our method improves the robustness of quantization table changes. The final steganalysis performance is limited by both the classifier and the feature set being adopted. The proposed RBFNN training method improves the performance of steganalysis even when being compared with the SVM. Two future works are worth considering. First, investigate the L-GEM based SVM for enhancing its steganalysis performance. Second, propose a feature selection method based on sensitivity to select better feature combination among existing feature sets to improve the steganalysis accuracy.

391

4.2. Experimental results of re-compressing images using quantization tables for training images

392

Table 3 shows the experimental results for re-compressing images using the quantization table used in training. The nsF5 method works on standard quantization tables only, so we train different steganalysis methods using images compressed by standard quantization table of quality factor 75 (75 s). Considering that images are compressed by standard quantization table of quality factor 80 (80 s), and that a classifier is trained using 75 s images, the performance of the classifier on the

345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389

393 394 395

Q1 Please cite this article in press as: W.W.Y. Ng et al., Steganalysis classifier training via minimizing sensitivity for different imaging sources, Inform. Sci. (2014), http://dx.doi.org/10.1016/j.ins.2014.05.028

INS 10897

No. of Pages 14, Model 3G

2 June 2014 Q1

W.W.Y. Ng et al. / Information Sciences xxx (2014) xxx–xxx

13

430

80 s images and on those images after a re-compression by the 75 s will be compared. For MB1 and MB2, we also perform the same test on testing images compressed by 75 ns. In tests using 75 ns, we test all methods after re-compressing using 75s. The proposed method outperforms other methods without regarding the feature set in most experiments with at least 95% statistical significance as shown in Table 3. Table 3 shows that the overall testing accuracy by re-compressing is worse for Chen’s and CC-PEV feature sets, but not for CF⁄. The exceptions are for nsF5 using CF⁄ features for both RBFNNs, nsF5 using CC-PEV features with LG-Steganalyzer, both MB1 and MB2 using CF⁄ features for both RBFNNs with testing images compressed by 80 s and MB2 using Chen’s features with testing images compressed by 75 ns for all methods. This shows that re-compressing testing images with the training quantization table does not necessarily yield good steganalysis results. It is because re-compression changes the statistical characteristics of the image, and the result depends on the steganographic method, the steganalysis classifier and the steganalysis features. A robust classifier is still necessary for yielding a good steganalysis result. Rows ‘‘Overall’’, ‘‘Without Re-Compression’’ and ‘‘Re-compression’’ in Table 4 denote the average testing accuracies of all rows, rows of both 80 s testing & 75 ns testing, and rows of both 80 s ? 75 s & 75 ns ? 75 s, respectively, shown in Table 3. From Table 4, the average testing accuracy of steganalysis without re-compression yields 2.33% to 11.10% better performance in comparison to steganalysis after re-compressing images using the training quantization table for Chen’s and CC-PEV features, respectively. However, when CF⁄ feature set is used, the LG-Steganalyzer yields 5.6% better testing accuracy with re-compression in comparison to the one without re-compression. For CC-PEV features, the proposed method yields 7.18% and 9.38% better overall testing accuracies when compared to the original SVM and RBFNN trained using MSE, respectively. It shows that the LG-Steganalyzer method trains more robust steganalysis classifier with high testing accuracy with or without re-compression. Our experimental results show that re-compression of testing images using the training quantization table does not guarantee good steganalysis result. We should test the classification of both with and without re-compression to acquire more knowledge about which one is better for a particular pair of steganographic method and steganalysis feature set. In contrast, the LG-Steganalyzer consistently outperforms other methods with the same pair of features and steganography comparing to other classifiers. This further consolidates our contribution that a steganalysis classifier robust to images compressed by quantization tables other than the training quantization tables is very important to the success of real world steganalysis systems. In summary, the performance of the steganalysis classifier depends on its robustness to the change of quantization tables in the testing images. In real steganalysis applications, dealing with different quantization tables between training and testing images is unavoidable and the robustness of the classifier with respect to the changes of quantization tables used in testing images is important. Overall, the proposed method yields a good robustness to the changes of quantization tables in testing images and yields better accuracies for testing images compressed by both the same and different quantization tables in experiments. On the other hand, we also show that classifying an image compressed by different quantization tables may not yield the best steganalysis result by re-compressing it using the training quantization table. All these experiments show that the sensitivity proposed in this work is significant to enhance the generalization capability and the robustness of steganalysis method without regarding the set of features being adopted.

431

5. Conclusion

432

442

In the real-world application of steganalysis system, we cannot restrict the imaging source of JPEG being transmitted over the Internet. The discussion of this work focuses on the different quantization tables being used by different software and cameras. The steganalysis feature difference created by different quantization tables is formulated as a feature perturbation. Sensitivity is defined to measure the effect of the feature perturbation to a classifier. A two-phase training algorithm for the RBFNN is proposed based on the minimization of the L-GEM consisting of the training error and the sensitivity. Experimental results show a good robustness of the proposed method. The proposed RBFNN training method could be applied with different feature sets and yield a better performance in comparison to the SVM and the RBFNN trained by minimizing training error only in our experiments. The major contribution of this work is to provide a new training method for the RBFNN for robust steganalysis with respect to changes in quantization tables. Experimental results also show that the selection of feature set is vital to the steganalysis accuracy. A future research problem is to develop an automatic feature selection method for steganalysis using L-GEM with minimum expert knowledge involvement for robust steganalysis.

443

Acknowledgement

396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429

433 434 435 436 437 438 439 440 441

445

This work is supported by National Natural Science Foundation of China (61272201) and a Program for New Century Excellent Talents in University of China (NCET-11-0162).

446

References

444

447 448 449 450 451

Q3 Q4

[1] P. Sallee, Model-based methods for steganography and steganalysis, Int. J. Image Graph. 5 (1) (2005) 167–190. [2] J. Fridrich, T. Pevny´, J. Kodovsky´, Statistically undetectable JPEG steganography: dead ends challenges, and opportunities, in: Proceedings of the 9th Workshop on Multimedia & Security, ACM, 2007, pp. 3–14. [3] P. Sallee, Model based steganography, in: Lecture Notes in Computer Science, Springer-Verlag, 2004, pp. 154–167. [4] D. Yan, R. Wang, X. Yu, J. Zhu, Steganography for MP3 audio by exploiting the rule of window switching, Comput. Security 31 (5) (2012) 704–716.

Q1 Please cite this article in press as: W.W.Y. Ng et al., Steganalysis classifier training via minimizing sensitivity for different imaging sources, Inform. Sci. (2014), http://dx.doi.org/10.1016/j.ins.2014.05.028

INS 10897

No. of Pages 14, Model 3G

2 June 2014 Q1 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495

Q5

14

W.W.Y. Ng et al. / Information Sciences xxx (2014) xxx–xxx

[5] O. Cetin, A.T. Ozcerit, A new steganography algorithm based on color histograms for data embedding into raw video streams, Comput. Security 28 (7) (2009) 670–682. [6] Z. Khan, A.B. Mansoor, An analysis of quality factor on image steganalysis, in: The 7th International Conference on Informatics and Systems (INFOS), 2010, pp. 1–5. [7] I. Lubenko, A.D. Ker, Steganalysis with mismatched covers: do simple classifiers help?, in: Proceedings of the on Multimedia and Security, ACM, 2012, pp 11–18. [8] H. Farid, Digital image ballistics from JPEG quantization: a followup study, Tech. Rep. TR2008-638, Department of Computer Science, Dartmouth College, 2008. [9] W. Luo, Y. Wang, J. Huang, Detection of quantization artifacts and its applications to transform encoder identification, IEEE Trans. Inform. Forensics Security 5 (4) (2010) 810–815. [10] Q. Liu, A.H. Sung, B. Ribeiro, M. Wei, Z. Chen, J. Xu, Image complexity and feature mining for steganalysis of least significant bit matching steganography, Inform. Sci. 178 (1) (2008) 21–36. [11] W.-N. Lie, G.-S. Lin, A feature-based classification technique for blind image steganalysis, IEEE Trans. Multimedia 7 (6) (2005) 1007–1020. [12] V. Sabeti, S. Samavi, M. Mahdavi, S. Shirani, Steganalysis and payload estimation of embedding in pixel differences using neural networks, Pattern Recogn. 43 (1) (2010) 405–415. [13] C. Chen, Y.Q. Shi, JPEG image steganalysis utilizing both intrablock and interblock correlations, in: IEEE International Symposium on Circuits and Systems, 2008, pp. 3029–3032. [14] T. Pevny, J. Fridrich, Merging Markov and DCT features for multiclass JPEG steganalysis, in: Security, Steganography, and Watermarking of Multimedia Contents IX, vol. 6505, SPIE, 2007. [15] Q. Liu, A.H. Sung, Z. Chen, J. Xu, Feature mining and pattern classification for steganalysis of LSB matching steganography in grayscale images, Pattern Recogn. 41 (1) (2008) 56–66. [16] J. Kodovsky´, J. Fridrich, V. Holub, Ensemble classifiers for steganalysis of digital media, IEEE Trans. Inform. Forensics Security 7 (2) (2012) 432–444. [17] J. Dong, X. Chen, L. Guo, T. Tan, Fusion based blind image steganalysis by boosting feature selection, in: IWDW, Lecture Notes in Computer Science, vol. 5041, Springer, 2007, pp. 87–98. [18] P. Hayati, V. Potdar, E. Chang, A survey of steganographic and steganalytic tools for the digital forensic investigator, in: Workshop of Information Hiding and Digital Watermarking, 2007. [19] Y.Q. Shi, C. Chen, W. Chen, A Markov process based approach to effective attacking JPEG steganography, in: Proceedings of the 8th International Conference on Information Hiding, Springer-Verlag, 2007, pp. 249–264. [20] Q. Liu, A.H. Sung, M. Qiao, Z. Chen, B. Ribeiro, An improved approach to steganalysis of JPEG images, Inform. Sci. 180 (9) (2010) 1643–1655. [21] J. Kodovsky´, J. Fridrich, Calibration revisited, in: Proceedings of the 11th ACM Multimedia and Security Workshop, 2009, pp. 63–74. [22] D.S. Yeung, W.W.Y. Ng, D. Wang, E.C.C. Tsang, X.-Z. Wang, Localized generalization error model and its application to architecture selection for radial basis function neural network, IEEE Trans. Neural Netw. 18 (5) (2007) 1294–1305. [23] P.P.K. Chan, D.S. Yeung, W.W.Y. Ng, C.-M. Lin, J.N.K. Liu, Dynamic fusion method using localized generalization error model, Inform. Sci. 217 (2012) 1– 20. [24] D.S. Yeung, J. Li, W.W.Y. Ng, P.P.K. Chan, Mlpnn training via a multiobjective optimization of training error and stochastic sensitivity, IEEE Trans. Neural Netw. Learn. Syst., submitted for publication. [25] B. Sun, W.W. Ng, D.S. Yeung, P.P. Chan, Hyper-parameter selection for sparse ls-svm via minimization of its localized generalization error, Int. J. Wavelets, Multiresol. Inform. Process. 11 (03) (2013) 1350030. [26] W.W.Y. Ng, A. Dorado, D.S. Yeung, W. Pedrycz, E. Izquierdo, Image classification with the use of radial basis function neural networks and the minimization of the localized generalization error, Pattern Recogn. 40 (1) (2007) 19–32. [27] D.S. Yeung, P.P.K. Chan, W.W.Y. Ng, Radial basis function network learning using localized generalization error bound, Inform. Sci. 179 (19) (2009) 3199–3217. [28] M.H. Christian Igel, Empirical evaluation of the improved Rprop learning algorithms, Neurocomputing 50 (2003) 105–123. [29] UCID, (accessed August 2013).

496

Q1 Please cite this article in press as: W.W.Y. Ng et al., Steganalysis classifier training via minimizing sensitivity for different imaging sources, Inform. Sci. (2014), http://dx.doi.org/10.1016/j.ins.2014.05.028