Signal Processing: Image Communication 39 (2015) 75–83
Contents lists available at ScienceDirect
Signal Processing: Image Communication journal homepage: www.elsevier.com/locate/image
Images steganalysis using GARCH model for feature selection Saeed Akhavan, Mohammad Ali Akhaee, Saeed Sarreshtedari Department of Electrical and Computer Engineering, College of Eng., University of Tehran, Tehran 1458889694, Iran
a r t i c l e i n f o
abstract
Article history: Received 5 January 2015 Received in revised form 12 August 2015 Accepted 12 August 2015 Available online 24 August 2015
This paper introduces a new blind steganalysis method. The required image features are extracted based on generalized autoregressive conditional heteroskedasticity (GARCH) model and higher-order statistics of the images. These features are exploited during the classification stage to detect a variety of steganographic methods in both spatial and transform domains. The GARCH features are extracted from non-approximate wavelet coefficients, where the heavy-tailed distribution of the coefficients makes the GARCH model applicable. Besides, the second order statistics are used to develop features very sensitive to minor changes in natural images. The experimental results demonstrate that the proposed feature-based steganalysis method outperforms state of the art methods while running on the same order of the features. & 2015 Published by Elsevier B.V.
Keywords: Steganalysis Feature Extraction GARCH Model Higher order statistics
1. Introduction With the overgrowing development of the Internet and its applications during the last decade, the need for the secure communications is more crucial nowadays. For instance, steganography techniques are developed to conceal the existence of the secret message during a seemingly normal communication, without raising any doubt for the observers. On the other hand, steganalysis methods are simultaneously evolved to combat this threat and detect the abuse of digital media. From a general viewpoint, steganalysis schemes are divided into two categories, namely non-blind (methodspecific) and blind (universal) techniques. The non-blind techniques are developed and optimized only to detect some specific schemes of steganography such as the least significant bit (LSB) embedding methods [1–6]. The Chisquare [2] and RS [1] steganalyses detect the simple LSB (least significant bit) replacement steganography finding certain abnormalities imposed to the histogram of the E-mail addresses:
[email protected] (S. Akhavan),
[email protected] (M.A. Akhaee),
[email protected] (S. Sarreshtedari). http://dx.doi.org/10.1016/j.image.2015.08.006 0923-5965/& 2015 Published by Elsevier B.V.
image due to LSB embedding. In 2002, Dumitrescu proposed a statistical analysis of neighboring samples to efficiently analyze the LSB embedding and give an estimation of the message length. In fact, Dumitrescu's method is a special case of RS analysis. Her work was valuable in deriving the detector using a different methodology that enabled further development of structural LSB detectors [7]. Histogram of characteristic function (HCF) is also used in HCF-based steganalysis schemes to expose the LSB embedding [3,4]. On the other hand, in steganalysis systems based on machine learning or blind steganalysis methods, there exists no prior information on the applied steganography technique. The blind approach to steganalysis was initiated by two major works of Avcibas [8] and Farid [9], and has been developed significantly during recent years [10–16]. Blind steganalysis methods are developed based on machine learning and classification principles and do not exploit any prior knowledge on the embedding method. In order to design a universal steganalyzer, several steganographic methods are applied to a collection of the training images. Afterwards, appropriate features representing a digest of the total information of the images are extracted. At the next step, classifiers are applied to find the optimum classification boundary between the cover and stego
76
S. Akhavan et al. / Signal Processing: Image Communication 39 (2015) 75–83
images based on the feature values extracted from the cover and stego training images. New images are classified as cover or stego images, by simply comparing the feature values to the classification boundary. Therefore, there are two important steps for every blind steganalyzer, namely, feature extraction and classification. The first important stage of every blind steganalyzer is to extract appropriate features. These features must be designed sophisticatedly to reflect every subtle modification of the original image. In other words, these features are the summary of the important and sensitive information of the images. The main goal of this paper is to find a set of these appropriate features based on the generalized autoregressive conditional Heteroskedasticity (GARCH) model and the second order statistics of the images. As mentioned in previous paragraph, the first set of features is based on the GARCH modeling. Autoregressive conditional heteroskedasticity (ARCH) processes were introduced by Engle in 1982 [17]. The samples of the ARCH processes are zero-mean and serially uncorrelated with conditional variances depending on the previous samples. Bollersev, in 1986, introduced a generalized ARCH model called as GARCH [18]. In the GARCH model, conditional variances are not only dependent to the previous sample values, but also to their conditional variances. Although this model had been initially developed to analyze the financial time series [17,18], but it has been shown recently that it is an appropriate tool to model the non-approximate wavelet coefficients with heavytailed and non-stationary distributions [19]. In this paper, we apply the GARCH model in the wavelet domain to extract features boosting the steganography footprints. The other set of the features are extracted based on the second order statistics of the images in the transfer domain which are very sensitive to the subtle modifications. These types of features have been already exploited in steganalyzing literature offering significant results [11,15,20]. The next step to design a universal (blind) steganalyzer is to select the suitable classifier. Several classifiers such as support vector machines (SVM) [21], neural network [22], and ensemble [16] have been exploited in the literature of the steganalyzing. In this paper, we choose the ensemble classifier because of its simplicity and efficiency. It uses Fisher linear discriminants (FLD) as base learners trained on random subspaces of the feature space. The out-of-bag estimate of the testing error on bootstrap samples of the training set is used to automatically specify the random subspace dimensionality and the number of base learners as described in [16]. The final classifier decision is made by combining the decisions of its base learners. By changing the threshold value on the number of base learners' decision (positive/negative), the value of false alarm and missed detection probability can be adjusted. The efficient analyzing of the recent sophisticated and diverse steganographic algorithms necessitates applying a large number of features to reflect even the weakest dependencies in the images. However, computational complexity of the classifiers such as SVM restricts the steganalyzer to use only limited number of the features. This has been the motivation behind choosing ensemble classifier which decreases the complexity of the classification algorithm significantly, while keeps the ability of dealing
with a large number of features and provides approximately the same performance. Moreover, ease of dealing with large feature sets makes it possible to avoid the need for conducting numerous experiments to choose the optimized features among all for each specific embedding method. The rest of this paper is organized as follows. Section 2 introduces the GARCH model. Extracting the features based on the GARCH model is explained in Section 3, while statistical features are introduced in Section 4. Section 5 is dedicated to the experimental results, and Section 6 concludes the paper.
2. GARCH model Samples from autoregressive conditional Heteroskedasticity processes are zero mean, serially uncorrelated with non-constant variances conditioned on the past samples as were introduced by Engle [17]. Bollerslev developed the GARCH model which is widely used for modeling financial time series [18]. Let ϵt represent a 1D stochastic process with zero mean. We define GARCH (p, q) for ϵt as below
ϵt =
ht ηt ;
ηt ∼ N (0, 1);
p
ht = k +
t = 1, 2, …, tend
q
∑ αiht − i + ∑ βjϵ2t − j i=1
j=1
ϵ20 = h0 = var (ϵt ) =
1 N
tend
∑ ϵ2t t=1
(1)
where ηt is a normal process independent of ht with zero mean and unity variance, ht denotes the conditional variance of ϵt and k, αi's and βj's are the GARCH parameters that should be estimated. Now let ψt 1 denote all information until t − 1
ψt − 1 = {ϵ 0, ϵ1, …, ϵt − 1, h0 , h1, …, ht − 1} ⇒ ϵ|ψt − 1 ∼ N (0, ht )
(2)
It can be shown that GARCH process is wide sense stationary if and only if [18] p
q
∑ αi + ∑ βj < 1 i=1
(3)
j=1
In order to estimate the GARCH parameters we use maximum likelihood (ML) estimation. Suppose that we want to apply the GARCH model to the process of yt. For this sake, we define the zero mean process of ϵt as below
ϵt = yt − rt . b;
ϵt =
ht ηt
rt = [yt − 1 , yt − 2 ,..., yt − M ] ψt − 1 = {y0 , y1 ,..., yt − 1 , h0 , h1 ,..., ht − 1} b = [b1, b2 ,..., bM ] t = {1, 2 ,..., M , M + 1 ,..., tend}
(4)
where rt and b are the vectors of explanatory variables and unknown parameters respectively. Therefore, we need to
S. Akhavan et al. / Signal Processing: Image Communication 39 (2015) 75–83
77
estimate the vector b as well. We can write
f (yt |ψt − 1) =
⎛ (y − rt . b)2 ⎞ ⎟ exp⎜⎜ − t ⎟ 2ht 2πht ⎝ ⎠ 1
p
ht = k +
(5)
q
∑ αiht − i + ∑ βj(yt − j i=1
− rt − j. b)2
j=1
(6)
Then, the Likelihood function is formed as below N
LF (γ ) =
∏ f (yt |ψt − 1);
γ = {k, α1, …, αp, β1, …, βq , b}
t=1
(7)
We find the parameter set γ that maximizes the likelihood function. Having this optimized parameter set found, the GARCH model is established completely.
3. Feature extraction based on GARCH model 3.1. GARCH model variances The variances of the GARCH model are collected from some of the wavelet subbands as our GARCH-based feature set. In this section, we discuss our decision in more details using hypothesis testing and ML estimation. Consider the hypotheses H0 and H1, which describe making decision on the received image as a cover or stego image, respectively. All the steganographic algorithms can be modeled as below
H0: yij = sij H1: yij = sij + nij
(8)
where nij is the additive noise added to the pixel value of sij located in row i and column j due to the embedding and yij are the pixels of the received image. Without the loss of generality, we change the 2D indexing to a 1D one for the sake of simplicity
H0: yk = sk ,
k = 1, 2, …, n
H1: yk = sk + nk ,
k = 1, 2, …, n
(9)
where n is the number of total image pixels. The other assumption here is to model the additive noise as a random variable of zero mean Gaussian distribution. In the steganalysis literature, the embedded message is assumed to be independent from the cover image. The message, modifies the cover image as an additive noise to generate the stego image. When the embedding is applied to the transform domains such as discrete cosine transform (DCT), the effect of this additive noise back in the spatial domain is well modeled as a zero mean Gaussian dis2 tribution of variance s0, i.e. nk ∼ N (0, σ02). In order to setup a method in which we are capable to exploit the above-mentioned analysis, the received image is transformed to the wavelet domain using Haar or other decomposition filters. In this domain, we can assume an approximate distribution for the image (S) itself which enables us to make decision based on detection and estimation techniques. However, since the variance of the
Fig. 1. Proper wavelet subbands for the extraction of the GARCH variances.
additive noise is not large enough, we are interested in subbands with the least power because in these subbands, the variance difference of the H0 and H1 hypotheses are large enough to let us separate them efficiently. Note that the additive Gaussian noise remains Gaussian after enduring linear time invariant (LTI) filters such as wavelet transform. Since the wavelet filters are normalized in energy, the variance of the noise remains also the same. According to the above-mentioned discussion, the approximation subband of the wavelet transform is not useful for our application. However, the other subbands are possible candidates since they include detailed information of the image. Working on the horizontal, vertical, and diagonal subbands using the procedure explained in Fig. 1, we achieve a total number of 117 subbands in the second, third and forth levels of decomposition. Now we rewrite the hypotheses in each new subband
H0: yk = sk ,
σ y2k = σ s2k
H1: yk = sk + nk , k = 1, 2, …, n
σ y2k = σ s2k + σ02 (10)
where each sk is assumed to be a random variable with Gaussian distribution and the variance of σs2k . Although we
78
S. Akhavan et al. / Signal Processing: Image Communication 39 (2015) 75–83
do not need σs2k here, but it can be achieved by having σ y2 k 2 known according to the GARCH model and s0 [23]. Now we apply the GARCH model to the yk's to find the conditional variances and use the ML estimation to make decision
H0: f (y|H0)
amount of information. Therefore, it will be reasonable to replace the variances of each subband with its GARCH parameters as the feature set. GARCH (1, 1) is a prevalent and efficient model to which we stick here. As a result, three features are extracted from each subband which results in the total number of 117 × 3 = 351 GARCH features obtained from all those 117 subbands in Fig. 1.
= f (y1|H0)f (y2 |y1 , H0)…f (yn |y1 , y2 , …, H0) H1: f (y|H1) = f (y1|H1)f (y2 |y1 , H1)…f (yn |y1 , y2 , …, H1)
4. Feature extraction based on the second order statistics
(11) 4.1. Basics
Then the decision is made based on (12) stego
f (y|H0) ≶ f (y|H1) cover
(12)
According to the GARCH model, we know that the distributions of the above functions are all Gaussian
f (y1|Hi ) ∼ N (0, σ y21) f (y2 |y1 , Hi ) ∼ N (0, σ y22) ... f (yn |y1 , y2 ,..., Hi ) ∼ N (0, σ y2n )
(13)
We see from (13) that the GARCH variances are very critical to the decision making process. This verifies our previous idea to choose them as an appropriate feature set. In order to show mathematical efficiency of our analysis, 2 we apply a random Gaussian noise with the variance of s0 as the secret message to the images. Our simulation results show that transformed domain methods like JSteg at the embedding rate of 5% up to full-rate in terms of bit per AC coefficient (bpac) in DCT domain result in additive noise 2 with s0 from 4 to 20 back in the spatial domain. Therefore, 2 in Section 5.2 we set the s0 at the same range. Then the results are derived by extracting the GARCH variances in only one subband of those in Fig. 1. This verifies the validity of the above-mentioned discussion. 3.2. GARCH model parameters In the previous section, we found a solution for the H0 (cover image) and H1 (stego image) hypotheses using the ML estimation and came to this equation stego
f (y|H0) ≶ f (y|H1) cover
(14)
By the way, the large number of variances (which is linearly dependent to the size of the subbands) might impose considerable complexity to our steganalysis method. Therefore, we have to find a way to more compress the variances information. We know that the GARCH model parameters, i.e. αi's, βj's and k, are derived by maximizing the distribution function. These parameters are more limited in number compared to the variances and they convey also the same
The histogram of the pixel values is the first order statistics of the image which represents the probability density function (PDF) of the image pixel values in some ways. The least significant bit (LSB)-based steganographic algorithms such as LSB matching (LSBM) and LSB matching revisited (LSBMR) [24] try to keep the histogram as least modified as possible, while some other techniques such as F5 [25] and model-based (MB) steganography [26] do the same for the histogram of the quantized DCT coefficients. This fact leads to designing steganalyzers that move forward to the application of the higher order statistics. However, increasing the order of the applied moments imposes more complexity on our method, without necessarily guaranteeing the achievement of more information. Therefore, the second order statistics are usually considered as the best tools to fulfill the steganalysis purposes [15]. Following in this section, we discuss about the second order statistics in more details. The second order statistic reflects the density function of the co-occurrence of two different image pixel values. Every pair of image pixel values can be considered to form a second order statistics. However, in the image subject, we are more interested in pairs of pixels somehow related to each other; such as adjacent pixels in the spatial domain [27]. One important challenge of using higher order statistics is their large number. As an example, for 8-bit gray-scale images; spatial-domain co-occurrence matrix has 256 × 256 = 65, 536 entries. This number of statistics is almost impossible to be dealt with. Thus, we need to compress this information. One idea is to consider the differences between consecutive pixels rather than themselves, as mentioned in [11]. In this case, we can restrict the values of statistics to the range of [ − T , T ] to limit the number of second order statistics. In the next equations, we denote the original matrix with F which can be the image itself, or its transformed version in quantized DCT or wavelet subbands. We also define the different matrix D in which its entries are different of consecutive entries of the original image. We define and use four types of difference matrices in our work. The horizontal, vertical, diagonal and minor diagonal difference matrices are defined based on (15)
S. Akhavan et al. / Signal Processing: Image Communication 39 (2015) 75–83 1
Dih, j = Fi, j − Fi − 1, j Div, j = Fi, j − Fi, j − 1 Did, j = Fi, j − Fi − 1, j − 1 Dimd , j = Fi, j − Fi − 1, j + 1
(15)
where Fi − m, j − n is Fi, j when shifted by m rows to the right and n columns down. Marginally problematic pixels are ignored while calculating the difference matrices. Now, we have Dh, Dv, Dd and Dmd with entries limited to the range of [ − T , T ] from which, we want to extract the second order statistics. While the joint second order density function is considered here as the feature set, one can choose the conditional distribution as well. Assuming that M and N represent the number of rows and columns of the original image, the joint distribution function for Dh is calculated as below. Similar processes are done for the other three matrices
NJuh, v = Pr(Dih, j + 1 = u, Dih, j = v) ∑i ∑j δ(Dih, j + 1 = u, Dih, j = v) M × (N − 1)
;
u, v ∈ [ − T , T ]
(16)
NJ represents the neighboring joint density function and has (2T + 1)2 number of entries. In order to further reduce the number of features, NJ1 and NJ2 are defined and used in our work
NJu1, v = NJu2, v =
NJuh, v
+
79
NJuv, v
outcome of averaging over all NJ11, NJ21,...,NJL matrices, where L stands for the number of 8 8 blocks. Likewise, the NJ2 matrix is calculated. Since dependencies are considered only inside blocks, this method is called interblock correlation which is summarized in Fig. 3. Here we choose NJ (2, 4) resulting to a number of 2 × (2 × 4 + 1)2 = 162 features
NJ m =
1 L
L
∑ NJim,
m ∈ {1, 2}
i=1
(18)
3. Features extracted from intrablock correlation: In this case, we put certain entries (frequencies) of all 8 8 blocks excluding the first entry (DC coefficient) together to form a set of 63 new F matrices. The D and NJ matrices are calculated similar to the former sections. The number of matrices is reduced by averaging over all of them similar to the previous section. For example, we 1 have NJ1 as the average over NJ11, NJ21 ,..., NJ63 this time. Since the similar frequencies in different blocks are considered in this method, it is called intrablock correlation which is presented in Fig. 4. Here, we again choose NJ (2, 4) resulting in 162 features
NJ m =
1 63
63
∑ NJim,
m ∈ {1, 2}
i=1
(19)
All extracted features discussed above are summarized in Table 1.
2 NJud, v + NJumd ,v 2
5. Simulation results
(17)
NJ1 and NJ2 have also (2T + 1)2 entries. Since the process explained above will be referred to later, we summarize it in the block diagram of Fig. 2. The input to this block is the original matrix F and outputs are NJ1 and NJ2. The order equals two and T is the threshold for clipping. In the next sections, we discuss about feature extraction based on NJ's in the DCT domain. 4.2. DCT domain feature extraction Considering the works presented in [11,12], three types of the second order features are extracted. These three sets are explained in the following subsections separately 1. Features extracted from the whole dct coefficient matrix: In this case, we consider the matrix of quantized DCT coefficients as F matrix defined in the Section 4.1 and extract the D and NJ matrices in four directions, similarly. For the sake of feature extraction, we consider the NJ (2, 6) which leads to a set of 2 × (2 × 6 + 1)2 = 338 features. 2. Features extracted from interblock correlation: These features are extracted through considering the DCT 8 8 blocks as the F matrix. For each block, D and NJ matrices are calculated separately. Apparently, this process results in a very large number of matrices. In order to produce manageable results, we average over all matrices. After that, we have NJ1 which is the
5.1. Experimental setup We set up our steganalysis method based on the features derived in Sections 3 and 4, and the ensemble classifier to analyze a variety of steganographic methods in the spatial and the transform domain. We use the ensemble classifier with default settings available from http://dde. binghamton.edu/ download/ensemble. As described in the original publication [16], the ensemble by default minimizes the total classification error probability under equal priors PE. The random subspace dimensionality and the number of base learners are found by minimizing the outof-bag (OOB) estimate of the testing error, EOOB, on bootstrap samples of the training set. For forming our database, we have mixed 10,000 images from each of the BOSSbase v1.01 [28] and BOWS [29] database for training and test stages. All images are cropped in case, to the size of 256 256 with the central pixel remaining the same. One half of the images are chosen randomly for the training phase, while the results are derived by testing the classifier over the other half. The comparison criterion is the minimum error probability (PE) defined in (20)
1 PE = min (PFA + PMD(PFA)) PFA 2
(20)
PFA and PMD stand for the probabilities of the false alarm and the missed detection. In the following, we firstly
80
S. Akhavan et al. / Signal Processing: Image Communication 39 (2015) 75–83
ROC
1 Fig. 2. Co-occurrence matrix extraction.
True positive rate
0.8 0.6 0.4 Noise Variance=16 Noise Variance=9 Noise Variance=4
0.2
0
0
0.2
0.4
0.6
0.8
1
False positive rate Fig. 5. ROC performance comparison for additive noise with different powers.
Fig. 3. Feature extraction in the DCT domain using interblock correlation.
investigate the performance of the variances of the GARCH model. Later on, we investigate the performance of the proposed method for important spatial and transform domain steganographic algorithms. Finally, we compare the performance of the proposed steganalysis method with state-of-the-art schemes. 5.2. Variances of the GARCH model As explained, stego images in this section are generated by adding a random Gaussian noise to the cover images. The GARCH model variances are collected through nine subbands at the second level and 27 subbands at the third level wavelet decomposition as the features. The 256 256 size of the images results in a set of 32 × 32 = 1024 features by applying a third level decomposition in each subband. The ROC curve is plotted for variances of 4, 9 and 16 in Fig. 5 for only 1024 features from one of the third level subbands. We can observe that for the false alarm probability of 0.1, the probability of detection is greater than 90% as long as the noise variance is greater than nine. The simulation results demonstrate that the variances of the GARCH model are reliable for those subbands in which the signal to noise ratio is low enough. 5.3. Experimental results for the proposed steganalysis method
Fig. 4. Feature extraction in the DCT domain using intrablock correlation.
Table 1 Details of the designed feature set. Family
GARCH model
Extraction domain
Feature type
Number
DCT DCT (interblock) DCT (intrablock) Wavelet
NJ(2,6) NJ(2,4) NJ(2,4) GARCH model parameters Total
338 162 162 351 1013
The performance of the proposed steganalyzer is examined for several steganography techniques. The embedding rates are presented in bit per pixel (bpp) and bit per coefficient (bpc) for spatial and frequency domain methods respectively: 1. Spatial domain methods: In this subsection, our proposed steganalysis method is supposed to detect the stego images generated by LSB-based methods such as LSB replacement, LSBM, LSBMR [24] and LSB one-third [30]. As the most advanced algorithm among the recent steganographic schemes, the spatial domain steganography system of HUGO [31], WOW [32] and S-UNIWARD [33] was also implemented for our experiments.
S. Akhavan et al. / Signal Processing: Image Communication 39 (2015) 75–83
Table 2 Minimum error probability performance of the proposed method for spatial domain techniques. Embedding method
25%
50%
75%
100%
LSB LSBM LSBMR LSB one-third HUGO WOW S-UNIWARD
0.135 0.150 0.160 0.201 0.220 0.280 0.310
0.120 0.128 0.149 0.118 0.170 0.199 0.202
0.090 0.110 0.130 0.113 0.151 0.168 0.197
0.015 0.092 0.103 0.107 0.110 0.131 0.141
Table 4 Minimum error probability performance of the proposed method compared to existing schemes for spatial domain steganographic methods. Embedding method
LSB
LSBM
LSBMR Table 3 Minimum error probability performances of the proposed method for transform domain techniques. Embedding method
5%
10%
15%
20%
25%
PQ F5 nsF5 MB1 MB2 YASS J-UNIWARD
0.005 0.137 0.175 0.070 0.167 0.288 0.491
0.004 0.104 0.111 0.040 0.100 0.214 0.450
0.002 0.051 0.060 0.011 0.051 0.201 0.431
0.001 0.002 0.003 0.005 0.007 0.165 0.401
0.000 0.001 0.002 0.004 0.006 0.101 0.201
LSB One-third
HUGO
The minimum error probability performance is summarized in Table 2. 2. Transform domain methods: Our proposed steganalysis method was also examined for transform domain steganographic schemes. PQ [34], F5 [25], nsF5 [35], model-based (MB1 and MB2) [26] J-UNIWARD [33] and YASS [36] embedding algorithms were implemented in order to investigate the performance of the proposed method. Block size is set to 9 for the YASS algorithm. The performance is summarized in Table 3. 5.4. Performance comparison This section is dedicated to comparing the performance of the proposed steganalysis method to six of the state-ofthe-art steganalysis schemes. The first scheme is the steganalyzer proposed by Chen in 2008 which works based on 486 features extracted from the quantized DCT interblock and intrablock dependencies and is rather applicable to the transform domain embedding algorithms [12]. CCPEV steganalyzer was introduced in 2007 [13] and its improved version in 2009 [37] is the second comparing method which exploits a set of 548 features for the sake of classification. The second order Markov model and DCTbased features along with a calibration strategy comprise this steganalyzer. The third steganalyzer is SPAM feature set presented in 2010 with its all 686 features extracted from the third order Markov dependencies [15]. Introduced in 2011, the CCC steganalysis is another method in this comparison, which consists of 48,600 features based on the Rich Model [14]. The next steganalyzer is DCTR which uses a novel feature set for steganalysis of JPEG images. For this method, 8000 features are extracted as first-order statistics of quantized noise residuals obtained from the decompressed JPEG image using 64 kernels of the discrete cosine
81
WOW
S-UNIWARD
Rate (%)
SPAM 686D
PSRM 12870D
Proposed 1013D
25 50 100 25 50 100 25 50 100 25 50 100 25 50 100 25 50 100 25 50 100
0.122 0.129 0.071 0.143 0.132 0.097 0.155 0.141 0.099 0.212 0.131 0.185 0.316 0.289 0.201 0.399 0.352 0.321 0.401 0.387 0.371
0.141 0.130 0.090 0.152 0.140 0.104 0.161 0.150 0.134 0.240 0.185 0.120 0.239 0.177 0.101 0.300 0.190 0.122 0.312 0.201 0.127
0.135 0.120 0.015 0.150 0.128 0.092 0.160 0.149 0.103 0.201 0.118 0.107 0.220 0.170 0.110 0.280 0.199 0.131 0.310 0.202 0.141
transform [38]. The last approach is PSRM with 12,870 features which projects neighboring residual samples into a set of random vectors and takes the first-order statistic (histogram) of the projections as the feature [39]. CHEN, CC-PEV, CCC and DCTR steganalyzers are useful only for transform embedding domain but SPAM and PSRM steganalyzers are suitable for both spatial and transform domain embedding methods. Performance of these steganalyzers are compared in terms of minimum error probability in Tables 4 and 5 for several spatial and transform domain embedding techniques. The ROC comparison is also provided in Fig. 6 for MB2 embedding at the rate of 5% which shows the superiority of the proposed method. The results show that the proposed method provides high performance almost for all cases in spatial and transform domain techniques.
6. Conclusion A new steganalysis method using a moderate number of features have been proposed in this work. The feature set is the combination of both GARCH model features and higher order statistics in the transform domain. The GARCH model has been used to properly model the heavytailed distribution of the non-approximate wavelet or other transform domain. In this way, a set of GARCH model based features are extracted to efficiently help detecting the transform domain embedding schemes. Beside the GARCH features, we extract a set of features based on the second order image statistics. Higher order statistics help the steganalyzer to detect those embedding methods aiming at keeping the first order statistics (histogram, average, etc.) as least modified as possible. These features are extracted from transform domain to comply with the requirements of a blind steganalyis method. The
82
S. Akhavan et al. / Signal Processing: Image Communication 39 (2015) 75–83
Table 5 Minimum error probability performance of the proposed method compared to existing schemes for transform domain steganographic methods. Embedding method
PQ
F5
nsF5
MB1
MB2
YASS
J-UNIWARD
Rate (%)
CHEN 486D
CC-PEV 548D
SPAM 686D
CCC 48600D
DCTR 8000D
PSRM 12870D
Proposed 1013D
5 10 20 5 10 20 5 10 20 5 10 20 5 10 20 5 10 20 5 10 20
0.027 0.025 0.022 0.240 0.141 0.003 0.157 0.136 0.004 0.122 0.101 0.008 0.247 0.144 0.020 0.360 0.340 0.300 0.493 0.491 0.480
0.006 0.004 0.002 0.100 0.030 0.003 0.195 0.186 0.004 0.057 0.035 0.008 0.230 0.131 0.009 0.362 0.341 0.245 0.495 0.492 0.471
0.006 0.005 0.003 0.245 0.146 0.004 0.365 0.350 0.085 0.285 0.201 0.040 0.282 0.185 0.050 0.300 0.280 0.210 0.494 0.490 0.474
0.007 0.006 0.005 0.115 0.015 0.003 0.217 0.114 0.005 0.085 0.033 0.005 0.302 0.201 0.060 0.367 0.336 0.305 0.493 0.481 0.480
0.006 0.005 0.002 0.103 0.090 0.003 0.147 0.110 0.005 0.063 0.039 0.007 0.201 0.139 0.006 0.302 0.280 0.200 0.490 0.481 0.460
0.010 0.009 0.007 0.112 0.097 0.004 0.221 0.191 0.171 0.219 0.160 0.080 0.307 0.220 0.010 0.320 0.300 0.276 0.492 0.490 0.486
0.005 0.004 0.001 0.137 0.104 0.002 0.175 0.111 0.003 0.070 0.040 0.005 0.167 0.100 0.007 0.288 0.214 0.165 0.491 0.450 0.401
MB2 5%
1
True positive rate
0.8
0.6
ccc ccpev cchen SPAM Proposed y=x PSRM DCTR
0.4
0.2
0 0
0.2
0.4
0.6
0.8
1
False positive rate Fig. 6. ROC performance comparison for model-based embedding method.
experimental results illustrate that the proposed featurebased steganalysis method outperforms state-of-the-art schemes while using the same order of features.
References [1] J. Fridrich, M. Goljan, R. Du, Detecting lsb steganography in color, and gray-scale images, IEEE Multimed. 8 (4) (2001) 22–28. [2] A. Westfeld, A. Pfitzmann, Attacks on steganographic systems, in: Proceedings of the 3rd International Workshop on Information Hiding, vol. 1768, 1999, pp. 61–76. [3] J.J. Harmsen, W.A. Pearlman, Steganalysis of additive-noise modelable information hiding, in: Society of Photo-Optical Instrumentation Engineers (SPIE) Conference, vol. 5020, pp. 131–142.
[4] A.D. Ker, Steganalysis of lsb matching in grayscale images, IEEE Signal Process. Lett. 12 (June (6)) (2005) 441–444. [5] L. Zhi, S.A. Fen, Y.Y. Xian, A lsb steganography detection algorithm, in: 14th IEEE Proceedings on Personal, Indoor and Mobile Radio Communications, 2003. PIMRC 2003, vol. 3, 2003, pp. 2780–2783. [6] T. Zhang, X. Ping, Reliable detection of lsb steganography based on the difference image histogram, in: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings (ICASSP ’03), vol. 3, 2003, pp. III-545–8. [7] S. Dumitrescu, X. Wu, Z. Wang, Detection of lsb steganography via sample pair analysis, IEEE Trans. Signal Process. 51 (7) (2003) 1995–2007. [8] I. Avcibas, N. Memon, B. Sankur, Steganalysis using image quality metrics, IEEE Trans. Image Process. 12 (2) (2003) 221–229. [9] H. Farid, Detecting hidden messages using higher-order statistical models, in: 2002 International Conference on Image Processing, vol. 2, 2002, pp. 905–908. [10] W.-N. Lie, T.-I. Lin, A feature-based classification technique for blind image steganalysis, IEEE Trans. Multimed. 7 (6) (2005) 1007–1020. [11] Y.Q. Shi, C. Chen, W. Chen, A Markov process based approach to effective attacking jpeg steganography, in: Information Hiding, Springer, Berlin, 2007, pp. 249–264. [12] C. Chen, Y. Shi, Jpeg image steganalysis utilizing both intrablock and interblock correlations, in: IEEE International Symposium on Circuits and Systems, 2008. ISCAS 2008, 2008, pp. 3029–3032. [13] T. Pevny`, J. Fridrich, Merging Markov and dct features for multiclass jpeg steganalysis, in: Proceedings SPIE, Electronic Imaging, Security, Steganography, and Watermarking of Multimedia Contents IX, vol. 6505, p. 3, 2007. [14] J. Kodovsky`, J. Fridrich, Steganalysis in high dimensions: Fusing classifiers built on random subspaces, in: IS&T/SPIE Electronic Imaging. International Society for Optics and Photonics, San Francisco, 2011, pp. 1-13. [15] T. Pevny, P. Bas, J. Fridrich, Steganalysis by subtractive pixel adjacency matrix, IEEE Trans. Inf. Forensics Secur. 5 (2) (2010) 215–224. [16] J. Kodovsky, J. Fridrich, V. Holub, Ensemble classifiers for steganalysis of digital media, IEEE Trans. Inf. Forensics Secur. 7 (2) (2012) 432–444. [17] R.F. Engle, Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation, Econometrica 50 (4) (1982) 987–1007. [18] T. Bollerslev, Generalized autoregressive conditional heteroskedasticity, J. Economet. 31 (3) (1986) 307–327. [19] M. Amirmazlaghani, H. Amindavar, A. Moghaddamjoo, Speckle suppression in sar images using the 2-d garch model, IEEE Trans. Image Process. 18 (2) (2009) 250–259.
S. Akhavan et al. / Signal Processing: Image Communication 39 (2015) 75–83
[20] Q. Liu, A.H. Sung, M. Qiao, Z. Chen, B. Ribeiro, An improved approach to steganalysis of JPEG images, Inf. Sci. 180 (9) (2010) 1643–1655. [21] C.C. Chang, C.J. Lin, Libsvm: A Library for Support Vector Machines, 2001, software available at 〈http://www.csie.ntu.edu.tw/cjlin/ libsvm〉. [22] Y. Shi, G. Xuan, D. Zou, J. Gao, C. Yang, Z. Zhang, P. Chai, W. Chen, C. Chen, Image steganalysis based on moments of characteristic functions using wavelet decomposition, prediction-error image, and neural network, in Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, 2005, pp. 4 pp.–. [23] M. Amirmazlaghani, H. Amindavar, A novel statistical approach for speckle filtering of sar images, in: Digital Signal Processing Workshop and 5th IEEE Signal Processing Education Workshop, 2009. DSP/SPE 2009. IEEE 13th, 2009, pp. 457–462. [24] J. Mielikainen, Lsb matching revisited, Signal Process. Lett. IEEE 13 (5) (2006) 285–287. [25] A. Westfeld, F5-A steganographic algorithm, in: Lecture Notes in Computer Science, Springer, Berlin, 2001, vol. 2137, pp. 289–302. [26] P. Sallee, Model-based steganography, in: Digital Watermarking, Springer, Berlin, 2004, pp. 154–167. [27] Q. Liu, A.H. Sung, M. Qiao, Neighboring joint density-based jpeg steganalysis, ACM Trans. Intell. Syst. Technol., vol. 2, no. 2, pp. 16:116:16, Feb. 2011. [28] T. Filler, T. Pevny, P. Bas, Boss (Break Our Steganography System), July 2010, 〈http://boss.gipsa-lab.grenoble-inp.fr〉. [29] P. Bas, T. Furon, Bows-2, July 2007, 〈http://bows2.gipsa-lab.inpg.fr〉. [30] S. Sarreshtedari, M. Akhaee, One-third probability embedding: a new + − 1 histogram compensating image least significant bit steganography scheme, Image Process. IET 8 (February (2)) (2014) 78–89.
83
[31] T. Pevny`, T. Filler, P. Bas, Using high-dimensional image models to perform highly undetectable steganography, in: Information Hiding. Springer, Berlin, 2010, pp. 161–177. [32] V. Holub, J. Fridrich, Designing steganographic distortion using directional filters, in: 2012 IEEE International Workshop on Information Forensics and Security (WIFS), December 2012, pp. 234–239. [33] V. Holub, J. Fridrich, T. Denemark, Universal distortion function for steganography in an arbitrary domain, in: Information Security. Springer, Berlin, 2014. [34] J. Fridrich, M. Goljan, D. Soukal, Perturbed quantization steganography, Multimed. Syst. 11 (2) (2005) 98–107. [35] J. Fridrich, T. Pevný, J. Kodovský, Statistically undetectable jpeg steganography: dead ends challenges, and opportunities, in: Proceedings of the 9th Workshop on Multimedia & Security, Series MM& Sec '07, ACM, New York, NY, USA, 2007, pp. 3–14. [36] K. Solanki, A. Sarkar, B. Manjunath, Yass: yet another steganographic scheme that resists blind steganalysis, in: Information Hiding, Springer, Berlin, 2007, pp. 16–31. [37] J. Kodovský, J. Fridrich, Calibration revisited, in Proceedings of the 11th ACM Workshop on Multimedia and Security, Series MM& #38; Sec '09, ACM, New York, NY, USA, 2009, pp. 63–74. [38] V. Holub, J. Fridrich, Low-complexity features for jpeg steganalysis using undecimated dct, IEEE Trans. Inf. Forensics Secur. 10 (February (2)) (2015) 219–228. [39] V. Holub, J. Fridrich, Random projections of residuals for digital image steganalysis, IEEE Trans. Inf. Forensics Secur. 8 (December (12)) (2013) 1996–2006.