Towards Feature Representation for Steganalysis of Spatial Steganography
Journal Pre-proof
Towards Feature Representation for Steganalysis of Spatial Steganography Ping Wang , Fenlin Liu, Chunfang Yang PII: DOI: Reference:
S0165-1684(19)30474-8 https://doi.org/10.1016/j.sigpro.2019.107422 SIGPRO 107422
To appear in:
Signal Processing
Received date: Revised date: Accepted date:
12 October 2019 3 December 2019 5 December 2019
Please cite this article as: Ping Wang , Fenlin Liu, Chunfang Yang , Towards Feature Representation for Steganalysis of Spatial Steganography, Signal Processing (2019), doi: https://doi.org/10.1016/j.sigpro.2019.107422
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.
Highlights • The effects of spatial steganography on image statistical distribution are analyzed. • Feature separability analysis is conducted for both spatial and frequency features. • Two kinds of typical spatial features are improved based on the feature separability analysis. • Features of different types are merged to improve the steganalysis performance.
1
Towards Feature Representation for Steganalysis of Spatial Steganography Ping WANGa , Fenlin LIUa,∗, Chunfang YANGa a Zhengzhou
Science and Technology Institute, Zhengzhou, 450001, China
Abstract Feature separability analysis supports the feature selection and reduction and also guides the feature construction. This paper addresses the feature representation for steganalysis of spatial steganography. The paper analyzes the effects of spatial steganography on image residual histogram, and gives the feature separability of histogram features represented in both spatial and frequential domains, which explains the effectiveness of the typical spatial features, such as SRM and TLBP features. Based on the feature separability analysis, TLBP feature is improved by using a new texture pattern mapping to replace the ’riu2’ mapping in the feature construction. Besides, features in the frequency domain are constructed by applying DFT to the submodels of SRM. Experimental results show that both the two schemes improved the steganalysis performances. Furthermore, features in spatial and frequency domains are merged, which obtains better results than the corresponding single type features for steganalysis of S-UNIWARD, HILL, and MiPOD. For example, using HILL for information embedding with payload 0.4 bpp on BOSSbase dataset, the detection accuracy of the best merged feature is 3.64% and 1.13% higher than the corresponding single type features. Keywords: Feature representation, Feature separability, Frequency features, Merged features, Spatial features, Steganalaysis 1. Introduction Feature representation has a great impact on the performances of steganalysis. The same data represented in different ways may lead to different classification results [1]. For decades, the feature construction and representation for steganalysis is of great interest to researchers in the field [2]. Spatial steganography is a special image operation, which embeds secret messages into cover images by altering the pixel values for covert communication. The mainstream steganography is based on the randomly ±K operations applied to pixels, which will change the image statistical distribution and frequency characteristic. As the image operation can be represented in both spatial and frequency domains and owns different properties [3], the trace of steganography may be revealed by analyzing the changes of statistical features in both spatial and frequency domains. Besides, the features represented in different domains may own different abilities for steganalysis. Therefore, it is significant to construct the features represented in the appropriate way to improve the steganalysis performances. In the modern staganalysis framework which is based on the feature construction and machine learning, the ensemble classifier [4] based on Fisher linear discrimination (FLD) [5] is often used as the classifier due to its capacity of dealing with high dimensional features. The research of steganalysis mainly focuses on feature construction and selection. Aiming to detect the spatial steganography, researchers have proposed many effective steganalytic techniques based on the high dimensional features, such as Subtractive Pixel Adjacency Matrix (SPAM) ∗ Corresponding
authors Email address:
[email protected] (Fenlin LIU)
Preprint submitted to Signal Processing
[6], Spatial Rich Model (SRM) series [7, 8, 9, 10], and local texture features [11, 12, 13, 14, 15]. The features are then fed to classifiers for steganalysis, such as Support Vector Machine (SVM) [16] and ensemble classifier [4]. In the feature construction, the changes in the relationship among the near pixels are taken into consideration. The high dimensional features carry a great deal of information about the steganography. Currently, remarkable steganalysis performance is achieved by the combination of the ensemble classifier and the high dimensional features based on the inherent coherence and relationship among near pixels. However, the feature construction is often motivated by the empirical knowledge of the researchers, and there lacks the theoretical work for guidance. Besides, there exists a large number of information redundancies over the high dimensional features, which lead to a small quantity of useful information per feature. When the ensemble classifier is used for steganalysis, if the redundancies are removed to enlarge the quantity of the useful information per feature, there will be more useful information of the features in the subspace with the fixed dimension for the base learner, and accordingly, the steganalysis performance will be improved. Based on the theory, feature reduction techniques [17, 18, 19, 20] are proposed to reduce the dimensions of steganalysis features. Usually, the features before and after dimension reduction by these methods are represented in the same domain. Besides, some methods directly discard the least significant features. Though these methods may enlarge the quantity of useful information for the base learner, the overall information about steganography is reduced. In recent years, deep learning networks, which use powerful computing resources to automatically learn features, have aroused great the interest of researchers in the field of computer vision. Deep learning networks can effectively learn the December 6, 2019
structural information of images by repeating convolution, activation, and pooling operations, which have succeeded in various tasks in the field of computer vision [1]. For steganalysis, researchers have also proposed some excellent steganalytic techniques based on deep learning networks [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]. Compared with the frameworks based on hand-crafted features, the works based on deep learning can learn the features effectively, and currently show the better performances for steganalysis. However, the deep learning networks take the pixel matrix as the input, and the convolutions, activation, and pooling are spatial operations. Namely, the learned features in the networks are all represented in the spatial domain. Besides, the hyper-parameters of the networks for training have a great impact on steganalysis performance. At present, one of the most trivial works of the research on deep learning is to search the appropriate hyper-parameters to assure the effectiveness of the designed network. There still lacks the theoretical research on the layers inside the network. This paper addresses the feature representation for steganalysis. Feature separability analysis is applied to the residual histogram features in both spatial and frequency domains, based on which new spatial features and frequency features are constructed for steganalysis. The feature separability analysis may also benefit the design of the deep learning networks if the knowledge of the steganography could be merged into the networks [23, 28]. The experiments of detecting S-UNIWARD [32], HILL [33], and MiPOD [34] are conducted, where ensemble classifier is used for classification. The main works of the paper are summarized as follows.
The rest of the paper is organized as follows. Section 2 analyzes the effects of spatial steganography on image distribution. Section 3 presents the measurement of spatial feature separability and the effectiveness of SRM and TLBP, based on which new spatial features are constructed by improving TLBP, while Section 4 gives the frequency feature separability and the feature redundancies in the frequency domain, based on which SRM is converted into frequency features. Section 5 conducts the experiments, including the contrast experiments between TLBP and new spatial features, SRM and the frequency features, and the single type features and the merged features. Besides, feature selection is also applied to the merged features. Finally, Section 6 concludes the paper and makes the discussion about future works. 2. Effects of Spatial Steganography This paper focuses on the steganalysis of the spatial steganography which embeds messages into images by changes the image pixel values. The messages embedding operations of many steganographic methods are based on ±K operations [35, 36, 37, 32, 33, 34, 38, 39]. When these methods are applied to the image, the one dimensional (1-D) histogram of the stego is X h s (x) = α x− j, j hc ( j) , (1) j
where hc ( j) is the 1-D histogram of the corresponding cover, P ai, j ≥ 0 is the transfer ratio, and αi, j = 1. It shows that a histogram bin may transfer part of its value to other histogram bins, while it also receives parts from other histogram bins. This leads to a strong connection among the histogram bins and makes the envelope of image histogram smoother. Therefore, the high-frequency coefficients of the image histogram frequency spectrum will be declined by information embedding. To facilitate the expression of the analysis, we assume that the steganography randomly increases or decreases the pixel value by 1, namely ±1, which applies to many steganographic methods. With respect to the embedding position, it is independent of the image content for LSB matching, while for adaptive steganography, the embedding position is controlled by the distortion function, which is related to the image content. Despite all this, we assume that the embedding position is independent of the pixel gray level, namely, the probabilities of changing each pixel gray level are independent of each other. Let fρ (x) represent the noise introduced to the 1-D image histogram by the steganography with payload ρ, and omit the bins at the two ends of the histogram, then
(1) The effects of spatial steganography on image statistical distribution and frequency characteristic are analyzed, and the measurement of histogram feature separability is represented in both spatial and frequency domains based on the Fisher criterion, which explains the effectiveness of the typical features, such as SRM [7] and Threshold Local Binary Pattern (TLBP) [15]. (2) Based on the spatial feature separability, TLBP is improved by replacing the ’riu2’ mapping by a new texture pattern mapping in the feature construction, and new features named as TLBP-RM are constructed based on the three-dimensional histogram of the texture patterns. The experimental results validate the superiority of TLBP-RM over TLBP. (3) Based on the frequency feature separability, frequency features named as FSRM are constructed by transforming the submodels of SRM into the frequency domain to remove the feature redundancies. The feature number of FSRM is around half of the number of SRM, while FSRM perform better than SRM on BOSSbase dataset and is comparable to SRM on BOWS2 dataset.
1 − ρ/2, i = 0; ρ/4, i = ±1; fρ (i) = 0, else.
(4) Features of different types in both spatial and frequency domains are merged for steganalysis, where the merged features obtain better performance than the corresponding single type features.
(2)
and ∀ j, fρ (i) = αi, j and fρ (±1) = α±1, j . Represent (1) as the form of convolution that
3
h s = hc ∗ hρ ,
(3)
where ∗ is the convolution operation. Applying discrete Fourier transformation (DFT) to both sides of (3), we have H s (k) = Hc (k) Fρ (k) ,
Fρ (ki ) ≤ 1, it is indicated that with the higher residual order and the higher histogram dimension, the ratio of the stego residual histogram DFT coefficient to the cover residual histogram DFT coefficient is smaller, and the difference between the residual histograms of the stego and the cover is relatively larger. For (m) instance, the ratio of D(m) s (k) to Dc (k) reaches the minimum m (m) at k = 0.5[N, N, . . . , N] that Fρ (k) = (1 − ρ)n·2 , which is monotone decreasing on m or n. However, with respect to the residual order m, it is not the higher the better. For explanation, we divide the image residual into three individual parts, namely, the image content, the natural noise, and the stego noise. In this point of view, the residual histogram d(m) also consists of the corresponding three compo(m) (m) (m) nents dic(m) , dnn , and d(m) = dic(m) +dnn +d(m) sn that d sn . The image residual could be regarded as the output of image high-pass filtering, where the image content is suppressed and the noises are prominent. Therefore, when the residual order is relatively low, the proportion of the stego noise increases with the order m increasing. While the high order residual involves a large neighborhood and the mean of the stego noise (the ±1 noise) approximates zero, taking the linear sum of too many samples would suppress the stego noise. As steganalysis aims to capture the stego noise in the constructed features, the residual order m should not be too high. As to the histogram dimension n, according to (12), it is the higher the better. However, a higher histogram dimension means a larger number of steganalysis features. In the case, the performance of steganalysis would be constrained by the ability of the classifier to deal with the high dimensional features. Therefore, the choice of the histogram dimension depends on the selected classifier. Besides, it needs much computer resource to extract a large number of features.
(4)
where H s (K), Hc (k), and Fρ (k) are DFTs of h s (x), hc (x), and fρ (x) respectively, and Fρ [k] = 1 − ρ(1 − cos2 (πk/N)). Let hc (x) and h s (x) be the n-D image histogram before and after the steganography with payload ρ, and their DFTs are Hc (k) and H s (k), where x = [x1 , x2 , . . . , xn ]T ∈ Rn and k = [k1 , k2 , . . . , kn ]T ∈ Rn . Then we have H s (k) = Hc (k)Fρ (k),
(5)
where Fρ (k) is the DFT of the steganographic noise fρ (x) on the n-D image histogram. According to the assumption that the probabilities of changing each pixel gray level is independent n Q of each other, we get Fρ (k) = Fρ (ki ). Then i=1
H s (k) = Hc (k)
n Y
Fρ (ki ).
(6)
i=1
The image residual is the difference of the pixel and its predictor. For example, in the first order residual ri(1) j = xi j − xi, j−1 , the central pixel xi j is predicted as xˆi j = xi, j−1 , and in the second order residual ri(2) j = xi, j−1 + xi, j+1 − 2xi j , the central pixel xi j is predicted as xˆi j = 0.5(xi, j−1 + xi, j+1 ). Let d(m) (x) be the 1-D histogram of the m-th order residual. Then (m) (m) d(m) s = dc ∗ fρ ,
(7)
where fρ(m) = ∗(2 ) ( fρ ), which represents the convolution result of 2m fρ . Then the DFT of d(m) s (x) is m
(m) (m) D(m) s (k) = Dc (k)F ρ (k),
3. Histogram Spatial Features
(8)
As the spatial steganography changes the statistical distribution of image pixels, the high dimensional features based on the image residual histograms are expected to be capable of capturing the trace of the steganography, and the ensemble classifier [4] is one of the best choices for steganalysis due to its capability of dealing with high dimensional features. Currently, the most popular base learner for ensemble classifier is FLD. Therefore, the Fisher score of the high dimensional features is the appropriate measurement of the feature separability. Fisher linear discrimination (FLD) [5] is proposed by R. A. Fisher in 1936, where the Fisher score is designed to measure the intra-class and inter-class distances. The Fisher score of a single feature v is defined as
where m
m
Fρ(m) (k) = (Fρ (k))2 = (1 − ρ(1 − cos2 (πk/N)))2 .
(9)
Let d(m) (x) denote the n-D histogram of the m-th order residual. Then X d(m) (αx−j,j dc(m) (j)), (10) s (x) = j
P
where αij ≥ 0 and αij = 1, namely, the stego noise fρ(m) (x) is the symmetric real number matrix. As αij is irrelevant to j, it is denoted as αi for short. The DFT of d(m) s (x) is (m) (m) D(m) s (k) = Dc (k)F ρ (k),
(11)
where Fρ(m) (k) =
FS core(v) = n Y i=1
Fρ(m) (ki ) =
n Y i=1
(¯vc − v¯ s )2 , σ2 (vc ) + σ2 (v s )
(13)
m
(1 − ρ(1 − cos2 (πki /N)))2 . (12)
where v¯ c and v¯ s are the means of the feature v among covers and stegos respectively, and σ2 (vc ) and σ2 (v s ) are the corresponding variances. If the inter-class distance |¯vc − v¯ s | is large, and the intra-class distances σ2 (vc ) and σ2 (v s ) are small, then the Fisher score of the feature v is high, and the feature is highly separable.
Equations (11) and (12) imply the effects of the residual order m and the histogram dimension n on the ability of the histogram features to capture the trace of the steganography. As 4
where ζ is a factor only related to the payload, τ is the value of elements in the main diagonal of M−1 , and M−2 = (χ − γ) E + γU, (χ > 0, γ < 0). The detailed derivation of (18) is presented in Appendix B. Define
3.1. Separability Analysis of Spatial Features The Fisher score of the single spatial feature d(x) is FS core(d(x)) =
(µc (x) − µ s (x))2 , σ2 (dc (x)) + σ2 (d s (x))
(14)
∆
φ (x) = (χ − γ) xT x + γxT uuT x − 2nτ !2 ! n n P P xi2 + γ xi − nτ. = (χ − γ)
where x ∈ Rn is the residual vector and µ(x) is the mean of d(x). Image residuals are the results of high-pass filtering, where image noises are prominent. Assume the residual vector x is subject to multivariate generalized Gaussian distribution (MGGD) [40] that " # βΓ 2n 1 T −1 β ( µ x| M, β) = n exp − x M x , (15) n 1 2 π 2 |M| 2 2 2β Γ n
i
i
According to (14) and (18), if x is far away from the ori(m) gin, then the value of µ (x) are small, and the Fisher score FS core d(m) (x) is low. Therefore, the separable features with high Fisher scores locate near the origin.
2β
where β is the shape parameter, and M is the covariance matrix of components of x, namely, M is a n × n symmetric real scatter matrix. When β = 1, (15) refers to multivariate Gaussian distribution (MGD). For most images, β = 0.8 is suggested as a P good choice [41]. According to (10), µ s (x) = (αi · µc (x − i)).
10-3
-10
10-3 9
-10
8
7 6
-5
7
-5
6
5
5
i
4
0
As the values of µc (x) far away from the origin are very small, the special cases for µc (x) at the border are omitted in (15), and then we have P µc (x) − µ s (x) = 21 (αi · ((µc (x − i) − µc (x)) i (16) − (µc (x) − µc (x + i)))) .
0 4
3 3
5
2
5
2
1
10 -10
-5
0
5
10
0
(a)
According to (14) and (16), it can be seen that the Fisher score of the feature d(x) R is related to the Rsecond order difference of µc (x). Let Ψ = i αi gi gTi di and κ = i αi gTi M−1 gi di, where gi is the unit vector in the direction of i. It is worth noting that Ψ and κ are only related to the payload but unrelated to x. Now we replace the second order difference in (16) by the second order directional derivative of µ(x), and turn the sum into the integral. Then we get 2β−2 xT M−1 ΨM−1 x µ˜ c (x) − µ˜ s (x) = µc2(x) β2 xT M−1 x β−2 −µc (x) β (β − 1) xT M−1 x xT M−1 ΨM−1 x (17) β−1 − µc2(x) β xT M−1 x κ.
The detailed derivation of (17) is presented in Appendix A. It indicates that the feature separability is related to the parameter β, the covariance matrix M, and the residual vector x. However, it is hard to model the image distribution in practice. Even though the model parameters can be estimated though a large number of training samples, the obtained model may mismatch either the single image in the training set or the unknown images due to the variety among images. The well defined model on the training set may suffer a lot from cover source mismatching in practice. Nevertheless, the feature construction could be motivated from (17). Consider the MGD that β = 1, and assume the covariance of any two components is the same as q and the variance of each component as p, namely M = (p − q)E + qU, where E is the identity matrix and U = uuT , u = [1, 1, . . . , 1]T . Then we have ζ µ˜ c (x) − µ˜ s (x) = µc (x) (χ − γ) xT x + γxT uuT x − nτ , (18) 2
(19)
1
10 -10
-5
0
5
10
0
(b)
Figure 1: Fisher values of 2-D histogram spatial features of second order residuals. (a) Theoretical values in the assumption of MGD where p = 3.8569, q = 1.4106, and ζ = 2. (b) Experimental values in the case that S-UNIWARD with payload ρ = 0.4 is applied to 5,000 images.
We randomly select 5,000 images from BOSSbase1 dataset and obtain the variance and covariance of the second order residuals as p = 3.8569 and q = 1.4106 to calculate the covariance matrix M. In the assumption that both the residual vector and the feature variance σ2 (d(x)) are subjected to MGD, the theoretical values of 2-D histogram spatial features of the second residuals are calculated according to (13), as shown in Figure 1(a), where ζ = 2 as it is only related to the payload. For comparison, S-UNIWARD with payload ρ = 0.4 is applied to the selected images. Then 2-D histogram spatial features of the second order residuals are extracted from both covers and stegos to calculate the Fisher values, as shown in Figure 1(b). It implies that although the MGD model may not match the image distribution exactly, the theoretical results coincide with the experiments. For example, the features with high scores in both cases are near the origin. Besides, there are also some ineffective features near the origin due to the factor φ (x). When φ (x) ≈ 0, then the corresponding feature score is low. In theory, with the knowledge of the covariance matrix M, the positions of local minimum Fisher scores could be obtained by calculating the values of ζ, χ, γ, and τ when the payload ρ is known. In this case, we can construct effective features while getting rid of features with low Fisher scores. 1 http://agents.fel.cvut.cz/stegodata/
5
However, the image statistical distribution is hard to model in practice. Nevertheless, the above analysis may help us construct effective features. It can be seen from Figure 1 that in the assumption of MGD, we could obtain the approximate values of Fisher scores of histogram spatial features. Next, the analysis of two typical histogram spatial features, namely, SRM [7] and TLBP [15], are presented, followed by the feature construction of the improved TLBP features.
features. To further increase the feature separability, the knowledge of the selected channel in the steganography is merged into feature construction [8, 9, 10]. However, the scheme only applies to the cases where the knowledge of steganography is available to steganalyzers, which is too strict for the steganalysis in most cases in the real world where the specific steganographic technique and payload are unknown. Compared with the traditional blind steganalysis features, the application of the selection-channel-aware ones is limited. Therefore, it is significant to improve the performance of traditional blind steganalysis features.
3.2. SRM Features The submodels of SRM are residual histograms (cooccurrence matrices) that d (x) = |{r|r = x}| , T
3.3. TLBP Features
(20)
In the TLBP scheme, ten linear residuals, six ’max’ residuals, and six ’min’ residuals are used for feature construction. Each type of residuals is quantized, and threshold local binary pattern (TLBP) operator is applied to the unquantized and quantized residual images ( 0, |y| < T ; b = ` (|y| ≥ T ) = (23) 1, |y| ≥ T ,
T
where x = [x1 , x2 , x3 , x4 ] and r = [r1 , r2 , r3 , r4 ] . The features in the submodels are constructed through quantization, truncation, histogram calculation and desymmetrization, and feature mergence operations. To increase the feature variety, the features of the type ’minmax’ are added into the feature set. The ’minmax’ submodels take the minimum and maximum of at least two linearly filtered residuals to generate the residuals of the type ’min’ and ’max’ respectively. In essence, the submodels of SRM are 4-D histograms of various order residuals. In the feature construction, the quantization narrows the range of the residuals, and consequently reduces the variance and covariance of the residual x. Thus, the effective features locate near the origin. Besides, the truncation merges the features far away from the origin, which not only reduces the feature dimension, but makes the features more separable. For example, according to (18), µc (x) − µ s (x) > 0 for x that far away from the origin, P and thus µc (x) − µ s (x) > 0, which means the differ-
where T is a predefined threshold and y is the difference between the central residual x and its neighbor. It is worth mentioning that y is also the residual. For each x, eight neighbors are taken to calculate its TLBP, and six kinds of the neighborhoods are utilized. Then the patterns are reduced by ’riu2’ mapping [42]. Two 2-D histograms are calculated in the two direction pairs from each TLBP images, and the non-linear mapping is finally applied to features after the desymmetrization operation. In fact, TLBP features are histogram spatial features. The 8bit binary pattern b = [b1 , b2 , . . . , b8 ] is the residual vector consisting of eight components. The 2-D histogram of the TLBP image is
xi >T,1≤i≤4
ence of the feature value between cover and stego at the border of truncated range is enlarged. Therefore, the corresponding feature separability is improved. The truncation threshold is set as T = 2 in the construction of SRM to get rid of the high feature dimension. However, according to (19), when x ∈ [−T, T ]4 , the value of φ (x) may also be very small, and the corresponding feature is ineffective. Furthermore, after the quantization operation ! x [x]q = round , (21) q
|{[b 1 , b2 ] |ϕ (b1 ) = k, ϕ (b2 ) = l }| y1,1 , y1,2 , . . . , y1,8 , y2,1 , y2,2 , . . . , y2,8 o |` yi, j ≥ T = bi, j , ϕ (b1 ) = k, ϕ (b2 ) = l , (24) where ϕ(b1 ) is the ’riu2’ mapping. In this form, it is clear that TLBP features are the 16-D histogram features of the image residual. The binarization and texture pattern mapping in the feature construction contribute to the feature aggregation, which reduces the feature dimension and improves the feature separability. For the binarization, if the threshold is well predefined that µc (T ) − µ s (T ) = 0 for the 1-D histogram features, then ( µc (y) − µ s (y) < 0, |y| < T ; (25) µc (y) − µ s (y) > 0, |y| > T . c (k, l) = =
the feature is d [x]q =
X
d (x).
(22)
q([xi ]q −0.5)≤xi
The addends in the right side may be changed in different directions. For the d(x) closed to the origin, µc (x) − µ s (x) < 0, while for the further one, µc (x)−µ s (x) > 0. In this case, the difference of the sum between the cover and the stego may approximate to zero that µc ([x]q ) − µ s ([x]q ) ≈ 0, which results in ineffective features. In summary, although in the feature construction of SRM some skills are applied to reduce the feature dimension and improve the feature separability, there still are some ineffective 6
Therefore, the single feature separability can reach the maximum by merging the features of |y| < T and |y| ≥ T respectively. However, for the multi-dimensional histogram features, merging the features of y ≤ T and y ≥ T may result in ineffective features. hFor example,iin the hfeature construction of TLBP, the i patterns . . . , yi , y j , . . . and . . . , yi , −y j , . . . are counted into
the same histogram bin. If |yi | ≥ T and y j ≥ T , then according to (18), the Fisher score of the corresponding feature may be small. Besides, ’riu2’ mapping takes the patterns where transition of 0 and 1 bits emerges at least three times into a single pattern. As the different patterns before ’riu2’ mapping may correspond to different signs in (18), the feature separability corresponding to the merged pattern may be reduced.
h (u, v, m) =
i, j
(29) Finally, the non-linear mapping is applied to constructed features. Thus, the feature number is 22×2×6×(5×5×(5+1)/2) = 19800 in total. For brevity, the new feature set is named as TLBP-RM, where ’RM’ is the abbreviation of ’ReMapping’, which means that the ’riu2’ mapping is replaced by a new texture pattern mapping in TLBP feature construction. Figure 2 shows the frameworks of the feature construction of SRM, TLBP, and TLBP-RM for comparison.
3.4. Improving Spatial Feature Construction As mentioned above, although the existing histogram spatial features show good performances in steganalysis with the ensemble classifier, the features separability could be further improved. Based on the feature separability analysis, we propose new histogram spatial features by improving the feature construction of the existing histogram spatial features. It is indicated in (19) that the feature separability is related to both the sum and the square sum of the components of the residual vecP P P tor x, namely, xi and xi2 . As xi2 widens the range, while P P 2 xi suppresses xi , we take the sum of the absolute values P of the components into account for compromise, namely, |xi |. With the consideration, we replace the ’riu2’ mapping in TLBP feature construction with the sum of the eight bins in b. Moreover, trying to separate the features with the positive values in (18) from the negative ones and merge the features with the same signs, we further reduce the texture patterns. For example, in Figure 1, the features with high Fisher scores near the origin and ones far from the origin are to be merged respectively and separated by the transition region. In this way, the new texture pattern mapping is p=
0, 1, 2, 3, 4,
P
i P i P
i P
i P i
4. Histogram Frequency Features In machine learning, due to the limit capability of the classifier, the features should be represented in the appropriate form for the best performance [1]. For illustration, there are two classes of data to be classified, whose features are represented in Cartesian coordinates as shown in Figure 3(a). In this feature representation, it is hard to classify the two classes of data using a linear classifier. While representing the features in polar coordinates as shown in Figure 3(b), it is very easy to complete the classification task using a line. Currently, the most popular classifier to deal with high dimensional features for steganalysis is the ensemble classifier based on FLD, namely, the base learner is the linear classifier. According to the effects of steganography on residual histograms, the stego noises are added into histogram by convolution operation in the spatial domain, while by multiplication in the frequency domain where the changes of the frequency coefficients are independent of each other. In the early studies, researchers have proposed frequency features based on the center of mass of the histogram characteristic function for the steganalysis of LSB matching [43, 44]. However, using a few features to represent the whole histogram spectrum reduces the information about the steganography. Next, the separability analysis of histogram frequency features is presented and a scheme is proposed to represent SRM in the frequency domain.
bi = 0; bi ∈ {1, 2};
bi ∈ {3, 4, 5};
(26)
bi ∈ {6, 7}; bi = 8,
where p = 2 corresponds to the transition region. Note that in P P P the TLBP feature construction bi , |bi |, and b2i are equivalent as bi ∈ {0, 1}. To capture the effects of spatial steganography on residual histograms while restricting the number of features, the 3-D histograms of the patterns are formed along horizontal, vertical, diagonal, and anti-diagonal directions and then merged as ! ( u (u + 1) cuvm , u = m; +m+1 = f 15v + c + c , u > m, 2 uvm mvu
4.1. Separability Analysis of Frequency Features Since the steganographic noise to image histogram in the spatial domain is the real symmetric matrix, the corresponding frequency coefficient is also the real symmetric matrix. According to (12), Fρ(m) (k) is always the nonnegative real number, namely, the steganography has no effects on the histogram phases. Thus, (11) could be represented as (m) iθ(k) iθ(k) D s (k) e = Fρ(m) (k) D(m) . (30) c (k) e
(27)
where u, v, m ∈ {0, 1, 2, 3, 4}, and h (u, v, m) cuvm = P , h (i, j, k)
P δ pi j − u, pi, j+1 − v, pi, j+2 − m i, j P + δ pi j − u, pi+1, j − v, pi+1, j − m i, j P + δ pi j − u, pi+1, j+1 − v, pi+2, j+2 − m i, j P + δ pi j − u, pi−1, j+1 − v, pi−2, j+2 − m .
Then
(28)
i, j,k
7
(m) D s (k) = Fρ(m) (k) D(m) c (k) .
(31)
SRM scheme Image
34671-D High-pass filtering
Residual images
Quantization and Truncation
Derivative filtering
Residual images
Quantization and Truncation
Integer residual image
4-D histogram calculation
Feature aggregation
SRM
TLBP scheme Image
29040-D
TLBP calculation
TLBP images
'riu2' mapping
Coded images
2-D histogram calculation
Aggregation and non-linear mapping
New texture pattern mapping
Coded images
3-D histogram calculation
Aggregation and non-linear mapping
TLBP
19800-D TLBP-RM
Figure 2: Frameworks of the feature construction of SRM, TLBP, and TLBP-RM.
(a)
(m) if k = 0, then Fρ(m) (k) = 1, namely, D(m) c (k) = D s (k) . Thus define FS core D(m) (0) = 0. When k aprroaches to 0.5[N, N, ..., N]T from the origin, the value of τ f (k) increases. As to τm (k) and τσ (k), when k nears the origin, their values depend on the image content dic(m) , while when k is far away from (m) the origin, they depends on the natural noise dnn . With the large (m) value of the residual order, the proportion of dnn is high and it has the great effect on the feature separability. In other words, with the large value of the residual, if k is far away from the origin, then τm (k) is small and τσ (k) is relatively large, resulting in the ineffective features. We use the 10,000 images from BOSSBase dataset to investigate the changes of Fisher scores with the residual values. LSB matching with payload ρ = 1 is applied for information embedding. Figure 4 shows the Fisher scores of 1-D histogram features of first order residuals along the horizontal direction in spatial and frequency domains, where the residual values are limited to [−29, 29], and the feature number is 59. It shows that as k departs from the origin, the Fisher score of the frequency feature increases in the beginning while decreases in the end. In the beginning, the factor τ f (k) of the stego noise plays the dominant role. When k is larger than 19, the increase of τ f (k) slows down, while τm (k) is still decreasing, thus FS core D(1) (k) starts to decrease. In summary, the frequency features with high Fisher scores correspond to the middle frequency coefficients. As the stego noise matrix of the steganography based on randomly ±K operations is axisymmetric in the spatial domain, the conclusions obtained for K = 1 also applies to the steganography with any value of K, namely, the stego noise matrix in the frequency domain is also axisymmetric.
(b)
Figure 3: Feature representation of two classes of data. (a) In Cartesian coordinates. (b) In polar coordinates.
It can be seen that the histogram spectrum carries the same information about the steganography as the histogram frequency coefficients. Therefore, the frequency features could be constructed from the histogram spectrum to get rid of the complex operation. Besides, as Fρ(m) (k) is axisymmetric, namely, h iT ˜ ˜ Fρ(m) (k) = Fρ(m) k j where k = k1 , . . . , k j , ..., kn and k j = h iT k1 , . . . , −k j , ..., kn , then features corresponding to Fρ(m) (k) on the positive axes contain the same information about the steganography as the whole spectrum. According to (13), the Fisher score of a single frequency feature D(m) (k) is FS core D(m) (k)
=
= =
!2 D(m) (m) c (k) − D s (k) 2 (m) σ2 D(m) c (k) +σ D s (k) !2 2 D(m) c (k) 1−Fρ(m) (k) 2 · (m) (m) σ2 Dc (k) 1+ Fρ (k) τ f (k) · τ2m (k) · τ−1 σ (k) ,
(32)
4.2. Representing SRM in The Frequency Domain
where k , 0 as the direct component is ineffective, τ f (k) = 2 1−Fρ(m) (k) , τ (k) = D(m) (k) , and τ (k) = σ2 D(m) (k) . 2 1+ Fρ(m) (k)
m
c
σ
It indicates that the frequency features could be obtained by applying DFT to the histogram spatial feature according to (31). Besides, as the stego noise in the frequency domain is axisymmetric, the feature number could be reduced without the loss of the information about the steganography, and the quantity of the useful information per feature is consequently improved. For the ensemble classifier, the base learner could get more use-
c
It shows that the Fisher values of feature D(m) (k) is related to three factors that the stego noise τ f (k), the mean of the cover feature τm (k), and its variance τσ (k), where τ f (k) is monotone decreasing on Fρ(m) (k) in [0, 1]. According to (12), 8
on each axis. Thus, the image noise and the limit sampling points make it hard to exploit the spectrum characteristics of the 4-D histogram. Besides, due to the feature aggregation operation in SRM, the structure of 4-D histograms are destroyed. Therefore, applying 4-D DFT to SRM may not improve steganalysis performance. Note that DFT on multivariate signals could be implemented by applying DFT on each variate sequentially. Heuristically, the effect of the spatial steganography on the image histogram could be revealed along each axis. Furthermore, by reshaping the 4-D image histogram to 1-D signal in the axis-wise order, the relationship introduced by the share of histogram bins value with the neighboring bins along the specific direction emerges periodically in the reshaped signal, which could be captured by 1-D DFT. Thus, the 1-D DFT instead of the 4-D DFT is applied to the spatial features. The submodels of SRM are represented as 1-D signals, and the spectrum coefficients corresponding to the positive axis are taken as the frequency features for steganalysis. In this way, the number of constructed frequency features FSRM is 17,291, around half of the number of SRM. Given an investigated image, the detailed steps of FSRM construction are as follow.
1
0.5
0 -30
-20
-10
-20
-10
-3
0
3
10
20
30
10
20
30
1
0.5
0 -30
0
Figure 4: Fisher scores of 1-D histogram features of the first order residuals in spatial and frequency domains (top) and three factors of the Fisher scores of histogram frequency features (bottom).
ful information with the fixed subspace, thus improving the steganalysis performance. However, as the separable frequency features locate in the middle frequency region, constructing features directly in the frequency domain leads to a large number of temporary features. Although the final constructed features could be reduced by some skills, the computation of the whole spectrum will consume huge computing resources. To validate the effectiveness of frequency features, we transform the existing histogram spatial features into frequency features. Both SRM and TLBP are effective features, which use many skills for the feature aggregation, such as the quantization, truncation, and desymmetrization operations. These operations may change the spectrum characteristics. Moreover, the residual binarization and texture pattern mapping are applied in TLBP, which have a great impact on the spectrum characteristics. Therefore, only SRM is to be represented in the frequency domain. Figure 5 shows the framework of the frequency feature construction by applying DFT to submodels of SRM. The constructed frequency features are named as FSRM for short, where ’F’ refers to the frequency domain.
(1) Extract SRM features. (2) Reshape each submodel ci of SRM as the 1-D signal ci (:). (3) Apply 1-D DFT to ci (:) and take the modulus of the frequency coefficients Ci = |F (ci (:))| .
(4) Take the spectrum cofficients Ci on the positive axis as the feature subset C¯ i . (5) Merge all the feature subset as the frequency features FSRM. It is worth noting that the desymmetrization enlarges the stability of histogram distribution, and simultaneously reduces the feature number, which is different from the effects of DFT. The desymmetrization operation is based on the symmetry of the residual distribution, while the redundancy reduction in the frequency domain is based on the symmetry of the stego noise and the property of the DFT on the real symmetric matrix. Therefore, both the symmetrization and DFT could be applied in feature construction for feature reduction. Besides, the desymmetrization could be applied after DFT due to the linearity of DFT, that is, F (x + y) = F (x) + F (y). This conclusion may contribute to the feature construction in the frequency domain in future work.
Representing SRM in the frequency domain Extract SRM features
Represent submodels as 1-D signals
Take the modulus of the frequency coefficients
Remove duplicate features and merge the subset features 17291-D
SRM
ci(:)
Ci
(33)
FSRM
Figure 5: Feature construction of the freqeuncy features FSRM.
5. Experimental Results
In essence, the submodels of SRM are 4-D residual histograms of various orders. In theory, effective frequency features could be constructed by applying 4-D DFT to the 4-D histogram spatial features. However, as SRM truncates residual values, the histogram bins are confined to the range of [−T, T ]4 . SRM takes T = 2, which means that there are only five points
5.1. Experimental Setup Unless otherwise specified, the images from BOSSbase dataset are used to test the feature performance for steganalysis. BOSSbase dataset contains 10,000 gray images with size 9
512 × 512 coming from rescaled and cropped natural images of various sizes. Three adaptive steganographic techniques, S-UNIWARD [32], HILL [33], and MiPOD [34], are applied to generate stegos with payload ρ ∈ {0.1, 0.2, 0.3, 0.4, 0.5}. Therefore, there are 160,000 images in total. Then features are extracted from each image, and the features of each class of images are randomly divided into two equivalent parts as the training set and the testing set respectively. The ensemble classifier [4] is adopted for steganalysis. We evaluate the feature performance using the detection error on the testing set as PE = (PFA + P MD )/2, where PFA and P MD are the probabilities of the false alarm and the missed detection. For each class of images, we test ten times and take the average as the final result.
5.3. Performances of Histogram Frequency Features The steganalysis performance of FSRM is tested in this subsection. Figure 7 shows the detection errors, where the results of SRM are for comparison. It could be seen that transforming SRM into FSRM improves the feature performance. For example, when ρ = 0.4, the detection errors of FSRM for steganalysis of S-UNIWARD, HILL, and MiPOD are reduced by 1.48%, 1.42%, and 1.03% respectively. The results validate the conclusion that representing the features in the appropriate form help improve the steganalysis performance. As applying DFT to SRM cannot increase the information about the steganography in the feature set, the performance improvement may arise from the increase of the quantity of the useful information per feature and the ability of the ensemble classifier to deal with the frequency features. FSRM reduces the feature redundancy by removing nearly half number of features. Thus, the performance of the base learner with fixed subspace is improved, and consequently, the overall performance is improved.
5.2. Performances of Histogram Spatial Features Histogram spatial features reflect the effects of the steganography on the residual statistical distribution. To increase the feature diversity and reduce the feature number, skills are used for the feature aggregation. Different types of features use different groups of operations, lead to different steganalysis performances. In this subsection, the performance of TLPB-RM is tested and compared with TLBP. Figure 6 gives the detection errors of the two types of features. The experimental results show that for the steganalysis of the three steganographic techniques with five payloads, the performances of TLBP-RM are better than TLBP. TLBP-RM uses a new texture pattern mapping based on (26), which merges the features corresponding to the same signs in (19) and reduces the texture patterns and the feature number. Besides, TLBP-RM features are constructed on 3-D histograms, which are better to capture the effects of steganography on the residual joint statistical distribution. Moreover, TLBP-RM merges the features of 3-D histograms formed along horizontal, vertical, diagonal, and anti-diagonal directions. Compared with TLBP, TLBP-RM has fewer features but better performances. This supports merging features of different types, which will be presented later.
0.5
0.4
0.3
0.2
0.1 0.1
0.3
0.2
0.1 0.2
0.3
0.4
0.3
0.4
0.5
However, the performance improvement is not obtained when applying DFT to TLBP features. The effectiveness of frequency features comes from the cyclical influence of the steganography on histograms in the spatial domain. While in the construction of TLBP, strong operations are used for the feature aggregation, such as the binarization and the texture pattern mapping, which seriously destroy the cyclicity. Therefore, transforming TLBP into frequency features may loss much information about the steganography. Due to the similar reason, the scheme could not be applied directly to the selectionchannel-aware versions of SRM to improve their steganalysis performance. For example, when detecting S-UNIWARD with 0.4 bpp on BOSSbase dataset, by applying DFT to the submodels of maxSRM the detection error of the transformed maxSRM features increases by 0.76%. As in the maxSRM construction, the embedding probability is integrated into the calculation of histogram, the symmetry of the stego noise on the weighted histogram changes and the DFT coefficients of the stego noise are no longer symmetric. Thus, taking the module of the DFTs
S-UNIWARD:TLBP S-UNIWARD:TLBP-RM HILL:TLBP HILL:TLBP-RM MiPOD:TLBP MiPOD:TLBP-RM
0.1
0.2
Figure 7: Detection errors of SRM and FSRM.
0.5
0.4
S-UNIWARD:SRM S-UNIWARD:FSRM HILL:SRM HILL:FSRM MiPOD:SRM MiPOD:FSRM
0.5
Figure 6: Detection errors of TLBP and TLBP-RM.
10
is the least among all the merged features, which is only a little larger than the feature number of SRM. As to FSRM+SRM, it can be seen that the performance is better than both FSRM and SRM. However, FSRM comes from SRM and the quantity of the useful information in FSRM is not increased compared to SRM. This means that the useful information in the merged features FSRM+SRM is the same that in SRM. Therefore, the performance improvement indicates that merging FSRM and SRM increase the feature diversity. In this point of view, the frequency features and spatial features are complementary, even though they refer to the same data. When two types of spatial features are merged, SRM+TLBPRM is the best scheme. Besides, it also obtains the best overall performance among all the tested features. It is impressive because SRM has the worst overall performance and its feature number is the largest among the single type features, while merging SRM and TLBP-RM makes the biggest improvement. For example, for the steganalysis of HILL with payload ρ = 0.4, the detection errors of SRM and TLBP-RM are 24.88% and 22.37% respectively, and the detection error of SRM+TLPBRM is 21.24%, which is 3.64% lower than SRM and 1.13% lower than TLBP-RM. The results validate that increasing the feature diversity improves the performance of the ensemble classifier. Though both SRM and TLBP-RM come from the spatial domain, they are quite different as different groups of operations are used in the feature construction of these two types of features. Note that SRM+TLBP also has good performance due to its feature diversity. Moreover, the comparison between SRM+TLBP and SRM+TLBP-RM further validates the superiority of TLBP-RM over TLBP. With respect to TLBP+TLBPRM, the performance is comparable to TLBP-RM. As the feature construction of TLBP-RM is similar to TLBP, merging TLBP and TLBP-RM does not increase the feature diversity though the merged feature number is increased. Thus the performance is not improved. It is worth mentioning that although FSRM obtains better performance than SRM, FSRM+TLBP-RM is slightly inferior to SRM+TLBP-RM, especially for the staganalysis of MiPOD. This may be caused by the diversity of the feature subspaces. the number of FSRM+TLBP-RM features is 17,380 fewer than SRM+TLBP-RM. When FSRM+TLBP-RM features are fed to the ensemble classifier, the feature subspace of the trained base learner is smaller than that when SRM+TLBP-RM features are used. For example, when MiPOD with payload ρ = 0.4 is adopted for information embedding, the averages of the best subspace and base learner number over ten times of training are 1,640 and 99.8 respectively when FSRM+TLBP-RM is used, while the ones are 1,820 and 100.2 when SRM+TLBP-RM is used. As the base learner numbers in the two cases are close to each other, it can be regarded that the extra 17,380 features of SRM+TLBP-RM are portioned equally among the base learners. On one hand, enlarging the feature subspace improves the performance of the base learner. On the other hand, increasing the feature number while remaining the diversity between the base learners makes the voting strategy work well. Thus, the performance of the ensemble classifier improves. To evaluate the transferability of the proposed method across
Table 1: Merging schemes and the number of the merged features.
Merging scheme FSRM+SRM FSRM+TLBP FSRM+TLBP-RM SRM+TLBP SRM+TLBP-RM TLBP+TLBP-RM
Feature number 51,962 46,331 37,091 63,711 54,471 48,840
of maxSRM will loss information about the steganography, resulting in the decrease of the steganalysis performance. Nevertheless, the superiority of FSRM over SRM indicates that it is feasible to construct effective features in the frequency domain. 5.4. Performances of Merged Features Increasing the feature diversity could improve the performance of the ensemble classifier based on the voting strategy [7]. The idea of merging features of different types for steganalysis is firstly proposed in [45], where Markov and DCT features are merged for multi-class JPEG steganalysis. In [46], the DCT features CC-JRM and spatial features SRMQ1 are merged into J+SRM for the JPEG steganalysis, where the merged features obtain higher detection accuracies than the single type features CC-JRM for the steganalysis of various JPEG steganographic techniques. For the steganalysis of spatial steganography, the most effective features are constructed in the spatial domain. In [15], two types of spatial features are merged for steganalysis, which improves the performance. Merging different types of features in spatial and frequency domains may further improve the steganalysis performance. In this subsection, we merge two types of features into the merged features. Table 1 shows the six merging schemes and in the first three schemes, the frequency features are combined with the spatial features, while in the rest three schemes two types of spatial features are merged. Table 2 gives the detection errors of the merged features. For comparison, the results of the four single type features are also presented. For each steganographic technique and each payload, the best result is highlighted in the box. Overall, the performances of the merged features are better than that of the single type features. As the ensemble classifier is used for steganalysis, the increase of the feature diversity in the merged features contributes to the performance improvement when the base learners are more dissimilar to each other. Among the single types of features, TLBP-RM always obtains the best performance. As to TLBP, it is superior to SRM and FSRM except for that when S-UNIWARD with payload ρ ∈ {0.1, 0.2} is adopted for information embedding, the performance of TLBP is poorer than SRM and FSRM while using the new texture pattern mapping in the feature construction improves the performance which is better than SRM and FSRM. When frequency features and spatial features are merged, the scheme of FSRM+TLBP-RM is superior to the other two schemes. Meanwhile, the feature number of FSRM+TLBP-RM 11
different datasets, we have also conducted the experiments on BOWS22 dataset. The dataset contains 10,000 grayscale images with size 512 × 512, which are never compressed. The three steganographic techniques S-UNIWARD, HILL, and MiPOD, with payload 0.4 bpp and 0.2 bpp are used for information embedding. Table 3 presents the detection errors of the four single type features and the six kinds of merged features. The similar conclusions can be drawn from the results on BOWS2 and BOSSbase that SRM+TLBP-RM obtains the best overall performance among all the features and TLBP-RM is superior to other single type features. However, the results on BOWS2 also differ in some aspects from that on BOSSbase. First, the performances of FSRM and SRM are comparable to each other. Specifically, for the detection of S-UNIWARD, FSRM is slightly inferior to SRM, while for the detection of HILL and MiPOD, FSRM is slightly superior to SRM. As the FSRM feature number is nearly half of the SRM feature number, it seems that FSRM owns the superiority over SRM. However, when merged with other type features, such as TLBP and TLBP-RM, the merged features containing SRM obtain better results due to the feature diversity as we discussed previously. Moreover, SRM and FSRM are more effective than TLBP for the detection of S-UNIWARD on BOWS2, while by replacing the pattern mapping, TLBP-RM obtains better results than SRM and FSRM. The experimental results show that the steganalysis performance is related to not only the effectiveness of the features, but also the classifier structure. Besides, for the steganalysis based on the hand-crafted features and the classifier, it is significant to construct effective features and represent the features in the appropriate form for the classifier.
features after feature selection as all merged features are under the same condition. Besides, for a fair comparison between the merged features and the single type features, the feature selection is also applied to the four single type features. 0.23 SRM: 0.20477 FSRM: 0.19263 TLBP: 0.18846 TLBP-RM: 0.1793 FSRM+SRM: 0.18569 FSRM+TLBP: 0.17674 FSRM+TLBP-RM: 0.17087 SRM+TLBP: 0.17645 SRM+TLBP-RM: 0.17155 TLBP+TLBP-RM: 0.17826
0.22
0.21
0.2
0.19
0.18
0.17 0
1
2
3
4
5
6 104
Feature number
(a)
SRM: 0.24721 FSRM: 0.23582 TLBP: 0.22967 TLBP-RM: 0.2214 FSRM+SRM: 0.22583 FSRM+TLBP: 0.21449 FSRM+TLBP-RM: 0.21117 SRM+TLBP: 0.2154 SRM+TLBP-RM: 0.21191 TLBP+TLBP-RM: 0.21939
0.28 0.27 0.26 0.25 0.24 0.23 0.22 0.21 0
1
2
3
4
5
6 104
Feature number
5.5. Forward Feature Selection Feature selection removes the feature redundancies to increase the classification performance, especially when the computing resource is limited. Besides, feature separability analysis may be improved by getting the feedback from the results of the feature selection. Feature redundancies may exist in the merged features due to the large numbers of features. Here, we are going to investigate the effects of the feature redundancies on the performances of the merged features. As the ensemble classifier based on FLD is used for classification, the half of features from cover images and stego images are randomly selected to calculate the Fisher values for each kind of merged feature, and then the features are sorted in descending order according to the Fisher values. Then the features with the highest Fisher values are selected for steganalysis. The number of selected features is n·2000, where n is the positive integer. It is worth noting that the features used to calculate Fisher values may be included in the testing set as the whole feature set is randomly divided into two parts ten times. Although the testing data should be unknown to the classifier before finishing the training, however, using the features from the testing set to calculate the Fisher values have little effect on the performance comparison between the merged
(b) 0.27 SRM: 0.23705 FSRM: 0.22908 TLBP: 0.22767 TLBP-RM: 0.22155 FSRM+SRM: 0.21934 FSRM+TLBP: 0.21116 FSRM+TLBP-RM: 0.20893 SRM+TLBP: 0.21145 SRM+TLBP-RM: 0.20866 TLBP+TLBP-RM: 0.21812
0.26
0.25
0.24
0.23
0.22
0.21 0
1
2
3
4
Feature number
5
6 104
(c) Figure 8: Detetion errors after feature selection based on Fisher values. (a) S-UNIWARD, 0.4 bpp. (b) HILL, 0.4 bpp. (c) MiPOD, 0.4 bpp.
Figure 8 shows the detection errors of the four single type features and the six kinds of merged features after feature selection for the steganalysis of the three steganographic techniques when different numbers of features with the highest Fisher values are used. The numbers following the names of merged features in the legend are the corresponding mini-
2 http://bows2.ec-lille.fr/
12
Table 2: Detection errors of the single type features and merged features on BOSSbase. (%)
Steganographic Scheme
Steganalytic Feature
0.1 bpp
0.2 bpp
0.3 bpp
0.4 bpp
0.5 bpp
S-UNIWARD
SRM FSRM TLBP TLBP-RM FSRM+SRM FSRM+TLBP FSRM+TLBP-RM SRM+TLBP SRM+TLBP-RM TLBP+TLBP-RM
40.33 40.26 40.83 39.89 39.61 39.99 39.40 39.23 38.87 39.96
31.92 31.62 32.36 31.30 30.68 31.18 30.15 30.25 29.64 31.28
25.54 24.75 24.94 23.46 23.82 23.42 22.47 23.21 22.58 23.77
20.67 19.25 18.98 17.97 18.77 17.78 17.16 17.73 17.30 17.81
16.25 14.89 14.41 13.47 14.57 13.32 12.77 13.25 12.75 13.51
HILL
SRM FSRM TLBP TLBP-RM FSRM+SRM FSRM+TLBP FSRM+TLBP-RM SRM+TLBP SRM+TLBP-RM TLBP+TLBP-RM
43.26 42.04 41.18 40.44 41.88 40.75 40.12 40.95 40.22 40.42
35.92 34.91 33.78 33.15 33.94 33.10 32.61 33.05 32.86 33.26
29.67 28.61 27.96 27.27 27.91 26.83 26.14 26.68 26.30 27.16
24.88 23.46 23.11 22.37 22.9 21.84 21.28 21.70 21.24 22.19
20.33 19.21 18.69 18.00 18.60 17.65 17.12 17.35 17.00 18.07
MiPOD
SRM FSRM TLBP TLBP-RM FSRM+SRM FSRM+TLBP FSRM+TLBP-RM SRM+TLBP SRM+TLBP-RM TLBP+TLBP-RM
41.33 41.11 40.82 40.30 40.53 40.38 40.01 39.96 39.62 40.37
34.29 33.83 33.59 33.29 32.97 32.77 32.24 32.16 32.03 33.04
28.72 28.05 27.95 27.16 27.18 26.64 26.03 26.04 25.73 26.95
23.98 22.95 22.65 22.16 22.16 21.48 21.01 21.20 20.83 22.03
19.67 18.92 18.82 18.27 17.93 17.57 17.13 17.26 16.99 18.15
Table 3: Detection errors of the single type features and merged features on BOWS2. (%)
Steganalytic Feature SRM FSRM TLBP TLBP-RM FSRM+SRM FSRM+TLBP FSRM+TLBP-RM SRM+TLBP SRM+TLBP-RM TLBP+TLBP-RM
S-UNIWARD
HILL
MiPOD
0.2 bpp
0.4 bpp
0.2 bpp
0.4 bpp
0.2 bpp
0.4 bpp
32.11 32.53 33.17 31.82 31.15 31.72 30.95 30.59 30.14 31.79
17.76 17.93 18.01 16.96 16.82 16.73 16.10 15.78 15.44 16.81
35.36 34.41 32.64 31.51 32.92 31.82 31.14 31.21 31.00 31.54
21.22 21.10 20.12 19.10 19.75 19.00 18.43 18.25 17.99 19.16
34.74 34.15 33.49 32.69 33.05 32.52 32.16 32.07 31.90 32.58
21.55 21.20 20.58 20.01 19.85 19.57 19.34 18.94 18.45 20.09
13
mum detection errors. The results show that the detection error of FSRM+TLBP-RM decreases faster than other merged features with the feature number increasing, and FSRM+TLBPRM performs better than other merged features with the fixed feature number, which means that when the computing resource is limited, FSRM+TLBP-RM is superior to other merged features. Besides, for steganalysis of S-UNIWARD and HILL, the minimum detection error of FSRM+TLBP-RM is lower than other merged features, while for steganalysis of MiPOD, FSRM+TLBP-RM is slightly inferior to SRM+TLBPRM, which validates the effectiveness of FSRM and TLBPRM again. Furthermore, the best performance of each kind of merged feature is obtained before the feature number reaches the maximum, which means that increasing the feature number may not improve the steganalysis performance. For example, for steganalysis of MiPOD, FSRM+TLBP-RM obtains the lowest detection error when the feature number is 28,000, and the performance decreases if more features are used. It indicates that there exist around 9,000 redundant features in FSRM+TLBP-RM for steganalysis of MiPOD. Note that for steganalysis of different steganographic techniques, the redundant features may be different. Although in the feature construction of FSRM and TLBP-RM, feature reduction is applied to reduce the feature redundancy, there still exist a large number of redundant features. However, according to the experimental results, it is potential to further improve the steganalysis performance by constructing more effective features, which may get benefits from the feature separability analysis. The results also indicate that when feature selection is applied to reduce the feature numbers, the merged features are superior to the single type features, except one case that for the detection of S-UNIWARD with 0.4 bpp, the performance of TLBP-RM may be better than other kinds of features when the feature number is specially appointed. For example, when the feature number is 16,000, TLBP-RM obtains the lowest detection error among all the kinds of features. While for the detection of HILL and MiPOD with 0.4 bpp, the merged feature FSRM+TLBP-RM is always the best choice among the tested features if the feature number is specified to be less than 30,000. In summary, when the feature selection is considered for the feature reduction, the merged features are still superior to the single type features.
shortage of their feature construction. According to the separability analysis of spatial features, TLBP-RM is constructed by replacing ’riu2’ mapping with a new texture pattern mapping in the feature construction of TLBP. The experimental results show that for the steganalysis of S-UNIWARD, HILL, and MiPOD with five payloads, TLBP-RM is superior to TLBP. Besides, according to the separability analysis of frequency features, the frequency feature set FSRM is constructed by applying 1-D DFT to the submodels of SRM. The experimental results show that FSRM has better performance than SRM on BOSSbase and is comparable to SRM on BOWS2, though the FSRM feature number is nearly half of the SRM feature number. Finally, two types of features are merged for steganalysis. The experimental results indicate that merging the features of different types further improves the steganalysis performance. Meanwhile, feature selection is applied to the merged features to reduce the feature redundancy. The steganalysis performances of the selected features indicate that the feature constuction could be further improved to construct more effective features. In the experiments of steganalysis, the ensemble classifier is used for feature classification. The performance improvement gained in this paper is also related to the property of the ensemble classifier. As to TLBP-RM, the new texture pattern mapping and 3-D histograms make the features more separable than TLBP features, thus improves the steganalysis performance. With respect to FSRM, as it removes the feature redundancy, the quantity of the useful information per feature is increased, which improves the performance of the base learner with fixed subspace, and consequently improves the performance of the ensemble classifier. While merging the features of different types increase the feature diversity, which makes the voting strategy work better, and further contributes to the performance improvement. The experimental results also indicate that for steganalysis based on hand-crafted features and the classifier, representing the features in the form appropriate for the classifier makes the best steganalysis performance. According to the conclusion, combining the feature construction and the design of the classifier could further improve the steganalysis performance. To this end, the steganalysis framework based on deep learning achieves the combination of the feature construction and the classification, which automatically learns optimal weights for the feature construction. The state-of-art methods based on deep learning obtain better performances than the hand-crafted features. However, the types of deep learning networks are simple, namely, the features in the hidden layers are learned in the spatial domain by repeating convolution, activation, and pooling operations. Besides, due to the limited computing resource, there are only a small number of learned feature maps. Moreover, the interpretability of the deep learning network is poor. Although researchers divide the network into several phases to simulate the feature construction steps and the classification, there lacks the theoretical work to guide the design of the deep learning network. Therefore, it is still significant to research on the feature separability for the feature construction, which may further contribute to the study of deep steganalysis networks.
6. Conclusion Feature construction is the focus of the staganalysis research. For the steganalysis of the spatial steganography, the popular steganalytic features are constructed from residual histograms in the spatial domain and fed to the ensemble classifier. This paper addresses the feature representation for the steganalysis of the spatial steganography. The effects of the spatial steganography on residual histograms are analyzed in both spatial and frequency domains, and the separability of histogram spatial features and frequency features are measured based on the Fisher discriminant criterion. The spatial feature separability explains the effectiveness of SRM and TLBP and also points out the 14
and Hy (x) is the Hessian matrix of y(x) that
Based on the feature separability analysis, this paper only improves the existing features instead of constructing a novel type of features for steganalysis. Nevertheless, this work validates that it is feasible to construct effective features based on the feature separability. Compared with the feature selection and reduction, constructing the highly effective features directly avoids the calculation of redundant features, which improves the utilization of computing resources. Furthermore, there are fewer works that construct features in the frequency domain. This paper stimulates the feature construction in the frequency domain. As spatial features and frequency features may be complementary, the research on the feature separability may further improve the steganalysis performance. Besides, performing the feature separability analysis may also benefit other fields of data analysis, such as image cryptanalysis [47].
β−2 Hy (x) = 4β (β − 1) xT M−1 x M−1 xxT M−1 β−1 +2β xT M−1 x M−1 .
The second order directional derivative of µ(x) in the direction of i is gTi Hµ (x)Rgi , where gi is the Runit vector in the direction of i. Let Ψ = i αi gi gTi di and κ = i αi gTi M−1 gi di. It is worth noting that Ψ and κ are only related to the payload but unrelated to x. Now we replace the second order difference in (16) by the second order directional derivative of µ(x), and turn the sum into the integral. As both µ(x) and fρ (x) are axisymmetric, we get µ˜ cR(x) − µ˜ s (x) 1 T 2 Ri αi gi H µ (x) gi di 1 T (x) ∇y (x) ∇y(x)T − 2Hy (x) gi di α g 8 i Ri i µc R µc (x) (x) ∇y(x)T gi di − µc8(x) 2 i αi gTi Hy (x) gi di αi gTi ∇y 8 i 2β−2 R M−1 xxT M−1 gi di = µc2(x) i αi gTi β2 xT M−1 x β−2 R M−1 xxT M−1 gi di −µc (x) i αi gTi β (β − 1) xT M−1 x β−1 R − µc2(x) i αi gTi β xT M−1 x M−1 gi di 2β−2 = µc2(x) β2 xT M−1 x xT M−1 ΨM−1 x β−2 −µc (x) β (β − 1) xT M−1 x xT M−1 ΨM−1 x β−1 µc (x) κ. − 2 β xT M−1 x (A.5) = = =
Author Contribution Statement Ping WANG: Methodology, Software, Formal analysis, Writing - Original Draft, Visualization. Fenlin LIU: Conceptualization, Resources, Writing - Review & Editing, Supervision, Funding acquisition. Chunfang YANG: Validation, Investigation, Resources, Writing - Review & Editing, Project administration, Funding acquisition. Declaration of Interests The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Appendix B. The derivation of (18) When β = 1, according to (17), we have
Acknowledgements
µc (x) T −1 x M ΨM−1 x − κ . (B.1) 2 R As αi is axisymmetric, then ζ = i αi g2i( j) di, 1 ≤ j ≤ n, gi( j) is the j-th componets of gi , and if i , j then ! Z Z Z αi gi(i) gi( j) di = αi gi(i) di gi( j) d j = 0. (B.2) µ˜ c (x) − µ˜ s (x) =
This work was supported in part by the National Natural Science Foundation of China (No. 61772549, 61872448, U1736214, 61602508, and 61601517), and the National Key R&D Program of China(No. 2016YFB0801303 and 2016QY01W0105).
i
Appendix A. The derivation of (17) Let a =
n
βΓ( 2n ) n
1
π 2 |M| 2 2 2β Γ
be represented as
n 2β
Ψ
(A.1)
The Hessian matrix of µ(x) is 1 µ (x) ∇y (x) ∇y(x)T − 2Hy (x) , 4
(A.2)
i
R
αi gi gT di R i R i 2 α g g di · · · R i αi gi(1) di i R i i(1) i(2) α g g di α g2 di ··· i i i(1) i(2) i i i(2) = .. .. .. . . . R R α g g di α g g di · · · i i i(1) i(n) i i i(2) i(n) = ζE,
=
and κ
where ∇ is the gradient operator, namely,
β−1 ∇y (x) = 2β xT M−1 x M−1 x,
j
Thus,
β and y (x) = xT M−1 x , then (15) can
" # 1 µ (x) = a exp − y (x) . 2
Hµ (x) =
(A.4)
(A.3) 15
R = i Rαi gTi M−1 gi di = τ iRαi gTi gi di = nτ i αi g2i( j) di = nτζ,
R Ri αi gi(1) gi(n) di α g g di i i i(2) i(n) .. R . α g2 di i i i(n)
(B.3)
(B.4)
where τ is the value of elements in the main diagonal of M−1 , namely, M−1 = (τ − ω) E + ωU. Therefore, µ˜ c (x) − µ˜ s (x) =
ζ µc (x) xT M−2 x − nτ , 2
[16] C.-C. Chang, C.-J. Lin, LIBSVM: a Library for Support Vector Machines, ACM Transactions on Interlligent Systems and Technology (TIST) 2 (3) (2011) 27:1—-27:27. doi:10.1145/1961189.1961199. [17] I. Avcibas, N. Memon, B. Sankur, Steganalysis using image quality metrics, IEEE Transactions on Image Processing 12 (2) (2003) 221–229. doi:10.1109/TIP.2002.807363. [18] X. Luo, F. Liu, S. Lian, C. Yang, S. Gritzalis, On the Typical Statistic Features for Image Blind Steganalysis, IEEE Journal on Selected Areas in Communications 29 (7) (2011) 1404–1422. doi:10.1109/JSAC.2011. 110807. [19] J. C. Lu, F. L. Liu, X. Y. Luo, Selection of image features for steganalysis based on the Fisher criterion, Digital Investigation 11 (1) (2014) 57–66. doi:10.1016/j.diin.2013.12.001. [20] Y. Ma, X. Luo, X. Li, Z. Bao, Y. Zhang, Selection of Rich Model Steganalysis Features Based on Decision Rough Set α-Positive Region Reduction, IEEE Transactions on Circuits and Systems for Video Technology 29 (2) (2019) 336–350. doi:10.1109/TCSVT.2018.2799243. [21] G. Xu, H. Wu, Y. Shi, Structural Design of Convolutional Neural Networks for Steganalysis, IEEE Signal Processing Letters 23 (5) (2016) 708–712. doi:10.1109/LSP.2016.2548421. [22] G. Xu, H.-Z. Wu, Y. Q. Shi, Ensemble of CNNs for Steganalysis: An Empirical Study, in: Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security, IH&MMSec ’16, ACM, New York, NY, USA, 2016, pp. 103–107. doi:10.1145/2909827.2930798. [23] J. Ye, J. Ni, Y. Yi, Deep Learning Hierarchical Representations for Image Steganalysis, IEEE Transactions on Information Forensics and Security 12 (11) (2017) 2545–2557. doi:10.1109/TIFS.2017.2710946. [24] M. Chen, V. Sedighi, M. Boroumand, J. J. Fridrich, JPEG-Phase-Aware Convolutional Neural Network for Steganalysis of JPEG Images, in: Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security, IH&MMSec 2017, Philadelphia, PA, USA, June 20-22, 2017, 2017, pp. 75–84. doi:10.1145/3082031.3083248. [25] J. Zeng, S. Tan, B. Li, J. Huang, Large-Scale JPEG Image Steganalysis Using Hybrid Deep-Learning Framework, IEEE Transactions on Information Forensics and Security 13 (5) (2018) 1200–1214. doi: 10.1109/TIFS.2017.2779446. [26] C. F. Tsang, J. Fridrich, Steganalyzing Images of Arbitrary Size with CNNs, Electronic Imaging 2018 (7) (2018) 121–1–121–8. [27] B. Li, W. Wei, A. Ferreira, S. Tan, ReST-Net: Diverse Activation Modules and Parallel Subnets-Based CNN for Spatial Image Steganalysis, IEEE Signal Processing Letters 25 (5) (2018) 650–654. doi:10.1109/LSP. 2018.2816569. [28] M. Boroumand, M. Chen, J. Fridrich, Deep Residual Network for Steganalysis of Digital Images, IEEE Transactions on Information Forensics and Security 14 (5) (2019) 1181–1193. doi:10.1109/TIFS.2018. 2871749. [29] M. Chen, M. Boroumand, J. Fridrich, Reference Channels for Steganalysis of Images with Convolutional Neural Networks, in: ACM Information Hiding and Multimedia Security Workshop, IH&MMSec ’19, 2019, pp. 188–197. doi:10.1145/3335203.3335733. [30] J. Zeng, S. Tan, G. Liu, B. Li, J. Huang, WISERNet: Wider SeparateThen-Reunion Network for Steganalysis of Color Images, IEEE Transactions on Information Forensics and Security 14 (10) (2019) 2735–2748. doi:10.1109/TIFS.2019.2904413. [31] R. Zhang, F. Zhu, J. Liu, G. Liu, Depth-wise separable convolutions and multi-level pooling for an efficient spatial CNN-based steganalysis, IEEE Transactions on Information Forensics and Security (2019). doi:10. 1109/TIFS.2019.2936913. [32] V. V. Holub, J. J. Fridrich, Digital image steganography using universal distortion, in: W. Puech, M. Chaumont, J. Dittmann, P. Campisi (Eds.), ACM Information Hiding and Multimedia Security Workshop, IH&MMSec ’13, ACM, Montpellier, France, 2013, pp. 59–68. doi: 10.1145/2482513.2482514. [33] B. Li, M. Wang, J. Huang, X. Li, A new cost function for spatial image steganography, in: 2014 IEEE International Conference on Image Processing, ICIP 2014, IEEE, Paris, France, 2014, pp. 4206–4210. doi:10.1109/ICIP.2014.7025854. [34] V. Sedighi, R. Cogranne, J. Fridrich, Content-Adaptive Steganography by Minimizing Statistical Detectability, IEEE Transactions on Information Forensics and Security 11 (2) (2016) 221–234. doi:10.1109/Tifs. 2015.2486744.
(B.5)
where M−2 = (χ − γ) E + γU(χ > 0, γ < 0), and then µ˜ c (x) − µ˜ s (x) =
ζ µc (x) (χ − γ) xT x + γxT uuT x − nτ . (B.6) 2
References [1] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, Cambridge, Massachusetts, USA, 2016. URL http://www.deeplearningbook.org [2] A. A. D. Ker, P. Bas, R. B¨ohme, R. Cogranne, S. Craver, T. Filler, J. J. Fridrich, T. Pevn´y, Moving steganography and steganalysis from the laboratory into the real world, in: W. Puech, M. Chaumont, J. Dittmann, P. Campisi (Eds.), ACM Information Hiding and Multimedia Security Workshop, IH&MMSec ’13, ACM, Montpellier, France, 2013, pp. 45– 58. doi:10.1145/2482513.2482965. [3] R. C. Gonzalez, R. E. Woods, Digital Image Processing, Prentice Hall, Upper Saddle River, NJ, USA, 2007. [4] J. Kodovsk´y, J. Fridrich, V. Holub, Ensemble classifiers for steganalysis of digital media, IEEE Transactions on Information Forensics and Security 7 (2) (2012) 432–444. [5] R. A. Fisher, The Use of Multiple Measurements in Taxonomic Problems, Annals of eugenics 7 (1936) 179–188. [6] T. Pevn´y, P. Bas, J. Fridrich, Steganalysis by Subtractive Pixel Adjacency Matrix, IEEE Transactions on Information Forensics and Security 5 (2) (2010) 215–224. doi:10.1109/TIFS.2010.2045842. [7] J. Fridrich, J. Kodovsk´y, Rich models for steganalysis of digital images, IEEE Transactions on Information Forensics and Security 7 (3) (2012) 868–882. doi:10.1109/TIFS.2012.2190402. [8] W. Tang, H. Li, W. Luo, J. Huang, Adaptive steganalysis against WOW embedding algorithm, in: A. Unterweger, A. Uhl, S. Katzenbeisser, R. Kwitt, A. Piva (Eds.), ACM Information Hiding and Multimedia Security Workshop, IH&MMSec ’14, ACM, Salzburg, Austria, 2014, pp. 91–96. doi:10.1145/2600918.2600935. [9] T. Denemark, V. Sedighi, V. Holub, R. Cogranne, J. Fridrich, Selectionchannel-aware rich model for Steganalysis of digital images, in: 2014 IEEE International Workshop on Information Forensics and Security, WIFS 2014, IEEE, Atlanta, GA, USA, 2014, pp. 48–53. doi:10.1109/ WIFS.2014.7084302. [10] T. Denemark, J. J. Fridrich, P. C. Alfaro, Improving Selection-ChannelAware Steganalysis Features, in: A. M. Alattar, N. D. Memon (Eds.), Proc. IS&T, Electronic Imaging, Media Watermarking, Security, and Forensics 2016, Ingenta, San Francisco, California, USA, 2016, pp. 1– 8. doi:10.2352/ISSN.2470-1173.2016.8.MWSF-080. [11] B. Li, J. Huang, Y. Q. Shi, Textural features based universal steganalysis, in: Security, Forensics, Steganography, and Watermarking of Multimedia Contents X, San Jose, CA, USA, January 27, 2008, 2008, p. 681912. doi:10.1117/12.765817. [12] Y. Q. Shi, P. Sutthiwan, L. Chen, Textural Features for Steganalysis, in: Information Hiding - 14th International Conference, IH 2012, Berkeley, CA, USA, May 15-18, 2012, Revised Selected Papers, 2012, pp. 63–77. doi:10.1007/978-3-642-36373-3_5. [13] G. Xiong, X. Ping, T. Zhang, X. Hou, Image textural features for steganalysis of spatial domain steganography, Journal of Electronic Imaging 21 (3) (2012) 33015. doi:10.1117/1.JEI.21.3.033015. [14] F. Li, X. Zhang, H. Cheng, J. Yu, Digital image steganalysis based on local textural features and double dimensionality reduction, Security and Communication Networks 9 (8) (2016) 729–736. doi:10.1002/sec. 1094. [15] B. Li, Z. Li, S. Zhou, S. Tan, X. Zhang, New Steganalytic Features for Spatial Image Steganography Based on Derivative Filters and Threshold LBP Operator, IEEE Transactions on Information Forensics and Security 13 (5) (2018) 1242–1257. doi:10.1109/TIFS.2017.2780805.
16
[35] W. Luo, F. Huang, J. Huang, Edge adaptive image steganography based on LSB matching revisited, IEEE Transactions on Information Forensics and Security 5 (2) (2010) 201–214. doi:10.1109/TIFS.2010. 2041812. [36] T. Pevn´y, T. Filler, P. Bas, Using High-Dimensional Image Models to Perform Highly Undetectable Steganography, in: Information Hiding 12th International Conference, IH 2010, Springer, Calgary, AB, Canada, 2010, pp. 161–177. doi:10.1007/978-3-642-16435-4_13. [37] V. Holub, J. Fridrich, Designing steganographic distortion using directional filters, WIFS 2012 - Proceedings of the 2012 IEEE International Workshop on Information Forensics and Security (2012) 234–239doi: 10.1109/WIFS.2012.6412655. [38] X. Liao, Z. Qin, L. Ding, Data embedding in digital images using critical functions, Signal Processing: Image Communication 58 (2017) 146–156. doi:https://doi.org/10.1016/j.image.2017.07.006. [39] X. Liao, Y. Yu, B. Li, Z. Li, Z. Qin, A new payload partition strategy in color image steganography, IEEE Transactions on Circuits and Systems for Video Technology (2019). doi:10.1109/TCSVT.2019.2896270. [40] F. Pascal, L. Bombrun, J. Tourneret, Y. Berthoumieu, Parameter Estimation For Multivariate Generalized Gaussian Distributions, IEEE Transactions on Signal Processing 61 (23) (2013) 5960–5971. doi:10.1109/ TSP.2013.2282909. [41] J. R. Hernandez, M. Amado, F. Perez-Gonzalez, DCT-domain watermarking techniques for still images: detector performance analysis and a new structure, IEEE Transactions on Image Processing 9 (1) (2000) 55–68. doi:10.1109/83.817598. [42] T. Ojala, M. Pietik¨ainen, T. M¨aenp¨aa¨ , Multiresolution Gray Scale and Rotation Invariant Texture Classification with Local Binary Patterns, IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (7) (2002) 971–987. doi:10.1109/ICIP.2008.4712139. [43] A. D. Ker, Steganalysis of LSB matching in grayscale images, IEEE Signal Processing Letters 12 (6) (2005) 441–444. [44] Z. Xia, X. Wang, X. Sun, Q. Liu, N. Xiong, Steganalysis of LSB matching using differences between nonadjacent pixels, Multimedia Tools and Applications 75 (4) (2016) 1947–1962. doi:10.1007/ s11042-014-2381-8. [45] T. Pevny, J. Fridrich, Merging Markov and DCT features for multi-class JPEG steganalysis, Security, Steganography, and Watermarking of Multimedia Contents IX 6505 (2007) 28–40. doi:10.1117/12.696774. [46] J. Kodovsk´y, J. Fridrich, Steganalysis of JPEG images using rich models, Media Watermarking, Security, and Forensics 2012 8303 (2012) 81–93. doi:10.1117/12.907495. [47] C. Li, Y. Zhang, E. Y. Xie, When an attacker meets a cipher-image in 2018: A year in review, Journal of Information Security and Applications 48 (2019) 102361. doi:10.1016/j.jisa.2019.102361.
17