Audio steganalysis with Hausdorff distance higher order statistics using a rule based decision tree paradigm

Audio steganalysis with Hausdorff distance higher order statistics using a rule based decision tree paradigm

Expert Systems with Applications 37 (2010) 7469–7482 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

894KB Sizes 17 Downloads 75 Views

Expert Systems with Applications 37 (2010) 7469–7482

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Audio steganalysis with Hausdorff distance higher order statistics using a rule based decision tree paradigm S. Geetha a,*, N. Ishwarya a, N. Kamaraj b a b

Network Security Labs, Department of Information Technology, Thiagarajar College of Engineering, Madurai, India Department of Electrical and Electronics Engineering, Thiagarajar College of Engineering, Madurai, India

a r t i c l e

i n f o

Keywords: Information forensics Audio steganalysis Hausdorff distance statistics Rule based decision tree approach

a b s t r a c t The aim of this paper is to construct a practical forensic steganalysis tool for audio signals that can properly analyze the statistics disturbed by stego embedding and classify them to selected current steganographic methods. The objective of this paper is to prove that the choice of effective stego sensitive features and a proficient machine learning paradigm enhances the detection accuracy of the steganalyser. In this paper a rule based approach with a family of six decision tree classifiers viz., Alternating Decision Tree, Decision Stump, J48, Logical Model Tree, Naïve Baye’s Tree and Fast Decision Tree learner, to perform the detection of audio subliminal channel is introduced. In particular the higher order statistics extracted from the Hausdorff distance are investigated for an improvement of the detection performance, as competent audio steganalytic features. The evaluation of the enhanced feature space and the decision tree paradigm, on a database containing 4800 clean and stego audio files is performed for classical steganographic as well as for watermarking algorithms. With this strategy it is shown how general forensic approach can detect information hiding techniques in the field of covert communication as well as for DRM applications. For the latter case, the detection of the presence of a potential watermark in a specific feature space can lead to new attacks or to a better design of the watermarking pattern. Ó 2010 Elsevier Ltd. All rights reserved.

1. Motivation and the application scenario of audio steganalysis Steganography is the art of invisibly burying sensitive data into decoy information (cover) such that existence of data itself is hidden, except to the persons who correspond with each other. Secret messages can be embedded in text, graphics (images), video or sound (audio files) – basically anything that can be represented as bit streams. The logic behind the process is that these digital media may be slightly altered by injecting the secret data, and yet still look exactly the same to the human eye, or sound identical to the human ear. In this competitive information era, the threat of steganography has become a major issue in corporate life. Many organizations protect their internal network resources and information by using sophisticated security measures, such as firewalls. Many firewalls can block e-mail attachments such as executables, spreadsheets, and documents, and do so by looking for file extensions. Some security measures, or content filters, can actually determine if the particular file or attachment is actually the type to be blocked, a spreadsheet for instance, by analyzing the contents of the file. But these security checks are too transparent for innocent looking steganographic files so that confidential information can be easily and undetectably leaked out of organizations by disgruntled employees. * Corresponding author. Tel.: +91 98425 50862. E-mail address: [email protected] (S. Geetha). 0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.04.012

In addition steganography has become a tool for terrorists to exchange information globally. What makes this covet communication possible are the enormous computing power on desktops today, the massive volume of electronic communications, and the number of easily available tools that allow even a routine user to employ steganographic techniques. For example, commercial spies or traitors may thieve confidential trading or technical messages and deliver them to competitors for a great benefit by using hiding techniques. Terrorists may also use related techniques to cooperate for international attacks (like the 9/11 event in the U.S.) and prevent themselves from being traced. Some others may even think of the possibility of conveying a computer virus or Trojan horse programs via data hiding techniques. By far the biggest type of threat is the potential for concealing steganographic writing within computerized images and audios. The best way to guard against all the mentioned hazards of modern steganography is ‘‘steganalysis”. Steganalysis is taken as a countermeasure to steganography and is aimed at detecting the presence of hidden information from seemingly innocuous signals. Steganalysis, being the quality evaluation criteria of steganography, cannot only be used to expedite the elevation of the steganography security but also can be used for keeping from the abuse of the steganography by lawbreakers. When steganography is used to hide messages, a certain level of distortion and degradation may occur in the carrier; however, it might not be easily detected by the human visual/auditory system.

7470

S. Geetha et al. / Expert Systems with Applications 37 (2010) 7469–7482

Comparing the distortion and degradation and space considerations to a ‘‘normal” cover media could make possible the detection of whether steganographic content exists or not. It could be noticed that audio steganalysis is relatively unexplored than the image steganalysis by the information hiding community so far. Many efficient universal steganalysis systems exist for breaking the image stego systems (Fridrich, 2004; Lyu & Farid, 2006; Miche et al., 2006; Nung & Shiang, 2005; Wang & Moulin, 2007; Özer, Sankur, Memon, & Avcıbas, 2003; Özer, Sankur, Memon, & Avcıbas, 2006), but only few approaches exist for the audio domain. This may be because of the following reasons. The existing audio steganography schemes are advanced, i.e., they offer excellent audio imperceptibility, like the ones demonstrated in Kaliappan, Stanley, Scott, and Darren (2003), Gopalan (2004) for example. Additionally, the very nature of audio signal as a high capacity data stream makes the statistical analysis challenging. Especially inter-window analyses (considering the evolvement of the signal over time) which are possible on this continuous media distinguish audio signals from the image domain. All these factors motivated us to devise an efficient audio forensics tool. Steganalysis is an intrinsically difficult problem. By the weakest definition, a steganographic algorithm is considered to be defeated if there exists a method that can determine whether or not a medium contains hidden information with a success rate better than random guessing. While the number of freeware packages available for steganography is increasing each year, the detection of most of these methods is neither satisfactory nor fully automated. Extracting the hidden messages from the suspicious objects by comparing them to the original version is an uncomplicated task. But practically there are restrictions in the portability and accessibility of original cover signals. Hence, detecting the existence of hidden information becomes quite difficult and complex without knowing exactly which embedding algorithm, hiding domain, tunable system parameters and steganographic keys were used. Typically an intuition on the characteristics of cover media is used to determine a decision statistic that captures the effect of data hiding and allows discrimination between natural clean files and those containing hidden data. The question of the optimality of the statistic used is generally left unanswered. Additionally, the question of how to calibrate these statistics is also left open. Besides, by careful selection of the cover media, whose statistics are not sufficiently disturbed by the stego embedding process, to be detected by a steganalytic tool, a steganographer can show his ingenuity. Thus an arms race situation is seen prevailing between steganography and steganalysis: a steganographic method is detected by a steganalysis tool; a new steganographic method is invented to prevent detection, which in turn is found to be susceptible to an improved steganalysis. It is not known then what the limits of steganalysis are, an important question for both the steganographer and steganalyst. It is hoped by careful analysis that some measure of optimal detection can be obtained. Even though nothing might be guessed about the contents of the secret message in passive steganalysis, when the existence of hidden message is known, revealing its content is not always necessary. Just disabling and rendering it useless will defeat the very purpose of steganography. This implies that the warder should be capable of discriminating suspicious objects from a large number of innocuous ones (i.e., the so-called passive steganalysis). On the outset, deciding whether the cover media contains any secret message embedded in it or not is essential to steganalysis. Hence this work is aimed to determine whether a multimedia object was modified by information hiding techniques. Once classified, the suspicious objects can then be inspected in detail by any particular data embedding/retrieving algorithms. This pre-process would particularly contribute to save time in active steganalysis. We perceive the steganalysis task as analogous to pattern recognition process, i.e., to decide which class a test file belongs to – clean

or stego. The key issue for steganalysis is the choice of features. The features should be highly sensitive to the data hiding process. In other words, the features should be rather different for the clean, benign signal and for the adulterated stego-signal. The larger the difference, the better the features are. The features should be as general as possible, i.e., they are effective to all different types of audio and different data hiding schemes. Often in practice it is very hard to achieve a high recognition rate with a single feature when the classification such as steganalysis is complicated in nature. Therefore, multidimensional feature vectors need to be used under those circumstances. However, the typical number of possible features is so large as to prohibit any systematic exploration of all but a few possible interaction types (e.g., pair-wise interactions). Large feature sets with noisy numerical data also provide considerable difficulty for traditional symbolic inductive learning systems. The running time of the learning system and the accuracy and complexity of the output rapidly fall below an acceptable level. This work aims at extracting low-dimensional, informative statistical features that are significantly sensitive to the data hiding process. Considering from all these view-points, we propose to employ the higher order statistics derived from the audio distortion measure calculated from Hausdorff distance (Huttenlocher, Klanderman, & Rucklidge, 1993) as features. This dissimilarity metric reported many successful applications in template matching in images (Huttenlocher et al., 1993; Veltkamp and Hagedoorn, 1999). It possesses strong stego discriminatory power which is very helpful in the distortion measurement process. Unlike previous work in audio steganalysis that used the traditional audio quality metrics, such as Signal-to-Noise Ratio (SNR), Perceptual Audio Quality Measure (PAQM) Özer et al., 2003; Özer et al., 2006, and other such metrics, the proposed distortion measure is designed specifically to detect modifications to pure audio content. The classifier is trained with the selected features extracted from the clean and stego audio samples. The learning classifier iteratively updates its classification rule based on its prediction and the ground truth. The final stego classifier is obtained upon convergence. When we train the classifier for a specific embedding algorithm a reasonably accurate detection can be achieved. The classifier learns a model by averaging over multiple examples. Simulation results with the chosen decision tree paradigm, Hausdorff distance statistics and popular watermarking and steganographic strategies show the effectiveness of the proposed approach. 2. Literature review Classical approaches to data hiding within images can be categorized into spatial or transform (e.g., DCT, DWT, Ridgelet etc.) domains (Katzenbeisser & Petitcolas, 2000; Marvel, Boncelet, & Retter, 1999). Least Significant Bit (LSB) addition (Bender, Gruhl, Morimoto, & Lu, 1996; Nikolaidis & Pitas, 1998) or substitution (Chen, Chang, & Hwang, 1998; Lee & Chen, 2000) method is the most popular hiding technique. These techniques operate on the principle of tuning the parameters (e.g., the payload or disturbance) so that the difference between the cover signal and the stego signal is little and imperceptible to the human eyes. Yet, computer statistical analysis is still promising to detect such a distinction that human beings are difficult to perceive. Some tools, such as StegoDos, S-Tools, and EzStego, provide spatial-domainbased steganographic techniques (Katzenbeisser & Petitcolas, 2000; Petitcolas, Anderson, & Kuhn, 1999). Zeng, Ai, Hu, and Goa (2008) and colleagues analyzed a steganalysis method based on statistical moments of peak frequency. Combined with power cepstrum, it statistically analyzes the peak frequency using short window extracting, and then calculates the second order and third order center moments of the peak frequency as feature vector. The Bayes classifier is utilized in classification.

S. Geetha et al. / Expert Systems with Applications 37 (2010) 7469–7482

The authors have used various embedding parameters combinations such as hiding segment length, attenuation coefficient, and echo delay for hiding for the proposed steganalysis method. Fridrich, Goljan, and Hogea (2003) presented general methodology for developing attacks on steganographic systems for the JPEG image format. The detection first starts by decompressing the JPEG stego image, geometrically distorting it and recompressing. The paper investigates the use of macroscopic statistics that also predictably changes with the embedded message length. The details of the detection methodology are explained on the F5 algorithm and Outguess. Xu, Zhang, Wang, and Liu (2007) have viewed active steganalysis as blind sources separation (BSS) problem and can be solved with Independent Component Analysis (ICA) algorithm under the assumption that embedded secret message is an independent, identically distributed (i.i.d) random sequence and independent to cover image. Passive, in contrast to active steganalysis detects only the presence or absence of a hidden message. Fridrich (2004) described an improved version of passive steganalysis in which the features for the blind classifier are calculated in the wavelet domain as higher-order absolute moments of the noise residual. The features are calculated from the noise residual because it increases the features’ sensitivity to embedding, which leads to improved detection results. Geetha, Sivatha Sindhu, and Kamaraj (2008) presented a passive approach for audio steganalysis using genetic classifier and audio quality metrics. Specific steganalysis is dedicated to only a given embedding algorithm. It may be very accurate for detecting images embedded with the given steganographic algorithm but it fails to detect those embedded with another algorithm. Techniques developed in Fu, Qi, and Yuan (2007), Fridrich et al. (2002), Zhi, Fen, and Xian (2003) are specific where they target to attack wavelet-, Outguess-, and LSB-based stego systems respectively. Universal steganalysis enables to detect stego images whatever the steganographic system be used. Because it can detect a larger class of stego images, it is generally less accurate for one given steganographic algorithm. Methods presented in Lyu and Farid (2006), Dai, Liu, and Lin (2008) are universal. Several multi-class steganalyzers have been proposed in the recent years by various authors. Savoldi and Gubian (2007) presented an effective multi-class steganalysis system, based on high-order wavelet statistics, capable of attributing stego images to four popular steganographic algorithms. Penvy and Fridrich (2008) constructed a practical forensic steganalysis tool for JPEG images that can properly analyze both single- and double-compressed stego images and classify them to selected current steganographic methods. In binary steganalysis only two classes are there. The input is either tested positive (stego) or negative (pure). Targeted attacks use the knowledge of the embedding algorithm (Ru, Zhang, & Huang, 2005), while blind approaches are not tailored to any specific embedding paradigm. Blind approaches can be thought of as practical embodiments of Cachin’s (Cachin, 1998) definition of steganographic security. It is assumed that ‘natural images’ can be characterized using a small set of numerical features. The distribution of features for natural cover images is then mapped out by computing the features for a large database of images. Using methods of artificial intelligence or pattern recognition, a classifier is then built to distinguish in the feature space between natural images and stego images. Avcibas, Memon, and Sankur (2003) were the first who proposed the idea to use a trained classifier to detect and to classify robust data hiding algorithms. They have proposed an image steganalytic system using image quality metrics as features. Avcibas, Kharrazi, Memon, and Sankur (2005) also proposed a different set of features based on binary similarity measures between the lowest bit-planes. The authors used audio quality metrics (Özer et al., 2006) as features and tested their method on watermarking and steganographic systems. Later in Audio Audio Audio steganalysis with content

7471

independent measures (2006) they extended the same work and identified six content independent audio quality metrics and proved that the detection accuracy significantly increased because of removing content dependency from the features. Later many works have been proposed reporting the extraction of features suitable to reveal the existence of the secret message. Wang and Moulin (2007) investigated the selection of low dimensional informative features from three angles and proposed a three level learning of the classifier. The learning process includes selection of a sub-band image representation with better discrimination ability, Analyzing the empirical moments of PDFs and their characteristics functions and comparing their merits and reducing the dimensionality of the features. The authors have studied the statistical effects of additive embedding and explained why empirical characteristic function moments of wavelet coefficients are better choices than empirical PDF moments in image steganalysis. Fu et al. (2007) have proposed a wavelet domain audio steganalysis method based on principal component analysis. The audio signal is firstly decomposed by 4-level discrete wavelet transform, then the 36 statistical moments of the histogram and the frequency domain histogram for both the audio signal and its wavelet subbands are calculated as features, then the preprocessing of PCA is used on the statistics features and radial basis function (RBF) network is utilized as a classifier. Compared to image steganalysis, audio steganalysis is relatively unexplored. Hence we are motivated to develop a steganalytic scheme specifically for audio. In this paper we aim to discover a feature that is highly sensitive to distortions incurred in the signal mainly due to data hiding operation. Higher order statistics extracted from Hausdorff distance metric proves to be a candidate feature set. This work is typically a novel and first attempt of employing these features for steganalysis and designing a rule based system designed for audio steganalysis. An effective machine learning paradigm is identified to realize the proposed system i.e., a family of decision tree classifiers. The experimental results show that the proposed system (Hausdorff distance and Decision Tree classifier) achieves better classification rate compared to other systems and hence justify our claim that the proposed combination is a promising one in the audio steganalysis domain. 3. Proposed steganalyser For a group of given data samples (e.g., coefficients in any subband of the audio multi-level representation), the first important step of machine learning based audio steganalysis is to choose representative features from the two classes of training audio samples: clean, untouched audio samples and stego-audio with hidden information. Then, a decision function is built based on these feature vectors. The performance of the decision rule depends on the discrimination capabilities of the features. Also, if the feature vector has low dimension, the computational complexity of learning and implementing the decision function will decrease. In summary, we need to find informative, low-dimensional features. 3.1. Preprocessing of the audio signal The core of the feature extraction process, assumes audio files as input media. Therefore audio signals in other representations (e.g. the audio stream of an application) have to be captured into files. This is done by the application of specific hardware or software based capturing modules on the host or in the network. Additional pre-processing of the audio data handles the input and provides basic functions for data filtering (bit-plane filtering, silence detection), windowing and media specific operations like channel-interleaving/de-merging. Also denoising operation is performed as a pre-processing operation on the audio samples.

7472

S. Geetha et al. / Expert Systems with Applications 37 (2010) 7469–7482

Due to the natural noise in the media transmission process, e.g., quantization, sensor, and channel, a number of steganography hiding schemes try to disguise the hidden message as a naturally present noise. As such, a generalized additive noise scheme has been developed in Joachim, Bauml, and Girod (2002) that is capable of embedding data with any given distribution. Moreover, the work in Zhi et al. (2003) shows that most of the steganography algorithms, e.g., Least Significant Bit (LSB) steganography, spread spectrum image steganography, or even more robust and stealthy steganography schemes such as Discrete Cosine Transform (DCT) steganography, can be approximated as an additive noise scheme. The schematic diagram describing the embedding process in the steganography system is shown in Fig. 1. Let C(n) denote a cover audio object and S(n) be its stego-version. Let GN(n) be an independent and identically distributed (i.i.d) Gaussian noise; then the stego-audio can be expressed as S(n) = C(n) + GN(n) with the additive noise model. A good feature should enlarge the distance between C(n) and S(n). Since the distortion metric should be sensitive to the presence of a hidden message and its reaction should be proportional to the embedding strength, Hausdorff distance (Huttenlocher et al., 1993) can be used as a dissimilarity measure. Among dissimilarity measures over digital data, the Hausdorff distance has often been used in the content-based retrieval domain and is known to have successful applications in object matching (Veltkamp & Hagedoorn, 1999). A small distortion can result in a significant distance between two objects. However, in steganalysis, the main issue under consideration is not the content of an audio but the minor distortions introduced during the data-hiding process. As a result, this characteristic of Hausdorff distance makes it very effective for steganalysis. Hausdorff distance which is basically a max-min distance is very much useful in steganalysis since it is purely sensitive to distortions due to data hiding process. Let the length of each segment of the audio file be M. After denoising and wavelet decomposition at level p, for the mth segment, the wavelet coefficients of the audio file and its denoised version are

xpm ¼ fx1m ; x2m ; . . . ; xqm g

ð1Þ

and ~xpm ¼ f~x1m ; ~x2m ; . . . ; ~xqm g where q ¼ M=2p :

ð2Þ

Then, its distortion measure with Hausdorff distance is:

Hpm ¼ maxfhðxpm ; ~xpm Þ; hð~xpm ; xpm Þg

ð3Þ

    where hðxpm ; ~xpm Þ ¼ maxi¼1;2;...;q fminj¼1;2;...;q xim  ~xjm g is the directed    ~ pm and xi  ~ xjm  is some underlying Hausdorff distance from X pm to X m norm on the point of xim and ~xjm Here, we use the absolute difference. However, it is important to note that, in a real communication environment, a ‘‘reference audio file” needs to be used since we cannot get specific information about the original cover-audio object. The denoised version of an audio file has already been shown to be a good estimation of the cover signal (Özer et al., 2006). It is to be noted that the denoised version of the stego-audio is still the denoised cover-audio with some i.i.d. Gaussian noise. The goal of the denoising process is to recover the characteristics of the original cover-audio object while also removing as much noise as possible. Considering the non-stationary characteristics of audio signals (Geetha et al., 2008; Johnson, Lyu, & Farid, 2005), a smoothing filter may not be very suitable for estimating the cover-object. Among many other techniques, Wiener filtering is a powerful tool for additive noise reduction. In its basic form, Wiener theory assumes that signals are stationary processes. However, this assumption is not realistic for audio signals, whose characteristics change in time and therefore are considered to be non-stationary signals. As a result, we consider adopting the wavelet denoising technique. Using the thresholding technique (Chen et al., 1998), wavelet approximation allows an adaptive representation of signal discontinuities. Wavelets also provide unconditional basis for a variety of function spaces and thus provide better approximation power than Fourier basis, to recover the characteristics of the cover-audio signal more effectively (Chen et al., 1998). 3.2. Feature extraction Once we get the denoised version of an audio signal, a distortion measurement may be performed to assess the distortion or degradation of the original audio signal. Such a measurement should respond to the presence of a hidden message in an accurate, consistent, and monotonic (with respect to the size of the hidden message) way. It should be noted that, instead of gathering information directly from audio files, signatures of the audio files are generated based on their wavelet coefficients at different levels of resolution and will be used to test the distance of the audio file and its denoised version. The wavelet transform is chosen for its

Fig. 1. Schematic descriptions of (a) additive noise audio-steganography model, (b) denoising a cover-audio object, and (c) denoising a stego-audio object.

7473

S. Geetha et al. / Expert Systems with Applications 37 (2010) 7469–7482

well-known capability of multi-level decomposition (Burrus, Gopinath, & Guo, 1998), which can help to enlarge the influence of the additive noise present as a result of embedding. Since the hidden information may only modify a small portion of the cover-objects, the distortion is calculated at their predefined small segments separately. At this point, the time–frequency localization (Burrus et al., 1998) characteristic of the wavelet transform may also provide some information about the discontinuities that occur. The Hausdorff distance is calculated between an audio file (clean or stego) and its respective denoised version. Subsequently the higher order statistics including mean, variance, skewness, kurtosis and moment, of this distance vector is calculated. Then the feature vector is framed with these statistical values. Suppose the entire audio length has N samples, then the total N number of segments is dM e where M is the length of each segment. For wavelet decomposition at level p, where p = 0, 1, 2, ..., P, the overall distortion measured using Hausdorff distance is:

( Kurtosis : Kurtðx1 ; x2 ; . . . xN Þ ¼ PN Moment : Moment2 ¼

N 1 X xj  x_ 4 ð Þ N j¼1 r

p i p j¼1 ðfj Þ :dj PN p j¼1 dj

;

)

i ¼ 1; 2; . . . ; K

ð9Þ

ð10Þ

p

where dj is the amplitude of jth frequency component fjp to the distortion distances Dp and K is the total number of moments. We used the second order moments because from the experimental observations, they were discovered to possess a superior discriminatory power than the first order moments. In this way, from each level of decomposition, five features are extracted. Totally, we have 25 features calculated from a 4-level wavelet decomposition (0th – the signal itself, 1st, 2nd, 3rd, 4th levels), and constructed a 25-dimension feature vector representing an audio signal. Listing I summarizes the procedure for the feature extraction process, from an audio signal.

Listing I. Algorithm for Feature Extraction

D E Dp ¼ Hp1 ; Hp2 ; . . . ; Hpd N e

ð4Þ

3.3. Post-processing of the resulting feature vectors

M

and the feature vector

*



f1p ; f2p ; . . . ; f5p ; f1pþ1 ; f2pþ1 ; . . . ; f5pþ1 ; . . . ;

+

f1pþ4 ; f2pþ4 ; . . . ; f5pþ4

ð5Þ

comprising the statistics like mean, variance, skewness, kurtosis and moment, at each level of wavelet decomposition, is constructed. For a group of given co-efficient values, the higher order statistics are:

Mean : x_ ¼

N 1 X xj N j¼1

N 1 X _ 2 ðxj  xÞ N  1 j¼1 3 N  xj  x_ 1X Skewness : Skewðx1 ; x2 ; . . . xN Þ ¼ N j¼1 r

Variance : Varðx1 ; x2 ; . . . xN Þ ¼

where

r is the standard deviation value

ð6Þ ð7Þ

ð8Þ

In the proposed steganalyser the post-processing of the resulting feature vectors is responsible for preparing the following analysis by providing normalization as well as format conversions on the feature vectors. This step is introduced to make the approach more flexible and allow for different analysis or classification approaches. Before proceeding to evaluate the performance of the classifier, the discrimination capability of the proposed features is to be analyzed. The experiment involves breaking of different steganographic or watermarking strategies, which may adapt different techniques for embedding ranging from LSB substitution to embedding inside the wavelet coefficients. Hence the feature set formed has to be normalized before feeding them into the classifier for training, to provide a uniform semantics to the feature values. A set of normalized feature vectors as per the data smoothing function (Minn, 2007), min ^f ¼ fi  fi i max fi  fimin

ð11Þ

7474

S. Geetha et al. / Expert Systems with Applications 37 (2010) 7469–7482

is calculated for each seed audio to explore relative feature variations after and before it is modified. ^f i ; fimin and fimax represents the ith feature, its minimum and maximum values respectively. 3.4. Machine learning paradigm The subsequent analysis as the final step in the steganalysis process is implemented using a family of decision trees for classification of the signals as clean or stego bearing. Decision trees (Chisß, 2008; Garcia-Almanza & Tsang, 2008; Kanal, 1979; Landeweerd, Timmers, Gelsema, Bins, & Halic, 1983; Li & Dubes, 1986; Stein et al., 2005; Vafaie & DeJong, 1994) represent a supervised approach to classification. A decision tree is a simple structure where non-terminal nodes represent tests on one or more attributes and terminal nodes reflect decision outcomes. Steganalysis task is a process involving multi-stage decision making in which we classify a file as stego-adulterated or not, by considering the feature values and the inter-correlation among the features. The basic idea involved in any multistage approach is to break up a complex decision into a union of several simpler decisions, hoping the final solution obtained this way would resemble the intended desired solution. Some motivations (Safavian & Landgrebe, 1991) for using decision tree classifiers on steganalysis task are: (1) The steganalysis global decision region is complex, especially in 25-dimensional spaces, and they can be approximated by the union of simpler local decision regions at various levels of the tree. (2) In contrast to conventional single-stage classifiers where each data sample is tested against all classes, thereby reducing efficiency, in a tree classifier a sample is tested against only certain subsets of classes, thus eliminating unnecessary computations. (3) In single stage classifiers, only one subset of features is used for discriminating among all classes. This feature subset is usually selected by a globally optimal criterion, such as maximum average interclass separability (Vafaie & DeJong, 1994). In decision tree classifiers, on the other hand, one has the flexibility of choosing different subsets of features at different internal nodes of the tree such that the feature subset chosen optimally discriminates among the classes in that node. This flexibility may actually provide performance improvement over a single-stage classifier (Mui & Fu, 1980).

(4) In multivariate analysis, with large numbers of features and classes, one usually needs to estimate either high-dimensional distributions (possibly multimodal) or certain parameters of class distributions, such as a priori probabilities, from a given small sized training data set. In so doing, one usually faces the problem of ‘‘high-dimensionality.” This problem may be avoided in a DTC by using a smaller number of features at each internal node without excessive degradation in the performance. In this research we aimed to analyze and study the feasibility of applying decision tree classifiers on steganalysis problem. Hence we implemented the system, with a family of decision tree classifiers (Aha, Kibler, & Albert, 1991) including Alternating Decision Tree (ADT) (Freund and Mason, 1999), Decision Stump (Oliver & Hand, 1994), J48 (Quinlan, 1993), Logical Model Tree (LMT) (Quinlan, 1993), Naïve Baye’s Tree (NBT) (Safavian & Landgrebe, 1991) and Fast Decision Tree (FDT) (Landeweerd et al., 1983) Learner. The functioning model of the proposed audio steganalyzer is depicted in Fig. 2. Although we will not present the specific decision tree algorithm here, we can summarize the general approach, as below: 1. Choose an attribute that best differentiates the output attribute values. 2. Create a separate tree branch for each value of the chosen attribute. 3. Divide the instances into subgroups so as to reflect the attribute values of the chosen node. 4. For each subgroup, terminate the attribute selection process if: a. All members of a subgroup have the same value for the output attribute, terminate the attribute selection process for the current path and label the branch on the current path with the specified value. b. The subgroup contains a single node or no further distinguishing attributes can be determined. As in (a), label the branch with the output value seen by the majority of remaining instances. 5. For each subgroup created in (3) that has not been labeled as terminal, repeat the above process. Listing II shows an abstracted description of the algorithm execution.

Listing II. Pseudo code for Decision Tree Classifier

S. Geetha et al. / Expert Systems with Applications 37 (2010) 7469–7482

7475

Fig. 2. Functional model of the proposed audio steganalyzer.

4. Test scenario Two test goals are defined for this work. The primary goal is to prove that the higher order statistics of Hausdorff distance, reliably detect the presence of a given audio hidden channel. The secondary goal is to show the performance of the steganalyser is enhanced by the usage of the efficient machine learning technique – rule based decision tree approach. The defined test scenario, test data sets, set-up and procedure adapted for the evaluation of these goals are described in the following section. 4.1. Test sets and test set-up This section describes the set of algorithms A, sets of test files TestFiles and the classification method used in the evaluations. 4.1.1. Information hiding algorithms used For the performance evaluation of the proposed system, a set of information hiding algorithms A is used. The set of algorithms in A may be partitioned into two subsets as AS (audio steganography algorithms) and AW (audio watermarking algorithms) with A ¼ AS [ AW AS chosen: the following AS are used for testing:  AS1 – Steghide (version 0.4.3): for detailed descriptions see the Steghide website (Hetzl, 2003) and Kraetzer, Dittmann, and Lang (2006); parameter set: default  AS2 – S-Tools (version 4.0): for detailed descriptions see the STools website and Andy Brown (Brown, 2008); parameter set: default, Encryption algorithm = IDEA  AS3 – DWT Fusion (Tolba, Ghomey, Taha, & Khalifa, 2004): A cover-screw algorithm based on wavelet based fusion embedding IDEA secured messages into PCM coded audio files. The embedding is done by taking the wavelet decomposition of both cover audio and the secret audio and merging them into a single fused result using an embedding strength factor. Detection is

done by subtracting the wavelet coefficients of the cover audio signal from the fused result and then dividing by the embedding strength factor; parameter set: emebdding_strenght_factor = 0.006. AW chosen: The following schemes are considered for evaluating digital audio watermarking algorithms:  AW1 – Spread Spectrum (Cox, Kilian, Leighton, & Shamoon, 1997): A watermarking technique that spreads the watermarked signal over the entire audible frequency spectrum such that it approximates white noise, at a power level as to be inaudible. A pseudorandom sequence (chip) is used to modulate a carrier wave which creates the spread signal watermark code. This code is attenuated to a level roughly equal to 0.5% of the dynamic range of the original audio file, before being mixed with the original; parameter set: ECC = on, l = 2000, attenuation_level = 0.5 % of the audio signal range  AW2 – Echo Data Hiding (Bender et al., 1996); A data hiding technique that copies the original audio signal into two kernels, one which leads the original signal in time, and one which lags. Each kernel represents either a zero or a one bit for watermark data transmission. The bit stream of watermark data is used to mix the two kernels together. The signals are mixed with gradually sloped transitions to reduce distortion; parameter set: decay_rate = first_echo_amplitude = 0.5, one_offset = 0.6 ms,zero_offset = 0.9 ms  AW3 – Amplitude Modulation (Kaliappan et al., 2003); The embedded information is encoded in the form of the relative phase differences between the amplitude modulations applied to several groups of sub-band signals. Extraction of the amplitude modulations from the watermarked signal is performed by calculating the logarithmic ratio of the amplitude envelopes extracted from the neighboring sub-band signals; parameter set: embedding_region < 11,025 Hz, subband_pairs = 32, subband_groups = 3, frame_period = 5 s(3 Hz), 2 s(12 Hz).

7476

S. Geetha et al. / Expert Systems with Applications 37 (2010) 7469–7482

4.1.2. Test files For the evaluation of the proposed method, our cover audio dataset consists of 200 audio samples from Audio wave files (2006). It formed an audio database containing diverse characteristics like music, male speech, female speech, sporadic files, and audio files with silence, instrumental audio samples, etc. These audio samples are with the specification of 44,100 Hz Sample Rates, stereo, 16 bit quantization in uncompressed, PCM coded WAV-files. They have duration ranging from 10 s to 60 s. The files are transformed from their compressed wave format into standard PCM wave format using Nero Wave Edit software, if needed. It is expected that the difference between a cover audio and its stego version can be easily detected when more secret messages are embedded. Hence the capacity of the payload of a steganography scheme should be taken into account in evaluating the detection capacity of a steganalytic classifier. To depict this, the embedding rate (ER) characterizing a scheme which is defined as the ratio between the number of embedded bits and the number of pixels in an audio, is also analyzed. Hence, this database is augmented with the stego versions of these audio files using the above mentioned six schemes, at various ERs. Also a separate data set was generated by applying the signal processing techniques like compression (at several quality factors), low-pass filtering, equalizing, re-sampling, etc. Our generation procedure is aimed at making even contributions to audio database from different embedding schemes, from original or stego, and from processed or non-processed versions, so that the evaluation results can be more reliable and fair. Three different ERs are attempted for each scheme in generating the database like (#1) 5% (#2) 10% (#3) 20% of the maximum payload capacity prescribed by the techniques. The entire database contains 20046 = 4800 {(No. of audio files)  (No. of varying ER – 3 + 1 i.e., three different ER 5%, 10%, 20%+1 for clean set)  (No. of schemes evaluated)} audio files on the whole. Three-fourth (3600) of files in each type is used for training and the remaining files (1200) are used for testing. All the records are denoised to get the respective reference signal using wavelet shrinkage method (Donoho & Johnstone, 1995). Nonlinear soft thresholding was employed with the threshold value chosen being data dependent. Wavelet shrinkage denoising is performed with Version 4.6c1 of the WAVBOX Software Library (Taswell, 1993) using orthogonal discrete wavelet transforms and wavelet filters from the systematized collection of Daubechies wavelets. 4.1.3. Machine learning paradigm used For the classification in the steganalyser evaluation, the model is incorporated with a family of six decision tree classifiers including Logical Model Tree (LMT) Quinlan, 1993, Decision Stump (Oliver & Hand, 1994), Naive Bayes Tree (NBT) Safavian & Landgrebe, 1991, J48 (Quinlan, 1993), Fast Decision Tree (FDT) learner Landeweerd et al., 1983 and Alternating Decision Tree (ADT) Freund and Mason (1999) in Java and the algorithm described in Listing I is implemented as per the framework proposed. The stepwise approach is as follows. The input to the system is given as an attribute-relation file format (ARFF) file. A table is created in Oracle using the name specified in @relation”. The attributes specified under @attribute” and instances specified under @data” are retrieved from the ARFF file and then they are added to the created table. This procedure is followed for providing the training set as well as test set.

the proposed steganalyser described in Section 3 are performed to generate the statistical data and classifications required for the evaluation of the test goals. 4.2.1. Pre-processing of the audio data Wavelet de-noising is applied to the given audio signal x(n), to get its de-noised version ~ xðnÞ. The signals x(n) and ~ xðnÞ are divided into segments with pre-defined window size 1024, without over~ pm at different levlapping option. The wavelet coefficients C pm and C els like p, p + 1, p + 2, p + 3, p + 4 for the segment m are calculated. 4.2.2. Feature extraction form the signal In this step, the feature vectors are computed from the audio sample. Applying the proposed methodology as per the algorithm in Listing 1, the feature vector F was constructed with the 25 higher-order statistics extracted from the Hausdorff distance metric. 4.2.3. Post-processing of the resulting feature vectors Before proceeding to evaluate the performance of the classifier, discrimination capability of the proposed features is to be analyzed. The experiment involves breaking of different steganographic or watermarking strategies, which may adapt extremely different techniques for embedding ranging from LSB substitution to embedding inside the wavelet co-efficient. After the feature vectors are computed each is identified as belonging to original or marked file and the complete vector field is normalized as per Eq. (11), before feeding into the classifier for training to achieve a uniform semantics to the feature values. 4.2.4. Analysis For assessing the performance of the proposed steganalyser, the following measures are calculated. Let us denote the set containing the clean and the stego audio files as two disjoint sets:

actual stegoþ ¼ fall the stego filesg actual stego ¼ fall the clean filesg

ð12Þ

The classification produced by the decision trees under evaluation is put under two disjoint sets again, denoted as

predicted stegoþ ¼ fall the files that the system rates as stegog predicted stego ¼ fall the files that the system rates as cleang ð13Þ Taken together, the partitions in Eqs. (12) and (13) produce a fourcell partition of the original collection of N items, where N denotes the total number of audio samples, N = 3600 in training and N = 1200 in testing, in this case. Based on these values, the measures of performance evaluation used are given as

predicted stegoþ predicted stegoþ þ actual stego predicted stego FP Rate ¼ predicted stego þ actual stego predicted stegoþ Precision ¼ predicted stegoþ þ predicted stego predicted stegoþ Recall ¼ TPRate ¼ predicted stegoþ þ actual stego Precision  Recall F-measure ¼ 2  Precsion þ Recall

TP Rate ¼

ð14Þ ð15Þ ð16Þ ð17Þ ð18Þ

4.2. Test procedure

4.3. Results and discussion

As an initial step all required test files are generated as described in Section 4 subsection 2. After this step, the four steps of

Some of the sample rules that evolved in the system during the training stage of the DTC are shown.

S. Geetha et al. / Expert Systems with Applications 37 (2010) 7469–7482

Sample rule for identifying spread spectrum steganography technique. If variance at level5 > 0.007772 and mean at level4 <= 35.781 and skew at level1 <= 0.000277 and mean at level5 <= 8.1233 If skew at level5 <= 0.00061 Then STEGO Else If skew at level5 > 0.00061 and moment at level1 <= 7.6072 If skew at level4 <= 0.000447 Then STEGO Else If skew at level4 > 0.000447 If mean at level1 <= 154.04 Then PURE Else If mean at level1 > 154.04 Then STEGO Else If skew at level5 > 0.00061 and moment at level1 > 7.6072 Then PURE If variance at level5 > 0.007772 and mean at level4 <= 35.781 and skew at level1 <= 0.000277 and mean at level5 > 8.1233 Then PURE

Sample rule for identifying S-Tools steganography technique. If mean at level5 <= 7.0776 and moment at level4 <= 8.547 If variance at level1 <= 0.002388 Then STEGO Else If variance at level1 > 0.002388 If mean at level1 <= 46.711 If mean_level1<=43.041 Then PURE Else STEGO Else PURE Else If mean at level5 > 7.0776 and moment at level4 > 8.547 Then STEGO

Sample rule for identifying DWT Fusion steganography technique. If skew at level1 <= 0.000277 If variance at level5 <= 0.007205 If mean at level5 <= 7.8343 Then STEGO Else If kurtosis at level5 <= 20.557 Then PURE Else STEGO Else If skew at level4 <= 0.000028 Then STEGO Else If skew at level1 > 0.000277 Then PURE

Else PURE If kurtosis at level1 > 2.195 and mean at level2 <= 32.098 If mean at level1 <= 54.486 Then STEGO Else PURE If kurtosis at level1 > 2.195 and mean at level2 >32.098 Then STEGO Else If moment at level4 > 3.375 and mean at level4 > 20.493 If kurtosis at level1 <= 1.283 Then STEGO Else PURE

Sample rule for identifying Steghide watermarking technique. If skew at level1 <= 0.000277 If variance at level5 <= 0.007205 and mean at level5 <= 7.8343 Then STEGO Else PURE Else If skew at level2 <= 0.047442 If variance at level3 <= 0.026054 If variance at level2 <= 0.016622 Then PURE Else STEGO Else PURE Else If kurtosis at level5 <= 2.1592 Then PURE Else STEGO

Sample rule for identifying amplitude modulation watermarking technique. If mean at level1 <= 213.25 If moment at level3 <= 4.4781 Then PURE Else If kurtosis at level4 <= 1.161 Then STEGO Else If moment at level1 <= 12.295 If kurtosis at level5 <= 2.0874 If skew at level4 <= 0.000654 Then STEG Else PURE Else PURE Else STEGO Else STEGO

Some of the simple rules that evolved. Sample rule technique.

for

identifying

Echo

hiding

watermarking

If moment at level4 > 3.375 and mean at level4 <= 20.493 and kurtosis at level1 <= 2.195 If moment at level1 <= 6.8982 Then STEGO

7477

(1)If variance at level5 <= 0.007772 and Variance at level1<= 0.00114 and mean at level1 <= 97.158 Then STEGO (2)If variance at level5 <= 0.007772 and Variance at level1<= 0.00114 and mean at level1 > 97.158 Then PURE

7478

S. Geetha et al. / Expert Systems with Applications 37 (2010) 7469–7482

The classification and error rates obtained by using different values are listed in Table 1–7. From the results we observe that S-Tools (scheme #2), with an average PD rate of 92.12%, is highly vulnerable and easily breakable by the proposed system. Spread spectrum based data hiding (scheme# 4) is yielding more false

alarm rate of 37.67% proving it to be challenging system to break. Anyway an average PD rate of 90.97% is a promising measure for a universal steganalyser. We also find that the J48 DTC, with an average PD rate of 91.4%, is collectively giving better results for different strategies of watermarking and steganography. Decision stump

Table 1 Performance evaluation of the proposed system for the six embedding techniques using J48 decision tree classifier. Embedding techniques

Steghide S-tools DWT fusion Spread spectrum Echo data hiding Amplitude modulation

Proposed Hausdorff distance based steganalysis using J48 decision tree classifier Accuracy

TP rate

FP rate

Precision

Recall

F-measure

88 92.12 85 84 90 87.25

0.8933 0.9767 0.8450 0.8167 0.9050 0.8817

0.2759 0.0848 0.3496 0.3767 0.2436 0.2958

0.8933 0.9767 0.8450 0.8167 0.9050 0.8817

0.9437 0.9228 0.9494 0.9646 0.9594 0.9446

0.9178 0.9490 0.8942 0.8845 0.9314 0.9121

Table 2 Performance evaluation of the proposed system for steghide scheme using the six decision tree classifiers. Classifiers

Alternating Decision tree (AD Tree) Decision stump J48 Logical model tree (LMT) Naïve Bayes Fast decision tree learner (REP tree)

Spatial domain steganography method-Steghide Detection percentage

TP rate

FP rate

Precision

Recall

F-measure

74.8750 67.8750 88.0000 81.8750 83.0000 70.2500

0.8859 0.8507 0.9437 0.9221 0.9312 0.8635

0.5018 0.5916 0.2759 0.3946 0.3779 0.5629

0.7633 0.6933 0.8933 0.8283 0.8350 0.7167

0.8859 0.8507 0.9437 0.9221 0.9312 0.8635

0.8201 0.7640 0.9178 0.8727 0.8805 0.7832

Table 3 Performance evaluation of the proposed system for s-tools scheme using the six decision tree classifiers. Classifiers

Alternating decision tree (AD Tree) Decision stump J48 Logical model tree (LMT) Naïve Bayes Fast decision tree learner (REP Tree)

Spatial domain steganography method-S-Tools Detection percentage

TP rate

FP rate

Precision

Recall

F-measure

81.2500 70.8750 92.1250 91.7500 91.6250 90.1250

0.9294 0.8847 0.9228 0.9525 0.9449 0.9643

0.4094 0.5511 0.0848 0.1810 0.1692 0.2469

0.8117 0.7033 0.9767 0.9367 0.9433 0.9017

0.9294 0.8847 0.9228 0.9525 0.9449 0.9643

0.8665 0.7837 0.9490 0.9445 0.9441 0.9320

Table 4 Performance evaluation of the proposed system for DWT-fusion scheme using the six decision tree classifiers. Classifiers

Alternating Decision tree (AD Tree) Decision stump J48 Logical model tree (LMT) Naïve Bayes Fast decision tree learner (REP tree)

Transform domain steganography method-DWT fusion Detection percentage

TP rate

FP rate

Precision

Recall

F-measure

77.8750 64.1250 85.0000 73.3750 77.3750 82.7500

0.9238 0.8410 0.9494 0.8925 0.9116 0.9342

0.4618 0.6276 0.3496 0.5212 0.4674 0.3843

0.7683 0.6433 0.8450 0.7333 0.7733 0.8283

0.9238 0.8410 0.9494 0.8925 0.9116 0.9342

0.8389 0.7290 0.8942 0.8051 0.8368 0.8781

Table 5 Performance evaluation of the proposed system for spread spectrum scheme using the six decision tree classifiers. Classifiers

Alternating decision tree (AD tree) Decision stump J48 Logical model tree (LMT) Naïve Bayes Fast decision tree learner (REP Tree)

Transform domain watermarking method – spread spectrum Detection percentage

TP rate

FP rate

Precision

Recall

F-measure

75.3750 76.6250 84.0000 79.8750 80.3750 80.1250

0.9022 0.9073 0.9646 0.9229 0.9251 0.9232

0.4950 0.4778 0.3767 0.4306 0.4229 0.4265

0.7533 0.7667 0.8167 0.7983 0.8033 0.8017

0.9022 0.9073 0.9646 0.9229 0.9251 0.9232

0.8211 0.8311 0.8845 0.8561 0.8599 0.8582

7479

S. Geetha et al. / Expert Systems with Applications 37 (2010) 7469–7482 Table 6 Performance evaluation of the proposed system for echo data hiding scheme using the six decision tree classifiers. Classifiers

Spatial domain watermarking method-Echo data hiding

Alternating decision tree (AD Tree) Decision stump J48 Logical model tree (LMT) Naïve Bayes Fast decision tree learner (REP tree)

Detection percentage

TP Rate

FP Rate

Precision

Recall

F-measure

79.6250 68.6250 85.0292 77.2500 81.1250 83.3750

0.9210 0.8674 0.9680 0.8129 0.9293 0.9397

0.4342 0.5785 0.3767 0.4318 0.4116 0.3755

0.7967 0.6867 0.8321 0.9050 0.8100 0.8317

0.9210 0.8674 0.9680 0.8129 0.9293 0.9397

0.8543 0.7665 0.8949 0.8565 0.8655 0.8824

Table 7 Performance evaluation of the proposed system for amplitude modulation scheme using the six decision tree classifiers. Classifiers

Spatial domain watermarking method – amplitude modulation

Alternating decision tree (AD Tree) Decision stump J48 Logical model tree (LMT) Naïve Bayes Fast decision tree learner (REP tree)

Detection percentage

TP rate

FP rate

Precision

Recall

F-measure

77.8750 68.3750 87.2500 86.2468 84.8750 82.3750

0.9139 0.8668 0.9446 0.9124 0.9331 0.9338

0.4602 0.5810 0.2958 0.2754 0.3401 0.3911

0.7783 0.6833 0.8817 0.9014 0.8600 0.8233

0.9139 0.8668 0.9446 0.9124 0.9331 0.9338

0.8407 0.7642 0.9121 0.9069 0.8951 0.8751

Table 8 Classification and error rates for test sets at various embedding rates by j48 classifier. Schemes

#1 #2 #3 #4 #5 #6

Classification rate

Error rate

5% of maximum payload (%)

10% of maximum payload (%)

20% of maximum payload (%)

5% of maximum payload (%)

10% of maximum payload (%)

20% of maximum payload (%)

82.3 89.6 75.5 74.2 85.4 84

83.2 92.3 77.3 78.6 88.4 85.7

89.4 93.6 80.6 81.2 92.8 89.5

17.70 10.40 24.50 25.80 14.60 16.00

16.80 7.70 22.70 21.40 11.60 14.30

10.60 6.40 19.40 18.80 7.20 10.50

model, with an average PD rate of 71.7%, is performing relatively poorer in this environment. This is due to the learning procedure employed by this method, which exploits only one attribute for classification. We are interested in analyzing the detectability of proposed features and classifier against embedding schemes of different applications or principles. Table 9 lists classification and error rates to see differentiation in performances between: (1) six targeted embedding schemes; (2) steganographic or watermarking applications; (3) spatial or DCT operation domain; and (4) types of processed non-stego audio files. We also analyzed the ND rates for the original, re-sampled, equalized, and compressed non-stego audio files. It is found that our system has a better performance in recognizing the plainness of compressed audio files. The higher ND rate for compressed files is beneficial to real applications, since most audio files will be compressed in the MP3 form. 4.4. Influence of embedding rate In this experiment, the audio files at various payload capacities were selected to see the influence on detectability. The ERs for the six embedding schemes were tried at 5%, 10% and 20% of the maximum hiding capacities in their proposed versions. The experimental results are listed in Table 8, which depicts that the average PD rate still remains above 87.85% for 20%, 84.25% for 10% and 81.83% for 5% of maximum payload capacity. The results for steganographic schemes (an average PD rate of 91.4%) are more promising than for the watermarking schemes (an average PD rate of 90.56%), as the steganographic schemes carry more hidden data than those

of watermarking schemes, which makes the measured features more distinguishable for detection. The results reveal that clearly, our proposed features and decision tree classifiers still yield reasonable results for stego images of less ER. The detection rate comparison of six stego schemes by the six decision tree classifiers is shown in Fig. 3. Also, the comparison based on the application and operational domain of the information hiding algorithms is shown in Fig. 4. 5. Summary and conclusions In this paper, we presented a new class of decision tree detectors based on the higher order statistics of Hausdorff distance measure. These features have the ability to explicitly model the local audio discontinuities introduced by the information embedding process. The derived detectors were demonstrated to show notable improvement over their counterparts derived from the state-ofthe-art audio models. Also, the J48 test detector is demonstrated to achieve better performance than the other decision tree classifier. Table 10 summarizes and compares characteristics of our proposed method with those of several other previous significant works in the literature. It could be observed that the proposed system is superior in important aspects like average ND rate, lowest detectable payload capacity than the other systems. This justifies our claim that the proposed features and machine learning paradigm is a promising strategy to be applied on steganalysis. The following observations and conclusions are drawn from the results reported:

7480

S. Geetha et al. / Expert Systems with Applications 37 (2010) 7469–7482

Table 9 Average PD/ND rates for performance differentiation between different target schemes, different applications, different operation domains, and different types of non-stego images. Differentiation categories

Detection rate (%)

Schemes

#1 #2 #3 #4 #5 #6 Steganography watermarking Spatial Transform

Applications Operation domain Differentiation categories Signal processing techniques

89.95 93.31 90.88 89.29 90.64 91.75 91.38 90.56 91.3 91.0 ND rate (%) 80.2 76.5 52.6 62.4

Compression Low pass filtering Equalizing Resampling

(1) It seems that the classification performance is necessarily proportional to the proposed set of features extracted from Hausdorff distance metric, heavily dependent on the ER and number of embedding schemes under tests.

(2) Similar to (Farid, 2002; Özer et al., 2006) our system is operated blindly and not restricted to the detection of a particular steganographic scheme (such as LSB or spread spectrum or echo hiding). (3) A nonlinear classifier that is easy to adapt to non-separable classes is adopted in both (Zeng et al., 2008) and our system. However, the system introduced in Zeng et al. (2008) was only dedicated to the detection of echo hiding and few test files were used. (4) Our training and test database collects a larger number of stego and non-stego audio samples, that were generated by using different steganographic schemes (six kinds), different embedding rates (5–20% maximum payload), and different signal processing (low-pass filtering, resampling, compression etc.). This diversity makes our system more approximate to real applications. (5) The average classification rate (91.4%) for our proposed system is superior to many existing systems in blind steganalysis research. (6) Decision tree classifiers are candidate and promising machine learning paradigm that can be employed for steganalysis domain. (7) J48 classifiers outperform other decision trees for the proposed steganalyser model.

Fig. 3. Detection rate comparison of the six stego schemes.

Detection Rate under various strategies 100 90

Detection Rate

80 70 60 50 40

TP Rate

30

FP Rate

20 10 0 Steganography

Watermarking

Applications

Spatial

Transform

Operational Domain

Fig. 4. Detection rate comparisons for the steganographic and water marking systems.

7481

S. Geetha et al. / Expert Systems with Applications 37 (2010) 7469–7482 Table 10 Summarization of previous works and our proposed system. Steganalytic systems

Fu et al. (2007)

Ru et al. (2005)

Özer et al. (2006)

Zeng et al., 2008

Avcibas, 2006

Proposed system

Number of features Domains of feature extraction Training/classifier

36 Transform

20 Spatial

2 Transform

Yes/SVM

Yes/Naïve Bayes

6 Perceptual nonperceptual Yes/linear

25 Transform

Yes/radial basis function and PCA Arbitrary

19 Perceptual nonperceptual Yes/linear

Steghide

Arbitrary

Echo data hiding

Arbitrary

Yes/decision tree classifier Arbitrary

3 0.016 bpp 250 150 90% 81.67% No

1 >0.01 bpp 500 2500 97% 89% No

6 >0.05 bpp 50 50 88.6% 79.3% No

1 0.65 bpp 600 600 80% 72.6% Embedding parameters such as attenuation coefficient and echo delay

6 >0.05 bpp 50 50 91% 82.5% No

6 >0.01 bpp 3600 1200 91.52% 91.3% No

Targeted embedding scheme Number of test schemes Payload of stego images Size of training database Number of test images Average PD rate Average ND rate Side information constraint for classifier

(8) Spatial domain based stego schemes are relatively more vulnerable than transform domain based techniques like steganographic schemes are more vulnerable than watermarking schemes. To make our system more practical, future work could include the following. (a) Identifying the type of steganographic algorithm utilized to generate the stego audio and locating the audio portions exploited to hide secret messages (active steganalysis). After these, we may be able to locate, retrieve, and analyze the embedded messages to infer the conveyed information. (b) Improving the performance as well as the scalability of the blind steganalyzer using appropriate fusion techniques. (c) Identifying still more promising features that are purportedly sensitive to the data hiding operations.

Acknowledgements The authors would like to thank Prof. S.V. Raghavan, IIT Madras for his highly valuable comments and suggestions on this work. Also many thanks to the Management, the Principal and the Head of the Department of Information Technology, Thiagarajar College of Engineering, Madurai, India for the computing facilities and for the moral support they provided in carrying out this work.

References Aha, D. W., Kibler, D., & Albert, M. K. (1991). Instance based learning algorithms. Machine Learning, 6(1), 37–66. Audio wave files (2006). Wave files from http://www.wavsurfer.com. Avcibas, I. (2006). Audio steganalysis with content independent measures. IEEE Signal Processing Letters, 13(2), 92–95. Avcibas, I., Kharrazi, M., Memon, N., & Sankur, B. (2005). Image steganalysis with binary similarity measures. EURASIP Journal on Applied Signal Processing, Hindawi, 1, 2749–2757. Avcibas, I., Memon, N., & Sankur, B. (2003). Steganalysis using image quality metrics. IEEE Transactions on Image Processing, 12(2), 221–229. Bender, W., Gruhl, D., Morimoto, N., & Lu, A. (1996). Techniques for data hiding. IBM Systems Journal, 35(3–4), 313–336. Brown, A. (2008). S-tools version 4.0. Available from http://members.tripod.com/ steganography/stego/s-tools4.html. Burrus, C. S., Gopinath, R. A., & Guo, H. (1998). Introduction to wavelets and wavelets transforms: a primer. Upper Saddle River, NJ, USA: Prentice Hall. Cachin, C. (1998). An information theoretic model for steganography. In David Aucsmith (Ed.), Information hiding, second international workshop, lecture notes in computer science (Vol. 1525, pp. 306–318.11). Springer: Portland, Oregon, USA.

Chen, T. S., Chang, C. C., & Hwang, M. S. (1998). A virtual image cryptosystem based upon vector quantization. IEEE Transactions on Image Processing, 7(10), 1485–1488. Chisß, M. (2008). Evolutionary decision trees and software metrics for module defects identification. International Journal of Computer and Information Science, 2, 273–277. Cox, I., Kilian, J., Leighton, T., & Shamoon, T. (1997). Secure spread spectrum watermarking for multimedia. IEEE Transactions on Image Processing, 6(12), 1673–1687. Dai, M., Liu, Y., & Lin, J. (2008). Steganalysis based on feature reducts of rough set by using genetic algorithm. In Proceedings of the seventh world congress of intelligent control and automation, WCICA ’08, IEEE (vol. 23, pp. 6764–6768). Chongqing, China, 2008, . Donoho, D. L., & Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet shrinkage. Journal of the American Statistical Association, 90(432), 1200–1224. Farid H. (2002). Detecting hidden messages using higher-order statistical models. In Proceedings of the international conference on image processing, ICIP 2002, IEEE (pp. 905–908). Rochester, New York, USA. Freund, Y., & Mason, L. (1999). The alternating decision tree algorithm. In Ivan Bratko & Saso Dzeroski (Eds.). Proceedings of the sixteenth international conference on machine learning proceedings machine learning, ICML 1999 (pp. 124–133). Bled, Slovenia: Morgan Kaufmann. Fridrich, J. (2004). Feature based steganalysis for JPEG images and its implications for future design of steganographic schemes. In Jessica J. Fridrich (Ed.). Information hiding, 6th international workshop, IH 2004, lecture notes in computer science (Vol. 3200, pp. 67–81). Toronto, Canada: Springer. Fridrich, J., Goljan, M., & Hogea, D. (2002). Attacking the outguess. In Agnes Hui Chan & Virgil D. Gligor (Eds.). Fifth international conference on information security, ICIS 2002, lecture notes in computer science (Vol. 2433, pp. 3–6). Springer. Fridrich, J., Goljan, M., & Hogea, D. (2003). New methodology for breaking steganographic techniques for JPEGs. In E. Delp et al. (Eds.). Security and watermarking of multimedia contents V, SPIE 2003, electronic imaging science and technology (Vol. 5020, pp. 143–155). California, USA: Society of Photo-optical Instrumentation Engineers, Santa Clara. Fu, J., Qi, Y., & Yuan, J. (2007). Wavelet domain audio steganalysis based on statistical moments and PCA. In: Proceedings of the international conference on wavelet analysis and pattern recognition, ICWAPR ’07, IEEE (pp. 619–1623). Beijing, China. Garcia-Almanza, A. L., & Tsang, E. P. K. (2008). Evolving decision rules to predict investment opportunities. International Journal of Automation and Computing, 5(1), 22–31. Geetha, S., Sivatha Sindhu, S. S., & Kamaraj, N. (2008). AQM-based audio steganalysis and its realization through rule induction with a genetic learner. International Journal of Signal and Imaging Systems Engineering, 1(2), 99–107. Gopalan, K. (2004). Cepstral domain modification of audio signals for data embedding: preliminary results. In Edward J. Delp, & Ping Wah Wong (Eds.), Security, steganography, and watermarking of multimedia contents VI, SPIE 2004, electronic imaging science and technology (Vol. 5306 pp. 151–161). Society of Photo-optical Instrumentation Engineers: San Jose, California, USA. Hetzl, S. (2003). Steghide. Available from http://steghide.sourceforge.net. Huttenlocher, D. P., Klanderman, G. A., & Rucklidge, W. J. (1993). Comparing images using the Hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(9), 850–863. Joachim, J. E., Bauml, R., & Girod B. (2002). A communications approach to image steganography. In SPIE proc. of security and watermarking of multimedia contents (pp. 26–37). Johnson, M. K., Lyu, S., & Farid, H. (2005). Steganalysis of recorded speech. In J. Fridrich et al. (Eds.). Security and watermarking of multimedia contents VII, SPIE 2005, electronic imaging science and technology (Vol. 86, pp. 664–672). CA, USA: Society of Photo-Optical Instrumentation Engineers, San Jose.

7482

S. Geetha et al. / Expert Systems with Applications 37 (2010) 7469–7482

Lee Y. K., & Chen, L. H. (2000). High capacity image steganographic model. In Proceedings of the international conference on image and signal processing, ICISP 2000, IEE (Vol. 147, no. 3, pp. 288–294). Agadir, Morocco. Kaliappan, G., Stanley, W. J., Scott, A. F., & Darren, H. M. (2003). Audio steganography by amplitude or phase modification. In Ping Wah Wong & Edward J. Delp (Eds.). Security and watermarking of multimedia contents V, SPIE 2003, electronic imaging science and technology (Vol. 5020, pp. 67–76). California, USA: Society of Photo-optical Instrumentation Engineers, Santa Clara. Kanal, L. N. (1979). Problem-solving methods and search strategies for pattern recognition. IEEE Transactions on Pattern Analysis Machine Intelligence, 1, 193–201. Katzenbeisser, S., & Petitcolas, F. A. P. (2000). Information hiding techniques for steganography and digital watermarking. Artech House: Norwood, MA. Kraetzer, C., Dittmann, J., & Lang, A. (2006). Transparency benchmarking on audio watermarks and steganography. In Edward J. Delp, & Ping Wah Wong (Eds.), Security, steganography, and watermarking of multimedia contents VIII, SPIE 2006, Electronic Imaging Science and Technology (Vol. 6072, pp. 24–36). Society of Photo-optical Instrumentation Engineers, San Jose, California, USA. Landeweerd, G., Timmers, T., Gelsema, E., Bins, M., & Halic, M. (1983). Binary tree versus single level tree classification of while blood cells. Pattern Recognition, 16, 571–577. Li, X., & Dubes, R. C. (1986). Tree classifier design with a permutation statistic. Pattern Recognition, 19, 229–235. Lyu, S., & Farid, H. (2006). Steganalysis using higher-order image statistics. IEEE Transactions on Information Forensics and Security, 1(1), 111–119. Marvel, L. M., Boncelet, C. G., Jr., & Retter, C. T. (1999). Spread spectrum image steganography. IEEE Transactions on Image Processing, 8(8), 1075–1083. Miche, Y., Roue, B., Lendasse, A., & Bas, P. (2006). A feature selection methodology for steganalysis. In Proceedings of the international workshop on multimedia content representation, classification and security (pp. 49–56). Berlin, Heidelberg, Istanbul, Turkey: Springer. Minn, F. (2007). A novel intrusion detection method based on combining ensemble learning with induction-enhanced particle swarm algorithm. In Proceedings of the international conference on natural computation, ICNC ‘07, IEEE (pp. 520–524). Haikou, Hainan, China. Mui, J., & Fu, K. S. (1980). Automated classification of nucleated blood cells using a binary tree classifier. IEEE Transactions on Pattern Analysis Machine Intelligence, 2, 429–443. Nikolaidis, N., & Pitas, I. (1998). Robust image watermarking in the spatial domain. Signal Processing, 66, 385–403. Nung, L. W., & Shiang, L. G. (2005). A feature based classification technique for blind image steganalysis. IEEE Transactions on Multimedia, 7(6), 1007–1020. Oliver, J. J., & Hand, D. (1994). Averaging over decision stumps. In Francesco Bergadano & Luc De Raedt (Eds.). Proceedings of the European conference on machine learning, ECML-’94, lecture notes in computer science (Vol. 784, pp. 231–241). Springer. Özer, H., Sankur, B., Memon, N., & Avcıbas, I. (2003). Steganalysis of audio based on audio quality metrics. In Ping Wah Wong & Edward J. Delp (Eds.). Security and watermarking of multimedia contents V, SPIE 2003, electronic imaging science and technology (Vol. 5020, pp. 55–66). California, USA: Society of Photo-optical Instrumentation Engineers, Santa Clara.

Özer, H., Sankur, B., Memon, N., & Avcıbas, I. (2006). Detection of audio covert channels using statistical footprints of hidden messages. Elsevier Digital Signal Processing, 16, 389–401. Penvy, T., & Fridrich, J. (2008). Multi-class detector of current steganographic methods for JPEG format. IEEE Transactions on Information Forensics and Security, 3(4), 635–650. Petitcolas, F. A. P., Anderson, R. J., & Kuhn, M. G. (1999). Information hiding – A survey. IEEE Proceedings – Protection of Multimedia Content, 87(7), 1062–1078. Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann Publishers. Rasoul Safavian, S., & Landgrebe, David (1991). A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics, 21(3), 660–674. Ru, X., Zhang, H., & Huang, X. (2005). Steganalysis of audio: attacking the StegHide. In S. Daniel Yeung, Zhi-Qiang Liu, Xizhao Wang, & Hong Yan (Eds.), Proceedings of the international conference on machine learning and cybernetics, ICMLC ’05, IEEE (pp. 3937–3942). Guangzhou, China. Savoldi, A., & Gubian, P. (2007). Blind multi-class steganalysis system using wavelet statistics. In Proceedings of the third international conference on intelligent information hiding and multimedia signal processing, IIHMSP ’07, IEEE computer society (pp. 27–29). Kaohsiung, Taiwan. Stein, G., Chen, B., Wu, A. S., & Hua, K. A. (2005). Decision tree classifier for network intrusion detection with GA-based feature selection. In Mário Guimarães (Ed.). Proceedings of the 43rd annual southeast regional conference (Vol. 2, pp. 136–141). Kennesaw, Georgia, Alabama, USA: ACM. Taswell, C. (1993–2001). WAVBOX software library and reference manual. Computational toolsmiths from www.wavbox.com. Tolba, M. F., Ghonemy, M. A. -S., Taha I. A. -H., & Khalifa A. S. (2004). High capacity image steganography using wavelet-based fusion. In: Proceedings of the ninth IEEE symposium on computers and communications, ISCC’04, IEEE computer society (pp. 430–435). Alexandria, Egypt. Vafaie, H., & DeJong, K. (1994). Improving a rule induction system using genetic algorithms in machine learning: A multistrategy approach (Vol. 4). San Mateo, CA.: Morgan Kaufmann. pp. 453–469. Veltkamp, R. C., & Hagedoorn M. (1999). State-of-the-art in shape matching, technical report UU-CS-1999-27, Utrecht University, Netherlands 1999. Wang, Y., & Moulin, P. (2007). Optimized feature extraction for learning based image steganalysis. IEEE Transactions on Information Forensics and Security, 2(1), 31–45. Xu, B., Zhang, Z., Wang, J., & Liu, X. (2007). Improved BSS based schemes for active steganalysis. In Feng Wenying, Feng Gao (Eds.), Proceedings of the ACIS conference on software engineering, artificial intelligence, networking and parallel distributed computing, SNPD 2007, IEEE computer society (pp. 815–818), Qingdao, China. Zeng, W., Ai, H., Hu, R., & Gao, S. (2008). An algorithm of echo steganalysis based on Bayes classifier. In Proceedings of the third international conference on information and automation, ICIA ’08, IEEE computer society (pp. 1667–1670). Zhangjiajie, Hunan, China. Zhi, L., Fen, S. A., & Xian, Y. Y. (2003). A LSB steganography detection algorithm. In Proceedings of the fourteenth international conference on personal, indoor and mobile radio communication, PIMRC 2003, IEEE signal processing (pp. 2780–2783). Beijing, China.