Steganography for MP3 audio by exploiting the rule of window switching

Steganography for MP3 audio by exploiting the rule of window switching

c o m p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 7 0 4 e7 1 6 Available online at www.sciencedirect.com journal homepage: www.elsevier.com/locate...

639KB Sizes 0 Downloads 24 Views

c o m p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 7 0 4 e7 1 6

Available online at www.sciencedirect.com

journal homepage: www.elsevier.com/locate/cose

Steganography for MP3 audio by exploiting the rule of window switching Diqun Yan*, Rangding Wang, Xianmin Yu, Jie Zhu College of Information Science and Engineering, Ningbo University, Ningbo, 315211 Zhejiang, PR China

article info

abstract

Article history:

MP3 audio is a promising carrier format for covert communication because of its

Received 26 April 2011

popularization. In this paper, we propose an MP3 steganographic method by exploiting the

Received in revised form

rule of window switching during encoding. The method carries out embedding by

27 March 2012

establishing a mapping relationship between the secret bit and the encoding parameter,

Accepted 23 April 2012

namely window type. The proposed algorithm is fully compliant with MP3 compression standard and the distortion caused by steganography can be controlled automatically by

Keywords:

the distortion adjustment mechanism of the encoder. Experimental results demonstrate

Steganography

that the proposed method introduces insignificant perceptual distortion and is statistically

Window switching

undetectable for the attack of block size analysis. ª 2012 Elsevier Ltd. All rights reserved.

MP3 Encoding parameters Undetectability

1.

Introduction

The purpose of steganography (Provos and Honeyman, 2003) is to send secret messages by hiding data into innocuous cover objects. Digital media, such as image, audio, video and text, are often used as the steganographic carriers. MP3 (MPEG, 1992), as a standard for transmission and storage of compressed audio, is a promising carrier format for steganography. First, MP3 is the most popular and widely used audio file format. When audios in MP3 format are taken as cover signals, the stego-audios will be less likely to be noticed by steganalyzers than other audio formats. Also, it is a challenge for steganalyzers to distinguish whether the distortion is caused by stego operation or by MP3 encoding since MP3 is a lossy compression algorithm. Some data hiding methods for MP3 audios have been proposed (Megias et al., 2003; Kim et al., 2004; Koukopoulos and Stamatiou, 2006; Kwon et al., 2011). Most of them are originally

designed for copyright protection, and the hiding capacity is low. There have been a few stego-tools for MP3 audio, such as MP3Stego (Petitcolas, 2002), UnderMP3Cover (Platt, 2004) and MP3Stegz (Achmad, 2008). Among them, MP3Stego, proposed by Petitcolas, is a typical one to embed secret message into MP3 audio. The embedding takes place at the inner loop during quantization and secret message is embedded by changing the end condition of the inner loop. MP3Stego keeps the audio quality well and can be applied both to covert communication and copyright protection. However, there is still room to improve its undetectability which is an important property of a steganographic system. Westfeld (2003) pointed out that, by analyzing the variance of block size (the number of bits for one granule), it is able to detect whether the testing MP3 audio has been treated with MP3Stego or not. Similarly, Dittmann and Hesse (2004) considered that MP3Stego will produce more various values for block size. Qiao et al. (2009) found that the embedding of MP3Stego will affect the continuity of the

* Corresponding author. Tel.: þ86 574 87600352. E-mail addresses: [email protected], [email protected] (D. Yan), [email protected] (R. Wang), [email protected] (X. Yu), [email protected] (J. Zhu). 0167-4048/$ e see front matter ª 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.cose.2012.04.006

c o m p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 7 0 4 e7 1 6

coefficient distribution in adjacent frames. Another drawback of MP3Stego is that, at a low compression ratio, an endless loop will happen during embedding once the terminating conditions are not suitable enough. To solve this problem, an improved steganographic method has been proposed in our early work (Yan et al., 2009). The main idea is that the parity of quantization step rather than the parity of block size in MP3Stego is used to embed secret message. In this paper, we propose a novel steganographic algorithm for MP3 audio based on window switching strategy which is a technique in MP3 compression standard to control the preecho distortion. By establishing a relationship between the type of window and the parity of secret bit, secret message is embedded into MP3 audio. The proposed algorithm can be fully compliant with MP3 compression standard, which means that the stego-audios can be decoded correctly with ordinary MP3 players. The extraction of the hidden message can be completed only by parsing the side information without fully decoding. The experimental results show that the proposed algorithm can improve not only the undetectability of the stegomessage but also the perceptual quality of the stego-audios. The rest of the paper is organized as follows. Section 2 gives a brief review of MP3 compression standard and discusses the potential positions for embedding secret message. The embedding and extracting procedures of the proposed steganographic algorithm is given in Section 3. In Section 4, experimental results about the capacity, imperceptibility and undetectability are presented. Finally, conclusions are drawn in Section 5.

2.

Considerations on embedding position

2.1.

Review of MP3 compression standard

The MP3 standard (MPEG, 1992) comprises a flexible hybrid coding technique that incorporates several methods including sub-band decomposition, filter bank analysis, transform coding, entropy coding, dynamic bit allocation and psychoacoustic analysis. Fig. 1 shows the MP3 encoder block diagram. The encoder operates on consecutive frames of audio data. Each frame consists of 1152 samples and one frame is further split into two granules with 576 samples each. A hybrid filter bank is applied to increase frequency resolution and thereby better approximate critical band behavior. Sophisticated bit allocation and quantization strategies that rely upon no-uniform quantization, analysis-by-synthesis, and entropy coding are introduced to allow reduced bit rates and improved perceptual quality.

705

The embedding of secret message can take place during different stages of MP3 encoding such as transform domain (Megias et al., 2003; Kwon et al., 2011), quantization (Petitcolas, 2002; Yan et al., 2009) and entropy coding (Kim et al., 2004; Yan and Wang, 2011). Since the quantization in MP3 standard is a lossy procedure, the integrity of the hidden message may be compromised if we carry out the embedding before quantization (e.g., time and transform domain). As for a steganographic system, high bit error rate during extraction is not allowed. Hence, most of the MP3 steganographic methods conducted the embedding after quantization. However, it should be noted that the mentioned lossy or lossless here, refers to the audio data being encoded rather than the encoder parameters. In fact, once an encoding parameter is generated, it will no longer be changed during the follow-up encoding operations. So the hidden secret message will not be affected with the encoding if the embedding can be realized by exploiting the encoder parameters. Imperceptibility is one of the most important properties of a steganography system, which refers to the degree of distortion introduced by secret message embedding and its effect on the human perception. In MP3 standard, by the distortion adjusting mechanism which is nested in quantization, the distortion caused by quantization can always be well controlled under the masking threshold determined by a psychoacoustic perceptual model. More details about the adjust mechanism can be found in ISO/IEC 11172-3 specification (MPEG, 1992). If we finish the embedding before quantization, the distortion introduced by the embedding will be a part of the quantization distortion. Since the distortion will be automatically adjusted to an acceptable level with the mechanism, the imperceptibility of the hidden message can also be well maintained. Based on the above considerations, window types, the encoding parameters of the hybrid filter banks which are prior to quantization, are adopted to embed the secret message in this paper.

2.2.

Window switching

To adapt the time and frequency resolution of hybrid filter banks and avoid pre-echo distortion, window switching strategy is used in MP3 standard. Once there is a transient in the input audio signal, the strategy starts to work as follows. During stationary segments, the encoder uses long windows (NORMAL) to keep the high frequency resolution. When a transient comes, a start window (START) is used to shift long windows into short windows (SHORT) until the transient is past. Once the transient is past, the encoder uses a stop window (STOP) to return back to long windows. Fig. 2 shows

Fig. 1 e Block diagram of MP3 encoder.

706

c o m p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 7 0 4 e7 1 6

a

b

c

d

Fig. 2 e Window types in MP3 compression standard (a) NORMAL, (b) START, (c) SHORT, (d) STOP.

the window types used in MP3 standard. The START window has a left side that overlaps with a NORMAL window and a right side that overlaps with a SHORT window, while the STOP window is the reverse. A criterion for selecting window type is derived from the value of the perceptual entropy (PE ) calculated by psychoacoustic model. When the value exceeds the default threshold (1800), the window type should be switched to SHORT. Fig. 3 shows the state machine of window switching strategy and the detailed description is given as follows:

3.1.

Embedding procedure

As described in Section 2, the window type for the current granule is determined by the value of perceptual entropy and the window type of the previous granule. In the embedding procedure, when the hiding bit is 1, SHORT type is selected for current granule regardless of the value of PE. Certainly, the window type of the previous granule should be updated according to the switching rule. Similarly, when the hiding bit is 0, it needs to select a type from NORMAL, START, and STOP according to the window type of the previous granule.

(1) If PE < 1800 and the window type of the previous granule is NORMAL or STOP, NORMAL is selected for the current granule. (2) If PE < 1800 and the window type of the previous granule is SHORT, select STOP. (3) If PE  1800, the choice for the current granule is SHORT. In this case, the window type of the previous granule needs to be updated. That is, normal type should be replaced by START and STOP should be replaced by SHORT.

3.

Proposed method

The proposed steganographic method is based on the window switching strategy described in Section 2. The main idea is to embed secret message into MP3 audio by establishing a relationship between the type of window and the parity of secret bit. In this section, the embedding and extracting procedures are given.

Fig. 3 e State machine for window switching strategy in MP3 standard.

707

c o m p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 7 0 4 e7 1 6

According to the switching rule, when the type for the current granule is SHORT, the type for the previous granule should be updated. When the type for the previous granule is NORMAL, the updated type for the previous granule should be START. In other words, the type for the previous granule is shifted from NORMAL to START. It notes that the previous granule may have been stegoed. In this case, any problem does not exist during extraction because both NORMAL and START represent the secret bit 0. However, when the type for the previous granule is STOP, the updated type should be SHORT. In this case, an error will occur during extraction because the SHORT type represents the secret bit 1 while the STOP type represents the secret bit 0. Fig. 4 illustrates this ambiguous case. In this example, for the fourth secret bit ‘0’, STOP type is selected according to the embedding rule and switching rule because the type for the previous granule is SHORT. For the next secret bit ‘1’, SHORT will be selected according to the embedding rule. Meanwhile, the type for the previous granule should be updated to SHORT according to the switching rule. Thus, the extractor will be confused when dealing with SHORT. In order to extract the secret message correctly, the above-mentioned embedding operation should be modified as follows. Firstly, it needs to set a flag to 1 when the current type is STOP after embedding. Then, for the next granule, if the flag is true, the last secret bit will be embedded once again until the window type is no longer a STOP one. Fig. 5 gives the solution to this example. The embedding procedure is integrated with MP3 encoding process, and the pseudo-code for embedding is shown in the Fig. 6. The detailed embedding operations are as follows.

re-embed

re-embed

Secret bits

Window type

0

0

1

normal normal short

0

stop

1

0

normal short

update

0

stop

0

normal normal

update

start

start

Embedding Extracting

Window type

Extracted bits

normal start

0

0

short

stop

1

skip

start

0

short

1

stop

skip

normal normal

0

where k means the concatenation. LP which is the length of P is encoded with 4 bytes. Step 3: Use the above-mentioned key to generate a seed for a pseudo random bit generator. The bits from the generator determine the embeddable granules.

(1)

where Encrypt and Compress denote the compression and encryption algorithm, separately. Step 2: LM denotes the length of M. Concatenate LM and M to form the total embedding payload P. That is

P ¼ LM k M

Secret bits

Window type

(2)

0

normal

0

normal

1

0

1

0

0

short

stop

short

stop

normal

update

update

start

short

Embedding Extracting

Window type Extracted bits

normal

0

start

short

0

1

short

1

short

1

stop

0

Fig. 4 e An example of an ambiguous case during embedding procedure.

0

Fig. 5 e An example illustration of solving the problem in Fig. 5.

Step 1: Compress secret message O to eliminate its redundancy before embedding and then encrypt it by a secret key k. That is

M ¼ EncryptðCompressðOÞ; kÞ

0

normal

0

Fig. 6 e Pseudo-code for embedding procedure.

708

c o m p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 7 0 4 e7 1 6

Table 1 e Comparison of MOS values. ER(%)

96 kbps

0 0.01 0.05 0.1

192 kbps

MP3Stego

Yan

Proposed

MP3Stego

Yan

Proposed

4.07 3.83 3.59 3.42

4.46 4.12 4.03 3.84

4.29 4.17 4.01 3.81

4.85 4.69 4.29 4.04

5.00 4.92 4.75 4.58

4.96 4.93 4.61 4.33

Step 4: For each embeddable granule, embed the secret bit b by switching the window type according to the embedding algorithm. When meet the STOP type, skip it and embed the last secret bit once again. Step 5: Repeat the Step 4 until all granules are processed. Finally, an MP3 stego-audio S is obtained.

3.2.

Extracting procedure

The extraction process of the proposed method is also integrated with MP3 decoding. The main steps are as shown as following: Step 1: Receive an MP3 stego-audio S~ and the secret key k. Step 2: With the key, the pseudo random generator produces the information about which granules just like the sender has done for selecting embedding granules. Step 3: Parse the side information and obtain the information of window type for each stegoed granule. Step 4: Extract the hidden bits according to the window types. It is noted that, due to the updating operation at the end of

window switching, the type variable of current granule actually holds the window type of the previous granule. Therefore, the extraction operation should be started from the second embeddable granule. Step 5: Get the length LP of the payload P from the first 4 bytes of the extracted bits. Obtain the secret message M by throwing the first 4 bytes from P. Step 6: Decrypt and decompress the extracted secret message M. Finally, the original message O can be restored. That is O ¼ DecompressðDecryptðM; kÞÞ

(3)

where Decompress and Decrypt denote the decompression and decryption algorithm, separately.

4.

Experimental results

In our experiments, six mono audios with different genres, namely blues, classical, country, folk, pop and jazz, are adopted as cover-audios. Each cover audio is about 180 s long, and sampled at 44.1 kHz with 16 bits resolution. Before embedding,

Table 2 e Comparison of SNR values (in dB). Audios

Blues

Classical

Country

Folk

Pop

Jazz

ER(%)

0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1

96 kbps

128 kbps

MP3Stego

Yan

Proposed

MP3Stego

Yan

Proposed

48.08 39.96 34.28 33.74 72.36 64.64 58.45 57.91 70.06 53.64 48.44 47.85 65.62 43.70 38.71 38.34 50.06 44.61 38.70 38.26 68.03 55.91 52.59 52.40

56.02 45.74 39.87 39.73 76.47 69.64 63.74 63.09 74.63 56.14 51.56 50.97 70.16 47.88 43.45 43.16 56.19 48.59 43.68 43.30 70.39 60.82 56.88 56.88

53.83 41.77 37.80 37.17 72.37 65.30 59.72 58.79 69.74 54.17 48.82 48.61 65.75 44.82 40.59 40.13 54.79 46.44 40.40 39.82 66.50 56.71 54.09 54.18

57.85 42.74 37.27 37.23 82.45 69.78 63.59 63.26 79.17 62.88 55.81 55.74 69.83 50.02 44.24 44.07 60.23 50.20 43.35 43.77 73.49 60.16 58.44 58.57

61.84 48.64 42.59 42.18 96.17 76.82 70.11 69.60 85.19 64.12 57.61 57.49 70.21 53.39 48.43 48.22 62.11 54.12 47.23 47.13 76.23 64.18 61.31 61.53

58.79 45.62 40.28 40.10 82.34 72.24 62.40 62.25 78.21 61.21 55.02 54.73 70.18 50.06 45.32 45.23 59.70 51.78 43.56 43.54 71.76 60.57 58.33 58.38

709

c o m p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 7 0 4 e7 1 6

Table 3 e Comparison of SNR values (in dB). Audios

Blues

Classical

Country

Folk

Pop

Jazz

ER(%)

0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1

192 kbps

256 kbps

MP3Stego

Yan

Proposed

MP3Stego

Yan

Proposed

66.69 50.31 50.97 46.91 89.20 79.79 76.49 70.34 89.24 73.49 70.55 66.54 83.98 61.89 60.11 57.24 70.36 61.66 57.23 51.80 79.67 65.71 65.09 63.56

70.59 56.53 53.85 50.50 101.96 82.26 78.39 72.16 95.36 74.92 71.26 68.00 86.61 63.86 61.92 58.67 72.02 63.44 59.30 53.97 83.57 70.81 69.23 66.70

67.99 53.86 51.31 48.30 91.62 77.64 74.49 69.53 88.16 69.55 66.47 62.94 80.58 59.34 56.94 53.95 70.69 58.55 54.68 50.11 79.06 66.88 66.34 62.53

77.30 59.96 58.41 56.38 107.34 86.73 83.67 77.57 96.39 80.63 76.78 73.46 92.78 69.57 67.34 64.19 77.42 67.23 64.19 59.57 87.35 75.86 74.62 72.09

77.66 63.41 60.82 57.38 99.46 87.53 84.41 78.75 97.75 81.93 78.32 74.67 93.26 70.72 68.50 65.25 82.40 70.55 66.09 61.06 89.34 77.05 75.68 73.06

74.71 60.77 58.31 55.21 98.09 84.15 80.58 75.63 95.26 76.21 72.89 69.65 88.54 65.73 63.70 60.48 78.54 65.14 61.81 56.85 85.65 73.31 72.71 68.80

the secret message is always preprocessed just like MP3Stego for the convenience of comparison. The message is first compressed by Zlib algorithm (Gailly and Adler, 1995) to eliminate its redundancy. Then 3DES algorithm (Young, 1995), which applies the Data Encryption Standard (DES) algorithm three times to each data block, is used to encrypt the compressed message. The algorithm is carried out on the base of the 8 Hz-mp3 encoder (8Hz Productions, 1998) and four typical compression ratios (96 kbps, 128 kbps, 192 kbps and 256 kbps) are tested in the experiments. Three primary requirements, capacity, imperceptibility and undetectability, may be used to evaluate a steganographic scheme. (1) Capacity. It is the quantity of secret message that can be embedded, which may be given in absolute measurement (such as the size of secret message) or in relative value (called embedding rate, such as bits per sample, or the ratio of the secret message to the cover, etc.). (2) Imperceptibility. It is the level of concealment, which prevents the warden from being able to distinguish between an original cover and a stego one. There are two methods to rate the imperceptibility: subjective evaluation and objective evaluation. Subjective evaluation is the same as the result judged by human listening, but it is time-consuming and may be influenced by some subjective factors such as the observer’s professional background, and psychology. As for audio steganography, mean opinion score (MOS) (Lie and Chang, 2006) provides a numerical indication of the perceived audio quality, which is calculated by averaging the results of a set of subjective tests. The MOS value is expressed from 1 to 5, where 1 stands for the lowest quality and 5 for the highest quality. Signal to noise ratio (SNR) has been widely used as the important objective evaluation metric for audio steganography. The definition of SNR can be given by

PN1

2 n¼0 s ðnÞ 2  N1 ~ n¼0 sðnÞ  sðnÞ

SNR ¼ 10 log10 P

(4)

where s (n) represents the clean audio (unmarked and decompressed in this work), and s~ðnÞ represents the stegoaudio (marked and decompressed). In general, higher the value of SNR, more imperceptible the secret message. The value of SNR, however, is not always applicable because it averages the distortions on the entire audio signal and disregards important mechanisms of the human auditory system. PEAQ (Thiede et al., 2000), compares the excitation patters along the basilar membrane in response to the two audio signals and integrates the comparison results over time into the objective difference grade (ODG). The value of (ODG) can be interpreted using a grade that describes the perceptual difference from imperceptible (when ODG is 0) to very annoying (when ODG is 4). The ODG value of 1 or higher is generally considered acceptable. (3) Undetectability. A steganographic system is considered undetectable if no statistical test can distinguish between the cover and stego objects. According to the informationtheoretical definition of steganographic undetectability given by Cachin (2004), the distance between the probability distributions of the cover and stego objects can be taken as a measure. However, it is difficult to obtain the distribution of the cover object because the dimensionality of the cover space is too large (potentially infinite). In practice, an alternative method to measure the undetectability is to calculate the distance between the critical features of cover and stego.

4.1.

Maximum embedding ratio

Although the maximum embedding ratio of a steganographic technique is of little importance, it does show us with a sense

710

c o m p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 7 0 4 e7 1 6

a

b

60

65

MP3Stego Yan Proposed

55

60

55

SNR (dB)

SNR (dB)

50

45

50

40

45

35

40

30

35

0

0.01

0.05

0.1

0

Embedding Rate (%)

0.01

0.05

0.1

Embedding Rate (%)

d

c

80

70 75 65

60

SNR (dB)

SNR (dB)

70

65

55 60 50 55 45

0

0.01

0.05

0.1

0

Embedding Rate (%)

0.01

0.05

0.1

Embedding Rate (%)

Fig. 7 e Performance comparison on SNR for blues audio (a) 96 kbps, (b) 128 kbps, (c) 192 kbps, (d) 256 kbps.

of how well the embedding algorithm utilizes redundancies in the cover audio. Ideally, the maximum embedding rate of the proposed method can achieve 0.5 bit/granule which is the same as that of MP3Stego and Yan et al.’s (2009) method. The maximum embedding ratio can be expressed as MER ¼

fs  ch 576

(5)

where fs and ch denote the sample rate and the number of channels, separately. The digit 576 in equation means the number of samples in each granule. According to the equation (5), the maximum embedding ratios for mono channel audio at 32 KHz, 44.1 KHz, 48 KHz sample rates are 27.78, 38.28 and 41.67 bit/s, separately, which rises with the increase of the sample rate. However, the actual embedding rate will be lower

than the value of MER because the granule with STOP type will be skipped during embedding. The more the occurrences of STOP type, the lower the embedding ratio. During the embedding, there will be a STOP type once the secret bit sequence ‘10’ appears. Hence, the property of the secret message determines the occurrences of STOP type. Due to the uncertainty and diversity of the secret message, the real maximum embedding ratio of the proposed method is uncertain.

4.2.

Imperceptibility

The imperceptibility refers to the perceptual transparency of the hidden message. It is obvious that the imperceptibility is

711

c o m p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 7 0 4 e7 1 6

Table 4 e Comparison of ODG values. Audios

Blues

Classical

Country

Folk

Pop

Jazz

ER(%)

0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1

96 kbps

128 kbps

MP3Stego

Yan

Proposed

MP3Stego

Yan

Proposed

0.0478 0.1531 0.3755 0.3729 0.2565 0.2322 0.3609 0.3768 0.0844 0.1570 0.2983 0.3289 0.0725 0.1186 0.3288 0.3499 0.0033 0.1498 0.2638 0.2682 0.0671 0.0670 0.1555 0.1621

0.1283 0.0293 0.0982 0.1062 0.0248 0.0284 0.1276 0.1519 0.0524 0.0137 0.0638 0.0998 0.1069 0.0414 0.0655 0.0917 0.0923 0.0029 0.0461 0.0564 0.1094 0.0380 0.0314 0.0524

0.0841 0.1031 0.2442 0.2597 0.0245 0.1558 0.3303 0.3506 0.0420 0.0823 0.2513 0.2666 0.0680 0.0536 0.2205 0.2416 0.0161 0.0981 0.1666 0.1839 0.0396 0.0647 0.1370 0.1463

0.1368 0.0571 0.2386 0.2185 0.0326 0.2026 0.3664 0.3463 0.0918 0.0194 0.1123 0.1093 0.1372 0.0681 0.1659 0.1506 0.1209 0.0227 0.1184 0.0945 0.1373 0.0030 0.0927 0.1119

0.1717 0.0924 0.0047 0.0003 0.1623 0.0550 0.0130 0.0400 0.1686 0.1321 0.0490 0.0424 0.1854 0.1341 0.0561 0.0619 0.0575 0.0028 0.0002 0.0195 0.1819 0.1123 0.0067 0.0151

0.1341 0.0161 0.1065 0.1103 0.0337 0.1342 0.2661 0.2906 0.0058 0.0098 0.1104 0.0850 0.1278 0.0926 0.0224 0.0293 0.1131 0.0148 0.0265 0.0247 0.1267 0.0228 0.0857 0.0842

Table 5 e Comparison of ODG values. Audios

Blues

Classical

Country

Folk

Pop

Jazz

ER(%)

0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1

192 kbps

256 kbps

MP3Stego

Yan

Proposed

MP3Stego

Yan

Proposed

0.1726 0.0756 0.1095 0.0410 0.1576 0.0032 0.0560 0.1343 0.1758 0.1133 0.0882 0.0210 0.1796 0.1330 0.1052 0.0675 0.1963 0.0821 0.1561 0.1275 0.0802 0.1101 0.1146 0.1791

0.1865 0.1977 0.1582 0.1488 0.1761 0.1150 0.0811 0.0128 0.1928 0.1658 0.1541 0.1238 0.1934 0.1744 0.1660 0.1699 0.1913 0.1563 0.1749 0.1594 0.1954 0.1573 0.1377 0.0950

0.1991 0.1504 0.1412 0.1042 0.1799 0.1267 0.0949 0.0211 0.1738 0.0968 0.0681 0.0032 0.1956 0.1755 0.1666 0.1451 0.1956 0.1694 0.1815 0.1299 0.1892 0.1554 0.0989 0.0938

0.2039 0.1690 0.1706 0.1338 0.1875 0.1557 0.1391 0.1034 0.1978 0.1874 0.1828 0.1724 0.1992 0.1907 0.1874 0.1778 0.1996 0.1903 0.1822 0.1765 0.1971 0.1691 0.1523 0.0428

0.2042 0.1890 0.1826 0.1671 0.1883 0.1602 0.1454 0.1139 0.1980 0.1887 0.1850 0.1753 0.1994 0.1920 0.1889 0.1809 0.2013 0.1875 0.1835 0.1719 0.1991 0.1739 0.1602 0.1294

0.2010 0.1933 0.1859 0.1724 0.1840 0.1433 0.1211 0.0734 0.1963 0.1794 0.1725 0.1555 0.1973 0.1834 0.1777 0.1619 0.1975 0.1772 0.1689 0.1511 0.1937 0.1510 0.1290 0.0846

712

c o m p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 7 0 4 e7 1 6

a

b

0.2 MP3Stego Yan Proposed

0.1

0.20 0.15 0.10

0.0

0.05 0.00

ODG

ODG

-0.1

-0.2

-0.05 -0.10 -0.15

-0.3 -0.20 -0.4

-0.25

0

0.01

0.05

0.1

0

Embedding Rate (%)

0.01

0.05

0.1

Embedding Rate (%)

c

d

0.21

0.20 0.20 0.18 0.19 0.16 0.18

ODG

ODG

0.14

0.12

0.17

0.16

0.10

0.08

0.15

0.06

0.14

0.04

0.13

0

0.01

0.05

0.1

0

Embedding Rate (%)

0.01

0.05

0.1

Embedding Rate (%)

Fig. 8 e Performance comparison on ODG for blues audio (a) 96 kbps, (b) 128 kbps, (c) 192 kbps, (d) 256 kbps.

Table 6 e Histogram similarity measure at 0.05% embedding rate (/1000). Distance

Manhattan

Euclidean

Audio

Blues Classical Country Folk Pop Jazz Blues Classical Country Folk Pop Jazz

96 kbps

128 kbps

MP3Stego

Yan

Proposed

MP3Stego

Yan

Proposed

166.264 189.032 163.798 170.968 161.254 168.782 91.0458 102.4193 78.7673 81.3430 86.5562 87.4929

82.562 44.088 63.716 70.656 74.806 72.178 40.6676 23.2624 30.1750 35.4410 36.6270 30.6350

4.574 10.770 6.374 4.938 4.378 4.566 1.4732 4.5638 1.6383 1.5880 1.0946 1.1207

234.136 331.896 244.450 307.386 277.772 242.974 125.6164 178.3632 94.0713 120.9428 114.3513 121.5876

112.532 32.302 66.100 87.752 45.168 61.206 46.3649 10.6373 24.3405 35.1583 18.9084 22.3923

15.170 8.236 12.226 12.216 8.526 8.368 5.4094 1.9388 3.1620 4.3699 2.3297 2.4834

713

c o m p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 7 0 4 e7 1 6

Table 7 e Histogram similarity measure at 0.05% embedding rate (/1000). Distance

Audio

Manhattan

192 kbps

Blues Classical Country Folk Pop Jazz Blues Classical Country Folk Pop Jazz

Euclidean

MP3Stego

Yan

Proposed

MP3Stego

Yan

Proposed

215.274 228.002 184.314 280.186 213.444 272.074 112.5779 48.9949 78.7673 77.9019 66.5291 69.6050

75.116 44.514 30.294 45.876 36.658 39.818 20.4300 20.7854 6.1104 10.4218 9.3844 7.1577

30.000 15.934 15.588 21.170 13.938 16.056 6.9619 4.0911 2.1627 3.8527 2.4527 2.3525

290.640 240.716 266.224 364.472 283.710 263.605 70.9267 86.7266 55.0303 66.1984 57.7869 55.8754

52.740 62.728 44.526 50.794 51.066 56.642 11.3319 21.5275 7.6225 8.5759 9.9239 8.9640

23.294 21.826 23.974 22.116 23.064 23.042 3.1699 4.2775 2.4035 2.5431 2.7223 2.9675

affected by the embedding rate. Higher the embedding rate, worse the imperceptibility. The embedding rate in the experiments is defined as ER ¼

256 kbps

LS  100% LMP3

(6)

where LS and LMP3 denote the length of the secret message and the MP3 file, separately. Due to the compression operation before embedding, it will produce 24 bytes even for a secret file with 0 bytes. In addition, extra 4 bytes are used to store the length of secret message. Therefore, the embedding rate 0% does not mean that the MP3 file remains unchanged.

Twenty listeners joined the subjective quality evaluation. The participants listened to the original and the stegoed audios and were asked to report dissimilarities between the two audios. Table 1 shows the average results of the MOS values with various embedding rates. These results show that the distortion of the proposed method is not perceived by the listeners when the embedding rate is less than 0.05%. Meanwhile, it shows that the compression ratio 192 kbps may be an appropriate selection during embedding. Tables 2 and 3 show the comparative results of various cover-audios by using different schemes in items of SNR. Fig. 7 shows the SNR values on blues audio under various compression ratios. It can be

Table 8 e Comparison of variance (/100) of block length. Audios

Blues

Classical

Country

Folk

Pop

Jazz

ER(%)

0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1

96 kbps

128 kbps

Clean

MP3Stego

Yan

Proposed

Clean

MP3Stego

Yan

Proposed

11.06

23.94 88.78 195.39 202.87 24.18 75.91 156.01 169.55 21.16 68.91 148.90 160.15 28.88 69.45 143.32 151.39 26.99 88.50 172.49 178.72 22.78 76.54 167.08 168.06

11.59 13.32 16.34 16.37 11.56 15.00 19.67 20.61 12.28 14.66 17.62 18.02 19.32 21.06 23.86 24.10 11.30 13.09 15.46 16.29 13.01 15.89 21.21 22.29

11.07 11.03 11.36 11.43 10.96 12.46 15.40 15.49 11.84 11.83 12.27 12.32 18.75 19.04 19.43 19.20 11.16 11.24 11.60 11.51 12.27 12.90 13.31 13.61

57.00

71.78 216.36 380.09 410.05 32.71 145.78 340.12 348.38 105.51 223.72 414.70 428.90 60.24 113.57 202.83 214.32 51.75 146.66 266.91 260.22 37.88 149.01 312.81 324.86

58.87 76.71 90.63 92.39 18.54 23.03 29.11 30.18 91.54 110.17 142.34 144.32 47.77 53.72 64.30 64.51 39.21 53.37 67.69 67.57 19.99 25.42 33.32 34.66

56.70 57.59 58.20 59.14 17.97 17.68 17.59 17.57 90.87 91.25 93.77 92.51 42.46 42.98 43.58 44.37 36.99 37.30 38.95 37.42 19.49 20.04 19.81 20.36

10.91

11.68

18.77

10.85

12.25

17.92

90.84

43.04

37.11

19.16

714

c o m p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 7 0 4 e7 1 6

Table 9 e Comparison of variance (/100) of block length. Audios

ER(%)

Blues

0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1 0 0.01 0.05 0.1

Classical

Country

Folk

Pop

Jazz

192 kbps

256 kbps

Clean

MP3Stego

Yan

Proposed

Clean

MP3Stego

Yan

Proposed

282.81

297.81 457.99 540.44 870.72 89.41 89.41 284.73 525.12 346.87 408.19 447.05 548.51 349.57 427.21 482.61 594.96 330.88 456.60 514.75 626.28 126.46 273.78 328.99 560.61

285.08 334.68 370.41 475.91 73.94 73.94 105.80 134.87 341.59 353.59 363.97 382.62 343.72 373.02 395.54 449.61 316.49 381.45 397.73 424.82 112.74 131.72 143.78 175.81

282.18 286.37 284.06 290.31 71.21 71.21 70.37 69.83 340.56 342.11 341.22 336.79 343.21 348.64 352.67 365.30 306.26 312.94 312.48 315.83 110.30 109.38 107.45 105.75

185.78

195.58 280.89 317.59 458.95 79.18 183.46 254.39 451.38 118.85 196.28 228.67 352.50 165.86 243.93 284.79 413.32 125.05 208.30 250.79 367.50 76.60 185.27 236.01 402.94

188.34 201.68 211.83 236.52 63.52 84.32 95.13 124.18 112.03 126.55 135.78 158.50 157.55 172.94 180.23 203.86 113.21 129.25 135.33 161.35 64.54 81.96 92.59 123.11

185.17 186.94 187.68 194.18 61.10 61.17 60.86 59.76 109.72 111.49 113.22 120.96 154.26 155.91 156.93 162.73 110.96 111.63 113.06 115.29 62.27 61.83 60.67 61.25

71.75

340.34

342.97

308.46

110.53

61.27

109.45

155.08

111.89

62.31

seen that, even at low compression ratio (96 kbps) and high embedding rate (ER ¼ 0.1%), the SNR value of the proposed method is far higher than the 20 dB requirement of IFPI (Katzenbeisser and Petitcolas, 2000). Tables 4 and 5 show the comparative results about ODG. The reason that some scores of ODG are greater than 0 is that the test audio has better perceptual quality than the reference audio. Fig. 8 shows the ODG values on blues audio. It can be seen that the proposed method performs better than MP3Stego, and slightly worse than Yan’s method (Yan et al., 2009). In addition, the ODG values of the proposed method are always around 0, which means that the perceptual quality can be considered well maintained.

where, d is the number of bins in the histogram and is set to 50 in the experiment. P and Q are the probability density functions for the histogram. Tables 6 and 7 show the experimental results under the embedding rate 0.05%. The smaller the distance, the greater the similarity of histograms. It can been seen that the proposed method can provide more statistical undetectability than other MP3 steganographic methods in various compression ratios. Another feature to rate the undetectability in the experiment is the variance of the block size which contains the number of the bits used for main data for each granule. It has been adopted by Westfeld to evaluate the undetectability of MP3Stego. The variance can be expressed as follow,

4.3.

s2 ¼

PN

Undetectability

To evaluate the statistical undetectability of the proposed scheme, the Manhattan (L1) and Euclidean (L2) distances (Cha, 2007) are adopted to evaluate the similarity between the original histogram and stego-histogram of MDCT coefficients which are the frequency-domain representation of MP3 audio. The distances are defined as follow,

dManh

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u d uX ¼ t jPi  Qij

(7)

i¼1

dEuc

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u d uX 2 ¼ t jPi  Qij i¼1

(8)

i¼1 ðxi

N

 xÞ2

(9)

where xi denotes the block length for the i-th granule and x denotes the mean value of the total N granules. Tables 8 and 9 represent the variance (/100) of the block sizes under various compression ratios and embedding rates. Fig. 9 shows the experimental data under the same embedding rate (0.05%). From the experimental results, it can be seen that the fluctuation in the block length, introduced by the proposed method, is minimal compared with that of other methods under various compression ratios. Additionally, the curves of the proposed method in Fig. 9 are nearly coincided with those of the non-stego case. According to the above analysis, a conclusion can be made that the proposed method maintains the statistical characteristics of the original block sizes and is secure against the detection attack based on block size analysis.

715

c o m p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 7 0 4 e7 1 6

560 480

Variance/100

400

b

Clean MP3Stego Yan Proposed

400

320

Variance/100

a

320 240 160

240

160

80 80 0 96

128

192

0 96

256

Compression Ratio (kbps)

c

d

480

Variance/100

Variance/100

256

560

400

320 240 160 80

320 240 160 80

128

192

0 96

256

Compression Ratio (kbps)

e

192

480

400

0 96

128

Compression Ratio (kbps)

128

192

256

Compression Ratio (kbps)

560

f

320

480 240

Variance/100

Variance/100

400 320 240 160

160

80 80 0 96

0 128

192

256

Compression Ratio (kbps)

96

128

192

256

Compression Ratio (kbps)

Fig. 9 e Comparison on variance (/100) of block size (ER [ 0.05%) (a) blues, (b) classical, (c) country, (d) folk, (e) pop, (f) jazz.

5.

Conclusions

This paper proposes an MP3 steganographic scheme which is realized by manipulating the encoder parameters, rather than modifying the signal data being encoded. The window type which is a parameter of the hybrid filter bank is employed to embed secret message. By the distortion adjustment mechanism of the encoder, little steganographic distortion has been introduced. Compared with the existed two MP3 steganographic methods, the proposed method can provide much better undetectability. Our future work is to find more encoder parameters which are suitable for steganography and to

investigate if the proposed algorithm can also be applied to other audio compression formats, such as MPEG-2/4 AAC.

Acknowledgments The authors are grateful to the reviewers for their fruitful comments which greatly contributes to improve the quality of the original manuscript. This work in the paper is supported by National Natural Science Foundation of China (61170137), Doctoral Foundation of Ministry of Education of China (20103305110002), Scientific Research Fund of Zhejiang

716

c o m p u t e r s & s e c u r i t y 3 1 ( 2 0 1 2 ) 7 0 4 e7 1 6

Provincial Education Department (Y201119434), Zhejiang Scientific and Technical Key Innovation Team of New Generation Mobile Internet Client Software (2010R50009), Outstanding (Postgraduate) Dissertation Growth Foundation of Ningbo University (10Y20100002), Ningbo University Foundation (XYL10002, XK1087) and K.C. Wong Magna Fund in Ningbo University.

references

8Hz Productions. 8hz-mp3, http://www.8hz.com/mp3/; 1998. Achmad Z. MP3Stegz, http://sourceforge.net/projects/mp3stegz/; 2008. Cachin C. An information-theoretic model for steganography. Information and Computation 2004;192(1):41e56. Cha SH. Comprehensive survey on distance/similarity measures between probability density functions. International Journal of Mathematical Models and Methods in Applied Sciences 2007;1(4):300e7. Dittmann J, Hesse D. Network based intrusion detection to detect steganographic communication channels: on the example of audio data. In: IEEE workshop on MMSP; 2004. p. 343e6. Gailly J, Adler M. Zlib, http://zlib.net; 1995. Katzenbeisser S, Petitcolas FAP. Information hiding techniques for steganography and digital watermarking. London: Artech House; 2000. Kim D, Yang S, Chung J. Additive data insertion into MP3 bitstream using linbits characteristics. In: Proceedings of ICASSP; 2004. p. 181e4. Koukopoulos D, Stamatiou Y. A watermarking scheme for MP3 audio files. International Journal of Signal Processing 2006;2(3): 206e13. Kwon G, Wang C, Lian S, Hwang S. Advanced partial encryption using watermarking and scrambling in MP3. Multimedia Tools and Applications; 2011. doi:10.1007/s11042-011-0771-8. Lie WN, Chang LC. Robust and high-quality time-domain audio watermarking based on low-frequency amplitude modification. IEEE Transactions on Multimedia 2006;8(1): 46e59. Megias D, Herrera J, Minguillon J. A robust audio watermarking scheme based on MPEG 1 layer 3 compression. In: Communications and Multimedia Security. LNCS, vol. 2828; 2003. p. 226e38. MPEG. Coding of moving pictures and associated audio for digital storage media at up to about 1.5 M bit/s, part 3: audio. International Standard IS, ISO/IEC; 1992. p. 11172e3.

Petitcolas FAP. MP3Stego, http://www.cl.cam.ac.uk/fapp2/ steganography/mp3stego/index.html; 2002. Platt C. UnderMP3Cover, http://sourceforge.net/projects/ump3c; 2004. Provos N, Honeyman P. Hide and seek: an introduction to steganography. IEEE Security Privacy 2003;1(3):32e44. Qiao M, Sung A, Liu Q. Steganalysis of MP3Stego. In: Proceedings of IJCNN; 2009. p. 2566e71. Thiede T, Treurniet WC, Bitto R, Schmidmer C, Sporer T, Beerends JG, et al. PEAQ C the ITU standard for objective measurement of perceived audio quality. Journal of Audio Engineer Society 2000;48(1&2):3e29. Westfeld A. Detecting low embedding rates. In: Proceedings of information hiding workshop. LNCS, vol. 2578; 2003. p. 324e39. Yan D, Wang D. Huffman table swapping-based steganography for MP3 audio. Multimedia Tools and Applications 2011; 52(2&3):291e305. Yan D, Wang R, Zhang L. Quantization step parity-based steganography for MP3 audio. Fundamenta Informaticae 2009; 97(1&2):1e14. Young E. OpenSSL crypto library manual, https://www.openssl. org/docs/crypto/des.html; 1995. Diqun Yan received B.S. and M.S. degrees in Circuit and System from Ningbo University, China, in 2002 and 2008, respectively. He is currently a Ph.D. student at College of Information Science and Engineering, Ningbo University. His research interests include multimedia security and digital audio processing. Rangding Wang is born in 1962. Received his M.S. degree in the Department of Computer Science and Engineering from the Northwest Polytechnic University, Xian in 1987, and received his Ph.D. degree in the School of Electronic and Information Engineering from Tongji University, Shanghai in 2004. Since 2004, he has been a professor at the College of Information Science and Engineering with Ningbo University. His current research works focus on speech coding, digital watermarking and multimedia signal processing. Xianmin Yu received his B.S. degree from Wuhan Polytechnic University, China, in 2010. Currently he is a M.E. student at College of Information Science and Engineering, Ningbo University. His current research interests include digital audio coding and steganalysis. Jie Zhu received his B.S. degree from Ningbo University, in 2009. Currently he is a M.E. student at College of Information Science and Engineering, Ningbo University. His current research interests include digital audio coding and steganography.