Scrambling–embedding for JPEG compressed image

Scrambling–embedding for JPEG compressed image

Signal Processing ] (]]]]) ]]]–]]] 1 Contents lists available at ScienceDirect 3 Signal Processing 5 journal homepage: www.elsevier.com/locate/s...

4MB Sizes 0 Downloads 119 Views

Signal Processing ] (]]]]) ]]]–]]]

1

Contents lists available at ScienceDirect

3

Signal Processing

5

journal homepage: www.elsevier.com/locate/sigpro

7 9 11

Scrambling–embedding for JPEG compressed image

13 Q1

SimYing Ong a,n, KokSheik Wong a, Kiyoshi Tanaka b

15

a b

Faculty of Computer Science and Information Technology, University of Malaya, Malaysia Faculty of Engineering, Shinshu University, Japan

17 19

a r t i c l e i n f o

abstract

21

Article history: Received 27 December 2013 Received in revised form 3 October 2014 Accepted 23 October 2014

This paper proposes a novel reversible unified information hiding method for the JPEG compressed image, aiming to achieve scrambling and external data insertion simultaneously. The properties of DC coefficients, energy of AC coefficient block, and run of zero AC coefficients are exploited. Two techniques are proposed to degrade the perceptual quality while manipulating the DCT coefficients for data embedding. Most of the existing unified information hiding methods are designed to operate in the spatial domain and their direct application to the compressed domain such as JPEG will lead to large bitstream size increment. Thus, the proposed techniques aim to minimize the bitstream size increment while offering comparable performance with respect to the current stateof-the-art methods. Experiments are conducted to measure the performance of the proposed unified method by using the UCID (Uncompressed Color Image Database) database and standard test images. Results suggest that the effective data payload ranging from 32 to 10238 bits and SSIM value ranging from 0.0548 to 0.9432 are achieved for the UCID database. The proposed method are also compared with the conventional methods in terms of effective data payload, image quality degradation, bitstream size increment and robustness against multiple sketch attacks. & 2014 Published by Elsevier B.V.

23 25 27 29 31

Keywords: Scrambling–embedding Scalable data payload Progressive quality degradation DCT Minimizing bitstream size

33 35 37

63

39 41

1. Introduction

43

Information hiding in multimedia content is generally divided into two major disciplines, namely, encryption, and external data insertion. Encryption conceals the perceptual meaning of the multimedia content by making it unintelligible (i.e., resembling noise) [1,2]. As an alternatively to encryption, scrambling is proposed to overcome certain application and implementation constraints such as computational power, memory, and format compliance while compromising robustness [3]. Scrambling is also commonly referred to as lightweight encryption, perceptual encryption, or transparent encryption in the literature.

45 47 49 51 53 55

n

57 59

Corresponding author. E-mail addresses: [email protected] (S. Ong), [email protected] (K. Wong), [email protected] (K. Tanaka).

On the contrary, external data insertion embeds data into a multimedia content to serve specific purpose. For instance, metadata (e.g., hyperlink, content label) is inserted into a multimedia content for enrichment [4] or digital archiving purposes [5]. For the purpose of error concealment, recovery information is inserted into the multimedia content for patching the lost or corrupted part(s) [6,7]. In digital right management, fingerprint is embedded into a content to trace its illegal distributor [8], and so forth. Although external data insertion and encryption are traditionally pursued separately, in recent years, researchers are exploring the unification of both disciplines to achieve multiple purposes within a single application [8–11]. This unification is crucial in the digital era because, on one hand, the current technology enables and simplifies various operations that can be performed on a digital content, such as record, copy, modify, and transmit. On the

65 67 69 71 73 75 77 79 81 83

http://dx.doi.org/10.1016/j.sigpro.2014.10.028 0165-1684/& 2014 Published by Elsevier B.V.

61 Please cite this article as: S. Ong, et al., Scrambling–embedding for JPEG compressed image, Signal Processing (2014), http://dx.doi.org/10.1016/j.sigpro.2014.10.028i

85

2

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61

S. Ong et al. / Signal Processing ] (]]]]) ]]]–]]]

other hand, technology also makes the digital content vulnerable to various forms of threat. Thus, unified information hiding can be considered to cope with different needs in today's applications. One of the possible applications of unified information hiding is document management in a typical office setting. The manager stores the confidential documents in encrypted form to prevent unauthorized viewing by irrelevant personnel. However, it is challenging for the secretary to manage and categorize these encrypted confidential documents because she does not have access to the corresponding plaintext. Therefore, metadata can be added by means of data embedding to the encrypted documents to allow the secretary to better manage them. Thus, unified information hiding method can provide controlled access to the user [12]. In addition, when the perceptual quality requirement in the traditional (i.e., single-purposed) data insertion method is relaxed, scalability in both data payload (i.e., number of embeddable bits in the content) and quality degradation can be achieved. That is, the amount of embeddable data and permissible distortion on the output image can be controlled [11,13]. Scalability is handy for image warehouse and video-on-demand services. From the perspective of a seller, image/video of different degrees of distortion can be generated to attract potential buyers. For the buyer, only one download is needed for both preview and actual owning. Furthermore, the users can access to different versions of the image/video depending on their subscriptions (e.g., regular, premium). In addition, metadata such as copyright information, user information can also be embedded into the processed image/video. Another possible application of unified information hiding is the use of scrambling for camouflaging the embedded data. In our proposed method, the output can be made to resemble noise. When an intruder encountered a scrambled (i.e., totally distorted) document, the first thing which comes to his mind is the plaintext, i.e., the decrypted image. Thus, the intruder will focus on the descrambling process of the image and likely to neglect the embedded message, which was achieved by using the scrambling process. Unified information hiding has the potential to meet the needs of the aforementioned applications, but a more detailed framework needs to be devised for actual implementations. As of this writing, the literature on unified information hiding is limited and the most relevant methods are reviewed here. Kundur et al. proposed the framework for joint fingerprint and encryption method for digital right management [8]. In particular, the sign of the AC coefficients are modified during transmission to prevent unauthorized viewing. At the receiver side, selected AC coefficients are decrypted while the undecrypted AC coefficients are considered as the fingerprint. Although this method does not cause bitstream size increment, it is irreversible, i.e., the original compressed content cannot be perfectly reconstructed. Zhang et al. proposed a separable data hiding method by embedding data into the LSB (least significant bit) of the selected encrypted pixel [9]. In the decoding process, the correlation among pixels is considered to reconstruct the original image. This method offers the separable property where decryption and data

extraction can be carried out in any orders. However, [9] does not guarantee reversibility and its data payload is low. Ma et al. proposed the RRBE method (Reserving Room Before Encryption) where encryption takes place after data embedding [10]. In particular, the LSB of the selected pixels are embedded (by using reversible data insertion method) into the reserved pixels, and the vacated space is utilized to embed the external data. This method is reversible and the data payload is higher than that of [9], but the data payload depends on the deployed reversible data insertion method. Recently, Fujiyoshi proposed a separable unified method by encrypting and vacating the room for data embedding using histogram permutation [14]. Although this method is reversible, the data payload is limited and it is designed to operate in the spatial domain. Ong et al. proposed a data insertion method to purposely distort image quality through reversible data embedding [12]. An image is partitioned into non-overlapping blocks and the histogram of each image block is processed based on a key-dependent function. However, this method is nonseparable and not suitable to be utilized in compressed image because it expands the bitstream. In this work, a reversible unified information hiding method is proposed for the JPEG compressed image. To the best of our knowledge, this is the first method that attempts to embed data through scrambling process in the DCT domain. The proposed methods are designed to achieve scalability in data payload and perceptual quality degradation while minimizing bitstream size increment. The rest of this paper is structured as follows: Section 2 puts forward the encoding and decoding steps of the proposed method. Enhancements of the proposed method are discussed in Section 3. Experimental results are presented in Sections 4 and 5 concludes this paper.

63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97

2. Proposed methods 99 In JPEG encoder, the input image A with M  N pixels is divided into 8  8 non-overlapping blocks Bðm; nÞ, where m A f1; 2; …; M=8g and n A f1; 2; …; N=8g. Each block is transformed into its frequency representation using DCT (Discrete Cosine Transformation) and the output is further quantized. Each quantized coefficient in Bðm; nÞ is denoted by Bðm; nÞu;v , where u; v A f1; 2; …; 8g. The upper left element, i.e., Bðm; nÞ1;1 , is called the DC coefficient while the rest are called the AC coefficients. The DC coefficients are predicted and the errors are coded, i.e., DPCM (Differential Pulse-Code Modulation). On the other hand, the AC coefficients in a block are considered in zigzag order and runlength coded, where the count of zero coefficients before the occurrence of a non-zero coefficient (or the end of the block is reached) is considered [15,16]. The coefficients are encoded as a pair, i.e., (zero run-length, actual non-zero AC coefficient), by using the Huffman codewords. For the rest of the discussion, (zero run-length, actual non-zero AC coefficient) is referred to as ZRV (zero-run value) pair. We exploit the properties of both DC and AC coefficients to realize scrambling–embedding. Here, two techniques are proposed and they are applied in three nonoverlapping areas within a JPEG image. Fig. 1 shows the flowchart of the proposed method. First, we propose a

Please cite this article as: S. Ong, et al., Scrambling–embedding for JPEG compressed image, Signal Processing (2014), http://dx.doi.org/10.1016/j.sigpro.2014.10.028i

101 103 105 107 109 111 113 115 117 119 121 123

S. Ong et al. / Signal Processing ] (]]]]) ]]]–]]]

3

1

63

3

65

5

67

7

69

9

71

11

73

13

75

15

77

17

79

19

81

21

83

23

85

25

87

27

89

29

91

31

93 Fig. 1. Encoding process for Proposed Method I, II, and III.

33 35 37 39 41 43

95 97

scrambling–embedding technique by manipulating the ZRV pairs (i.e., Proposed Method I) in Section 2.1. Next, energy of AC coefficients (i.e., sum of magnitudes) and the DC coefficient for each block are collected into two M=8  N=8 arrays. Another scrambling–embedding technique is proposed and applied on the DC coefficient array (i.e., Proposed Method II) and AC block energy array (i.e., Proposed Method III). The encoding and decoding processes for Proposed Method II and III are detailed in Sections 2.2 and 2.3, respectively.

99 101 103 105 107

45 2.1. Proposed method I: ZRV Pairs 47 49 51 53 55 57 59 61

The recommended quantization table in the JPEG compression standard incorporates HVS (Human Visual Systems) to achieve a balanced rate–distortion trade-off [16]. In particular, the visual information held by the high frequency sub-bands is less perceptually significant when compared to that held by the low frequency sub-bands. It is because human visual system is more sensitive to the overall (i.e., major) changes than the fine/subtle variations. Hence, the recommended quantization table utilizes small divisors for low frequency sub-bands to retain the perceptually significant information and large divisors for the high frequency sub-bands to remove the insignificant information. As a result, there are less zeros and larger (in terms of magnitude) AC coefficients in the low frequency sub-bands,

Fig. 2. Statistics of ZRV properties [%] for six standard test images. 1Zerorun of the first pair is smaller than zero-run of the last pair; 2Magnitude of the first pair is larger than magnitude of the last pair; 3Sum of zero-run for the first two pairs is smaller than the sum of zero-run for the last two pairs; 4Sum of magnitude for the first two pairs is larger than the sum of magnitude for the last two pairs.

109 111 113 115

while there are more zeros and smaller AC coefficients in the high frequency sub-bands. Since the AC coefficients handled in the zigzag order and run-length coded, ZRV pairs at the front end often have large magnitudes and short zero-runs while ZRV pairs at the back end have small magnitudes and long zero-runs. Fig. 2 records the statistics of zero-run and value (i.e., magnitude) for both the first and last ZRV pairs

Please cite this article as: S. Ong, et al., Scrambling–embedding for JPEG compressed image, Signal Processing (2014), http://dx.doi.org/10.1016/j.sigpro.2014.10.028i

117 119 121 123

S. Ong et al. / Signal Processing ] (]]]]) ]]]–]]]

4

1 3 5 7 9 11

based on six standard test images [17] (i.e., Airplane, Baboon, Boat, Lake, Lenna and Peppers) compressed at quality factor of 80. The zero-run part for the first ZRV pair is shorter (i.e., smaller) than that of the last ZRV pair for 73.9% of the time. In addition, the coefficient in the first ZRV pair has larger magnitude than that of the last ZRV pair for 82.1% of the time. These trends are more obvious when considering the sum of two front ZRV pairs and the sum of two last ZRV pairs. For the same set of images, the percentages increase to 88.5% and 84.9% for the case of sum of zero runs and sum of magnitudes, respectively. Therefore, it is observed in JPEG that:

13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Property 1. The ZRV pairs in a quantized coefficient block have large magnitude and short zero-run for the front pairs while having small magnitude and long zero-run for the end pairs. This property is exploited in this section to realize scrambling–embedding. Suppose there are X number of ZRV pairs in a block, where each ZRV pair Zx contains two elements, namely zero-run rx and value vx for x A f1; 2; …; Xg. Due to the utilization of binary representation and limited number of ZRV pairs (i.e., X r63), only 2β ZRV pairs are considered in the encoding process for β A f2; 3; 4; 5g. Note that β is dependent on X in which case β r ⌊log2 Xc. In the proposed method, only d ¼ 2β  1 ZRV pairs at the front and 2β  1 ZRV pairs indexed from the back are utilized, while the middle pairs are left unmodified. For instance, if X ¼10, then β ¼ ⌊log2 10c ¼ 3. Here, the first four ZRV pairs and last four ZRV pairs are utilized, and the remaining two middle pairs are left unmodified as shown in Fig. 3. To ease the discussion, we omit the middle pairs in the illustrations while showing only the chosen 2β ZRV pairs. Every two ZRV pairs are collected into a group and each group is denoted by Gc for c A f1; 2; …; dg. Using the same example given in Fig. 3, the grouping process is illustrated in Fig. 4, where Z 1 ðr 1 ; v1 Þ and Z 2 ðr 2 ; v2 Þ are grouped as G1, Z 3 ðr 3 ; v3 Þ and Z 4 ðr 4 ; v4 Þ are grouped as G2, and so forth. Groups in a block are shifted and flipped to create a total of 2d unique states for data embedding purposes. Here, each state is denoted by Sγ for γ A f1; 2; …; 2dg. First, groups in a block are shifted as follows: Sγ ¼ Sγ  1 5 1

for γ A f2; 3; …; dg and 5 denotes the process of shifting one group to the left. Note that S1 ¼ fG1 ; G2 ; …; Gd g is the initial state, i.e., the original position as in S1 of Fig. 4. Example of Proposed Method I for d¼4 is shown in Fig. 4(a).

53 55 57 59 61

ð1Þ

Fig. 3. Utilized and unutilized ZRV pairs for X ¼10.

Next, the groups in a block are flipped to produce the states Sd þ 1 ¼ fGd ; Gd  1 ; …; G1 g, i.e., the indices are in decreasing order. From this particular state, the unique states of Sγ for γ A fd þ 2; d þ 3; …; 2dg are produced by invoking Eq. (1). Using the same example, the flipping and shifting processes are illustrated in Fig. 4(b), and the generated unique states are recorded in Table 1. Each unique state is considered to represent log2 ⌊2dc ¼ β bits of the external data, viz., the ZRV pairs are arranged to form the state determined by the external data to be embedded. Based on the last column of Table 1, S1 encodes ‘000’, S2 encodes ‘001’, …, and S8 encodes ‘111’. That is, the ZRV pairs are left unmodified (i.e., S1 ¼ ½G1 ; G2 ; G3 ; G4 ) if the external data is ‘000’. On the other hand, if the external data is ‘010’, the ZRV pairs are modified to S3 ¼ ½G3 ; G4 ; G1 ; G2 , and so forth. To increase the robustness of Proposed Method I against unauthorized decoding, the association between the states and bit sequences can be randomized. To extract the embedded data and recover the original image, the original state for every manipulated block B0 ðm; nÞ must be identified as the reference state. The AC coefficients are grouped and unique states S0γ are generated using Eq. (1). For this purpose, Property 1 is exploited to locate the original state of each block by computing the product of run and magnitude Γ ðS0γ Þ for each S0γ as follows:

Γ ðS0γ Þ ¼ ½ðr1 þ r2 þ 2Þð∣vd  1 ∣ þ ∣vd ∣Þ  ½ð∣v1 ∣ þ ∣v2 ∣Þðrd  1 þ rd þ 2Þ:

63 65 67 69 71 73 75 77 79 81 83 85 87 89

ð2Þ

91

Since it is possible that r 1 þ r 2 ¼ 0 or r d  1 þ r d ¼ 0, þ2 is introduced in the equation. The result for each state is collected. The original state always yields the largest Γ because in the original state, (a) the magnitude for the first group and the runs on the last group are of large values, and (b) the runs for the first group and the magnitude on the last group are of small values. Once the original state is identified, the manipulated block B0 ðm; nÞ can be recovered to its original form Bðm; nÞ by reordering the ZRV pairs. The table associating states and bit sequences is referred to extract the embedded data. Decoding is performed on all blocks and finally, the recovered image A0 is obtained. Nonetheless, not all the blocks satisfy Property 1. For example, in Lenna,  6% of the blocks failed the assumed property. To handle these failure blocks, two solutions are considered: (a) preprocess these blocks to fit the assumed property; and, (b) utilize side information to indicate those failure blocks. Specifically, preprocessing removes the ZRV pair(s) from the back (with respect to zigzag order) until the assumed property is satisfied, thus making the proposed method irreversible, but rewritable.1 However, if reversibility is important for the application in question, the second option can be pursued. In particular, location of the failure blocks is recorded as side information and the ZRV pairs are simply shuffled. However, the data payload is affected because: (a) failure blocks are not utilized hence not contributing to the data payload; and, (b) part of the

93

1 Here, rewritable implies that A0 can be re-utilized in the proposed method without repeating the preprocessing. In other words, image quality does not further degrade in subsequent utilization.

Please cite this article as: S. Ong, et al., Scrambling–embedding for JPEG compressed image, Signal Processing (2014), http://dx.doi.org/10.1016/j.sigpro.2014.10.028i

95 97 99 101 103 105 107 109 111 113 115 117 119 121 123

S. Ong et al. / Signal Processing ] (]]]]) ]]]–]]]

5

1

63

3

65

5

67

7

69

9

71

11

73

13

75

15

77

17

79

19

81 83

21 23 25 27 29 31 33 35

Fig. 4. Illustration of ZRV groups shifting process for d ¼ 4. S1 in (a) is flipped to produce S5 in (b). (a) Shifting from S1 to S4, (b) Shifting from S5 to S8.

Table 1 List of states for the example given in Fig. 4. Sγ

States

S1 S2 S3 S4

{G1, {G2, {G3, {G4,

G2, G3, G4, G1,

G3, G4, G1, G2,

G4} G1} G2} G3}

000 001 010 011

S5 S6 S7 S8

{G4, {G3, {G2, {G1,

G3, G2, G1, G4,

G2, G1, G4, G3,

G1} G4} G3} G2}

100 101 110 111

Encoded bits

37 39 41 43 45 47 49 51 53 55 57 59 61

data payload is reserved to store the side information. Hereinafter, in the case of rewritable, we called it as rewritable mode, while in the case of reversible, we called it as reversible mode. Last but not least, the parameters agLim and abLim are introduced to control the quality and data payload of the proposed method. The first parameter, agLim, limits the number of groups (i.e., setting the maximum value) that can be utilized for every block in the encoding process. For example, suppose image A consists of 5  5 coefficient blocks and the number of groups that can be formed in each block is indicated in Fig. 5(a), where the symbol ⊠ denotes a failure block. Fig. 5(b) shades the blocks considered by Proposed Method I when agLim ¼ 4 and indicates the actual number of groups formed in each considered block. Note that there are more available ZRV pairs for utilization (e.g., the block located at the 1-st row and 3-rd column in Fig. 5(a) has 16 groups), but at most agLim ¼ 4 groups are considered. On the other hand, the second parameter, abLim, limits the number of blocks to consider. Here, only the blocks with at most abLim groups of ZRV pairs are selected for encoding. For instance, when abLim ¼ 8, only the blocks containing at most 8 groups of ZRV pairs

(viz., 4 r X r 31) are utilized for scrambling–embedding. The result of imposing abLim ¼ 8 to A is shown in Fig. 5(c), where only the shaded blocks are considered for encoding. 2.2. Proposed method II: DC coefficients By construction, the DC coefficient is the average intensity value for its corresponding 8  8 pixel block. Therefore, the DC coefficients themselves are sufficient to sketch the original image. An example is shown in Fig. 6(a), which suggests the leakage of image information [18]. Therefore, it is crucial to scramble the DC coefficients to prevent unauthorized viewing. In this section, a permutation-based technique is proposed to scramble the DC coefficients while embedding data. First, the DC coefficients are gathered into a matrix D, and let Dðm; nÞ denote the DC coefficient originating from the ðm; nÞth block for m A f1; 2; …; M=8g and n A f1; 2; …; N=8g. Fig. 6(b) shows the distributions of Dðm; nÞ Dðm; n þ 1Þ, i.e., horizontal differences, which follows the Laplacian distribution. The same trend is also observed for the vertical differences. Hence, Property 2 holds true: Property 2. DC coefficients are highly correlated in both the horizontal and vertical directions, i.e., Dðm; nÞ  Dðm 7 δm ; n 7 δn Þ for small δm and δn. This property is exploited in both directions in Proposed Method II to achieve scrambling–embedding using the DC coefficients. First, the 1st row of D is left unmodified and the encoding proceeds from the 2nd row onwards. Each row of D is divided into non-overlapping groups (i.e., μ groups in total) and each group is denoted by Hf for f A f1; 2; …; μg. Next, these groups are permuted to generate unique states Υ ζ , where ζ A f1; 2; …; μ!g. Finally, each state is utilized to encode ⌊log2 μ!c bits of the external

Please cite this article as: S. Ong, et al., Scrambling–embedding for JPEG compressed image, Signal Processing (2014), http://dx.doi.org/10.1016/j.sigpro.2014.10.028i

85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 121 123

6

S. Ong et al. / Signal Processing ] (]]]]) ]]]–]]]

1

63

3

65

5

67

7

69

9

71

11

73

13 15

Fig. 5. Example of limiting the utilization of groups and blocks using abLim and agLim. (a) Maximum number of groups (i.e., 2β ) in each block, (b) Maximum number of groups utilized in each block when agLim ¼ 4, (c) Blocks utilized when abLim ¼ 8 Steganographic embedding.

75 77

17

79

19

81

21

83

23

85

25

87

27

89

29

91 93

31 33

Fig. 6. Sketch of Lenna image using the DC coefficients and distribution of the difference between neighboring DC coefficients. (a) DC image, (b) Distribution of difference in DC values.

35 37 39 41 43 45 47 49

Fig. 7. Arrangement of DC coefficients groups. (a) Division of DC coefficients into groups Hf, (b) Arranging DC groups in circular representation.

51 53 55 57 59 61

data. In other words, each row of D is rearranged according to the external data. The same operations are applied in the vertical direction. An illustration for μ ¼ 4 is shown in Fig. 7(a). Table 2 records all possible states when μ ¼ 4 and their encoded bit sequences. For example, the row is modified from Υ 1 ¼ ½H1 ; H2 ; H3 ; H4  to Υ 11 ¼ ½H1 ; H4 ; H2 ; H3  to embed ‘1010’. Note that the association between states and the bit sequences can be shuffled to increase the robustness of the proposed method.

Next, distortion is controlled by restricting: (a) the scope of permutation; and, (b) the number of states to consider. Note that a new (unique) state is generated by permuting the groups Hf. Within each generated state, the distance (in terms of modulo arithmetic) between every two consecutive groups (in terms of their original indices) must be smaller than the parameter Ψ , where 1 r Ψ r μ  1. Otherwise, the state in question is ignored. For instance, Fig. 7(b) shows a valid state for Ψ ¼ 1 because the distance between every two consecutive groups, namely H1 to H2, H2 to H3 and H3 to H4, is maxf1; 1; 1g ¼ 1. Note that the permutation reduces to the conventional cyclic shift process when Ψ ¼ 1. Therefore, distortion can be controlled by tuning the parameter Ψ . Table 2 records the distance of each consecutive pair (see 4-th column) in every state in the case of μ ¼ 4. The output image quality can be further controlled by using the parameter φ when Ψ ¼ 1, where φ is utilized to limit the number of states involved in the encoding process. If φ ¼ 2, only Υ1 and Υ2 (or any two states achievable by Ψ ¼ 1) are considered. Nonetheless, the utilization of Ψ and φ also reduces the data payload because fewer states are considered. During the decoding process, each scrambled– embedded row is divided into μ groups and permuted 0 exhaustively to generate all possible states Υ ζ . Here, the original state for every row is identified to enable image

Please cite this article as: S. Ong, et al., Scrambling–embedding for JPEG compressed image, Signal Processing (2014), http://dx.doi.org/10.1016/j.sigpro.2014.10.028i

95 97 99 101 103 105 107 109 111 113 115 117 119 121 123

S. Ong et al. / Signal Processing ] (]]]]) ]]]–]]]

1 3 5 7 9 11 13 15 17 19 21 23

63

Table 2 List of all states and their corresponding assigned bit sequences. Υz

States

Υ1 Υ2 Υ3 Υ4 Υ5 Υ6 Υ7 Υ8 Υ9 Υ10 Υ11 Υ12 Υ13 Υ14 Υ15 Υ16 Υ17 Υ18 Υ19 Υ20 Υ21 Υ22 Υ23 Υ24

{H1, {H4, {H3, {H2, {H1, {H4, {H3, {H2, {H1, {H1, {H1, {H1, {H2, {H2, {H2, {H2, {H3, {H3, {H3, {H3, {H4, {H4, {H4, {H4,

H2, H1, H4, H3, H3, H2, H1, H4, H2, H3, H4, H4, H1, H1, H3, H4, H1, H2, H2, H4, H1, H2, H3, H1,

H3, H2, H1, H4, H4, H3, H2, H1, H4, H2, H2, H3, H3, H4, H1, H3, H4, H1, H4, H2, H3, H1, H1, H2,

H4} H3} H2} H1} H2} H1} H4} H3} H3} H4} H3} H2} H4} H3} H4} H1} H2} H4} H1} H1} H2} H3} H2} H1}

Encoded bits

Distances

Ψ

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 – – – – – – – –

1, 1, 1 1, 1, 1 1, 1, 1 1, 1, 1 2, 1, 1 2, 1, 2 2, 1, 2 2, 1, 2 1, 2, 3 2, 3, 2 3, 2, 1 3, 3, 3 3, 2, 1 3, 3, 3 1, 2, 3 2, 3, 2 2, 3, 2 3, 3, 3 3, 2, 1 1, 2, 3 1, 2, 3 2, 3, 2 3, 2, 1 1, 1, 3

1 1 1 1 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

65 67 69 71 73 75 77 Fig. 8. Distribution for difference in AC Block Energies in Lenna.

79 image information of the scrambled image [19]. Thus, it is crucial to permute the AC coefficient blocks to prevent unauthorized viewing. Fig. 8 shows the distribution of Λðm; nÞ  Λðm; n  1Þ. The distribution indicates that neighboring blocks have similar energies, and hence they are highly correlated. Similar trend is also observed for the vertical direction, i.e., Λðm; nÞ  Λðm  1; nÞ. In particular, the following property in JPEG is considered for the rest of this section:

25 27 29 31 33 35 37 39 41 43 45 47 49

7

recovery and correct data extraction. For that purpose, 0 the sum of differences εðΥ ζ Þ is calculated using the DC coefficients in the current and previous rows for every possible state: N=8

εðΥ 0ζ Þ ¼ ∑ ∣Dζ ði; jÞ  Dði 1; jÞj;

ð3Þ

j¼1

where Dζ ði; jÞ depends on the grouping of DC coefficients 0 and the state ζ. The state with the smallest value of εðΥ ζ Þ is declared as the original state (Property 2). Since the DC coefficients are correlated vertically, the sum of difference between the original state and previous row should be the smallest. Hence, the DC coefficients can be restored to their original positions and the processes are repeated for all rows. Since the original state can be identified, the embedded data can also be extracted correctly. Nonetheless, some rows in D (e.g., from similar texture area) fail to satisfy Property 2 and hence they cannot be restored perfectly. These unfit rows are identified using Eq. (3) in the preprocessing step and their indices are recorded as side information. Here, they are merely permuted without embedding data. Similar steps are performed for all columns.

51

81 83 85 87 89

Property 3. The sum of magnitude for quantized AC coefficient blocks are highly correlated in both the horizontal and vertical directions, i.e., Λðm; nÞ  Λðm 7 δm ; n 7 δn Þ for small δm and δn. Here, the permutation function detailed in Section 2.2 is applied on Λ to realize scrambling–embedding. Again, preprocessing is first performed to identify the unfit rows (of AC energy blocks) by calculating the sum of differences between the current and previous rows. Then, location of the unfit rows is identified and recorded as side information. Each of the rows that fit the assumed property is divided into groups to generate all possible states. These possible states are then utilized to encode data, where each row in Λ is modified based on the external data to be embedded, similar to the case of Proposed Method II (DC coefficient). For decoding, the sum of differences for every possible states is calculated. The state which yields the smallest sum of differences is declared as the original state. Therefore, the blocks are recovered to their original positions and the embedded data is obtained. The encoding and decoding processes here are similar to those of Proposed Method II, and the details are omitted.

91 93 95 97 99 101 103 105 107 109 111 113

2.3. Proposed method III: AC block energy 53 55 57 59 61

The sum of magnitude of AC coefficients Λðm; nÞ for the (m,n)-th 8  8 DCT block is calculated as follows:  8  8 Λðm; nÞ ¼ ∑ ∑ ∣Bðm; nÞu;v j  jBðm; nÞ1;1 j: ð4Þ u¼1v¼1

Due to the nature of DCT, Λðm; nÞ yields a large value for a complex or edge block, and vice versa. Leakage of these information may allow an intruder to derive additional

2.4. Combination of proposed method I, II and III

115

Proposed Method I, II and III are operating independently on non-overlapping areas in a JPEG compressed image. Therefore, they can be combined to increase the range of image quality degradation while increasing the data payload. The performance of the combined method in terms of achievable data payload, output image quality and bitstream size increment is presented in Section 4.

117

Please cite this article as: S. Ong, et al., Scrambling–embedding for JPEG compressed image, Signal Processing (2014), http://dx.doi.org/10.1016/j.sigpro.2014.10.028i

119 121 123

S. Ong et al. / Signal Processing ] (]]]]) ]]]–]]]

8

payload of the remaining rows (i.e., less than ν). Therefore, the data payload increases when μ or ν increases.

1

3. Discussion

3

3.1. Improvement on data payload and robustness

63 65

3.2. Robustness against Well-suited Attacks 5 7 9 11 13 15 17 19 21

The data payload for Proposed Method II and III can be further enhanced by increasing the number of possible permutations. Fig. 9 shows the increment of the highest possible data payload when the number of groups (i.e., μ) in a row is increased. In addition, it is also observed that the highest possible data payload increases if two or more rows are combined. For instance, the highest possible data payload for a single row (i.e., ν ¼ 1) is 256 bits for an 512  512 image and μ ¼ 4. After increasing ν to 5, the achievable data payload is 776 bits, which is  3 times larger than that of ν ¼ 1. Here, the data payload for every ν and μ is calculated by using the following equation: ⌊M=8=νc⌊log2 ðνμ!Þc þ⌊log2 ½ððM=8ÞmodνÞμ!c:

ð5Þ

where ⌊M=8=νc is the number of groups consisting ν rows in the image, ⌊log2 ðνμ!Þc is the number of bits representable by each group, and ⌊log2 ½ððM=8ÞmodνÞμ!c is the data

23 25 27 29 31 33 35 37 39 41 43 45

Fig. 9. Data payload for combined rows for an image of 512  512 pixels. The dotted line indicates the predicted possible data payload because the number of permutations is too huge to be calculated.

Table 3 Performance of Proposed Method I (ZRV Pairs) using various parameters in rewritable mode and reversible mode (the right-most column). Mode

47

53

Rewritable agLim ¼ 2

Reversible agLim ¼ 4

agLim ¼ 8

abLim ¼ 2

abLim ¼ 4

abLim ¼ 8

agLim ¼ abLim ¼ 16

agLim ¼ abLim ¼ 16

Image quality degradation

49 51

67 This subsection discusses the robustness of the proposed method if Properties 1–3 of the proposed method are known and exploited by the attacker. First, for Proposed Method I, the attacker can perform the reverse operation using Eq. (2). However, the AC coefficient blocks are scrambled using Proposed Method II. In other words, the attacker can only recover the original order of the ZRV pairs within individual blocks but does not know the exact location of each block. Hence, the original image or its sketch remains concealed. Even if the attacker knew Properties 2 and 3, he still needs to guess the parameter values μ and ν correctly for Proposed Method II and III, respectively, in order to successfully perform the reverse operations. Secondly, the tables utilized to encode the external data are kept private between the sender and receiver. Therefore, even in the event where the attacker successfully recovers the image (fully or partially), he is still not able to extract the embedded external data. Next, substitution can be performed on ZRV pairs and DC coefficients after performing the scrambling–embedding processes to further improve robustness. Specifically, the values of the original ZRV pairs and DC coefficients are substituted by another value in the same category. Note that these operations will not cause any bitstream size increment because the substitution is done within the same category. For the case of ZRV pairs, the Huffman codewords of exact same length can be bijectively mapped. For example, since both Run/Size of 0/3 (i.e., codeword ¼ 100XYZ) and 2/1 (i.e., codeword ¼ 11100W) have the same complete Huffman codeword length of 6 bits, they can be bijectively mapped without giving any impact to the bitstream size. In fact, more codewords satisfying the length condition can be considered. Without knowing the substitution or mapping rules, the attacker is not able to reverse the operations.

A (SSIM) A (PSNR) A0 (SSIM) A0 (PSNR)

0.9640 37.07 0.5580 15.97

0.4365 14.69

69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111

0.3904 14.05

0.9432 34.19

0.8040 25.11

0.4736 15.61

0.3848 13.97

0.4117 14.15

113 115

Bitstream size changes

55 57 59 61

A0 [%]

 0.65

 0.66

 0.66

 0.66

 0.66

 0.66

 0.66

0.00

8578.64 0.19

676.15 0.01

2862.82 0.06

7342.12 0.16

8887.77 0.19

5137.65 0.12

Effective data payload 0

A [Bits] A0 [BpNAC]

4991.84 0.11

7149.68 0.16

A and A0 are the original and output images, respectively. Effective data payload is obtained after deducting the gross data payload with location map of size M=8  N=8.

Please cite this article as: S. Ong, et al., Scrambling–embedding for JPEG compressed image, Signal Processing (2014), http://dx.doi.org/10.1016/j.sigpro.2014.10.028i

117 119 121 123

S. Ong et al. / Signal Processing ] (]]]]) ]]]–]]]

1 3 5

11 13

63

Table 4 Performance of Proposed Method II (DC Coefficients) for various μ and Ψ . μ

4

5

6

7

8

Ψ

Max

Max

Max

Max

1

A (SSIM) A (PSNR)

0.9640 37.07

65 2

3

4

5

6

Max

Image quality degradation

7 9

9

A0R (SSIM) A0R (PSNR) A0R  C (SSIM)

A0R  C (PSNR)

67 69 71

0.5845 13.52 0.3813

0.5912 13.72 0.3956

0.5744 13.49 0.3740

0.5506 13.22 0.3571

0.5542 13.15 0.3596

0.5507 13.13 0.3584

0.5486 13.12 0.3569

0.5473 13.12 0.3560

0.5466 13.12 0.3552

0.5461 13.12 0.3551

0.5464 13.13 0.3553

73

10.54

10.84

10.55

10.41

10.38

10.37

10.37

10.37

10.37

10.37

10.38

75

0.30 4.15

0.38 4.39

0.04 4.41

0.18 4.42

0.27 4.43

0.31 4.44

0.34 4.44

0.36 4.45

0.37 4.44

Bitstream size changes

15 17 19 21 23 25 27 29

A0R [%] A0R  C [%]

0.14 3.95

0.26 3.79

79

Effective data payload A0R [Bits] A0R [BpDC] A0C [Bits] A0C [BpDC] A0R  C [Bits]

A0R  C [BpDC]

153.16 0.05 172.68 0.06 325.84

247.13 0.08 286.48 0.09 533.61

383.65 0.12 457.52 0.15 841.16

493.65 0.16 625.71 0.20 1119.35

99.00 0.03 115.16 0.04 214.16

186.32 0.06 228.02 0.08 414.34

307.30 0.10 387.41 0.13 694.72

369.29 0.12 479.50 0.16 848.79

379.20 0.13 558.04 0.19 937.24

338.65 0.11 584.05 0.20 922.70

302.15 0.10 644.70 0.21 946.85

0.11

0.17

0.27

0.36

0.07

0.14

0.23

0.29

0.32

0.31

0.31

A is the original image. A0R is the output image for row operation. A0C is the output image for column operation. A0R  C is the output image for row and column operations. Effective data payload is obtained after deducting the side information which records the location of unfit row(s) and column(s) (i.e., ⌈log2 M=8⌉ bits per unfit row and ⌈log2 N=8⌉ bits per unfit column).

35

Table 5 Performance of Proposed Method II (DC Coefficients) using various φ when μ ¼ 4 and Ψ ¼ 1. φ

41

2

3

4

Image quality degradation

37 39

A0R (SSIM) A0R (PSNR) A0R  C (SSIM)

A0R  C (PSNR)

0.7048 15.57 0.5164

0.6197 13.96 0.4145

0.5871 13.54 0.3833

12.13

10.78

10.54

Bitstream size changes

43 45 47 49 51 53 55 57 59 61

81 83 85 87 89 91 93

31 33

77

A0R [%] A0R  C [%]

0.02 2.46

0.03 3.46

0.04 3.93

Effective data payload A0R [Bits] A0R [BpDC] A0C [Bits] A0C [BpDC] A0R  C [Bits]

A0R  C [BpDC]

47.95 0.02 57.28 0.02 105.23

46.41 0.02 57.46 0.02 103.86

96.00 0.03 115.12 0.04 211.13

0.03

0.03

0.07

In addition, the number of groups per row μ and the number of row ν to be grouped can be changed periodically. Also, instead of evenly dividing the DC coefficients into groups, irregular division can be performed. Similar modification can be done to the AC coefficient blocks. Hence, even if the attacker exploits Properties 1–3, the aforementioned discussions suggest that the reverse operations are not straight forward for an attacker who

knows nothing about the parameter values. Therefore, the proposed method is robust against well-suited attacks.

95

4. Experimental results

97

Experiments were conducted to verify the performance of each proposed method using the UCID (Uncompressed Color Image Database) database [20]. The images in UCID database are all of size 384  512 or 512  384 pixels. These images are converted into grayscale and compressed at quality factor of 80 for experiment purposes. Change on bitstream size, effective data payload (i.e., number of embeddable bits in the image with respect to the proposed methods), and output image (i.e., scrambled–embedded) quality for Proposed Method I, II and III are examined in the following subsections. In addition, performance of the integrated method (i.e., Proposed Method I, II and III) is evaluated. Finally, comparison of the proposed method with related conventional methods is also performed and discussed.

99

4.1. Image quality degradation, bitstream size, and data payload

101 103 105 107 109 111 113 115 117

Table 3 shows the average performance of Proposed Method I using the UCID database for various parameters in rewritable mode (i.e., preprocessing is applied). Note that, by design, only blocks which contain at least 2 groups of ZRV pairs (viz., at least 4 ZRV pairs) are utilized in the experiment.

Please cite this article as: S. Ong, et al., Scrambling–embedding for JPEG compressed image, Signal Processing (2014), http://dx.doi.org/10.1016/j.sigpro.2014.10.028i

119 121 123

S. Ong et al. / Signal Processing ] (]]]]) ]]]–]]]

10

1 3 5

μ

4

5

6

7

8

Ψ

Max

Max

Max

Max

1

A (SSIM) A (PSNR) A0R (SSIM) A0R (PSNR) A0R  C (SSIM)

0.964 37.07 0.4775 19.58 0.3299

11 13

65 2

3

4

5

6

Max

Image quality degradation

7 9

63

Table 6 Performance of Proposed Method III (AC Block Energy) for various μ and Ψ .

A0R  C

(PSNR)

18.58

67 69

0.4722 19.51 0.3315

0.4471 19.27 0.3143

0.4137 19.02 0.3014

0.4075 18.93 0.3002

0.4057 18.92 0.2995

0.4058 18.92 0.2995

0.4055 18.92 0.2994

0.4057 18.93 0.2992

0.4054 18.92 0.2991

0.4052 18.92 0.2991

18.59

18.46

18.39

18.37

18.36

18.36

18.36

18.36

18.36

18.36

71 73 75

Effective data payload

15 17 19 21 23 25

A0R [Bits] A0R [BpB] A0C [Bits] A0C [BpB] A0R  C [Bits]

A0R  C [BpB]

99.46 0.03 152.98 0.05 252.44

117.55 0.04 210.75 0.07 328.30

135.84 0.04 293.48 0.10 429.32

98.21 0.03 259.96 0.08 358.17

87.26 0.03 104.20 0.03 191.46

144.99 0.05 191.09 0.06 336.08

213.51 0.07 295.89 0.10 509.41

232.96 0.08 334.32 0.11 567.28

195.25 0.06 306.50 0.10 501.75

139.43 0.05 234.82 0.08 374.25

95.70 0.03 318.19 0.10 418.89

0.08

0.11

0.14

0.12

0.06

0.11

0.17

0.18

0.16

0.12

0.13

A is the original image. A0R is the output image for row operation. A0C is the output image for column operation. A0R  C is the output image for row and column operations. Effective data payload is obtained after deducting the side information which records the location of unfit row(s) and column(s) (i.e., ⌈log2 M=8⌉ bits per unfit row and ⌈log2 N=8⌉ bits per unfit column). No changes on bitstream size up to 2 decimal places for A0R and A0R  C .

31 33

Table 7 Performance of Proposed Method III (AC Block Energy) using various φ when μ ¼ 4 and Ψ ¼ 1. φ

2

3

4

Image quality degradation

35 37

A0R (SSIM) A0R (PSNR) A0R  C (SSIM)

A0R  C (PSNR)

39 41 43 45 47

0.6353 21.35 0.4612

0.5275 20.09 0.3627

0.4777 19.60 0.3302

19.56

18.82

18.59

Effective data payload A0R [Bits] A0R [BpB] A0C [Bits] A0C [BpB] A0R  C [Bits] A0R  C [BpB]

37.95 0.01 54.75 0.02 92.69

31.84 0.01 55.29 0.02 87.13

75.43 0.02 112.93 0.04 188.36

0.03

0.03

0.06

† No changes on bitstream size up to 2 decimal places for A0R and A0R  C .

49 51 53 55 57 59 61

79 81 83 85 87 89

27 29

77

The achieved image quality (a measure for distortion) in terms of SSIM ranges from 0.3848 to 0.9432, and PSNR ranges from 13.97 to 34.19 dB. In terms of effective data payload, Proposed Method I offers up to 8887.77 bits (equivalent to 0.19 bits per non-zero AC coefficient [BpNAC]). Note that, no side information is needed when operating in rewritable mode because all the blocks are utilized for scrambling–embedding. In addition, the bitstream size increment is 0.66%, on average, for all considered parameters. Here, some gain in compression is achieved due to preprocessing that removes ZRV pairs from the block. In general, it is also observed that the

image quality degrades progressively and the effective data payload increases progressively when either agLim or abLim increases. For completion of discussion, the performance of Proposed Method I in reversible mode is also examined and the results are recorded in the last column of Table 3. On average, 7.91% of the blocks are only scrambled (i.e., without data embedding) because they fail to satisfy Property 1. Hence, a location map is needed to differentiate the scrambled–embedded blocks from the scrambledonly blocks. Although the gross data payload for Proposed Method I in reversible mode is 8209.65 bits, its effective data payload becomes 5137.65 bits after storing the location map as side information. Note that the location map is of size M=8  N=8 bits regardless of the number of unfit blocks. Nevertheless, bitstream size remains unchanged (up to byte alignment level) and there is no loss of information in the reconstructed image. Similarly, Table 4 records the average results of the output image quality, changes on bitstream size, and effective data payload achieved by Proposed Method II using the UCID database. By using different parameters and types of operation (i.e., row and/or column operation), the achieved image quality in terms of SSIM ranges from 0.3551 to 0.5912, and PSNR ranges from 10.37 to 13.72 dB. The average effective data payload achieved by Proposed Method II ranges from 99.00 bits (viz., 0.03 BpDC) to 1119.35 bits (viz., 0.36 BpDC) where BpDC refers to bits per DC coefficient. In this method, the indices of unfit rows and columns are recorded as side information. The effective data payload in Table 4 is obtained after considering the side information. Here, the index of an unfit row (column) consumes ⌈log2 M=8⌉ (log2 ⌈N=8⌉) bits. In the

Please cite this article as: S. Ong, et al., Scrambling–embedding for JPEG compressed image, Signal Processing (2014), http://dx.doi.org/10.1016/j.sigpro.2014.10.028i

91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 121 123

S. Ong et al. / Signal Processing ] (]]]]) ]]]–]]]

11

1

63

3

65

5

67

7

69

9

71

11

73

13

75

15

77

17

79

19

81

21

83

23

85

25

87

27

89

29

91

31

93

33

95

35

97

37

99

39

101

41

103

43

105

45

107

47

109

49

111

51

113

53 55 57 59 61

Fig. 10. Output images from Proposed Method I, II, III and the proposed combined method. (a) Original Image, (b) Proposed Method I (agLim=16), (c) Proposed Method II - A0R (μ ¼ 8 & Ψ ¼ Max), (d) Proposed Method II - A0R  C (μ ¼ 8 & Ψ ¼ Max), (e) Proposed Method III - A0R (μ ¼ 8 & Ψ ¼ Max), (f) Proposed Method III - A0R  C (μ ¼ 8 & Ψ ¼ Max), (g) Proposed Combined Method.

worse case scenario (i.e., all rows and columns are unfit), ðM=8Þ  ⌈log2 M=8⌉ and ðN=8Þ  ⌈log2 N=8⌉ bits will be spent for side information. Overall, it is observed that the effective data payload increases when either μ or Ψ increases for both the row and column operations. In addition, the increment of bitstream size in the output

image after the row operation ranges from 0.04 to 0.38%, while the increment after both row and column operations ranges from 3.79% to 4.45%. The bitstream size increment due to row operation is comparatively low because these operations destroy the vertical correlation of the DC coefficients, which is irrelevant to DPCM in JPEG. However,

Please cite this article as: S. Ong, et al., Scrambling–embedding for JPEG compressed image, Signal Processing (2014), http://dx.doi.org/10.1016/j.sigpro.2014.10.028i

115 117 119 121 123

S. Ong et al. / Signal Processing ] (]]]]) ]]]–]]]

12

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

column operations destroy the horizontal correlation of the DC coefficients, making DPCM ineffective and hence leading to greater bitstream size increment. To further control the degradation in image quality, parameter φ is utilized and the results for various φ for μ ¼ 4 and Ψ ¼ 1 are shown in Table 5. When φ increases, the output image quality degrades progressively due to the row and column operations, where SSIM ranges from 0.3833 to 0.7048 and PSNR ranges from 10.54 to 15.57 dB. Similarly, the effective data payload increases progressively when φ increases, with bitstream size increment of less than 4%. Next, Table 6 shows the average results for Proposed Method III obtained by using the AC block energy Λ. Again, the performance is measured in terms of output image quality, change in bitstream size, and effective data payload for different parameters using the UCID database. Similar to Proposed Method II, Proposed Method III is also able to achieve different levels of distortion and effective data payload. The output image quality ranges from 0.2991 to 0.4775 in terms of SSIM, and ranges from 18.36 to 19.58 dB in terms of PSNR. The effective data payload ranges from 87.26 bits (viz., 0.03 BpB) to 567.28 bits (viz., 0.18 BpB) where BpB refers to bits per AC Block. Proposed Method III also records the indices for the unfit rows and columns for image recovery and data extraction purposes. Thus, the effective data payload in Table 6 is obtained by subtracting the side information (which consumes ⌈log2 M=8⌉ bits per unfit row and ⌈log2 N=8⌉ bits per unfit column) from the gross data payload. Again, in the worse case scenario (i.e., all rows and columns are unfit), ðM=8Þ  ⌈log2 M=8⌉ and ðN=8Þ  ⌈log2 N=8⌉ bits will be spent for side information. Similar to the results achieved by

33 35

Table 8 Performance of the proposed combined method.

37

Proposed Method Ia  IIb  IIIb

39 41 43

Image quality degradation (SSIM) Image quality degradation (PSNR) Bitstream size changes [%] Effective data payload [Bits] Effective data payload [Bits Per Pixels]

0.0548 8.63 4.48 10,238.01 0.05

45 a

47 49

b

uses agLim ¼ abLim ¼ 16. uses μ ¼ 8 and Ψ ¼ Max.

63 Proposed Method II, the image quality degrades progressively and the effective data payload increases progres65 sively when μ and Ψ increase. Nevertheless, the Proposed Method III does not induce noticeable change in the size of 67 the encoded bitstream for various test images (  0%). Unlike Proposed Method II which is implemented to 69 process the DC coefficients, destroying the horizontal and vertical correlations in the AC Block Energy level does not 71 affect the bitstream size. This is because the AC coefficients are coded independently within an 8  8 coefficient block 73 in JPEG and hence permuting the AC blocks does not affect the coding efficiency. Similarly, the perceptual quality is 75 controllable in Proposed Method III by using φ, and the results are recorded in Table 7. Here, the output image 77 quality degrades progressively when φ increases. In particular, the achievable image quality ranges from 0.3302 to 79 0.6353 in terms of SSIM, and ranges from 18.59 to 21.35 dB in terms of PSNR. In addition, the achievable effective data 81 payload ranges from 31.84 to 188.36 bits when various φ are considered. Here, the effective data payload increases 83 when φ increases, with insignificant change in bitstream size. 85 The output images for Proposed Method I, II and III, as well as the combined method (i.e., Proposed Method I, II, 87 and III) are shown in Fig. 10. The average results for the combined method are recorded in Table 8. As the system Q2 89 of combined methods, the lowest image quality of SSIM ¼0.0548 or PSNR ¼8.63 dB is achieved by invoking all 91 Proposed Method I, II and III. On the other hand, the highest image quality of SSIM ¼0.9432 or PSNR ¼34.19 dB 93 is achieved by using Proposed Method III only. From the results, it is found that the bitstream size increment is 95 dominated by Proposed Method I, and thus the combined method achieves similar bitstream size increment as in 97 Proposed Method I, which is  4:48%. As a system, the combined method offers, on average, 10,238.01 bits (viz., 99 0.05 bits per pixel), as the highest effective data payload when using the UCID database. 101 4.2. Comparison with related works 103 In this section, the proposed combined method (i.e., system) is compared with the conventional and relevant 105 methods using the standard test images (i.e., Airplane, Baboon, Boat, Lake, Lenna, and Peppers [17]), each com- 107 pressed at the quality factor of 80. Zhang et al.'s [9] and Ma et al.'s [10] methods are re-implemented to handle JPEG 109 111

Table 9 Comparison with related conventional methods.

113

51 53 55 57 59 61

Zhang et al.'s method [9] Ma et al.'s method [10] Kundur et al.'s method [8] Integrated method  Proposed Method



Image quality (SSIM)

Image quality (PSNR)

Data payload [Bits]

Bitstream size changes [%]

Reversibility

Scalability

0.0107 0.0108 0.4165

6.45 6.45 18.87

4300.80 32,448.83 63,698.17

212.50 212.63  0.01

Irreversible Reversible Irreversible

No No No

0.0626 0.0638–0.9152

10.46 10.05- 36.47

13,202.67 7.71 669.83–14,861.67  0.48–3.19

Reversible Reversible / rewritable

No Yes

115

Combination of scrambling method by Takayama et al. [21] and reversible data embedding method by Xuan et al. [22].

Please cite this article as: S. Ong, et al., Scrambling–embedding for JPEG compressed image, Signal Processing (2014), http://dx.doi.org/10.1016/j.sigpro.2014.10.028i

117 119 121 123

S. Ong et al. / Signal Processing ] (]]]]) ]]]–]]]

13

1

63

3

65

5

67

7

69

9

71

11

73

13

75

15

77

17

79

19

81

21

83

23

85

25

87

27

89

29

91

31

93

33

95

35

97

37

99

39

101

41

103

43

105

45

107

47

109

49

111

51 53 55 57 59 61

Fig. 11. Output image for the proposed combined method and related conventional methods. (a) Proposed combined method, (b) Zhang et al.'s method [9], (c) Ma et al.'s method [10], (d) Kundur et al.'s method [8], (e) Integrated Method  [21,22].

compressed image. To the best of our knowledge, there is no scrambling–embedding method in the compressed domain other than that proposed by Kundur et al.'s method [8], Therefore, in this work, Kundur et al.'s method is utilized as the benchmark. Nonetheless, for completion of discussion, Takayama et al.'s scrambling method [21] and Xuan et al.'s reversible data embedding method [22]

in compressed domain are integrated for comparison purposes. Specifically, Takayama et al. shuffle the DC coefficients, randomize the order of ZRV pairs in each block, and permute the AC coefficient blocks [21]. On the other hand, Xuan et al. modify the histogram of the coefficients to embed data [22]. The integrated method is denoted by  .

Please cite this article as: S. Ong, et al., Scrambling–embedding for JPEG compressed image, Signal Processing (2014), http://dx.doi.org/10.1016/j.sigpro.2014.10.028i

113 115 117 119 121 123

S. Ong et al. / Signal Processing ] (]]]]) ]]]–]]]

14

1 3 5 7 9 11 13

Table 9 shows the results for each considered method and the proposed method. Zhang et al.'s method and Ma et al.'s method are able to generate a significantly distorted output image, but the bitstream size increment is very large (  212%). On the other hand, the bitstream size increment for the proposed combined method is only  3:19%. It is worth mentioning that the distortion achieved by the proposed method can be further intensified by randomizing the sign of each nonzero AC coefficient, which does not affect the bitstream size. Although Kundur et al.'s method achieves very high data payload Table 10 Comparative results for all sketch attacks.

15 17 19 21 23

Zhang et al.'s method [9] Ma et al.'s method [10] Kundur et al.'s method [8] Niu et al.'s method [24] Integrated method  Proposed Method

DCM

NCC

EAC

PLZ

Pass Pass N/A Fail Pass Pass

Fail Fail Fail Pass Pass Pass

Pass Pass Fail Pass Pass Pass

Fail Fail Fail Pass Pass Pass

 Combination of scrambling method by Takayama et al. [21] and reversible data embedding method by Xuan et al. [22].

and maintains the bitstream size, it is irreversible. Next, the parameters in  is fine-tuned to achieve similar (but always smaller) data payload as the proposed combined method. Although  is reversible, it causes larger bitstream size increment than that of the proposed combined method even when embedding a smaller amount of external data. In addition, all the conventional methods considered cannot degrade the image quality or increase data payload progressively, but the proposed method is able to achieve a wide range of distortion and data payload by adjusting the parameters agLim, abLim, μ, Ψ and φ. Therefore, the proposed combined method: (a) provides more flexibilities to suit the needs of various applications, (b) offers comparable performance when compared to the related conventional methods, and (c) achieves insignificant bitstream size increment. For visual comparison, Fig. 11 shows the output image for each method considered. It is observed that the output image for Zhang et al.'s and Ma et al.'s methods are fairly distorted, and thus the perceptual meaning of the original image is somewhat concealed. The output image generated by the integrated method  is also distorted. However, Kundur et al.'s method only partially distorts

63 65 67 69 71 73 75 77 79 81 83 85

25

87

27

89

29

91

31

93

33

95

35

97

37

99

39

101

41

103

43

105

45

107

47

109

49

111

51

113

53

115

55

117

57

119 121

59 61

Fig. 12. Sketches produced by the simple sketch attacks when applied on the related conventional methods. Here, only the selected successful cases are shown. Other outcomes are summarize in Table 10. (a) NCC on [9], (b) PLZ on [10], (c) EAC on [8], (d) DCM on [24].

Please cite this article as: S. Ong, et al., Scrambling–embedding for JPEG compressed image, Signal Processing (2014), http://dx.doi.org/10.1016/j.sigpro.2014.10.028i

123

S. Ong et al. / Signal Processing ] (]]]]) ]]]–]]]

15

1

63

3

65

5

67

7

69

9

71

11

73

13

75

15

77

17

79

19

81

21

83

23

85

25

87

27

89

29

91

31

93

33

95

35

Fig. 13. Output of all four sketch attacks applied on the proposed combined method (Proposed Method I, II and III). (a) NCC, (b) PLZ, (c) EAC, (d) DCM.

97

37

the image because merely the sign information is randomized. On the other hand, the distortion level of the output image can be controlled by the proposed combined method, and the most distorted output image resembles noise (i.e., block-based-like noise). Last but not least, four sketch attacks are considered to examine the robustness of the proposed and conventional methods considered. The first technique is known as DCM (DC category mapping) attack, where it assigns a representative value to each DC category [23]. This technique is effective in attacking method which substitutes the DC coefficients by another value in the same category (e.g., Niu et al. method [24]). This technique is explained in greater detail in [23]. In addition, three techniques proposed in [19], namely, NCC (non-zero AC coefficient count), EAC (energy of AC coefficients block), and PLZ (position of last non-zero AC coefficients), are considered to sketch the plaintext directly by using the AC components of the scrambled image. Detailed information about these attacks can be found in [19]. Zhang et al.'s method, Ma et al.'s method, Kundur et al.'s method, Niu et al.'s method [24], the integrated method  , and the proposed combined method are compared in terms of robustness against the aforementioned sketch attacks. Table 10 summarizes the outcome of

the sketch attacks, which are based on visual inspection to determine whether the outline of the image is revealed. If the outline of the image is visible under a particular sketch attack (see Fig. 12 for successful attacks), it will be labeled as Fail in the table. Otherwise, it will be labeled as Pass (see Fig. 13 for failed attacks) because the image information remains concealed. Zhang et al.'s method and Ma et al.'s method pass the DCM and EAC attacks because the DC and AC coefficient values are modified by using the XOR operations. However, the attacker is able to sketch the outline of the plaintext image by using either NCC or PLZ attack because the number of non-zero AC coefficients and the position of the last non-zero AC coefficients remain intact in these methods. On the other hand, Kundur et al.'s method is vulnerable to all sketch attacks because only the sign information is randomized. Niu et al.'s method survives the NCC, PLZ and EAC attacks because it permutes the AC coefficient blocks. However, it fails the DCM attack. The sketch generated for these methods are shown in Fig. 12. From these sketches, the outline of the Lenna image can be recognized. It should be noted that sketch of higher quality is achievable when the image resolution increases. On the other hand, the proposed combined method and the integrated method  survive all four sketch attacks considered. The corresponding sketches for

99

39 41 43 45 47 49 51 53 55 57 59 61

Please cite this article as: S. Ong, et al., Scrambling–embedding for JPEG compressed image, Signal Processing (2014), http://dx.doi.org/10.1016/j.sigpro.2014.10.028i

101 103 105 107 109 111 113 115 117 119 121 123

16

1 3 5

S. Ong et al. / Signal Processing ] (]]]]) ]]]–]]]

the proposed combined method are shown in Fig. 13. Although  is able to survive all sketch attacks, its degree of distortion is not scalable and the bitstream size increment is 4 2 times larger than that of the proposed combined method while embedding a smaller amount of data.

7 5. Conclusion 9 11 13 15 17 19 21 23 25 27 29 31 33

A reversible unified information hiding method was proposed to realize scrambling–embedding in JPEG compressed image. Two techniques were proposed and applied to three non-overlapping components in JPEG compressed image, namely, the DC coefficients, AC block energy and zero-run value pairs. The proposed combined method could achieve scalability in data payload, which ranged from 32 to 10,238 bits, and scalability in perceptual quality degradation with SSIM value ranging from 0.0548 to 0.9432 and PSNR value ranging from 8.63 to 34.19 dB. In addition, the maximum observed bitstream size increment is 4.48%. Experimental results verified that the proposed combined method is able to achieve comparable performance when compared to the related conventional methods. The proposed combined method also survives all four sketch attacks while the related conventional methods fail for at least one sketch. However, the fact that each coefficient block is processed individually in the proposed scrambled–embedded method may be a limitation of our method. we shall study and further improve on this aspect as our future work. Besides, we want to implement the proposed combined method in other domains, such as video and audio. We also want to design more flexible techniques in controlling the data payload and perceptual quality degradation.

35 References 37 39 41 43 45 47 49

[1] D. Van De Ville, W. Philips, R. Van De Walle, I. Lemahieu, Image scrambling without bandwidth expansion, IEEE Trans. Circuits Syst. Video Technol. 14 (6) (2004) 892–897. [2] M.S.A. Karim, K. Wong, Universal data embedding in encrypted domain, Signal Process. 94 (0) (2014) 174–182. [3] D. Engel, E. Pschernig, A. Uhl, An analysis of lightweight encryption schemes for fingerprint images, IEEE Trans. Inf. Forensics Secur. 3 (2) (2008) 173–182. [4] B. Mobasseri, R. Berger, M. Marcinak, Y. NaikRaikar, Data embedding in JPEG bitstream by code mapping, IEEE Trans. Image Process. 19 (4) (2010) 958–966. [5] C.-W. Lee, W.-H. Tsai, A data hiding method based on information sharing via PNG images for applications of color image authentication and metadata embedding, Signal Process. 93 (7) (2013) 2010–2025.

[6] C. Adsumilli, S. Mitra, T. Oh, Y. Kim, Error concealment in video communications by informed watermarking, in: L.-W. Chang, W.-N. Lie, (Eds.), Advances in Image and Video Technology, Lecture Notes in Computer Science, vol. 4319, Springer Berlin, Heidelberg, 2006, pp. 1094–1102. [7] K.-L. Chung, Y.-H. Huang, P.-C. Chang, H.-Y.M. Liao, Reversible data hiding-based approach for intra-frame error concealment in h.264/ avc, IEEE Trans. Circuits Syst. Video Technol. 20 (Nov (11)) (2010) 1643–1647. [8] D. Kundur, K. Karthik, Video fingerprinting and encryption principles for digital rights management, Proc. IEEE 92 (6) (2004) 918–932. [9] X. Zhang, Separable reversible data hiding in encrypted image, IEEE Trans. Inf. Forensics Secur. 7 (2) (2012) 826–832. [10] K. Ma, W. Zhang, X. Zhao, N. Yu, F. Li, Reversible data hiding in encrypted images by reserving room before encryption, IEEE Trans. Inf. Forensics Secur. 8 (3) (2013) 553–562. [11] S. Ong, K. Wong, K. Tanaka, Reversible data embedding using reflective blocks with scalable visual quality degradation, in: International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2012, pp. 363–366. [12] S. Ong, K. Wong, K. Tanaka, A scalable reversible data embedding method with progressive quality degradation functionality, Signal Process. Image Commun. (0) . [13] J. Tian, J. Wells, R.O., Reversible data-embedding with a hierarchical structure, in: International Conference on Image Processing, vol. 5, 2004, pp. 3419–3422. [14] M. Fujiyoshi, K. Kuroiwa, H. Kiya, A scrambling method for motion JPEG videos enabling moving objects detection from scrambled videos, in: International Conference on Image Processing, 2008, pp. 773–776. [15] W.B. Pennebaker, J.L. Mitchell, JPEG Still Image Data Compression Standard, 1st ed. Kluwer Academic Publishers, Norwell, MA, USA, 1992. [16] T.81: Information technology - digital compression and coding of continuous-tone still images - requirements and guidelines. [Online]. Available: 〈http://www.itu.int/rec/T-REC-T.81/en〉. [17] The USC-SIPI image database [online]. Available: 〈http://sipi.usc.edu/ database/〉. [18] W. Li, Y. Yuan, A leak and its remedy in JPEG image encryption, Int. J. Comput. Math. Comput. Vis. Pattern Recognit. 84 (Sep. (9)) (2007) 1367–1378. [19] K. Minemura, Z. Moayed, K. Wong, X. Qi, K. Tanaka, JPEG image scrambling without expansion in bitstream size, in: IEEE International Conference on Image Processing, 2012, pp. 261–264. [20] G. Schaefer, M. Stich, UCID—an uncompressed colour image database, in: In Storage and Retrieval Methods and Applications for Multimedia 2004, Proceedings of SPIE, vol. 5307, 2004, pp. 472–480. [21] M. Takayama, K. Tanaka, A. Yoneyama, Y. Nakajima, A video scrambling scheme applicable to local region without data expansion, in: IEEE International Conference on Multimedia and Expo, July 2006, pp. 1349–1352. [22] G. Xuan, Y. Shi, P. Chai, X. Cui, Z. Ni, X. Tong, Optimum histogram pair based image lossless data embedding, in: Y. Shi, H.-J. Kim, S. Katzenbeisser (Eds.), Digital Watermarking, Lecture Notes in Computer Science, vol. 5041, 2008, pp. 264–278. [23] S. Ong, K. Minemura, K. Wong, Progressive quality degradation in JPEG compressed image using DC block orientation with rewritable data embedding functionality, in: International Conference on Image Processing, 2013. [24] X. Niu, C. Zhou, J. Ding, B. Yang, JPEG encryption with file size preservation, in: International Conference on Intelligent Information Hiding and Multimedia Signal Processing, August 2008, pp. 308 –311.

Please cite this article as: S. Ong, et al., Scrambling–embedding for JPEG compressed image, Signal Processing (2014), http://dx.doi.org/10.1016/j.sigpro.2014.10.028i

51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99