ARTICLE IN PRESS
Signal Processing 88 (2008) 174–188 www.elsevier.com/locate/sigpro
Human visual system based adaptive digital image watermarking Huiyan Qi, Dong Zheng, Jiying Zhao Multimedia Communications Research Laboratory, School of Information Technology and Engineering, University of Ottawa, 800 King Edward Avenue, Ottawa, Ont., Canada K1N 6N5 Received 29 October 2006; received in revised form 16 July 2007; accepted 17 July 2007 Available online 28 July 2007
Abstract It is known that there is a trade off between the imperceptibility and robustness of a digital watermarking system. Trying to deal with this problem, a new adaptive digital image watermarking method is proposed. The new spatial masking is built according to the image features such as the brightness, edges, and region activities. With the same watermark embedding energy, the quality of watermarked image using the proposed adaptive masking is much better than the one without using the adaptive masking. The wPSNR is used to evaluate the image quality difference which is fed into the new spatial masking function to control the watermark embedding adaptively. The watermark is detected by a key-dependent method without knowing any information of the original image. In addition, we also extend this proposed spatial masking to the DCT domain by searching the extreme value of the quadratic function subject to the bounds on the variables. In experiments, this scheme shows good performance. r 2007 Elsevier B.V. All rights reserved. Keywords: HVS masking; wPSNR; Watermarking energy; DCT domain masking
1. Introduction The growth of the digital multimedia technology and the successful development of the Internet allow people to process, deliver and store information more easily. This kind of advantage also raises the issue of how to protect the copyright ownership. Digital watermarking is an effective method for copyright protection. The watermarking technology pays lots of attention to the robustness against attacks. Generally, a digital image watermarking system is similar to a communication system Corresponding author.
E-mail addresses:
[email protected] (H. Qi),
[email protected] (D. Zheng),
[email protected] (J. Zhao).
composed by three main elements: a watermark embedder, a communication channel, and a watermark detector. Usually, the embedder and the detector are more attractive to the researchers. Of course, researchers also take the properties of the communication channel into account during the design of the digital watermarking system. The watermark embedding can be represented by the following equation: y ¼ x þ aw,
(1)
where y is the watermarked image, x is the original image, w is the watermark, and a controls the embedding strength. The watermark can be detected by calculating the value of the correlation between the watermark and the received watermarked image.
0165-1684/$ - see front matter r 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2007.07.020
ARTICLE IN PRESS H. Qi et al. / Signal Processing 88 (2008) 174–188
In order to resist the normal signal processing and other different attacks, we wish the embedding strength to be as high as possible. However, because the watermark directly affects the original image, it is obvious that the higher the embedding strength, the lower the quality of the watermarked image will be. In other words, the robustness and the imperceptibility of the watermark are contradictory to each other. The best method is to take the human visual system (HVS) into account when embedding watermark. In this way, the strength of watermark is adaptive to the features of the original image to guarantee the maximum-possible imperceptivity of watermark. 2. Literature review Usually, people use the peak signal-to-noise ratio (PSNR) to measure the quality of an image. For an 8-bit gray scale image, the equation of PSNR is as follows: PSNR ¼ 10 log10
2552 MSE
(2)
with MSE ¼
M X N 1 X ðyði; jÞ xði; jÞÞ2 , M N i¼1 j¼1
(3)
where yði; jÞ and xði; jÞ are the pixel value in the watermarked image and the original image, respectively. M N presents the image size. According to Eq. (1), MSE ¼
M X N 1 X ðawði; jÞÞ2 . M N i¼1 j¼1
(4)
It is well known that the PSNR cannot reflect the human perception of the quality change/degradation precisely. So HVS is introduced to give an accurate evaluation of subjective quality. HVS is one of the most complex biological system which includes three stages: encoding, representation and interpretation [1]. Many researchers found that a lot of factors cause human vision to have a limited sensitivity. For example, the surface of the cornea causes the refraction; the circular entrance of the pupil causes the diffraction; the optical lens has the chromatic aberration effects, and the mosaic of photo-receptors only does the spatial sampling process [1]. These limitations and defections imply that the human vision does not respond to the small stimuli very well, and it cannot tell the difference
175
among the signals beyond certain precision either. However, this perceptive redundancy gives us a good opportunity to embed watermark in the image. Barni [2] summarized the observations and experiments and gave the following three rules, which are the foundation of the spatial masking in this paper: (1) Disturbs are much less visible in highly textured regions than in uniform areas. (2) Contours are more sensitive to noise addition than highly textured regions but less than flat areas. (3) Disturbs are less visible over dark and bright regions. There are many different human visual system models built for watermarking system. Delaigle [3] proposed an HVS masking which is built in the fast Fourier transform (FFT) domain with setting the threshold on Garbor filter. He concentrated on the local area energy and only cast the watermark on the horizontal direction of the spatial frequency. As a result, the watermarking algorithm brought obviously visible ring effect around the edges. Martin [4] used isotropic contrast function and set the threshold based on the subjective experiments to create masking for the watermark. He calculated the masking in the wavelet domain and transformed to the spatial domain to embed watermark. The result shows that the masking mostly happens around the edges and almost none in the flat area. Trying to avoid the ring effect, Hannigan [5] combined the edge detection and the contrast masking together. But he could not deal with the salt and pepper noise in the area with narrow, parallel lines, for example, human hair. Bartolini and Barni [6] followed the three rules to develop their masking. First, the image was filtered by the band pass filter, where the watermark was to be embedded. The second step was to use the edge detection to locate the edge area in the original image. The third and fourth steps applied the median filter with different thresholds to determine the areas with brightness and darkness. The final step was the summation of the results of the previous four steps, the result of the summation is a binary image. An offset value was given to the final summation which results in the minimum amount of watermark energy inserted. The advantage of this method is that the masking tells where the watermark should be strong and
ARTICLE IN PRESS H. Qi et al. / Signal Processing 88 (2008) 174–188
176
where the watermark should be weak, but it also has some weakness. The first problem is that it is quite difficult to determine the offset value for different images. The second problem is that it did not give the mask value of the image but only the locations of the watermark. Based on the maximum a posteriori probability (MAP) theory and Markov random field (MRF), Voloshynovskiy [7] used the noise visibility functions (NVF) to adjust the watermark embedding strength. However, it could not deal with flat areas. The Watson model [8] assesses the image visual quality by estimating the multiples of the just noticeable differences (JNDs) between the original work and the distorted work. According to the mechanism of the human visual system, three factors are considered in the Watson model in order to more closely approximate the perceptual quality of an image. The three factors are: a frequency sensitivity function, a luminance masking function, and a contrast masking function. A pooling process is used then to combine all the estimated local perceptual distances together to achieve a global perceptual distance. The Watson model is designed in the discrete cosine transform (DCT) domain. As for the watermark detection, Christine and Podilchuk [9] embedded watermark with Watson model masking and used non-blind detection to extract the watermark. This method is limited by the availability of the original image. 3. Proposed HVS based adaptive digital watermarking method
masking for the watermark in the spatial domain as shown in Fig. 1. 3.1. Luminance masking The reason to use luminance is that the darkness and brightness in the image is reflected by the luminance. The visibility of luminance threshold in the spatial domain depends on two factors: average background luminance and the non-uniformity of the background luminance. A very good luminance masking proposed by Chou [10] is built based on these two factors. It gives the just noticeable difference of each gray scale level. All the calculations are based on the image block of size 5 5, therefore in the implementation, we will use a sliding window of size 5 5. This luminance masking is also used in the video distortion measurement system [11]. In the Chou’s algorithm, all the parameters are determined by the experimental environment and statistical results. The calculations of this masking are as follows [10]: M L ðx; yÞ ¼ maxf 1 ðbgðx; yÞ; mgðx; yÞÞ, f 2 ðbgðx; yÞÞg, f 1 ðbgðx; yÞ; mgðx; yÞÞ ¼ mgðx; yÞaðbgðx; yÞÞ þ bðbgðx; yÞÞ, 8 0:5 ! > > < T 0 1 bgðx; yÞ þ 3; 127 f 2 ðbgðx; yÞÞ ¼ > > : gðbgðx; yÞ 127Þ þ 3;
ð6Þ
bgðx; yÞp127; bgðx; yÞ4127;
(7)
After reviewing different masking models, we introduce a new approach which takes those three rules (summarized by Barni [2]) simultaneously into account. We determine the luminance masking, texture masking and edge masking based on the image features, and then combine these three masking together to get a comprehensive final
aðbgðx; yÞÞ ¼ 0:0001bgðx; yÞ þ 0:115, 0pxoH; 0pyoW ,
Texture Masking
ð8Þ
bðbgðx; yÞÞ ¼ l 0:01bgðx; yÞ, 0pxoH; 0pyoW ,
Luminance Masking Original Image
ð5Þ
Combination
Edge Masking
Fig. 1. HVS masking.
Final Masking
ð9Þ
ARTICLE IN PRESS H. Qi et al. / Signal Processing 88 (2008) 174–188
where M L is the luminance masking, f 1 is the spatial masking function. bgðx; yÞ is the background luminance and mgðx; yÞ is the maximum weighted average of luminance differences around the pixel at ðx; yÞ. H and W indicate the image height and width. f 2 is the visibility function based on the background luminance. The parameter a and b are the background luminance dependent functions. The value of some other parameters are: T 0 ¼ 17, g ¼ 3=128, l ¼ 1=2. The calculation of mgðx; yÞ function is as follows: mgðx; yÞ ¼ max fjgradk ðx; yÞjg,
(10)
k¼1;2;3;4
177
Table 4 Parameter G 4 in the luminance masking calculation 0 0 0 0 0
1 3 8 3 1
0 0 0 0 0
Finally, the calculation of bgðx; yÞ is as follows: bgðx; yÞ ¼
5 X 5 1 X pðx 3 þ i; y 3 þ jÞBði; jÞ, 32 i¼1 j¼1
0pxoH; 0pyoW . gradk ðx; yÞ ¼
5 X 5 1 X pðx 3 þ i; y 3 þ jÞGk ði; jÞ, 16 i¼i j¼1
0pxoH; 0pyoW ,
ð11Þ
where pðx; yÞ is the pixel value at the position ðx; yÞ, H and W represent the image block size. The value of G 1 , G 2 , G 3 ; and G 4 are shown in Tables 1–4:
0 0 0 0 0
1 3 8 3 1
ð12Þ
The value of B is defined in Table 5: Table 5 Parameter B in the luminance masking calculation 1 1 1 1 1
1 2 2 2 1
1 2 0 2 1
1 2 2 2 1
1 1 1 1 1
Table 1 Parameter G1 in the luminance masking calculation 0 1 0 1 0
0 3 0 3 0
0 8 0 8 0
0 3 0 3 0
0 1 0 1 0
Table 2 Parameter G2 in the luminance masking calculation 0 0 1 0 0
0 8 3 0 0
1 3 0 3 1
0 0 3 8 0
0 0 1 0 0
Table 3 Parameter G3 in the luminance masking calculation 0 0 1 0 0
0 0 3 8 0
1 3 0 3 -1
0 8 3 0 0
0 0 1 0 0
3.2. Texture masking As for texture masking, the simplest method is to use the normalized variance. The first step is to use a sliding window of size 3 3 to calculate the local variance. The second step is to find the maximum value of the local variance. The third step is to normalize the variance value of the whole image with the maximum from step two. The weakness of this method is that it locates the image texture, but it does not determine the exact masking value. Voloshynovskiy [7] used the local variance to determine the noise visibility function. The same problem was that it could not determine the masking value directly, instead it only did the adjustment to the embedding strength. It is obvious that the local variance shows the relationship between the individual pixel value and the mean value of the group pixels inside the sliding window. This is the key element to represent whether the image texture is high activity or uniform. The texture masking we proposed here uses the absolute value of the distance between each pixel and local average value of the pixels within the sliding window as the masking, as shown in
ARTICLE IN PRESS H. Qi et al. / Signal Processing 88 (2008) 174–188
178
Eq. (13). The size of sliding window is still 3 3. M T ¼ jxði; jÞ xði; ¯ jÞj,
xði; ¯ jÞ ¼
L L X X 1 xði þ k; j þ lÞ, ð2L þ 1Þ2 k¼L l¼L
(13)
(14)
where, xði; jÞ is the pixel at the position ði; jÞ. M T is the texture masking, ð2L þ 1Þ2 represents the number of pixels in the image block. We do not directly choose to use the variance or the standard deviation as the masking. According to our experiments, if the variance or the standard deviation was used as the masking for some areas like edge areas, the watermarked image would have very obviously visible salt and pepper noise; if the difference between pixel value and the local mean value is used as the masking, the fidelity of the watermarked image compared to the original image is better. So empirically, the difference between pixel value and the local mean value is used as the masking.
Fig. 2. Original image ‘‘Lena’’.
3.3. Edge masking After introducing the luminance and texture masking, we also consider the edge masking. Many proposed methods did not handle this very well since the edge detection can only locate the edge in a binary way, that is, whether it is an edge or not. They cannot give the exact masking value such as how many pixels on the edge can be changed. Of course, when the texture masking is created, it also includes the edge parts. However, if the texture masking is used directly, the edge can be easily corrupted and it results in severe distortions to the image. Therefore, we have to correct the edge regions. Here we will use improved un-sharp masking method proposed by Keith Wiley [12] to generate edge masking. The boundary of an object in a gray scale image is located at where the sharp changing of the gray level happens. When human vision senses the object, it recognizes the outlines by enhancing the edge beyond its true natural appearance. The mechanism of this enhancement in the vision system decreases the dark side of the edge and increases the bright side of the edge at the same time. Therefore, it makes the edges of the object more obvious. There are two ways to create the edge masking. One is the un-sharp masking which blurs the original image first, and then gets the difference
Fig. 3. Resulting edge masking by applying un-sharp filter to image ‘‘Lena’’.
between original image and blurred image. For the original image ‘‘Lena’’ in Fig. 2, the result of the unsharp masking by using Matlab un-sharp filter is shown in Fig. 3. From the result, we find that although the un-sharp filter gives the edges of the person, it does not cover all the edges in the background.
ARTICLE IN PRESS H. Qi et al. / Signal Processing 88 (2008) 174–188
179
The other way to create edge masking is using the Laplacian filter which is also referred to as improved un-sharp masking [12], shown as follows: M E ¼ LðIÞ.
(15)
where M E is the edge masking, I is the original image, and Lð:Þ is the Laplacian filtering operation. We choose the Laplacian filter to deal with the edge masking because it can cover very small gray level changes. The Laplacian filter detects every pixel change happening in the image, and it results in a gray-scale filtered image, as shown in Fig. 4. From the mechanism we mentioned above, this edge masking generated by the Laplacian filter considers every pixel’s gray scale change. In order to deal with this situation, during the implementation, we use edge detection to help locate the edges. The algorithm we use for the edge detection is Canny’s algorithm, because it is good at detecting the weak edges. The edge detection result by using Canny’s algorithm is shown in Fig. 5. The edge detection result by using Canny’s algorithm is a binary image. It shows the location of the edges with white curves. In order to avoid the salt and pepper noise appearing in the very thin curve areas such as human hair, which is the unsolved problem mentioned and happened in Hannigan method [5], the dilation processing is added after the edge detection.
Fig. 4. Resulting edge masking by applying Laplacian filter to image ‘Lena’’.
Fig. 5. Edge detection result by applying Canny’s algorithm to the image in Fig. 4.
Dilation and erosion processing are a pair of morphological algorithms which are very useful in the application of boundary extraction, image segment and pattern recognition. Morphology is a group of image processing operations based on the image shapes. At first, the dilation operation defines the structuring element which is a matrix consisting of only 0’s and 1’s. The matrix can have any arbitrary shape and size. The elements in the matrix with a value of 1 define the neighborhood. Usually, the matrix size much smaller than the image size is good for the local processing. The rule [13] of the dilation operation is that the value of the output pixel is the maximum value of all the pixels in the input pixel’s neighborhood defined by the structuring element. In a binary image, if any of the pixels is set to the value 1, the output pixel is set to 1. In addition, pixels outside the border restricted by the structuring element are assigned the minimum value afforded by the data type. For binary images, these pixels are set to be 0. The shape of the structuring element is not fixed, it can be any of circle, diamond, line, square and so on. During the implementation, since we use dilation operation to reduce some texture masking value, and since the sliding window for producing texture masking is 3 3, we chose the structuring element in the dilation operation to be square shaped and with the size 3 3 . The result of dilated edge detection is a binary image, as shown in Fig. 6.
ARTICLE IN PRESS 180
H. Qi et al. / Signal Processing 88 (2008) 174–188
3.5. Watermark embedding Having produced the HVS masking, we can use it to control the embedding strength. The watermark embedding equation is already introduced in Eq. (1). After we use the masking, the equation will be: y ¼ x þ mw,
(17)
where y is the watermarked image, x is the original image, w is the watermark, m is the masking value. The watermark is defined as a direct spread spectrum sequence, which is generated by the pseudo random sequence with number 1 and 1. From the equation above, it is understandable that after the watermark is multiplied by the masking value, it falls to an imperceivable range. Therefore, the watermark will be adaptive to the image and the quality of the watermarked image will be improved. Fig. 6. Resulting image by applying dilation to the image in Fig. 5.
The dilated edge detection of the image can be used as a filter to wipe off the non-edge regions in the edge masking derived from the Laplacian filter and to only keep the edge areas. 3.4. Final masking After we create the luminance masking, texture masking and edge masking, the next step is to combine them together. We combine the texture masking and edge masking first. In the edge areas, we select the minimum value between the texture masking and edge masking. The reason we choose the minimum value is that the edge has less ability to cover the watermark than high activity areas. After this combination is finished, we will select the maximum value between the luminance masking and the masking from the previous step. The reason we pick the maximum value is that in the flat area, the masking value is decided by the luminance masking. In high activity regions, the masking value is determined by the texture masking. This final masking not only takes the luminance into account, but also the effects of the texture and edges of the image. M F ¼ maxðM L ; minðM E ; M T ÞÞ,
(16)
where M F is the final masking, M L is the luminance masking, M E is the edge masking and M T is the texture masking.
3.6. Evaluation of the watermarked image Considering all the elements of the sensitivity of human vision, we suggest that we can use this proposed HVS masking as the human vision filter. First of all, we calculate the absolute value of the difference between the original image and the watermarked image, which actually is the watermark. Second, we apply the proposed HVS masking as a filter to gauge the difference. We compare the difference value and masking value pixel by pixel. If the difference value is smaller than or equal to the masking value, the impact of this difference on the quality of the image is ignored. Otherwise, the difference between the watermark and the value of masking is the candidate for the mean square error calculation. The final calculation is as follows: ( 0 if jwði; jÞjpmði; jÞ; dði; jÞ ¼ jwði; jÞj mði; jÞ if jwði; jÞj4mði; jÞ; (18) wMSE ¼
N X N 1X dði; jÞ2 , N i¼1 j¼1
(19)
2552 , (20) wMSE where dði; jÞ is the difference between the watermark and masking value, wði; jÞ is the watermark value, and mði; jÞ is the masking value. Because the watermark is shaped by the masking, we use the noise shaping technology to deal with the wPSNR ¼ 10 log10
ARTICLE IN PRESS H. Qi et al. / Signal Processing 88 (2008) 174–188
watermark detection. Herre [14] proved that linear prediction coefficients of the signal in DCT domain will capture the envelope of the same signal in the time domain. We apply this method to the watermark detection. At the embedding side, we use the linear prediction coefficients to record the envelope of the masking, and then transfer those coefficients and energy values as keys to the detection site. At the detection side, a linear prediction synthesis filter is used to un-shape the received watermark in the received image. And the value of the correlation will decide whether the received image has watermark or not. 3.7. Data hiding in spatial domain With the HVS masking we proposed in previous sections, the method proposed by Swanson [15] can be easily extended to the spatial domain. Here, we mainly focus on how the HVS masking is integrated into other watermark algorithms. First, we divide the image into small blocks with the size 4 4. Each block will represent one message bit, and then use the spread spectrum which is a pseudo random sequence with value 1 or 1 to represent the watermark. After the watermark is generated, it first makes the pixel value of the original image become the integeral times of the HVS masking. If the message bit is one, it adds one quarter of the masking value, otherwise, it subtracts one quarter of the masking value. This kind of dithering operation implements the watermark embedding. The detailed process is as follows: (1) Extract the HVS masking from the original image. (2) Divide the original image into 4 4 block. (3) Divide the HVS masking into 4 4 block. (4) Find the minimum HVS masking value of each 4 4 block. (5) Generate the pseudo random sequence of size 4 4 as watermark. (6) If the message of bit is 1, use the following equation: I A ¼ round þ 0:25N T. (21) T If the message of bit is 1, use the following equation: I A ¼ round 0:25N T, (22) T
181
where A is the watermarked block, I is the original image block, T is the value of the masking obtained from the step (4), N is the watermark with value 1 or 1. (7) After all the image blocks are finished, the final watermarked image and the masking value are transferred to the watermark detection site. The detection method is relatively quite simple. It also divides the image into blocks whose size is the same as the embedder site, and then it uses the following equation to extract the watermark from each block. R R round . (23) A0 ¼ sign T T In the equation, R is the received image block, T is the HVS masking value, and A0 is the calculated watermark. If the sign of the final matrix is the same as watermark, the message bit is 1. If the final matrix sign is opposite to the watermark, the message bit is 1. From the above analysis, it is easy to see that the original image is not required in the watermark detection. 4. Extend proposed spatial domain masking to DCT domain In this paper, we introduce a method to extend the proposed spatial domain HVS masking to the DCT domain. Discrete cosine transform and inverse discrete cosine transform calculation are the bridges that connect the spatial domain and the DCT domain. We assume that the image is divided into multiple blocks of size 8 8. The 2-D DCT block calculates the two-dimensional discrete cosine transform of the input signal. The equations for the twodimensional DCT and Inverse DCT (IDCT) are as follows [16,17]: M1 1 X XN 2 F ðm; nÞ ¼ pffiffiffiffiffiffiffiffiffi CðmÞCðnÞ f ðx; yÞ MN x¼0 y¼0
cos
ð2x þ 1Þmp ð2y þ 1Þnp cos , 2M 2N
ð24Þ
1 XN X 2 M1 f ðx; yÞ ¼ pffiffiffiffiffiffiffiffiffi CðmÞCðnÞF ðm; nÞ MN m¼0 n¼0
cos
ð2x þ 1Þmp ð2y þ 1Þnp cos , 2M 2N
ð25Þ
ARTICLE IN PRESS H. Qi et al. / Signal Processing 88 (2008) 174–188
182
pffiffiffi where CðmÞ; CðnÞ ¼ 1= 2 when m; n ¼ 0; otherwise, CðmÞ; CðnÞ ¼ 1. f ðx; yÞ is the pixel value in the spatial domain, F ðm; nÞ is the pixel value in the DCT domain. M and N represents the block size. Let us use ½f ij and ½F ij (i; j ¼ 0; 1; . . . ; 7) to represent one block in the spatial domain and in DCT domain, respectively. For a simple example, we only embed the watermark in two middle frequencies such as F 05 and F 50 . Therefore, the location of the other frequency will be zero. The masking matrix for this example becomes the following: 3 2 0 0 0 0 0 F 05 0 0 6 0 0 0 0 0 0 0 07 7 6 7 6 6 0 0 0 0 0 0 0 07 7 6 6 0 0 0 0 0 0 0 07 7 6 (26) 7. 6 6 0 0 0 0 0 0 0 07 7 6 7 6F 6 50 0 0 0 0 0 0 0 7 7 6 4 0 0 0 0 0 0 0 05 0 0 0 0 0 0 0 0 Although there are only two frequency components in the matrix, it affects the whole spatial domain block value after the IDCT transform. This masking should be restricted by the spatial masking in the same block position. In order to do so, we calculate this masking back to the spatial domain using Eq. (25). Suppose the proposed spatial domain masking for one block is represented as ½SMij (i; j ¼ 0; 1; . . . ; 7). Because there are many zeros in the DCT domain masking matrix, after the calculation of the IDCT, we can get a group of 64 expressions shown in the following: 5ð2i þ 1Þp 5ð2j þ 1Þp þ F 05 cos 0:25 F 50 cos 16 16 pSMij ;
0pip7; 0pjp7.
E 2 ¼ F 250 F 205 .
(29)
The reason is that the variables which maximize the function E 1 will minimize the function E 2 . In other words, the same variables can make these two functions to obtain the extreme value. The following expression can be used to get the variables. 1 min xT Hx þ f T x x 2
such that
Axpb,
where x is the DCT domain masking which is the value we want to get, A will be the cosine matrix list in the above 64 expressions, b is vector of the spatial domain masking value which restricts the DCT domain masking, and f ¼ 0: Because we have two variables F 50 and F 05 , in order to express E 2 , we can write the matrix H as follows: 2 0 . 0 2 After the optimal calculation, we can get the value of the two variables F 50 and F 05 , which will be the masking values at those frequencies. 5. Implementation results and evaluation
ð27Þ
The above expressions show that after computing the IDCT of the desired DCT masking, the value is restricted by the spatial domain masking. In addition, we still wish that the energy of the watermark in the DCT domain to be as large as possible. Because the watermark is still a bipolar value of 1 or 1, the energy of the watermark will be the following expression: E 1 ¼ F 250 þ F 205 .
The goal is to find the suitable variables of F 50 and F 05 to maximize the energy. Meanwhile, these variables are restricted by the above 64 expressions. Coleman [18] has proved the algorithm and the function has been implemented in Matlab. However we notice that this function can only find suitable variables to satisfy the function to reach the minimum value. Therefore, we change the energy function into the following expression:
(28)
Therefore, the whole processing becomes a pure mathematical question which is a quadratic function and subject to bounds on some of the variables.
5.1. HVS masking The implementation of the proposed HVS masking includes three elements: luminance masking, texture masking and edge masking. In the implementation, we choose the image ‘‘Lena’’ as shown in Fig. 2 as an example. In the following sections, we will present and analyze the results step by step. 5.1.1. Luminance masking The first masking we show here is the luminance masking. The luminance masking is based on Chou’s algorithm [10]. According to the introduction
ARTICLE IN PRESS H. Qi et al. / Signal Processing 88 (2008) 174–188
183
Visibility Threshold
25 20 15 10 5 0 0
50
100 150 Grey Level
200
250
Fig. 7. Luminance masking.
in Section 3.1, the implementation result of the Chou’s algorithm is shown in Fig. 7. The advantage of Chou’s algorithm is that it uses the local luminance and gives the redundant pixel value of masking directly. For example, if the grey level is 0, the masking value can be 20. This means that if we change the original pixel value from 0 to 20, human vision will not tell the difference. Similarly, if the original pixel value of 127 is changed to somewhere between 124 and 130, it is indistinguishable by human vision because the masking value is 3. The figure of the luminance masking also shows that when the value of the original pixel is below or above certain level, the masking value will increase. This obeys the rule of the eyes with saturation of the brightness and darkness. When the gray level is 0 which means it is totally dark, the masking value is 20, and when the grey level reaches 255 that indicates it is bright, the masking value will be 6. This means that the darkness is more tolerant to the watermark than the brightness. However, in the middle range of grey level, the value of masking is not very high. The lowest masking value is around 3. As mentioned previously, Voloshynovskiy [7] chose the value 3 to be the embedding strength to fill the watermark in the flat region of the image. From Chou’s algorithm, this value is quite feasible. Fig. 8 shows our implementation result of normalized luminance masking. When the image has big areas covered with darkness or brightness, this luminance masking will show advantages. We did the normalization to the masking because the masking value is very small, the whole image will be black without normalization. After the normalization, it is clear that the white part in the masking has the high watermark embedding strength. Meanwhile, the black part has the low watermark embedding strength.
Fig. 8. Normalized luminance masking of image ‘‘Lena’’.
Fig. 9. Texture masking of image ‘‘Lena’’.
5.1.2. Texture masking By using the method suggested in Section 3.2, we present the implementation result of the texture masking, as shown in Fig. 9. From the result, it is clear that the white area has the high margin to cover the watermark, while the black area has low margin to cover the watermark.
ARTICLE IN PRESS 184
H. Qi et al. / Signal Processing 88 (2008) 174–188
5.1.3. Edge masking The result of edge masking is shown in Fig. 6 as discussed in Section 3.3. 5.1.4. Final masking The final masking is built based on the luminance masking, texture masking and edge masking produced separately. Based on the rule (2) that the edge is sensitive to the change more than the high activity areas, therefore, only in the edge areas, the minimum value between the texture masking and the edge masking is chosen. After testing 25 images with this modified texture masking to embed the watermark, we found that the edges in some images have some distortions. As a result, we give a weight number to adjust the masking of the edge areas. Based on the experiments, the final weight value is between 0.4 and 0.5. After combining the edge masking and texture masking together into the modified texture masking, we will combine the luminance masking with it. Between the modified texture masking and luminance masking, we will choose the maximum value. The reason is that in the texture masking, if there are flat areas, the masking will be determined by the luminance masking, while if there are high activity areas, the masking will mostly be determined by the edge and texture masking. Based on the results of the implementation, the final masking equation is changed to:
Fig. 10. Final HVS masking for image ‘‘Lena’’.
M F ¼ maxðM L ; p minðM T ; F ðM E ; E DI ðE DE ðIÞÞÞÞÞ,
(30) where M F is the final masking, p is the weight value, ranging from 0.4 to 0.5, M L is the luminance masking, M T is the texture masking, I is the original image, E DE ð:Þ is the edge detection operation, E DI ð:Þ is the dilation operation. F ð:Þ is the filter operation. Fig. 10 shows the final masking of the image. Different from other HVS masking which only concentrates on the edge of the images, this masking has high values in the high activity areas and darkness regions, the other parts of the masking values are lower. 5.2. Watermarking algorithm In the implementation, we embed watermark into the original image (see Fig. 11) with the same watermark energy. Fig. 12 is the watermarked image without using the proposed masking. Fig. 13
Fig. 11. Original image.
is the watermarked image with using the proposed masking. The embedding strength which the masking gives is 11. The PSNR of the watermarked images in Figs. 12 and 13 is 27.9 dB. However, it is clear that the quality of the watermarked image with using the proposed masking is much better than the one without using masking. If we use the proposed wPSNR, the wPSNR value of the watermarked
ARTICLE IN PRESS H. Qi et al. / Signal Processing 88 (2008) 174–188
Fig. 12. Watermarked image without using masking.
185
Fig. 14 is the result for watermark detection. We embed the same watermark into 25 different images, and use the linear correlation to detect watermark from the watermarked and unwatermarked images. From Fig. 14, it is clear that when the received image contains the watermark, the value of the linear correlation is above 0.85. When the received image has no watermark, the linear correlation results are below 0.2. We also use wPSNR to measure 25 watermarked images which are generated without masking, with NVF masking, and with the proposed masking. For the same image, the watermarking energy is the same, the only difference is with different masking. The chart is shown in Fig. 15. From Fig. 15, it is clear that NVF masking can help to improve the image quality, but it is not as good as the proposed masking. For some images which have large areas with darkness or brightness, the NVF masking is not suitable. Our proposed scheme can improve the quality of watermarked images over NVF masking scheme. When the watermarked images are processed by normal compression with different quality factors, the correlation value will be affected. Quality factor determines how much of the data will be lost and thus directly impacts the size of the compressed image. It is the values in quantization table that control the image quality. If the quality factor is 1, it specifies the lowest quality or maximum image compression. If the quality factor is 100, it specifies the highest quality or minimum image compression. Fig. 16 shows the correlation values with different
with watermark without watermark
1.1 1
Fig. 13. Watermarked image with using masking.
image without masking is 33.1 dB, while the one with the masking is 57.2 dB. It is not absolutely fair to use our wPSNR to evaluate our own watermarking algorithm. However, we could not find a better one for the moment. Since the proposed masking model takes into three spatial masking functions into account and approximate the HVS more accurately, we use it to evaluate our test results.
y (Correlation Value)
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 1
6
11 16 x (Image Number)
21
Fig. 14. Correlation values of the watermark detection.
ARTICLE IN PRESS H. Qi et al. / Signal Processing 88 (2008) 174–188
186
70
wPSNR [dB]
60 50 40 30 20 10 0 1
3
5
7
9
no masking
11 13 15 17 19 21 23 25 image Number NVF masking
proposed masking
Fig. 15. wPSNR values of watermarked images with different masking.
1.4
Correlation Value
1.2 1 0.8 0.6 0.4 0.2 0 1
3
5
7
Original Value
9
11 13 15 17 19 21 23 25 Image Number Quality Factor 90%
Quality Factor 80%
Quality Factor 85%
Quality Factor 75%
Fig. 16. Detection values for watermarked image after compression.
quality factor is 75%, most of the correlation value is above 0.5. The experiment only tests the ability of the embedding strength to help to improve the robustness to compression. If we wish the correlation value to be much higher, there are several other methods which deal with the compression impact on the watermarked image very well [19]. Another test is the additional noise attack. In the simulation, We add different types of noise to the watermarked image. The types of the noise include the Gaussian noise, salt and pepper noise and the speckle noise. The results of the linear correlation are shown in Fig. 17. From the results above, we can see that watermark is quite robust when different noise is added. This is mostly because the correlation value between the watermark and the noise is almost close to zero. After we get the HVS masking in the spatial domain, we follow the proposed method to extend this masking into the DCT domain and embed the watermark in the DCT domain. The results are showed in Figs. 18 and 19. The watermark is embedded in the two middle frequencies: F 05 and F 50 . The PSNR of watermarked image in Fig. 18 is 35.3 dB. We just show the example of two frequencies. Actually, watermark also can be embedded in any other frequencies except for the DC component. It has been demonstrated that, due to the increasing total embedding energy, the watermark is more robust against signal processing such as JPEG compression and different noise attacks, under the same perceptual quality.
1.4 1.2 Correlation Value
quality factors used on the watermarked image. The quality factors are 90%, 85%, 80% and 75%. From the results, we can see that the higher the quality factor, the higher the linear correlation value. The compression not only works on the original image, but also on the watermark which is carried with the original image. This is because when the quality factor is high, the impact of the compression on the image is small. The distortion to the watermark is small as well. As a result, the final linear correlation value is high. The shape of lines with the compression always follows the shape of the line that has no compression. When the value in the non-compression line is high, the value with the compression is also high, and vise versa. When the
1 0.8 0.6 0.4 0.2 0
1
3
Original Value
5
7
9
11 13 15 17 19 21 23 25 Image Number
Salt & Pepper
Gaussian Noise
Speckle Noise
Fig. 17. Detection values of watermarked image after noise attack.
ARTICLE IN PRESS H. Qi et al. / Signal Processing 88 (2008) 174–188
187
quality. The watermark can be detected without knowing the original image. We have extended our masking to the DCT domain successfully. The experimental results showed that the proposed algorithm performed well.
References
Fig. 18. Watermarked image with DCT masking.
Fig. 19. Watermark embedded in Fig. 18.
6. Conclusions In this paper, we have proposed a new adaptive image watermarking algorithm. With the same watermarking energy, the quality of watermarked image using the proposed masking is improved. The proposed wPSNR helped us evaluate the image
[1] J.F. Delaigle, C. Devleeschouwer, B. Macq, I. Langendijk, Human visual system features enabling watermarking, in: Proceedings of the IEEE ICME 2002, vol. 2, 2002, pp. 489–492. [2] M. Barni, F. Bartolini, Watermarking Systems Engineering Enabling Digital Assets Security and Other Applications, CRC Press, Boca Raton, 2004. ISBN: 0-8247-4806-9. [3] J.F. Delaigle, C.D. Vleeschouwer, B. Macq, Watermarking algorithm based on a human visual model, Signal Process. 66 (3) (1998) 319–335. [4] M. Kutter, S. Winkler, A vision-based masking model for spread spectrum image watermarking, IEEE Trans. Image Process. 11 (1) (January 2002) 16–25. [5] B.T. Hannigan, A. Reed, B. Bradley, Digital watermarking using improved human visual system model, in: Proceedings of the SPIE Electronic Image Security and Watermarking of Multimedia Content, vol. 4314, San Jose, CA, January 2001, pp. 468–474. [6] F. Bartolini, M. Barni, V. Cappellini, A. Piva, Mask building for perceptually hiding frequency embedded watermarks, in: Proceedings of the IEEE ICIP98 1 (October 1998) 450–454. [7] S. Voloshynovskiy, A. Herrigel, N. Baumgaertner, T. Pun, A stochastic approach to content adaptive digital image watermarking, in: Proceedings of the International Workshop on Information Hiding, Lecture Notes in Computer Science, vol. LNCS 1768, 1999, pp. 212–236. [8] I. Cox, M. Miller, J. Bloom, Digital Watermarking, Morgan Kaufmann, Los Altos, CA, 2002, ISBN: 1-55860714-5. [9] C.I. Podilchuk, W. Zeng, Image adaptive watermarking using visual model, IEEE J. Sel. Areas Commun. 16 (4) (May 1998) 525–539. [10] C. Chou, Y. Li, A perceptually tuned subband image coder based on the measure of just-noticeable-distortion profile, IEEE Trans. Circuits Syst. Video Technol. 5 (6) (December 1995) 467–476. [11] Z. Wang, A.C. Bovik, A human visual system based objective video distortion measurement system, in: proceedings of the International Conference on Multimedia Processing and Systems, August 2001. [12] hhttp://www.unm.edu/keithw/astrophotography/imagesharpening. htmli. [13] h http://www.mathworks.com/access/helpdesk/help/toolbox/ images/morph.htmli. [14] J. Herre, J.D. Johnston, Enhancing the performance of perceptual audio coding by using temporal noise shaping (TNS), in: Proceedings of the 101st AES Conv, Los Angeles, CA, November 1996. [15] M.D. Swanson, B. Zhu, A. Teefik, Robust data hiding for images, in: Proceedings of the Seventh IEEE Digital Signal
ARTICLE IN PRESS 188
H. Qi et al. / Signal Processing 88 (2008) 174–188
Processing Workshop, Loen, Norway, September 1996, pp. 37–40. [16] hhttp://www.mathworks.com/access/helpdesk/help/toolbox/ vipblks/2ddct.htmli. [17] hhttp://www.mathworks.com/access/helpdesk/help/toolbox/ vipblks/2didct.htmli.
[18] T.F. Coleman, Y. Li, A reflective newton method for minimizing a quadratic function subject to bounds on some of the variables, SIAM J. Optim. 6 (4) (1996) 1040–1058. [19] C. Lu, S. Huang, C. Sze, H.M. Liao, Cocktail watermarking for digital image protection, IEEE Trans. Multimedia 2 (4) (December 2000) 209–223.