Signal Processing 93 (2013) 2061–2069
Contents lists available at SciVerse ScienceDirect
Signal Processing journal homepage: www.elsevier.com/locate/sigpro
Fast communication
Robust image hashing using ring-based entropies Zhenjun Tang n, Xianquan Zhang, Liyan Huang, Yumin Dai Department of Computer Science, Guangxi Normal University, Guilin 541004, PR China
a r t i c l e i n f o
abstract
Article history: Received 24 May 2012 Received in revised form 27 October 2012 Accepted 10 January 2013 Available online 20 January 2013
Image hashing is an emerging technology in multimedia security for applications such as image authentication, digital watermarking, and image copy detection. In this paper, we propose a robust image hashing based on the observations that image pixels of each ring are almost unchanged after rotation and ring-based image entropies are approximately linearly changed by content-preserving operations. This hashing is achieved by converting the input image into a normalized image, dividing the normalized image into different rings and extracting the ring-based entropies to produce hash. Hash similarity is measured by correlation coefficient. Experiments show that our hashing is robust against content-preserving manipulations such as JPEG compression, watermark embedding, scaling, rotation, brightness and contrast adjustment, gamma correction and Gaussian low-pass filtering. Receiver operating characteristics (ROC) curve comparisons with notable algorithms indicate that our hashing has better classification performances than the compared algorithms. & 2013 Elsevier B.V. All rights reserved.
Keywords: Image hashing Robust hashing Image entropy Digital watermarking Image copy detection
1. Introduction Image hashing is an emerging technology in multimedia security. It uses a short string called image hash to represent an input image and finds applications in image authentication, digital watermarking [1], image copy detection, tamper detection, image indexing and content-based image retrieval (CBIR). Conventional cryptographic hash functions such as SHA-1 and MD5 can convert input message into a fixed-size string, but they are sensitive to the input data and therefore not suitable for image hashing. This is because digital images often undergo some normal processing such as JPEG compression, geometric transform, and format conversion. After these operations, digital representations of the images are changed, but their visual appearances are still preserved. So their image hashes are expected to be the same or very similar. In general, an image hash function should have
n
Corresponding author. Tel.: þ86 1529 599 0968. E-mail addresses:
[email protected],
[email protected] (Z. Tang).
0165-1684/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.sigpro.2013.01.008
two basic properties. (1) Perceptual robustness: The hash function should be robust against content-preserving operations, such as JPEG compression and image enhancement. In other words, hashes of an image and its attacked versions generated by content-preserving operations should be almost the same or very similar. (2) Discriminative capability: Different images should have different hashes. This means that hash distance between different images should be large enough. Many researchers have devoted themselves to developing image hashing in the past years. For example, Venkatesan et al. [2] exploit wavelet coefficient statistics to construct hashes. This method is sensitive to contrast adjustment and gamma correction. Fridrich and Goljan [3] propose a hashing algorithm based on the observation that if a low-frequency discrete cosine transform (DCT) coefficient is small in absolute value, it cannot be made large without causing visual changes to the image, and vice verse. Lefebvre et al. [4] use Radon transform (RT) to construct robust hashes. Later, Roover et al. [5] propose a RASH method by dividing an image into radial projections, extracting RAdial Variance (RAV) vector from these
2062
Z. Tang et al. / Signal Processing 93 (2013) 2061–2069
projections and compressing the RAV vector by DCT. The RASH method is resilient to image rotation and re-scaling, but its discriminative capability needs to be strengthened. In [6], Swaminathan et al. use Fourier coefficients to generate hashes. This hash function is resilient to several content-preserving modifications such as moderate geometric transforms and filtering. In another study [7], Kozat et al. view images and attacks as a sequence of linear operators and propose a perceptual hashing as follows. They randomly select overlapping rectangles from the input images, apply singular value decomposition (SVD) to these rectangles, and extract a feature vector from each SVD result to form a secondary image. Next, they choose small size overlapping rectangles from the secondary image, apply SVD to the rectangles again, and extract feature vectors of SVD results to generate hashes. The SVD-SVD hashing is robust against rotation, but its discriminative capability is not desirable. Monga and Mihcak [8] first use non-negative matrix factorization (NMF) to derive image hashing. This hashing is robust against rotation, but fragile to watermark embedding. In [9], Tang et al. find invariant relation existing in the NMF coefficient matrix and use it to construct robust hashes. Recently, Ou et al. [10] apply RT to the input image, randomly select 40 projections to perform 1-D DCT, and take the first AC coefficient of each projection to produce hash. The RT-DCT hashing is resistant to small angle rotation, but its discrimination is also not desirable. Ahmed et al. [11] exploit wavelet transform and SHA-1 to produce hashes. This method can be used for image authentication, but it is fragile to contrast enhancement. In [12], Tang et al. propose a similarity metric and use it to construct image hashing. In another study [13], Tang et al. convert RGB color image into YCbCr and HSI color spaces, extract invariant moments from each color component, and use them to form image hashes. This scheme is resilient to rotation. Most of the above algorithms [2,3,9,11,12] have a common feature that they are fragile to image rotation. Some algorithms [4–8,10,13] can resist rotation, but their discriminative capabilities are not desirable. In this work, we propose an image hashing algorithm based on ring division and image entropy. This algorithm can not only resist image rotation with arbitrary angle, but also has a desirable discriminative capability. The rest of the paper is organized as follows. Section 2 describes the proposed image hashing. Section 3 presents experimental results. Conclusions are given in Section 4.
2. Proposed image hashing As shown in Fig. 1, our algorithm is composed of three steps. The first step is to produce a normalized image for robust feature extraction. In the second step, we divide the normalized image into different rings, whose contents
Input image
Preprocessing
are invariant to image rotation. Finally, we extract image entropies of these rings and use them to form image hash. The following subsections give detailed description of each step. 2.1. Preprocessing We convert the input image into a normalized image by the following steps. First, we convert the input image to a square image sized m m by bilinear interpolation. This operation ensures that those images with different resolutions have the same or very similar hashes. To alleviate influence of minor modifications, such as noise contamination or filtering, a Gaussian low-pass filter is then applied to the square image. This operation is achieved by a convolution mask. Let TGaussian (i, j) be the element in the i-th row and the j-th column of the convolution mask. Thus, it can be obtained by T ð1Þ ði,jÞ T Gaussian ði,jÞ ¼ P P ð1Þ T ði,jÞ i
ð1Þ
j
where T(1)(i, j) is defined as T ð1Þ ði,jÞ ¼ eði
2
þ j2 Þ=2s2
ð2Þ
in which s is the standard deviation of all elements in the convolution mask. If the input is an RGB color image, we convert the filtered image into YCbCr color space and take the luminance component for representation. Conversion from RGB color space to YCbCr color space can be done by the following equation: 2 3 2 32 3 2 3 Y 65:481 128:553 24:966 R 16 6C 7 6 76 7 6 7 112 54 G 5 þ 4 128 5 4 b 5 ¼ 4 37:797 74:203 Cr 112 93:786 18:214 B 128 ð3Þ where Y, Cb and Cr are the luminance, blue-difference chroma and red-difference chroma of a pixel, and R, G, B are the red, green and blue components of the pixel, respectively. 2.2. Ring division To make our hash resilient to image rotation, we divide the normalized image into different rings. As shown in Fig. 2, (a) is the central part of Airplane, (b) is obtained by cropping the rotated Airplane. Obviously, image contents of these rings in Fig. 2(a) are the same with those of the corresponding rings in Fig. 2(b). In other words, image content of each ring are unchanged after image rotation. This provides us an opportunity to extract image features resilient to rotation. Here, we only consider image content in the inscribed circle of the normalized image and divide it into rings with equal area. This is because each ring feature is expected to have the same importance with
Ring division
Entropy extraction
Fig. 1. Block diagram of our hashing.
Image hash
Z. Tang et al. / Signal Processing 93 (2013) 2061–2069
2063
Fig. 2. Ring division of an image and its rotated version. (a) Central part of Airplane and (b) Rotated by 30 1
others. The division is done by calculating the circle radii and the distance from each pixel to the image center. Let n be the ring number and Rk be the set of those pixel values of the k-th ring (k ¼1, 2, y, n). From Fig. 2, it is observed that the pixels of each ring can be determined by two neighbor radii except those of the innermost ring. Suppose that rk is the k-th radius (k ¼1, 2, y, n) labeled from small value to big value. Thus, r1 and rn are the radii of the innermost and outmost circles, respectively. Clearly, rn ¼ bm/2c for the m m normalized image, where b c means downward rounding. To determine other radii, we calculate the area of the inscribed circle A to find the average area of each ring mA: A ¼ pr 2n
ð4Þ
mA ¼ bA=nc
ð5Þ
So r1 can be calculated by the following equation: rffiffiffiffiffiffi r1 ¼
mA p
ð6Þ
Thus, other radii rk (k¼2, 3, y, n 1) can be obtained by rk ¼
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi mA þ p r2k1
p
ð7Þ
where R1 is corresponding to the innermost ring and Rk is corresponding to the k-th ring. In fact, the observation that the image pixels of each ring are unchanged after rotation can be proved as follows. First, we observe from Eqs. (9) and (10) that the pixels of each ring only depend on their pixel distances and the circle radii. Since the circle radii are invariant for the normalized images with the same size, we just need to prove that the distance from each pixel to the image center is kept unchanged after rotation. Second, as rotation operation takes the image center as the origin of the coordinates, we conduct coordinate conversion by translation transform [14]: ( xi ¼ xi þt x ð11Þ yj ¼ yj þt y where ðxi ,yj Þ are the coordinates of qi,j in the new coordinate system, tx and ty are the translations along the x-axis and y-axis, respectively. Here, tx ¼ xc and ty ¼ yc. Let ðxc ,yc Þ be coordinates of the image center in the new coordinate system. Thus, we can obtain their values by substituting the original coordinates into Eq. (11) as follows: ( xc ¼ xc þt x ¼ xc xc ¼ 0 ð12Þ yc ¼ yc þ t y ¼ yc yc ¼ 0 (0)
Let qi,j(xi, yj) be the value of the pixel in the j-th row and the i-th column of the image, where (xi, yj) are the coordinates. Suppose that (xc, yc) are the coordinates of the image center. Thus, xc ¼m/2þ0.5 and yc ¼m/2þ0.5 if m is an even number. Otherwise, xc ¼(mþ 1)/2 and yc ¼(mþ1)/2. So the distance from qi,j(xi, yj) to the image center (xc, yc) can be measured by the Euclidean distance: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi di,j ¼ ðxi xc Þ2 þ ðyj yc Þ2 ð8Þ
Let di,j be the Euclidean distance from ðxi ,yj Þ to ðxc ,yc Þ. (0) Therefore, di,j can be calculated by qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð0Þ di,j ¼ ðxi xc Þ2 þ ðyj yc Þ2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ ðxi Þ2 þðyj Þ2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ ðxi þt x Þ2 þ ðyj þt y Þ2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ ðxi xc Þ2 þ ðyj yc Þ2 ð13Þ
Having obtained the circle radii and pixel distances, we can classify image pixels into n sets as follows:
Suppose that ðxu ,yv Þ are the rotated results of ðxi ,yj Þ. Thus, they satisfy the equation of rotation transform [14] as follows: ( xu ¼ xi cosyyj sin y ð14Þ yv ¼ xi siny þyj cosy
R1 ¼ fqi,j 9di,j rr 1 g Rk ¼ fqi,j 9r k1 odi,j r r k g ðk ¼ 2,3,. . .,nÞ
ð9Þ ð10Þ
2064
Z. Tang et al. / Signal Processing 93 (2013) 2061–2069 (y)
where y is the rotation angle. Let di,j be the Euclidean distance from ðxu ,yv Þ to ðxc ,yc Þ. So it can be computed by
2.4. Similarity measurement Since hash values are approximately linearly changed, we exploit correlation coefficient to measure hash simi-
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðyÞ di,j ¼ ðxu xc Þ2 þ ðyv yc Þ2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ ðxu Þ2 þ ðyv Þ2 ¼ ðxi cosyyj sin yÞ2 þ ðxi sin y þyj cos yÞ2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ ½ðxi Þ2 cos2 y2xi yj sin y cos y þðyj Þ2 sin2 y þ ½ðxi Þ2 sin2 y þ 2xi yj sin y cos y þ ðyj Þ2 cos2 y qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 ¼ ðxi Þ2 cos2 y þðyj Þ2 sin y þðxi Þ2 sin y þðyj Þ2 cos2 y ¼ ðxi Þ2 þ ðyj Þ2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ ðxi þt x Þ2 þ ðyj þt y Þ2 ¼ ðxi xc Þ2 þ ðyj yc Þ2
(y)
From the above deduction, we conclude that di,j is (0) equal to di,j . Therefore, for continuous images, image pixels of each ring are unchanged after rotation. As to digital images, pixel coordinates are denoted with integers. This means that if the results of Eq. (14) are not integers, they must be quantized by the rounding opera(y) tion which will make a slight difference between di,j and (0) di,j . In this case, a few pixels on the circle may fall in the adjacent ring. Therefore, for digital images, image pixels of each ring are almost unchanged after image rotation.
Entropy is a basic concept of information theory [15]. It is a measure of the average information content. Let ei be an event, p(ei) be its probability of occurrence, and E be a set forming by ei (i¼ 1, 2, y, N). Thus, the entropy of E can be defined as follows: N X
pðei Þ log2 pðei Þ
ð16Þ
i¼1
where pðe1 Þ þ pðe2 Þ þ þ pðeN Þ ¼ 1. For digital images, entropy can be used to characterize image texture. In this case, E is the input image, ei is a pixel value, and p(ei) is the probability of ei occurring in the input image. For a gray image, ei A [0, 255] and the N ¼256. In this work, we take the entropy of each ring as a feature and use it to form image hash. This is based on the observation that ring-based entropies are approximately linearly changed by content-preserving operations. Experiments will validate this observation in Subsection 3.1. Let hl be the l-th element of the hash h. Thus, it can be calculated by hl ¼ HðRl Þ ðl ¼ 1,2,3,. . .,nÞ
ð17Þ
Clearly, hash length is equal to the ring number. The smaller the ring number, the shorter the hash length. However, few rings mean few features, which will inevitably hurt the discriminative capability. Therefore, we need to keep a balance between the hash length and the discriminative capability in choosing the ring number. In experiments, we find that a moderate ring number, e.g., 60 or 64, is an acceptable choice for 512 512 images.
(1)
(1)
(2)
(2)
larity. Let h(1) ¼[h1 , h2 , y, hn ] and h(2) ¼[h1 , h2 , y, (2) hn ] be two hashes. Thus, the correlation coefficient is defined as S¼
dhð1Þ ,hð2Þ dhð1Þ dhð2Þ þ e
ð18Þ
where e is a small constant to avoid singularity when dhð1Þ dhð2Þ ¼ 0, dhð1Þ and dhð2Þ are the standard deviation of h(1) and h(2), and dhð1Þ ,hð2Þ is their covariance calculated by the following equation:
dhð1Þ ,hð2Þ ¼
2.3. Entropy extraction
HðEÞ ¼
(1)
ð15Þ
n 1 X ð1Þ ð2Þ ½h m1 ½hi m2 n1 i ¼ 1 i
ð19Þ
where m1 and m2 are the means of h(1) and h(2), respectively. The range of S is [ 1, 1]. The more similar the input images, the bigger the S value. If S is bigger than a predefined threshold Tthreshold, the input images are considered as visually identical images. Otherwise, they are viewed as different images. 3. Experimental results In the experiments, the used parameters of our hashing are m ¼512 and n¼64. To validate performances of our algorithm, we conduct experiments on robustness and discriminative capability in Sections 3.1 and 3.2, respectively. Comparisons with some notable hashing algorithms are presented in Section 3.3. 3.1. Perceptual robustness To validate perceptual robustness, we produce visually identical images of the standard test images of size 512 512 shown in Fig. 3 by the StirMark 4.0 [16], Photoshop and MATLAB. Detailed parameters of each manipulation are presented in Table 1. Since image sizes of the rotated images are significantly expanded and some image regions are padded with black or white pixels, we only take the central parts sized 361 361 of the original images and the rotated images for hash generation. Fig. 2 is an example of the central parts of the original and the rotated images. After the above manipulations, each image has 60 attacked versions. We extract image hashes
Z. Tang et al. / Signal Processing 93 (2013) 2061–2069
2065
Fig. 3. Standard test images.(a) Airplane (b) Baboon (c) House (d) Peppers and (e) Lena.
Table 1 Manipulations with different parameter values. Tool
Manipulation
Description
Parameter values
StirMark StirMark StirMark StirMark Photoshop Photoshop MATLAB MATLAB
JPEG compression Watermark embedding Scaling Rotation and cropping Brightness adjustment Contrast adjustment Gamma correction 3 3 Gaussian low-pass filtering
Quality factor Strength Ratio Rotation angle in degree Photoshop0 s brightness scale Photoshop0 s contrast scale
30, 40, 50, 60, 70, 80, 90, 100 10, 20, 30, 40, 50, 60, 70, 80, 90,100 0.5, 0.75, 0.9, 1.1, 1.5, 2.0 1, 2, 5, 10, 15, 30, 45, 90, 1, 2, 5, 10, 15, 30, 45, 90 10, 20, 10, 20 10, 20, 10, 20 0.75, 0.9, 1.1, 1.25 0.3, 0.4, 0.5, 0.6,0.7,0.8,0.9, 1.0
g Standard deviation
of the original and the attacked images, and calculate their similarities. The results are given in Fig. 4. We find that all S values are bigger than 0.95 except several cases in Fig. 4(d) and (h). To find generalized thresholds for applications, we apply our hashing to a large image database. To this end, we take all images in the first and second volumes, i.e., ’Textures’ and 0 Aerials0 , of the USC-SIPI Image Database [17] as test images, where the 0 Textures0 volume has 64 Gy images and the 0 Aerials 0 volume contains 1 Gy image and 37 color images. Thus, there are 102 test images in total. Fig. 5 presents typical images in the 0 Textures0 and 0 Aerials0 volumes. It is clear that the images in the 0 Textures0 are texture images and those in the 0 Aerials0 volume are natural images. We exploit all manipulations listed in Table 1 to attack these 102 images and obtain 102 60 ¼ 6120 pairs of similar images. Next, we calculate hash similarity S between each pair of visually identical images and then get the mean and standard deviation of S under different manipulations. The results are listed in Table 2. We observe that all means of the 0 Aerials0 volume are bigger than 0.99, those of the 0 Textures0 volume are all bigger than 0.97 except one case, and standard deviations of the both volumes are very small. The big similarity values empirically verify that ring-based entropies are approximately linearly changed by content-preserving operations. We also observe that the minimum mean of the 0 Textures0 volume is 0.9506, which is produced by the manipulation of rotation and cropping. This value is smaller than that of the 0 Aerials0 volume. This is because interpolation errors caused by rotation in texture images are much bigger than those in natural images. Since the number of natural images is much larger than that of texture images in a real-world situation, we can choose 0.95 as the threshold to resist most of the above manipulations. In this case, 99.43% visually identical natural
images are considered as similar images, and 85% similar texture images are correctly classified. Table 3 lists percentages of visually identical images identified as similar images under different thresholds. We can select proper thresholds to suit specific applications. To view the performance of our hashing in a realworld situation, we exploit it to generate the hashes of Fig. 6(a) and (b). In this example, Fig. 6(a) is a photograph with imperfect leveling. Fig. 6(b) is a similar version of Fig. 6(a), which has undergone a sequence of manipulations, including image rotation with 41, JPEG compression with quality factor 50, brightness adjustment with the Photoshop0 s scale 20, and contamination by Gaussian white noise of mean 0 and variance 0.01. The image sizes of Fig. 6(a) and (b) are both 760 560. We calculate similarity between the image hashes of Fig. 6(a) and (b), and find that S¼ 0.9737, which is bigger than 0.95 and then illustrates that Fig. 6(a) and (b) are a pair of visually identical images. 3.2. Discriminative capability To test the discriminative capability, we collect 200 different color images to form a database, where 67 images are downed from the Internet, 33 images are captured by digital cameras and 100 images are taken from the Ground Truth Database [18]. The image sizes range from 256 256 to 2048 1536. We apply our hashing to the image database and extract 200 hashes. Calculate similarity between each pair of hashes and then obtain 19 900 results. The minimum and maximum values of the results are –0.9625 and 0.9828, respectively. The mean and the standard deviation of the similarity values are 0.2091 and 0.4584, respectively. If Tthreshold ¼ 0.95, there are only 0.18% different images falsely considered as similar images.
2066
Z. Tang et al. / Signal Processing 93 (2013) 2061–2069
1
1
0.99
0.99
S
0.98
S
0.98 Airplane Baboon House Lena Peppers
0.97 0.96
0.96
0.95 30
40
50 60 70 80 Quality factor
90
Airplane Baboon House Lena Peppers
0.97
0.95 10 20 30 40 50 60 70 80 90 100 Strength
100
1
1 0.99
0.99
0.98 Airplane Baboon House Lena Peppers
0.97 0.96 0.95 0.5
S
S
0.98
0.97 0.96 0.95 0.94
0.75 0.9 1.1 1.5 Ratio
0.93 -90
2
1
1
0.99
0.99
S
Airplane Baboon House Lena Peppers
0.97 0.96 0.95 0.3
-45-30-15-5 5 15 30 45 Rotation angle
90
0.98
S
0.98
Airplane Baboon House Lena Peppers
0.4
0.5 0.6 0.7 0.8 Standard deviation
Airplane Baboon House Lena Peppers
0.97 0.96
0.9
0.95 0.75
1
0.9
1.1
1.25
γ 1
1
0.99
0.99
0.98 0.98 S
S
0.97
0.97 0.96 0.95 -20
Airplane Baboon House Lena Peppers -10 10 Photoshop's scale
20
0.96 0.95 0.94 0.93 -20
Airplane Baboon House Lena Peppers -10 10 Photoshop's scale
20
Fig. 4. Robustness validation. (a) JPEG compression, (b) Watermark embedding, (c) Scaling, (d) Rotation and cropping, (e) Gaussian low-pass filtering, (f) Gamma correction, (g) Brightness adjustment and (h) Contrast adjustment.
3.3. Performance comparisons We compare our hashing with some notable algorithms: the SVD-SVD hashing [7], the NMF-NMF-SQ hashing [8], the RT-DCT hashing [10], and our invariant
moments based hashing [13]. To make fair comparisons, all color images used in Sections 3.2 and 3.3 are exploited to test the performances of the assessed algorithms. The used parameters of [7] are: the first number of overlapping rectangles is 100, rectangle size is 64 64, the
Z. Tang et al. / Signal Processing 93 (2013) 2061–2069
2067
Fig. 5. Typical images of the USC-SIPI image database. (a) The ‘Aerials’ volume and (b) The ‘Textures’ volume.
Table 2 Mean and standard deviation of S under different manipulations. Manipulation
Images in the 0 Aerials0 volume
JPEG compression Watermark embedding Scaling Rotation and cropping Brightness adjustment Contrast adjustment Gamma correction 3 3 Gaussian low-pass filtering
Images in the Textures0 volume
0
Mean
Standard Mean deviation
Standard deviation
0.9959 0.9996 0.9978 0.9915 0.9973 0.9988 0.9964 0.9988
0.0108 0.0004 0.0043 0.0110 0.0058 0.0017 0.0082 0.0026
0.0298 0.0737 0.0257 0.0598 0.0073 0.0027 0.0446 0.0145
0.9803 0.9743 0.9787 0.9506 0.9992 0.9997 0.9796 0.9903
Table 3 Percentages of visually identical images identified as similar images under different thresholds. Threshold
0.96 0.95 0.94 0.93 0.92 0.91
Visually identical images identified as similar images Natural images (%)
Texture images (%)
98.99 99.43 99.74 99.82 99.87 99.91
81.15 85.00 88.18 90.34 91.90 93.15
robustness and the discriminative capability, respectively. They are defined as follows:
PTPR ¼
Number of the pairs of visually identical images considered as similar images Total pairs of visually identical images
ð20Þ
PFPR ¼
Number of the pairs of different images considered as similar images Total pairs of different images
ð21Þ
second number of overlapping rectangles is 20 and the rectangle size is 40 40. The used parameters of [8] are: sub-image number is 80, height and width of sub-images are 64, ranks of the first and the second NMF are 2 and 1, respectively. Similarity metrics used in [7,8,10,13] are also adopted here, i.e., L2 norm for [7,8,13] and the normalized Hamming distance for [10]. We use receiver operating characteristics (ROC) graphs [19] to visualize classification performances of the assessed algorithms, where true positive rate (TPR) PTPR and false positive rate (FPR) PFPR are indicators of the
For those algorithms with the same FPR, the algorithm with big TPR is better than the one with small TPR. Similarly, for the algorithms with the same TPR, the method with small FPR outperforms the scheme with big FPR. For each algorithm, we choose 10 thresholds, calculate their corresponding TPRs and FPRs, and then obtain the ROC curve. The used thresholds of the assessed algorithms are presented in Table 4. Fig. 7 is the ROC curve comparisons between our hashing and other algorithms. It is observed that the ROC curve of our hashing is above those of other algorithms. This means that the
2068
Z. Tang et al. / Signal Processing 93 (2013) 2061–2069
Fig. 6. A pair of visually identical images. (a) A photograph with imperfect leveling and (b) A similar version of (a) attacked by different operations.
Table 4 The used thresholds of the assessed algorithms for ROC curves. Algorithm
Thresholds
[7] [8] [10] [13] Our
0.10, 0.20, 0.30, 0.40, 0.45, 0.50, 0.60, 0.70, 0.80, 0.90 1500,3000,4500,6000,7500,9000,10500,12000,13500,15000 0.01, 0.05, 0.10, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.60 1, 4, 8, 12, 15, 18, 22, 27, 30, 34 0.20, 0.40, 0.60, 0.70, 0.85, 0.92, 0.95, 0.97, 0.98, 0.99
1
Table 5 Time and hash length comparisons among different algorithms.
0.9
True positive rate
0.8 0.7 0.6
1
0.5 0.95
0.4
Our [13] [10] [8] [7]
0.3 0.2 0.1
0.9 0.85
0
0.05
0.1
0.15
0.2
0 0
0.2
0.4 0.6 False positive rate
0.8
1
Fig. 7. ROC curve comparisons between our hashing and other wellknown methods.
classification performances of our hashing are better than those of the algorithms [7,8,10,13]. For example, when the FPR is about 0, the optimal TPR of our hashing is 0.9762 and those of the algorithms [7,8,10,13] are 0.7325, 0.9099, 0.5183 and 0.9452, respectively. If the TPR is approximately 1.0, the optimal FPR of our hashing reaches 0.0083 and those of the methods [7,8,10,13] are 0.8233, 0.1423, 1.0 and 0.0311, respectively. Moreover, we compare the run time of the assessed algorithms. To do so, we record the consumed time of generating 200 different image hashes in the respective discrimination test, and calculate the average time of
Algorithm
Average time (s)
Hash length
[7] [8] [10] [13] Our
0.650 1.153 6.510 1.429 0.437
1600 decimal digits 64 decimal digits 240 bits 42 decimal digits 64 decimal digits
producing an image hash. All algorithms are implemented with MATLAB 7.1 and run on a personal computer with 3.10 GHz Intel i3-2100 CPU and 3.24 GB RAM. The average time of the SVD-SVD hashing, the NMF-NMF-SQ hashing, the RT-DCT hashing, the invariant moments based hashing and our hashing are 0.650, 1.153, 6.510, 1.429 and 0.437 s, respectively. Our hashing has a fast speed due to low complexity of entropy calculation. As to the space for hash storage, the hash lengths of the SVD-SVD hashing and the NMF-NMF-SQ hashing are 1600 and 64 decimal digits, that of the RT-DCT hashing is 240 bits, and those of the invariant moments based hashing and our hashing are 42 and 64 decimal digits, respectively. The time and hash length comparisons among different algorithms are summarized in Table 5, where the texts in bold are the optimum results. 4. Conclusions In this work, we have proposed a robust image hashing based on ring division and image entropies. A key component of our hashing is the use of ring-based image entropies. As those pixels in each ring are almost
Z. Tang et al. / Signal Processing 93 (2013) 2061–2069
unchanged after image rotation, ring-based entropies are resilient to rotation operation and therefore the proposed hashing can tolerate image rotation. Experimental results have shown that our hashing is robust against normal content-preserving manipulations. Comparisons with some well-known hashing algorithm have indicated that our hashing has better performances than the compared algorithms in time complexity and classification between perceptual robustness and discriminative capability.
Acknowledgments This work was partially supported by the Natural Science Foundation of China (61165009, 60963008), the Guangxi Natural Science Foundation (2012GXNSFBA053166, 2012GXNSFGA060004, 2011GXNSFD018026, 0832104), the ‘Bagui Scholar0 Project Special Funds, the Project of the Education Administration of Guangxi (200911MS55), the Scientific Research and Technological Development Program of Guangxi (10123005–8), the Scientific and Technological Research Projects of Chongqing0 s Education Commission (KJ121310), the Scientific and Technological Program of Fuling District of Chongqing (FLKJ,2012ABA1056), the Scientific Research Foundation of Guangxi Normal University for Doctor Programs, and Innovation Project of Guangxi Postgraduate Education for Master Students (YCSZ2012058). The authors would like to thank the anonymous referees for their valuable comments and suggestions. References [1] C. Qin, C. Chang, P. Chen, Self-embedding fragile watermarking with restoration capability based on adaptive bit allocation mechanism, Signal Processing 92 (2012) 1137–1150. [2] R. Venkatesan, S.-M. Koon, M.H. Jakubowski, P. Moulin, Robust image hashing, in: Proceedings of the IEEE International Conference on Image Processing, 2000, pp.664 666.
2069
[3] J. Fridrich, M. Goljan, Robust hash functions for digital watermarking, in: Proceedings of the IEEE International Conference on Information Technology: Coding and Computing, 2000, pp.178 183. [4] F. Lefebvre, B. Macq, J.-D. Legat, RASH: radon soft hash algorithm, in: Proceedings of European Signal Processing Conference, 2002, pp. 299–302. [5] C.D. Roover, C.D. Vleeschouwer, F. Lefebvre, B. Macq, Robust video hashing based on radial projections of key frames, IEEE Transactions on Signal Processing 53 (2005) 4020–4036. [6] A. Swaminathan, Y. Mao, M. Wu, Robust and secure image hashing, IEEE Transactions on Information Forensics and Security 1 (2006) 215–230. [7] S.S. Kozat, R. Venkatesan, M.K. Mihcak, Robust perceptual image hashing via matrix invariants, in: Proceedings of the IEEE International Conference on Image Processing, 2004, pp. 3443–3446. [8] V. Monga, M.K. Mihcak, Robust and secure image hashing via nonnegative matrix factorizations, IEEE Transactions on Information Forensics and Security 2 (2007) 376–390. [9] Z. Tang, S. Wang, X. Zhang, W. Wei, S. Su, Robust image hashing for tamper detection using non-negative matrix factorization, Journal of Ubiquitous Convergence and Technology 2 (2008) 18–26. [10] Y. Ou and K.H. Rhee, A key-dependent secure image hashing scheme by using Radon transform, in: Proceedings of the IEEE International Symposium on Intelligent Signal Processing and Communication Systems, 2009, pp. 595 598. [11] F. Ahmed, M.Y. Siyal, V.U. Abbas, A secure and robust hash-based scheme for image authentication, Signal Processing 90 (2010) 1456–1470. [12] Z. Tang, S. Wang, X. Zhang, W. Wei, Structural feature-based image hashing and similarity metric for tampering detection, Fundamenta Informaticae 106 (2011) 75–91. [13] Z. Tang, Y. Dai, X. Zhang, Perceptual hashing for color images using invariant moments, Applied Mathematics & Information Sciences 6 (2012) 643S–650S. [14] X. Wang, X. Zhang, Point pattern matching algorithm for planar point sets under Euclidean transform, Journal of Applied Mathematics http://dx.doi.org/10.1155/2012/139014. Article ID 139014. [15] C.E. Shannon, A mathematical theory of communication, Bell System Technical Journal 27 (1948) 379–423, 623–656. [16] F.A.P. Petitcolas, Watermarking schemes evaluation, IEEE Signal Processing Magazine 17 (2000) 58–64. [17] USC-SIPI Image Database [online]. Available /http://sipi.usc.edu/ database/S. [18] Ground Truth Database [online]. Available /http://www.cs. washington.edu/research/imagedatabase/groundtruth/S. [19] T. Fawcett, An introduction to ROC analysis, Pattern Recognition Letters 27 (2006) 861–874.