Nonnegative sparse coding induced hashing for image copy detection

Nonnegative sparse coding induced hashing for image copy detection

Neurocomputing 105 (2013) 81–89 Contents lists available at SciVerse ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom ...

491KB Sizes 0 Downloads 133 Views

Neurocomputing 105 (2013) 81–89

Contents lists available at SciVerse ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Nonnegative sparse coding induced hashing for image copy detection Fuhao Zou, Hui Feng, Hefei Ling n, Cong Liu, Lingyu Yan, Ping Li, Dan Li School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China

a r t i c l e i n f o

abstract

Available online 8 October 2012

Among the existing hashing methods, the Self-taught hashing (STH) is regarded as the state-of-the-art work. However, it still suffers the problem of semantic loss, which mainly comes from the fact that the original optimization objective of in-sample data is NP-hard and therefore is compromised into the combination of Laplacian Eigenmaps (LE) and binarization. Obviously, the shape associated with the embedding of LE is quite dissimilar to that of binary code. As a result, binarization of the LE embedding readily leads to significant semantic loss. To overcome this drawback, we combine the constrained nonnegative sparse coding and the Support Vector Machine (SVM) to propose a new hashing method, called nonnegative sparse coding induced hashing (NSCIH). Here, nonnegative sparse coding is exploited for seeking a better intermediate representation, which can make sure that the binarization can be smoothly conducted. In addition, we build an image copy detection scheme based on the proposed hashing methods. The extensive experiments show that the NSCIH is superior to the state-ofthe-art hashing methods. At the same time, this copy detection scheme can be used for performing copy detection over very large image database. & 2012 Elsevier B.V. All rights reserved.

Keywords: Content based copy detection Semantic hashing Nonnegative sparse coding Support Vector Machine

1. Introduction With rapid development of Internet and digital image processing technologies, editing and distribution of digital image become very convenient. This in practice brings us lots of benefits but at the same time leads to numerous copyright infringing events due to near no cost of illegal usage. Therefore, it is urgent to exploit an automatic way to trace such events and then report them to the owners of the infringed images. Currently, there are two approaches, watermarking and content based copy detection (CBCD), which can be used to complete such tasks. In contrast with the watermarking, the CBCD technique will be more flexible since it does not need to embed any information into medium (like image, video, audio, etc.) prior to distribution. In addition, it is more robust against various perceptual distortions. Consequently, the CBCD technique has attracted much more attentions in the community of multimedia security [14] and multimedia retrieval [25,26]. For the CBCD technique, it is usually regarded as a branch of content based image retrieval since they comprised two similar parts: feature representation and retrieval based on the extracted features. However, there exists fundamental differences between these two techniques. For the former, it aims to find the variations of the query image which are the copies of query image with or

n

Corresponding author. E-mail address: [email protected] (H. Ling).

0925-2312/$ - see front matter & 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2012.06.042

without suffering some manipulations. While the objective of the latter is to search the ones similar to the query image, whose restriction is obviously much looser than that of the former. Moreover, it is worth noting that maybe the similar ones are not the copies and the copies are probably dissimilar. Thus, in the community of CBCD, lots of attention is paid on how to design a new feature which can well capture the visual information, especially the spatial distribution information, to effectively identify the copies and at the same time be robust against intentional or unintentional distortions. However, efficient retrieval skills are rarely taken into account. As the number of images is exponentially increased all around, it is especially essential to develop an efficient retrieval method to conduct the fast nearest neighbors search (NNS) over the database with millions or even billions of feature vectors. At present, there are mainly two kinds of methods, namely exact and approximate NNSs, which can be applied to NNS task. Generally speaking, although the former, which usually adopts the tree-like structure such as K–D tree [2] and R-tree [9], can return all expected nearest neighbors of the query vector, it will suffer from the curse of dimensionality and is not applicable to retrieve over large scale database. For the latter, it can offer good scalability by sacrificing small partial query quality. Among the approximate NNS methods, hashing based method is considered to be effective and efficient for NNS over large scale database. Therefore, considering the need of scalability of image copy detection application, we will concentrate on developing fast and scalable hashing algorithm for image copy detection here.

82

F. Zou et al. / Neurocomputing 105 (2013) 81–89

In principle, hashing based method first maps the test (i.e., query) vector and the training vectors in real-value space into binary codes lying in hamming space and then finds the nearest neighbors by evaluating the hamming distance between the binary codes associated with the test and each training vector. Generally, the performance of the hashing based method significantly depends on the extent of locality or semantic preserving which implies that the hamming codes of similar vectors should be close to each other and those of dissimilar vectors should be far apart after mapping. According to the principle of generating hash code, the existing hashing schemes can be classified into two categories, named as data-unaware and data-aware hashing methods, respectively. Obviously, data-aware hashing outperforms the data-unaware method due to its explicit optimization objective for seeking minimum semantic loss. In spite of this, the state-of-the-art data-aware hashing algorithms still suffer semantic loss problem which will directly affect the accuracy and efficiency of the copy detection. For example, the Spectral hashing [24] and Self-taught hashing [28], as the representatives of data-aware hashing algorithms, encounter semantic loss issue resulting from relaxing the restrictions to make the problem solvable. To tackle the limitation of the state-of-the-art data-aware hashing, this paper proposes a new hashing algorithm, called nonnegative sparse coding induced hashing, to produce hash codes for in-sample data and out-of-sample data (e.g., query vector). Following, based on the proposed hashing method, we develop an image copy detection scheme which can be applicable to large scale database. It is worth highlighting the characteristics of this paper as follows:

 To reduce the semantic loss and avoid NP-hard problem, when

 

training in-sample data, we never seek to directly map them from high-dimensional real value space to hamming space. Namely, the process of generating hash code is explicitly divided into finding an appropriate intermediate representation and then binarizing it. The nonnegative sparse coding (NNSC) and nonnegative LE are introduced to find an intermediate representations for insample data. Based on the proposed hashing method, we develop an image copy detection scheme which can be used for large scale database with better accuracy and efficiency.

The remainder of this paper is organized as follows. In Section 2, we review the state-of-the-art of the NNS. We present the NSCIH algorithm and then construct an image copy detection framework based on the NSCIH in Section 3. And then, in Section 4, the extensive experiments are conducted to demonstrate the effectiveness and efficiency of the proposed algorithm. Finally, we draw the conclusions in Section 5.

2. Related works As pointed out in the previous section, the focus of this paper is mainly on NNS due to the fact that feature extraction has been intensively investigated in the field of CBCD. So, in the following, we will skip the survey of the feature extraction and mainly review the method related to NNS, particularly in terms of the hashing based method. In the early period of NNS research, all kinds of NNS methods based on tree-like structure such as R-tree [9], K–D tree [2], SRtree [13], navigating-nets [15] and cover-tree [3] have successively emerged which return accurate results and are named as exact NNS. For this class of methods, it can perform very well by

making use of pre-constructed structures such as space-partition or data-partition indices in the case of low-dimension feature space. However, it has been shown that once dimensionality exceeds about 10, its performance dramatically degrades and is even worse than the naive method such as brute-force or linearscan one [23]. Namely, this class of methods suffers from curse of dimensionality. Actually, for a large number of applications including the CBCD, it is unnecessary to return all exact results. This makes it possible to handle the curse of dimensionality at the price of a little partial of query quality. Locality sensitive hashing (LSH) proposed by Indyk [6] is well known as pioneer and milestone work to execute efficient approximate NNS over high-dimensional feature space. LSH first adopts the random projections followed by thresholding to convert high dimension vectors into compact binary codes, which ensures that the close vectors are mapped into similar codes with a high probability while the distant vectors are mapped into similar codes with a low probability. By computing the hamming distance between binary codes, it can efficiently find the nearest neighbors of the query object in hamming space. However, since the construction of hash functions for LSH is data independent, i.e., data unaware, LSH may lead to quite inefficient codes in practice. To obtain high query accuracy, LSH method usually uses multiple hash tables. Experimental studies show that LSH method often needs over a hundred or even several hundreds hash tables to achieve good query accuracy for high-dimensional vectors (see the literature [29]). Since the volume of each hash table is proportional to the number of vectors, LSH is space-consuming which prevents it from being applied to large scale database. To reduce the memory consumption of LSH [6], several variations of LSH such as Entropy-LSH [19] and Multi-probe LSH [17] have been proposed, which decrease the number of hash tables used for indexing and guarantee accuracy by designing novel query strategies. Despite the fact that the two methods markedly reduce the space cost than LSH, space consumption is still regarded as a bottleneck since they belong to query directed method and the accuracy of hash codes is not improved at all. To overcome the space-consuming issue occurred within LSH and its variations, several researchers intend to design data-aware hashing [22] by introducing the machine learning tricks. Different from the data-unaware hashing, the data-aware hashing has explicit optimization objective over the known data to minimize the semantic loss. Accordingly, it can produce the hash codes with superior semantic preserving. For instance, in [20], the authors introduced the stacked Restricted Boltzmann Machine (RBM) to yield hash codes and demonstrated that it was indeed applicable to generate compact binary codes to speed up NNS. Meanwhile, the boosting approach was exploited to yield the hash codes such as Sensitive Coding (SSC) [21] and Forgiving Hashing (FgH) [4]. In [21], it has been demonstrated that both stacked-RBM and boosting-SSC are superior to LSH in the case of applying to a database with tens of millions of images. Recently, two outstanding hashing methods, namely, Spectral hashing (SpH) [24] and Self-taught hashing (STH) [28], have been proposed which theoretically formulate the hashing problem as spectral graph partition. These two hashing methods are considered as the stateof-the-art among the existing hashing methods. However, these two methods still encounter semantic loss issue. This is because some restrictions are eliminated to make the original objective function (i.e., NP-hard problem) solvable, which leads to large semantic loss. On the whole, though hashing based method is considered as the effective approach to approximate NNS applications like CBCD, semantic loss issue remains the big challenge in the area of hashing. Motivated by this, we introduce the NNSC and SVM to design a novel hashing algorithm with superior semantic preserving.

F. Zou et al. / Neurocomputing 105 (2013) 81–89

3. Image copy detection based on NSCIH 3.1. The framework of image CBCD The proposed image copy detection framework is illustrated in Fig. 1, which contains the online and offline stages. In the offline stage, the training images usually collected from Internet in crawling manner are first converted into high-dimensional features X ¼ ½x1 ,x2 , . . . ,xn   Rmn (n is the number of training images) using the typical feature extraction method (e.g., Ordinal measures of AC coefficients [14]). Next, the NNSC-Learning procedure will be used to compute the nonnegative sparse codes Y ¼ ½y1 ,y2 , . . . ,yn   Rln of X, followed by the binarization for mapping Y from nonnegative real values to binary codes ‘ ¼ ½‘1 , ‘2 , . . . , ‘n   f1,1gln (i.e., hash codes). After that based on the mapping relationship between matrixes X, ‘, we adopt the SVM-Learning procedure to learn l Support Vector Machines f ðÞ ¼ ff 1 ðÞ,f 2 ðÞ, . . . ,f l ðÞg, also called hash function, which will be exploited to yield hash code for out-of-sample data. In the online stage, we adopt the same method to extract the feature of test image and then compute its hash code using the hash function f ðÞ. In the end, we perform the copy detection in which the hamming distances of binary codes corresponding to test and training images are evaluated in efficient XOR operation and then the objects whose distances are smaller than the predefined threshold are returned as query results. 3.2. The scheme of NSCIH 3.2.1. Computing hash codes for in-sample data As shown in Fig. 1, the proposed NSCIH adopts the NNSC to predict the intermediate representation of X instead of using Laplacian-Eigenmaps (LE) [1] used in STH [28] for the purpose of reducing the semantic loss. The reasons why the proposed method can effectively reduce the semantic loss are as follows. Let us look at the objective function of in-sample data in STH as follows: traceðY T LYÞ

min : Y

s:t: :

Y i,j A f1,1g

YT1 ¼ 0 YTY ¼ I

ð1Þ

where L is the Laplacian matrix of the training matrix X. It is evident that this objective is NP-hard. To make it solvable, the constraint of Y i,j A f1,1g is removed. Namely, the process to train the hash codes for in-sample data is equivalent to the combination of LE and binarization. Due to the fact that the shape associated with the LE embedding is quite dissimilar to that of binary code, the following binarization definitely leads

to large semantic loss. In addition, this impact will result in further semantic loss when generating hash code for out-ofsample data. To overcome the above issue, we intend to develop a new intermediate representation which is not only quite close to binary values but also well preserves the semantic structure of the original data. Inspired from the work proposed by Cai [5], we develop a novel nonnegative sparse coding, which can be realized as follows. First, by exploiting the nonnegative-LE embedding of original data X, we learn a basis set B which well captures the local structure of the data X. And then, on top of it, we obtain a nonnegative sparse representation Y of X which not only approximately takes on the discreted status but also preserves the local structure. Although the proposed method is similar to the work of literature [5], there exists a significant difference between them, which is that we impose nonnegativity and the specified sparserate constraints for adapting to the hashing requirements. In the following, we will elaborate on how to achieve such kind of intermediate representation by resorting to the sparse coding. Sparse coding [16] is widely used for signal reconstruction and denoising. Generally, its optimization objective is expressed as minJXBYJ2F þ lJYJ1

where X ¼ ½x1 ,x2 , . . . ,xn  A Rmn is the matrix in the original space, B ¼ ½b1 ,b2 , . . . ,bl  A Rml is a basis matrix, Y ¼ ½y1 ,y2 , . . . ,yn  A Rln is a sparse coefficient matrix, J:JF is the Frobenius matrix norm and l 40 is regularized parameter used to control the sparsity of Y. Obviously, the coefficient matrix Y is far from the expected intermediate representation. To that end, the implementations of obtaining the basis matrix B and sparse representation Y are as follows. (A) Computing basis matrix B. For the sake of preserving the semantic information during learning sparse representation Y in the following and simultaneously making the calculation of B convenient, we start by estimating a coarse solution of Y by using the nonnegative-LE and then use it to predict the basis matrix B. Let Yc be the coarse solution of Y and W be the affinity matrix, which is used to capture the neighborhood relationship structure (also known as semantic information) between columns of X. To make sure the sparsity of matrix W, for any column in X, only the k-nearest neighbors are considered to compute similarity. Moreover, the heat kernel equation is selected to measure similarity. Then, the affinity matrix can be expressed as ! 8 2 > < exp Jxi xj J t Wði,jÞ ¼ > : 0

y1 y2 y3

NNSC-Learning

xn

Binarization

x3

ð2Þ

Y

x1 x2 Feature Extraction

83

if xi A Nk ðxj Þ or xj A N k ðxi Þ otherwise

0111 0110 1110 …

yn



1001

Training Images SVM-Learning

Test Image

Hash Function Feature Extraction

x

f 1 ( ), f 2 ( ),

, fl ( )

1110

Search

CBCD results Fig. 1. The proposed framework for CBCD.

84

F. Zou et al. / Neurocomputing 105 (2013) 81–89

where Nk(x) represents the set of k-nearest-neighbors of column x. Yc can be found by minimizing the following objective function: Yc

Y c DY Tc ¼ I,Y c g0

s:t: :

ð3Þ

where L ¼ DW and D is a diagonal matrix whose items are P column (or row, since W is symmetric) sums of W, Dii ¼ j W ij . That is, the Yc is the nonnegative-LE embedding of X. Assume that Y~ ¼ Y c D1=2 . Eq. (3) can be rewritten as T Y~ D1=2 LD1=2 Y~

min : Y~

T Y~ Y~ ¼ I, Y~ g0

s:t: :

ð4Þ

where n is the Hadamard product, M ¼ D1=2 LD1=2 , and M þ ,M  can be respectively defined as  M ij if M ij 40 M ijþ 9 0 otherwise and (

9M ij 9

if Mij r 0

0

otherwise

Once obtaining the Y~ , the Yc can be evaluated using Y c ¼ Y~ D1=2 . In principle, the NNSC is viewed as the special case of nonnegative matrix factorization (NMF). Namely, X  BY and BT B ¼ BBT ¼ I. Accordingly, the Y can be approximately represented as Y  BT X. Based on the assumption that YTc is the coarse solution of Y, we have Y c  X T B. To make the latter obtained sparse representation preserve the semantic information well, we learn the basis B which can best fit Yc and therefore be achieved by solving the optimization problem as follows: min : JY c X T BJ2 þ aJBJ2 B

ð6Þ

where a is the regularization parameter and the second part is regularized term for avoiding over-fitting. By setting the derivative of Eq. (6) with respect to B to zero, we can obtain the optimal solution: Bn ¼ ðXX T þ aIÞ1 XY c

ð7Þ

where I is the unit matrix. If the dimensionality of the images feature is very high, computation of the inverse matrix XX T þ aI is quite expensive. To lower the computational cost, like literature [5], we can exploit some iterative algorithms to directly solve the problem in Eq. (7) like LSQR [18]. (B) Computing sparse representation Y. Now, we turn to calculate nonnegative Y with desired sparse rate. To achieve this goal, like literature [10], we impose some constraints on Eq. (2) and have the new optimization objective in the form: min : Y

s:t: :

JXBYJ2F þ lJYJ1 Yk0,sr  50%

ð8Þ

Here, sr  50% restricts that sparse rate is approximately 50% and Yk0 makes sure that any element in matrix Y is nonnegative. Since the parameter B is known now, the above-mentioned problem will become a L1-regularized least squares problem. For convenience of description, without loss of generality, the optimization object is transformed to vector level. Thus, for any

ð9Þ

Also, as used in [16], by introducing the sign vector yi A f1,0,1gl (where yi ¼ signðyi Þ), Eq. (9) will be represented as min : Jxi Byi J2 þ lyTi yi yi

ð10Þ

By setting the derivative of Eq. (10) with respect to yi to zero, we can get the analytic resolution of yi: y^i :¼ ðBT BÞ1 ðBT xi lyi =2Þ

By referring to the literatures [7,27], the update rules for Y~ , which can converge to a local minimum, can be represented in the form: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u uY~ M þ Y~ M þ Y~ T Y~ Y~ ’Y~ nt ð5Þ T Y~ M þ þ Y~ M  Y~ Y~

M ij 9

min : Jxi Byi J2 þ lJyi J1 yi

Y c LY Tc

min :

vector yi A Y, its objective function can be reformulated as

ð11Þ

To make sure the nonnegativity and sparsity of yi as well as efficiency, we propose an improved feature-sign search algorithm to solve Eq. (11) based on the work of [16], which is considered as the top algorithm for efficient sparse coding but not directly applicable to Eq. (10). The detail algorithm is described in Algorithm 1 (see Appendix A). Compared to the original featuresign search, the improved version adopts two modifications to achieve nonnegativity: one is that the negative elements will not be considered as members of active set. Another is that the negative element is by force set to zero after the new yi has been yielded in each iterative step. After obtaining the intermediate representation yi, for the sake of efficient search, the yi will be binarized into binary code ‘i under the following criterion: For any item yij in yi, if yij 4 0, then ‘ij ¼ 1 and  1 otherwise. 3.2.2. Construction of hash function for out-of-sample data Suppose we have original matrix X ¼ ½x1 ,x2 , . . . ,xn  and its associated binary code matrix ‘ ¼ ½‘1 , ‘2 , . . . , ‘n . By exploring the mapping relationships between X and ‘, we plan to learn l binary classification functions f ðÞ ¼ ff 1 ðÞ,f 2 ðÞ, . . . ,f l ðÞg (i.e., hash function) to predict hash codes for out-of-sample data. Here, we adopt linear Support Vector Machines (linear-SVM ) proposed in [12] as classification function because it has linear training time and therefore is very suitable for large datasets. Given the vectors x1 ,x2 , . . . ,xn associated with its binary labels of the p-th bit ðpÞ ðpÞ T ‘ðpÞ 1 , ‘ 2 , . . . , ‘ n , the p-th function f p ðxÞ ¼ signðwp xÞ can be obtained by solving the following optimization problem: min :

wp , xji Z 0

s:t: :

n 1 T Cp X wj wp þ x 2 n i ¼ 1 pi ðpÞ

8ni¼ 1 : ‘i wTp xi Z 1xpi

ð12Þ

where wp is the projection vector, xpi is the slack variable, and Cp is the scale factor. Thus, f ðxÞ ¼ signðW T xÞ, where W ¼ ½w1 ,w2 , . . . ,wl . 3.2.3. Time complexity analysis Assume that s denote the average number of non-zero elements per vector, l r m. In the first stage, constructing the k-nearest neighbors graph needs Oðn2 s þ n2 kÞ time, computing the smallest l eigenvectors of the eigenproblem in Eq. (4) needs O(lnk) time, calculating B by solving Eq. (7) with LSQR method needs O(lns) time [18], computing the entire solution path of problem in Eq. (10) 3 2 using LARs [8] needs Oðl þml Þ time, and the binarization needs O(ln) time. In the second stage, each of the l linear SVM classifiers can be trained in O(sn) time or even less, thus all training can be done in O(lns) time [11]. Considering l 5 n, the time complexity of 3 2 our NSCIH algorithm is Oðn2 s þ n2 kþ l þml Þ.

4. Experiments We conduct a series of experiments to verify the proposed approach’s accuracy, efficiency and sensitivity against varying

F. Zou et al. / Neurocomputing 105 (2013) 81–89

of the parameters such as code length, detecting threshold. To exhibit the performance of the proposed approach, the two classical works, SpH and STH, are selected as comparative objects.

4.1. Dataset description The experiments are operated on four publicly available image database: CIFAR-10, MNIST, TINY and Caltech-101. The short description related to such four databases is as follows: CIFAR-10: The CIFAR-10 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.1 The CIFAR-10 dataset consists of 60,000 32  32 color images in 10 classes, with 6000 images per class. MNIST: The MNIST database of handwritten digits, available from this page,2 has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. TINY: The Tiny Images dataset,3 which consists of 79,302,017 images, each being a 32  32 color image. This data is stored in the form of large binary files which can be accessed by a Matlab toolbox written by Rob Fergus, etc. Caltech-101: Caltech-101 is collected by Fei-Fei Li, etc.,4 and have 101 categories objects. About 40–800 images per category. Most categories have about 50 images, forming a median-scale data set (  9 K images). The size of each image is roughly 300  200 pixels.

4.2. Evaluation The copy detection is processed as follows: First, except the image dataset TINY (from which only 1 million images are randomly selected as the training images), the other three image datasets are directly used as training image datasets. Second, we randomly select 50 images from each training image dataset as test images, and then attack them using the stirmark attack tool5 for producing the copies of test images (the selected 18 attacks are listed in Table 1). After that we insert the obtained copies back to the corresponding image dataset. Namely, the number of each training image dataset is increased by 900. Third, we extract the features for test and training image using the feature extraction method proposed by Kim [14]. Note that we adopt the 16  16 blocks partition setting rather than that of 8  8 used in [14] and select the first 128 coefficients to calculate the ordinal measure. That is, the dimensionality of the obtained features is 128. Next, for each hashing method, we learn the hash codes for training feature dataset and hash function for test feature over four training feature datasets respectively. Finally, we use the hash code of test images as query to search nearest neighbors, and then compute standard query performance measures: precision, recall, and F1-measure, which are defined as Precision ¼

Recall ¼

1 2 3 4 5

number of copies detected whose d o t number of detection whose d ot

number of copies detected whose d o t number of total copies

http://www.cs.toronto.edu/ kriz/cifar.html http://yann.lecun.com/exdb/mnist/ http://horatio.cs.nyu.edu/mit/tiny/data/index.html http://www.vision.caltech.edu/archive.html http://www.petitcolas.net/fabien/watermarking/stirmark

ð13Þ

ð14Þ

85

Table 1 Attacks towards the test image. Attack no.

Attack type

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Scaling 0.50 Scaling 1.50 Sharpening 3  3 Color reduction 2  2 Median filtering 4  4 Median filtering Gaussian filtering 3  3 JPEG compression 10 JPEG compression 30 JPEG compression 40 JPEG compression 60 JPEG compression 70 JPEG compression 80 Change aspect ratio – scale.x0.8 scale.y1.0 Rotation  2.00 with cropping and scaling Rotation 1.00 with cropping and scaling Shearing – x-direction 5.0% y-direction 0.0% Shearing – x-direction 0.0% y-direction 1.0%

where d is the hamming distance between binary codes associated with test image and that of training image, and t is the predefined threshold. The F1-measure, also called F1-score, is the harmonic mean of precision and recall: F1measure ¼ 2 

Precision  Recall Precisionþ Recall

ð15Þ

The settings of the related parameters are as follows. Among the three hashing methods, the parameter k is set to 20 when constructing the k-nearest-neighbors graph for LE (i.e., used in SpH and STH) and nonnegative-LE (i.e., appeared in NSCIH). The SVM implementation is from LIBLINEAR with the default parameter values [12]. The parameter a in Eq. (7) and l in Eq. (8) were empirically set as 103 and 0.1, respectively. The setting of the code length l and detection threshold t will be introduced in the related sections. 4.3. Results Fig. 2 presents the F1-measure of NSCIH for detecting copies over the four training datasets respectively. To verify the impact of code length and detection threshold, we change the code length from 4-bit to 96-bit and the detecting threshold (i.e., the maximum query radius between hash code of any retrieved image and that of the query image) from 0 to 3. From Fig. 2, we can find two phenomena: One is that with the detection threshold increasing, the peak point of the F1-measure increases. Another is that before reaching the peak, the F1-measure keeps rising as the code length increases. However, as stated in [28], the longer binary codes demand more memory and a larger query radius requires more computation cost. So, the optimal setting of the code length and the detecting threshold can be obtained through balancing the effectiveness and efficiency. Fig. 3 shows the comparison results between NSCIH and the other two hashing methods, SpH and STH, in terms of precision– recall curves over training datasets. The results are generated by changing the code length from 4-bit to 96-bit while fixing query radius (i.e., detection threshold) to 1. It can be observed that the NSCIH is totally superior to the SpH and STH under the same experimental environment. This improvement mainly comes from that binarization of intermediate representation generated by the proposed NNSC leads to less semantic loss than that of produced by LE. In addition, by setting parameters t ¼1 and l ¼50, we measure the time cost of the three hashing methods for training and online

86

F. Zou et al. / Neurocomputing 105 (2013) 81–89

0.8

d< = 0 d< = 1 d< = 2 d< = 3

0.4 F1 measure

0.6 F1 measure

0.5

d< = 0 d< = 1 d< = 2 d< = 3

0.7

0.5 0.4 0.3 0.2

0.3 0.2 0.1

0.1 0

0 20

40 60 code length

80

20

80

0.5

0.8 d< = 0 d< = 1 d< = 2 d< = 3

0.6

d< = 0 d< = 1 d< = 2 d< = 3

0.4 F1 measure

0.7 F1 measure

40 60 code length

0.5 0.4 0.3 0.2

0.3 0.2 0.1

0.1 0

0 20

40 60 code length

80

20

40 60 code length

80

Fig. 2. The F1-measure of NSCIH for detecting copies of query image. (a) CIFAR-10, (b) MNIST, (c) TINY and (d) Caltech-101.

1

1 NSCIH SpH STH

0.6 0.4

0.6 0.4 0.2

0.2

0

0 0

0.2

0.4 0.6 recall

0.8

1

0

1

0.2

0.4 0.6 recall

0.8

1

1 NSCIH SpH STH

NSCIH SpH STH

0.8 precision

0.8 precision

NSCIH SpH STH

0.8 precision

precision

0.8

0.6 0.4 0.2

0.6 0.4 0.2

0

0 0

0.2

0.4 0.6 recall

0.8

1

0

0.2

0.4 0.6 recall

0.8

1

Fig. 3. The precision–recall curve for detecting copies of query image. a) CIFAR-10, (b) MNIST, (c) TINY and (d) Caltech-101.

query, which is listed in Table 2. According to the observation, the SpH and STH are more efficient than NSCIH when generating hash codes for training images. However, in the aspect of online query,

their response time is very close. Meanwhile, we vary the data size of training datasets over the TINY dataset to generate four training datasets by setting the data size as 5000, 10,000, 15,000,

F. Zou et al. / Neurocomputing 105 (2013) 81–89

Table 2 The time cost of the proposed approach comparing with SPH and STH. Database

CIFAR-10 MNIST TINY Caltech101

NSCIH

SpH

STH

Training (s)

Query (ms)

Training (s)

Query (ms)

Training (s)

Query (ms)

114.43 661.97 11562.99 93.95

10 14 143 6

37.9864 74.618 233.70 9.031

8 15 137 5

107.61 11 519.46 13 12 501.47 140 89.54 7

87

affect its scalability to large database since it can be done in an offline way. In the last, we verify the scalability of the NSCIH by examining the varying trend of the response time versus data size. We construct four datasets based on the TINY training dataset by fixing the data size as 1000, 10,000, 100,000 and 106 , respectively. Setting that t ¼1 and l ¼50, we compute the response time over the four constructed datasets. Fig. 5 shows the query time of the proposed method. These results show that the response time increases very slow though the data size dramatically increases. This implies that the NSCIH can be salable to very large database.

5. Conclusion and future work NSCIH

700

Training time (s)

600 500 400 300 200 100 5,000

10,000 15,000 20,000 Data size (number of training images)

25,000

Fig. 4. The time cost for training under the different database size (number of training images) settings.

Acknowledgments

200 NSCIH 180 160 Average query time (ms)

In this paper, we propose a new hashing method based on the nonnegative sparse coding and the SVM. By analyzing the principle of the state-of-art work like STH, we find that its implementation of generating hash code for in-sample data is equivalent to combination of LE and binarization. Namely, the LE is adopted to seek an intermediate representation. Since the shape of LE embedding is quite dissimilar to that of binary code, binarizing the LE embedding readily leads to significant semantic loss. To overcome this drawback, we propose an alternative of LE by sophisticatedly integrating the nonnegative LE and nonnegative sparse coding. Based on the alternative, we construct a hash scheme for in-sample data. And then, according to the mapping relationship between the in-sample data and its binary codes, we use the linear-SVM to generate a hash function for out-of-sample. In the end, we build an image copy detection framework based on the proposed hashing method. The extensive experiments show that the NSCIH is superior to the state-of-the-art hashing methods. At the same time, this copy detection scheme can be used for conducting copy detection over very large image database.

140

This work is supported by the NSF of China under Grant No. 61272409, 60873226 and 60803112, the Fundamental Research Funds for the Central Universities with Grant No. 2011QN050 and Wuhan Youth Science and Technology Chenguang Program.

120

Appendix A. The improved feature-sign search algorithm

100 80

Algorithm 1. The improved feature-sign search algorithm.

60

! ! 1: Initialize yi ¼ 0 , yi ¼ 0 , and active set :¼ fg, where yij A f1,0,1g denotes sign(yij);

40 20 0 1,000

2

Byi J 2: From zero coefficients of yi, select j ¼ argmaxj 9 @Jxi@y 9; ij

10,000

100,000

1 million

Data size (number of images) Fig. 5. The average query time per query image as database size (number of images).

20,000, 25,000, respectively. Keeping the same parameter settings, we calculate the training time over the four datasets. Fig. 4 presents the training time of the proposed method. It can be observed that training time rises very quickly as the data size increases. Due to the fact that the efficiency of the hashing method is mainly determined by its response time, the larger computational cost of the NSCIH in the training stage will not

Activate yij(add j to the active set) only if it locally improves the objective and yij 40, namely: 2

Byi J If @Jxi@y ol, then set yij ¼ 1, active set :¼ fjg[ active set; ij

3: Feature-sign step: ^ be a submatrix of B that contains only the columns Let D corresponding to the active set; Let y^ and y^ be subvectors of y and y corresponding to the i

i

i

i

active set; Compute the analytical solution to the resulting T

unconstrained QP(miny^i Jxi B^ y^i J2 þ ly^i y^i ): T new ^ 1 ðB^ T xi lyi =2Þ; :¼ ðB^ BÞ y^i

88

F. Zou et al. / Neurocomputing 105 (2013) 81–89

new Set the negative entries in y^i to zero; Perform a discrete line search on the closed line segment new from y^i to y^i : new (1) Check the objective value at y^i and all points where any coefficient changes sign; (2) Update y^i (and the corresponding entries in yi) to the point with the lowest objective value; Remove zero coefficients of y^i from the active set and update yi :¼ signðyi Þ; 4: Check the optimality and termination conditions: (a) Termination condition : sr ¼ SparseRateðyi Þ (function SparseRateðÞ computes the zero rate of yi ); If roundðsr  100Þ ¼ 50 then return yi as the solution; (b) Optimality condition for nonzero coefficients: @Jxi Byi J2 @yij

þ l signðyij Þ ¼ 0, 8yij a0;

If condition (b) is not satisfied, go to Step 3 (without any new activation), else check condition (c); (c) Optimality condition for zero coefficients: 2

Byi J 9 o l, 8yij ¼ 0 9 @Jxi@y ij

If condition (c) is not satisfied, go to Step 2, otherwise return yi as the solution.

References [1] M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput. 15 (2003) 1373–1396. [2] J. Bentley, K–d trees for semidynamic point sets, in: Proceedings of the Sixth Annual Symposium on Computational Geometry, VLDB’07, ACM, 1990, pp. 187–197. [3] A. Beygelzimer, S. Kakade, J. Langford, Cover trees for nearest neighbor, in: Proceedings of the 23rd International Conference on Machine Learning, 2006, Pittsburgh, PA, USA, pp. 97–104. [4] D.M. Blei, A.Y. Ng, M.I. Jordan, Latent Dirichlet allocation, J. Mach. Learn. Res. 3 (2003) 993–1022. [5] D. Cai, H. Bao, X. He, Sparse concept coding for visual analysis, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2011, Colorado Springs, CO, USA, pp. 2905–2910. [6] M. Datar, P. Indyk, N. Immorlica, V.S. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, in: Proceedings of the Annual Symposium on Computational Geometry, 2004, Brooklyn, NY, USA, pp. 253–262. [7] C. Ding, T. Li, W. Peng, H. Park, Orthogonal nonnegative matrix t-factorizations for clustering, in: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), vol. 2, New York, NY, USA, pp. 126–135. [8] B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression, Ann. Stat. 32 (2004) 407–499. [9] A. Guttman, R-trees: a dynamic index structure for spatial searching, in: Proceedings of the ACM Conference on Management of Data, 1984, vol. 1, Atlanta, GA, USA, pp. 47–57. [10] P. Hoyer, Non-negative sparse coding, in: Proceedings of the 2002 IEEE workshop on Neural Networks for Signal Processing, 2002, Piscataway, NJ, USA, pp. 557–565. [11] T. Joachims, Learning to Classify Text Using Support Vector Machines, Methods, Theory and Algorithms, The Springer International Series in Engineering and Computer Science, Springer, vol. 668, 2002. [12] T. Joachims, Training linear SVMs in linear time, in: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2006), Philadelphia, PA, USA, pp. 217–226. [13] N. Katayama, S. Satoh, The SR-tree: an index structure for high-dimensional nearest neighbor queries, in: Proceedings of the ACM Conference on Management of Data, 1997, vol. 26, USA, pp. 369–380. [14] C. Kim, Content-based image copy detection, Signal Process.: Image Commun. 18 (2003) 169–184. [15] R. Krauthgamer, J.R. Lee, Navigating nets: simple algorithms for proximity search, in: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms, 2004, vol. 15, New Orleans, LA, USA, pp. 791–800. [16] H. Lee, A. Battle, R. Raina, A.Y. Ng, Efficient sparse coding algorithms, in: Advances in Neural Information Processing Systems 2006, vol. 2, Nice, France, pp. 801–808. [17] Q. Lv, W. Josephson, Z. Wang, M. Charikar, K. Li, Multi-probe LSH: efficient indexing for high-dimensional similarity search, in: Proceedings of the 33rd International Conference on Very large data bases, VLDB’07, VLDB Endowment, 2007, pp. 950–961.

[18] C.C. Paige, M.A. Saunders, LSQR: an algorithm for sparse linear equations and sparse least squares, ACM Trans. Math. Software 8 (1982) 43–71. [19] R. Panigrahy, Entropy based nearest neighbor search in high dimensions, in: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms, Miami, 2006, FL, USA, pp. 1186–1195. [20] R. Salakhutdinov, G. Hinton, Semantic hashing, Int. J. Approx. Reason. 50 (2009) 969–978. [21] G. Shakhnarovich, P. Viola, T. Darrell, Fast pose estimation with parametersensitive hashing, in: Proceedings of the IEEE International Conference on Computer Vision, 2003, vol. 2, Nice, France, pp. 750–757. [22] J. Song, Y. Yang, Z. Huang, H.T. Shen, R. Hong, Multiple feature hashing for real-time large scale near-duplicate video retrieval, in: Proceedings of the 19th ACM International Conference on Multimedia (ACM MM 2011), 2011, Scotts-dale, Arizona, USA, pp. 423–432. [23] R. Weber, H.J. Schek, S. Blott, A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces, in: Proceedings of the 24th International Conference on Very-Large Databases, 1998, San Francisco, CA, USA, pp. 194–205. [24] Y. Weiss, A. Torralba, R. Fergus, Spectral hashing, in: Proceedings of the 22nd Annual Conference on Neural Information Processing Systems, 2008, pp. 1753–1760. [25] Y. Yang, F. Nie, D. Xu, J. Luo, Y. Zhuang, Y. Pan, A multimedia retrieval framework based on semi-supervised ranking and relevance feedback, IEEE Trans. Pattern Anal. Mach. Intell. 34 (2012) 723–742. [26] Y. Yang, Y.T. Zhuang, F. Wu, Y.H. Pan, Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval, IEEE Trans. Multimedia 10 (2008) 437–446. [27] S. Zafeiriou, N. Laskaris, Nonnegative embeddings and projections for dimensionality reduction and information visualization, in: Proceedings of the International Conference on Pattern Recognition, 2010, Istanbul, Turkey, pp. 726–729. [28] D. Zhang, J. Wang, D. Cai, J. Lu, Self-taught hashing for fast similarity search, in: Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2010, Geneva, Switzerland, pp. 18–25. [29] W. Zhang, K. Gao, Y.D. Zhang, J.T. Li, Data-oriented locality sensitive hashing, in: Proceedings of the International Conference ACM Multimedia 2010, Firenze, Italy, pp. 1131–1134.

Fuhao Zou received B.E. degree in computer science from Huazhong Normal University, Wuhan, Hubei, China, in 1998. And received M.S. and Ph.D. in computer science and technology from Huazhong University of Science and Technology (HUST), Wuhan, Hubei, China, in 2003 and 2006. Currently, he is an Associate Professor with the College of Computer Science and Technology, HUST. His research interests include digital watermarking, digital right management and copy detection.

Hui Feng received the B.E. degree in computer science and energy and power engineering from Wuhan University of Technology, Wuhan, Hubei, China, in 2004, and currently pursuing the Ph.D. degree in computer science, Huazhong University of Science and Technology (HUST), Wuhan, Hubei, China. He was a software engineering of Fawcar Company from 2004 to 2007. Now, his research interests include digital fingerprinting and digital right management. Feng received the scholarship from HUST from 2008 to 2010.

Hefei Ling received the B.E. and M.S. degrees in energy and power engineering from Huazhong University of Science and Technology (HUST), Wuhan, China, in 1999 and 2002 respectively, and the Ph.D. degree in computer science from HUST in 2005. Since 2006, he has been an Associate Professor with the College of Computer Science and Technology, HUST. From 2008 to 2009, he joined in the Department of Computer Science, University College London (UCL) as a Visiting Scholar. His research interests include digital watermarking and fingerprinting, copy detection, content security and protection. Ling has co-authored over 50 publications including journal and conference papers.

F. Zou et al. / Neurocomputing 105 (2013) 81–89 He received Excellent Ph.D. dissertation of HUST in 2006 and Foundation Research Contribution Award of HUST in 2005, the best graduate student award from HUST, Wuhan, China in 1999.

Cong Liu received the B.E. degree in the College of Computer Science and Technology from Wuhan University of Science and Technology (WUST), Wuhan, China, in 2009. He is now a Ph.D. candidate in the College of Computer Science and Technology. His current research interests approximate similarity search, manifold learning and sparse representation.

Lingyu Yan received her B.S. degree in mechanical engineering and M.S. degree in software engineering from Huazhong University of Science and Technology (HUST), Wuhan, China, in 2008 and 2010 respectively, and currently pursuing the Ph.D degree in computer science, Huazhong University of Science and Technology (HUST). Her research interests include copy detection, video representation, etc.

89 Ping Li received the B.E. and M.S. degrees in computer engineering from Huazhong University of Science and Technology (HUST), Wuhan, China, in 1996 and 1999 respectively, and the Ph.D. degree in computer science from HUST in 2010. Since 2000, he has been a Lecturer with the College of Computer Science and Technology, HUST. Her research interests include digital rights management (DRM), copy detection.

Dan Li received the B.E. and M.S. degrees in mechanical engineering from Huazhong University of Science and Technology (HUST), Wuhan, China, in 1998 and 2002 respectively, and the Ph.D. degree in computer science from HUST in 2008. Since 2003, she has been a Lecturer with the College of Computer Science and Technology, HUST. Her research interests include computer animation and geometric modeling.