J. Vis. Commun. Image R. 40 (2016) 847–851
Contents lists available at ScienceDirect
J. Vis. Commun. Image R. journal homepage: www.elsevier.com/locate/jvci
Unsupervised discriminative hashing q Kun Zhan a,⇑, Junpeng Guan a, Yi Yang a, Qun Wu b a b
School of Information Science and Engineering, Lanzhou University, Lanzhou, China School of Art and Design, Zhejiang Sci-Tech University, Hangzhou, China
a r t i c l e
i n f o
Article history: Received 15 March 2016 Revised 10 July 2016 Accepted 20 August 2016 Available online 26 August 2016 Keywords: Unsupervised discriminative hashing Out-of-sample extrapolation Manifold learning
a b s t r a c t Hashing is one of the popular solutions for approximate nearest neighbor search because of its low storage cost and fast retrieval speed, and many machine learning algorithms are adapted to learn effective hash function. As hash codes of the same cluster are similar to each other while the hash codes in different clusters are dissimilar, we propose an unsupervised discriminative hashing learning method (UDH) to improve discrimination among hash codes in different clusters. UDH shares a similar objective function with spectral hashing algorithm, and uses a modified graph Laplacian matrix to exploit local discriminant information. In addition, UDH is designed to enable efficient out-of-sample extension. Experiments on real world image datasets demonstrate the effectiveness of our novel approach for image retrieval. Ó 2016 Elsevier Inc. All rights reserved.
1. Introduction Hashing-based approximate nearest neighbor search has become popular due to its promising performance in terms of efficiency and accuracy [1,2]. The performance of nearest neighbors based algorithms can be significantly improved by exploiting a similarity measure and learning the similarity measure is closely related to the problem of feature learning [3–5]. A feasible way is to embed high-dimensional features into a low-dimensional Hamming space where similar items can be efficiently searched [6], which is usually performed by multiplying the feature by a projection matrix, subtracting a threshold and retaining the sign of the result. Locality sensitive hashing (LSH) algorithms are proposed for an approximate nearest neighbor search [7–9], but LSH is not stable and leads to bad results due to its randomized approximate nearest neighbor search and dataindependent nature. The performance of date-dependent hash functions based on machine learning techniques is better than data-independent ones. Spectral hashing (SH) is a coding consistency hashing algorithm and requires small bits [10]. The assumption of uniformly distributed date does not hold in most cases resulting in that the performance of SH is deteriorated. He et al. extend SH by defining the hash function using kernels [11], Zhuang et al. extend SH from ordinary graph to hypergraph [12,13]. Sparse spectral hashing integrates sparse principal component analysis [14] and boosting similarity sensitive hashing into SH [15]. LSH q
This paper has been recommended for acceptance by Zicheng Liu. ⇑ Corresponding author. E-mail address:
[email protected] (K. Zhan).
http://dx.doi.org/10.1016/j.jvcir.2016.08.016 1047-3203/Ó 2016 Elsevier Inc. All rights reserved.
relies on random projections and SH assumes features with uniformly distributed, which are problematic limitations. To avoid these limitations, there are many methods are proposed by using kernel functions to improved their performance [16–18]. Besides using manifold information as SH, we further consider the discriminative information into hash learning. Manifold learning is a suitable strategy to learn the embedding matrix from the manifold. Most manifold learning algorithms directly utilize Gaussian function to compute Laplacian matrix, which suffers from polytrope of bandwidth parameter. The discriminant information is not sufficiently exploited in aforementioned methods, so we construct a local clique comprising the data point and its neighboring data points in a nonlinear manifold by using local discriminant models and global integration (LDMGI) [19]. LDMGI is exploited by both manifold structure and local discriminant information simultaneously [19–23]. In our unsupervised discriminative hashing (UDH) algorithm, we use the LDMGI Laplacian matrix to learn hash codewords by using both manifold information and discriminant information, and the out-of-sample problem is addressed by the projection matrix which is computed during the hashing learning process. In summary, the main contribution of this paper is twofold: 1. An unsupervised discriminative hashing algorithm is proposed. 2. We use an addition term as regularization to learn a model for out-of-sample data extrapolation. The rest of this paper is organized as follows: In Section 2, we deduce our novel approach and give the solution of our approach and the algorithm to solve the regression framework. The
848
K. Zhan et al. / J. Vis. Commun. Image R. 40 (2016) 847–851
experimental setting and analysis of results are showed in Section 3. The conclusion and discussion of future work are given in Section 4. 2. Unsupervised discriminative hashing
that
there
are
n
training
data
points
>
X ¼ ½x1 ; x2 ; . . . ; xn 2 Rdn . H ¼ ½h1 ; h2 ; . . . ; hn 2 Bnm denotes binary hash code of length m. A 2 Rnn is the affinity matrix defined by aij ¼ expðkxi xj k =r Þ and r defines the standard deviation. Spectral hashing (SH) seeks compact binary codes for a given data point where the similarity of data are preserved [10]. The objective function of SH is: 2
min
X
2
ð5Þ
Y ¼ X > W:
ð6Þ
By ynew ¼ x> new W, we can predict the output y new if a new test data point xnew is input. Y is defined by (6) as a linear regression model, so we given the following regression function to learn W,
min kX > W Yk2F þ bkWk2F :
ð7Þ
W
2
Aij khi hj k
We incorporate (7) as an additional term of (3), then we obtain,
ij
X s:t: hi 2 f1; 1g ; hi ¼ 0; m
1X > hi hi ¼ I n i
1 Hk ¼ I 11> k
where I 2 Rkk is an identity matrix and 1 is the column-vector consisting of k ones. In order to enable out-of-sample extension, we assume
2.1. Preliminaries Suppose
F i ¼ fi0 ; i1 ; . . . ; ik g denotes the index set of the samples in N k ðxi Þ. Hk is the centering matrix,
ð1Þ
i
min TrðY > LYÞ þ akX > W Yk2F þ bkWk2F
ð8Þ
Y;W
s:t: Y > Y ¼ I P
> i hi hi
where the constraint ¼ I requires the bits to be uncorrelated. By utilizing the spectral relaxation, (1) is rewritten by, 1 n
min TrðH> ðD AÞHÞ s:t: Hij 2 f1; 1g;
ð2Þ
H> 1 ¼ 0; H> H ¼ I
where the a and b are two regularization parameters. When a tends to zero, (8) becomes (3) which can learn the non-
linear Y. When a tends to infinity, kX > W Yk2F equal to zero due to the minimization. So X > W Y ¼ 0 and (8) becomes (7). After learning the embedding matrix Y, the hash code H can be obtained by:
H ¼ signðYÞ;
ð9Þ
where signðÞ is the sign function which makes the value binary.
where TrðÞ is trace operator, D is a diagonal matrix and its elements P are column sums of A; dii ¼ j aij . The codewords can be obtained by the m eigenvectors of D A with minimal eigenvalue.
2.3. Algorithm derivation
2.2. Objective function
LðW; Y; mÞ ¼ TrðY > LYÞ þ akX > W Yk2F þ bkWk2F þ mTrðI Y > YÞ
Many objective functions of manifold learning algorithms can be uniformly formulated by [20,24],
min TrðY > LYÞ
ð3Þ
Y
s:t: Y > Y ¼ I
where Y ¼ ½y1 ; y2 ; . . . ; yn > 2 Rnm denotes the low dimensional embedding of X; L ¼ A D is the graph Laplacian matrix [25]. The Laplacian matrix plays a very important role in manifold learning algorithms. Different from the affinity matrix in existing manifold learning algorithms is usually pre-computed among nearby data pairs by a fixed function, e.g., the RBF kernel, we construct the Laplacian matrix taking account to both the discriminant information and the manifold structure of data [19]. To globally integrate the local discriminant models from all the cliques, the Laplacian matrix is constructed by,
2 6 6 L¼ Si Li S>i ¼ ½S1 ; S2 ; . . . ; Sn 6 6 4 i¼1 n X
3
L1 ..
.
is
a 1
positive
semi-definite
@L ¼ 2aXX > W 2aXY 2bW ¼ 0: @W
ð11Þ
@L ¼ 2LY 2aX > W þ 2aY 2mY ¼ 0: @Y
ð12Þ
From (11), we obtain,
1 b W ¼ XX > I XY ¼ MY;
a
ð13Þ
where M is denoted by,
ð14Þ
From (12), we obtain,
ð4Þ Li
where m is the Lagrangian multiplier. The optimum Y and W can be obtained by calculating the first order derivative (8) with respect to Y and W, respectively. By setting the derivative to zero, we have,
a
Ln where
ð10Þ
1 b M ¼ XX > I X:
7 7 7½S1 ; S2 ; . . . ; Sn > 7 5
L2
The Lagrangian function of (8) is,
matrix
> Li ¼ Hk ðH> k X i X i Hk þ kÞ Hk ; X i ¼ ½x0 ; x1 ; . . . ; xk1 is made up of xi and its k 1 nearest neighbors is the local data matrix comprising all the data points in N k ðxi Þ. Si 2 Bnk is the selection matrix with its element ðSi Þpq ¼ 1, if p ¼ F i fqg; ðSi Þpq ¼ 0, otherwise.
LY aX > W þ aY ¼ mY:
ð15Þ
Substituting (13) into (15), we obtain
ðL aX > M þ aIÞY ¼ mY:
ð16Þ
The optimal solution Y of (8) is formed by the m eigenvectors of the term L aX > M þ aI corresponding to the m smallest eigenvalues.
849
K. Zhan et al. / J. Vis. Commun. Image R. 40 (2016) 847–851
The proposed unsupervised discriminative hashing (UDH) algorithm for solving (8) is summarized in Algorithm 1.
6. ITQ: iterative quantization [30]. 7. SGH: scalable graph hashing with feature transformation [31].
Algorithm 1. UDH algorithm
The criterion precision p is used to measure the performance of our approach. The computing methods of the criterion is:
Input: Training data X 2 Rdn , parameter a; b and the length of binary codewords m. Do: 1: Construct its local clique N k ðxi Þ; 8 i 2: Construct the Laplace matrix L by (4). 3: Compute Y according to (16) where Y is formed by the m eigenvectors of the term L aX > M þ aI corresponding to the m smallest eigenvalues. 4: Compute projection matrix W by (13). 5: Generate the hash codewords H by (9). Output: Compact binary codewords H 2 Rnm and projection matrix of hash function W 2 Rdm .
p¼
Number of relevant points inside the retrieved images Number of retrieved images ð17Þ
The parameters used in our algorithm are a ¼ 1000 and b ¼ 1000. 3.3. Performance of UDH on different datasets In Table 1, we compare the performance of UDH with seven state-of-the-art methods on four datasets while fixing the code length with 32-bit and the Hamming radius with 4. UDH outperforms all compared methods with the largest precision. These results demonstrate the effectiveness of our novel approach for image retrieval.
3. Experiment 3.4. Performance of UDH with different hamming radius In this section, we show the datasets, the compared methods and the analysis of results. We compare our UDH approach with seven state-of-the-art approaches on four real world datasets. We show the satisfactory performance on a series of contrast experiments. 3.1. Datasets C101: Caltech101 datasets [26]. Pictures of objects belonging to 101 categories. About 40–800 images per category. Most categories have about 50 images. The size of each image is roughly 300 200 pixels. Each Caltech101 image is represented by a 512-dimension GIST descriptors extracted from the original image. We select 911 test images and 8233 training images. F17: Flower 17 category datasets [27]. The flower datasets have been created by gathering image from various websites. The 17 category datasets consist of 17 different flower categories of flowers common to the UK. Each categories has 80 images and the total 1360 images are represented by a 512-dimension GIST descriptors. We random select 360 images for testing and the remaining 1000 images are used to training. Paris: Paris consists of 6412 images from Flickr [28]. But 20 images are damaged. We have total 6392 images and we random select 1000 images for testing and 5392 images for training. We also choose GIST features as local features for this image datasets. Scene: Out door scene dataset contains 2688 color images, which belong to 8 outdoor scene categories: coast, mountain, forest, open county, street, insidecity, tall buildings and highways [29]. The feature is 512-dimension GIST feature. We use 538 images for training and 2,150 images for testing. 3.2. Experimental setup We compare the performance of the proposed UDH with the state-of-the-art methods, 1. LSH: locality-sensitive hashing [8]. 2. KLSH: kernelized locality-sensitive hashing [17]. 3. SKLSH: locality-sensitive binary codes from shift-invariant kernels [18]. 4. SH: spectral hashing [10]. 5. RRH: a random orthogonal projection method proposed in [30].
The Hamming distance lower than Hamming radius means that the two images are similar. In Table 2, UDH always achieves largest precision while Hamming radius from 1 to 5. But we can see that the larger Hamming radius has lower precision. The reason is that the dissimilar images may have small Hamming distance. Hence, a larger Hamming radius means that more wrong images are recognized to similar images.
3.5. Performance of UDH with different parameters There are two parameters used in the UDH method. In Table 3, we set a from 100 to 105 while b from 100 to 105 . We can see that UDH is insensitive with parameters. The precision on most parameters are 0.766 under the 32-bit binary code and r ¼ 5. In our experiment, we choose the a ¼ 103 and b ¼ 103 . 3.6. Performance of UDH with different code length In Table 4, the longer code means the larger precision. Our UDH can achieve largest precision (1.000) with small code length (32bit) while SH, LSH, KLSH and RRH require 128-bit binary code, SGH requires 64-bit binary code and SKLSH and ITQ require longer code length. It demonstrates that UDH outperforms the compared methods.
Table 1 Precision of our method UDH on four datasets compared with seven start-of-the-art methods under the 32-bit code length with Hamming radius r (r = 4). Method
LSH KLSH SKLSH SH RRH ITQ SGH UDH
Dataset C101
Flower
Paris
Scene
0.711 0.593 0.080 0.481 0.620 0.582 0.530 0.928
0.236 0.950 0.074 0.163 0.150 0.110 0.311 1.000
0.259 0.269 0.115 0.211 0.238 0.222 0.335 0.500
0.580 0.333 0.236 0.567 0.605 0.597 0.763 0.880
Note: The best results are highlighted in bold.
850
K. Zhan et al. / J. Vis. Commun. Image R. 40 (2016) 847–851
Table 2 Precision of our method UDH on Flower dataset compared with seven start-of-the-art methods under the 32-bit code length with different Hamming radius r. Method
Radius
LSH KLSH SKLSH SH RRH ITQ SGH UDH
r=1
r=2
r=3
r=4
r=5
1.000 1.000 0.115 0.900 0.774 0.173 0.833 1.000
0.741 1.000 0.093 0.490 0.325 0.127 0.548 1.000
0.420 1.000 0.082 0.253 0.196 0.119 0.421 1.000
0.236 0.950 0.074 0.163 0.150 0.110 0.311 1.000
0.182 0.724 0.068 0.136 0.137 0.104 0.251 0.767
Note: The best results are highlighted in bold.
Table 3 Precision of different parameters used in our method UDH on Flower dataset under the 32-bit code length with Hamming radius r ¼ 5.
a
b 0
1
103
104
105
0.774
0.969
0.714
0.682
0.821
0.827
0.774
0.969
0.766
0.766
0.766
0.766
0.821
0.821
0.766
0.766
0.766
0.766
0.766
104
0.821
0.766
0.766
0.766
0.766
0.766
105
0.781
0.766
0.766
0.766
0.766
0.766
10
10
10
100
0.821
0.827
101
0.821
0.766
102
0.821
103
2
Table 4 Precision of our method UDH on Flower dataset compared with seven start-of-the-art methods with Hamming radius r ¼ 4 while the code length from 4-bit to 128-bit. Method
LSH KLSH SKLSH SH RRH ITQ SGH UDH
Length 4-bit
8-bit
16-bit
32-bit
64-bit
128-bit
0.059 0.058 0.058 0.058 0.060 0.060 0.059 0.059
0.071 0.060 0.059 0.064 0.075 0.073 0.078 0.077
0.098 0.084 0.065 0.086 0.117 0.093 0.149 0.141
0.236 0.950 0.074 0.163 0.150 0.110 0.311 1.000
0.610 1.000 0.107 0.952 0.786 0.165 1.000 1.000
1.000 1.000 0.108 1.000 1.000 0.372 1.000 1.000
3.7. Retrieved images for query We choose moulinrouge of Pairs dataset for query. The top retrieved images are shown in Fig. 1. UDH only has 1 false positive. SGH has 5 false positives. ITQ, RRH and SH have 6 false positives and LSH, KLSH and SKLSH have 12, 14 and 20 false positives. UDH has good performance on image retrieval. 4. Conclusion In this paper, we propose a novel hashing learning approach for image search. Our approach learns from manifold learning theory and can solve the out-of-sample problem. Our algorithm embeds the high-dimensional features into lower space and then generates the compact binary hash codewords. The novel Laplacian matrix is constructed from a local clique and the combination of linear and non-linear embedding matrix learning can fully reserve the similarity from the original data for searching the relevant images. The projection matrix which is learned from the subspace of manifold can predict quality hash codewords for new input data, which
Fig. 1. Sample top retrieved images for moulinrou of Pairs dataset in using 32-bit binary code. Red circle denotes false positive. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
K. Zhan et al. / J. Vis. Commun. Image R. 40 (2016) 847–851
solves the out-of-sample problem. The performance is compared with the state-of-the-art hashing approach and obtains a better retrieval accuracy.
Acknowledgments This work has been supported by the National Science Foundation of China under the Grant Nos. 61201422 and 61300230, the Specialized Research Fund for the Doctoral Program of Higher Education under the Grant No. 20120211120013, and the Fundamental Research Funds for the Central Universities under the Grant No. lzujbky-2016-239 and lzujbky-2016-139.
References [1] A. Gionis, P. Indyk, R. Motwani, et al., Similarity search in high dimensions via hashing, Proc. of VLDB, International Conference on Very Large Data Bases, vol. 99, 1999, pp. 518–529. [2] J. Wang, W. Liu, S. Kumar, S.-F. Chang, Learning to hash for indexing big data—a survey, Proc. IEEE 104 (1) (2016) 34–57. [3] A. Globerson, S.T. Roweis, Metric learning by collapsing classes, Proc. of NIPS, vol. 18, 2006, pp. 451–458. [4] R. Salakhutdinov, G.E. Hinton, Learning a nonlinear embedding by preserving class neighbourhood structure, in: Proc. of AI and Statistics, 2007, pp. 412–419. [5] J. Song, L. Gao, Y. Yan, D. Zhang, N. Sebe, Supervised hashing with pseudo labels for scalable multimedia retrieval, Proc. of ACM MM, International Conference on Multimedia, vol. 23, ACM, 2015, pp. 827–830. [6] R. Salakhutdinov, G. Hinton, Semantic hashing, Int. J. Approx. Reason. 50 (7) (2009) 969–978. [7] M.S. Charikar, Similarity estimation techniques from rounding algorithms, Proc. of ACM Symposium on Theory of Computing, vol. 34, ACM, 2002, pp. 380–388. [8] M. Datar, N. Immorlica, P. Indyk, V.S. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, Proc. of Symposium on Computational Geometry, vol. 29, ACM, 2004, pp. 253–262. [9] Q. Lv, W. Josephson, Z. Wang, M. Charikar, K. Li, Multi-probe LSH: efficient indexing for high-dimensional similarity search, Proc. of VLDB, International Conference on Very Large Data Bases, vol. 33, 2007, pp. 950–961. [10] Y. Weiss, A. Torralba, R. Fergus, Spectral hashing, in: Proc. of NIPS, 2009, pp. 1753–1760. [11] J. He, W. Liu, S.-F. Chang, Scalable similarity search with optimized kernel hashing, Proc. of ACM SIGKDD, International Conference on Knowledge Discovery and Data mining, vol. 16, ACM, 2010, pp. 1129–1138.
851
[12] Y. Zhuang, Y. Liu, F. Wu, Y. Zhang, J. Shao, Hypergraph spectral hashing for similarity search of social image, Proc. of ACM MM, International conference on Multimedia, vol. 19, ACM, 2011, pp. 1457–1460. [13] Y. Liu, J. Shao, J. Xiao, F. Wu, Y. Zhuang, Hypergraph spectral hashing for image retrieval with heterogeneous social contexts, Neurocomputing 119 (2013) 49– 58. [14] H. Zou, T. Hastie, R. Tibshirani, Sparse principal component analysis, J. Comput. Graph. Stat. 15 (2) (2006) 265–286. [15] J. Shao, F. Wu, C. Ouyang, X. Zhang, Sparse spectral hashing, Pattern Recognit. Lett. 33 (3) (2012) 271–277. [16] M. Raginsky, S. Lazebnik, Locality-sensitive binary codes from shift-invariant kernels, Proc. of NIPS, vol. 22, Curran Associates, Inc., 2009, pp. 1509–1517. [17] B. Kulis, K. Grauman, Kernelized locality-sensitive hashing for scalable image search, Proc. of ICCV, vol. 12, IEEE, 2009, pp. 2130–2137. [18] M. Raginsky, S. Lazebnik, Locality-sensitive binary codes from shift-invariant kernels, in: Proc. of NIPS, 2009, pp. 1509–1517. [19] Y. Yang, D. Xu, F. Nie, S. Yan, Y. Zhuang, Image clustering using local discriminant models and global integration, IEEE Trans. Image Process. 19 (10) (2010) 2761–2773. [20] Y. Yang, F. Nie, S. Xiang, Y. Zhuang, W. Wang, Local and global regressive mapping for manifold learning with out-of-sample extrapolation, Proc. of AAAI, vol. 24, 2010. [21] Y. Yang, D. Xu, F. Nie, J. Luo, Y. Zhuang, Ranking with local regression and global alignment for cross media retrieval, Proc. of ACM MM, International Conference on Multimedia, vol. 17, ACM, 2009, pp. 175–184. [22] Y. Yang, F. Nie, D. Xu, J. Luo, Y. Zhuang, Y. Pan, A multimedia retrieval framework based on semi-supervised ranking and relevance feedback, IEEE Trans. Pattern Anal. Mach. Intell. 34 (4) (2012) 723–742. [23] X. Du, Y. Yan, P. Pan, G. Long, L. Zhao, Multiple graph unsupervised feature selection, Signal Process. 120 (2016) 754–760. [24] W. Wang, Y. Yan, S. Winkler, N. Sebe, Category specific dictionary learning for attribute specific feature selection, IEEE Trans. Image Process. 25 (3) (2016) 1465–1478. [25] F.R. Chung, Spectral Graph Theory, American Mathematical Soc., 1997. [26] L. Fei-Fei, R. Fergus, P. Perona, Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories, Comput. Vision Image Understan. 106 (1) (2007) 59–70. [27] M.-E. Nilsback, A. Zisserman, A visual vocabulary for flower classification, Proc. of CVPR, vol. 2, 2006, pp. 1447–1454. [28] J. Philbin, O. Chum, M. Isard, J. Sivic, A. Zisserman, Lost in quantization: improving particular object retrieval in large scale image databases, in: Proc. of CVPR, 2008, pp. 1–8. [29] A. Monadjemi, B. Thomas, M. Mirmehdi, Experiments on High Resolution Images Towards Outdoor Scene Classification, Tech. Rep., University of Bristol, Department of Computer Science, 2002. [30] Y. Gong, S. Lazebnik, Iterative quantization: a procrustean approach to learning binary codes, in: Proc. of CVPR, IEEE, 2011, pp. 817–824. [31] Q.-Y. Jiang, W.-J. Li, Scalable graph hashing with feature transformation, in: Proc. of IJCAI, 2015.