Learning binary codes with local and inner data structure

Learning binary codes with local and inner data structure

ARTICLE IN PRESS JID: NEUCOM [m5G;December 14, 2017;20:43] Neurocomputing 0 0 0 (2017) 1–10 Contents lists available at ScienceDirect Neurocomput...

1MB Sizes 0 Downloads 41 Views

ARTICLE IN PRESS

JID: NEUCOM

[m5G;December 14, 2017;20:43]

Neurocomputing 0 0 0 (2017) 1–10

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Learning binary codes with local and inner data structure Shiyuan He a,1,∗, Guo Ye a,1,∗, Mengqiu Hu a, Yang Yang a,∗, Fumin Shen a, Heng Tao Shen a, Xuelong Li b a

Center for Future Media and School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, PR China Center for OPTical IMagery Analysis and Learning (OPTIMAL), State Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an, Shaanxi 710119, PR China

b

a r t i c l e

i n f o

Article history: Received 23 August 2017 Revised 23 November 2017 Accepted 2 December 2017 Available online xxx Communicated by Xiaofeng Zhu Keywords: Supervised hashing Anchor graph Nearest neighbor search

a b s t r a c t Recent years have witnessed the promising capacity of hashing techniques in tackling nearest neighbor search because of the high efficiency in storage and retrieval. Data-independent approaches (e.g., Locality Sensitive Hashing) normally construct hash functions using random projections, which neglect intrinsic data properties. To compensate this drawback, learning-based approaches propose to explore local data structure and/or supervised information for boosting hashing performance. However, due to the construction of Laplacian matrix, existing methods usually suffer from the unaffordable training cost. In this paper, we propose a novel supervised hashing scheme, which has the merits of (1) exploring the inherent neighborhoods of samples; (2) significantly saving training cost confronted with massive training data by employing approximate anchor graph; as well as (3) preserving semantic similarity by leveraging pair-wise supervised knowledge. Besides, we integrate discrete constraint to significantly eliminate accumulated errors in learning reliable hash codes and hash functions. We devise an alternative algorithm to efficiently solve the optimization problem. Extensive experiments on various image datasets demonstrate that our proposed method is superior to the state-of-the-arts. © 2017 Published by Elsevier B.V.

1. Introduction During the last decades, visual search for large-scale dataset has drawn much attention in many content-based multimedia applications [1–3]. Among in large-scale vision problems, binary coding has attracted growing attention in image retrieval [4–6], image classification [7,8] and etc. Especially in recent years, the demand for retrieving relevant content among massive images is stronger than ever under such a big-data era. For most occasions, users input a keyword and intend to obtain the semantic similar images precisely and efficiently. For servers, it is almost impossible to linearly search the objects especially when they confront vast amount of images (e.g., the photo sharing website Flickr possesses more than five billion images and people still upload the photos at the rate of over 30 0 0 images per minute) [9]. In contrast to the cost of the storage, the search task consumes more computational resources due to the massive search request. Therefore, re-



Corresponding authors. E-mail addresses: [email protected] (S. He), [email protected] (G. Ye), [email protected] (M. Hu), [email protected] (Y. Yang), [email protected] (F. Shen), [email protected] (H.T. Shen), [email protected] (X. Li). 1 These authors contributed equally to this work.

cent years people devoted a lot of efforts to handle this problem. Among these proposed methods, hash shows great superiority than others. In simple terms, hashing methods map high-dimensional images into short binary codes and preserve the similarity of the original data at the same time. In this way, searching similar images converts to finding neighbor hashing codes in Hamming space which is simple and practical. Consequently, the technique leads to significant efficiency in multimedia, computer vision, information retrieval, machine learning and pattern matching [10–19]. For many applications, approximate nearest neighbor (ANN) search is sufficient enough [20–24]. Given a query instance, the algorithm aims to find the similar instance instead of returning the exact nearest neighbor. Therefore, efficient data structure is required to store data for fast search. Under such background, treebased indexing approaches are proposed for approximate nearest neighbor (ANN) search typically with the sub-linear complexity of O(log(N)). Among them, KD tree [25–27], R tree [28] and metric tree [29] are most representative. However, as the image technology develops, the descriptors of one image usually reach to hundreds dimensions and with the dimension increasing, the treebased methods’ performance dramatically degrades and it need more space to store data which costs a lot. In consideration of the inefficiency of tree-based indexing methods, hashing approaches have been proposed to map entire dataset into discrete codes and

https://doi.org/10.1016/j.neucom.2017.12.005 0925-2312/© 2017 Published by Elsevier B.V.

Please cite this article as: S. He et al., Learning binary codes with local and inner data structure, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.12.005

JID: NEUCOM 2

ARTICLE IN PRESS

[m5G;December 14, 2017;20:43]

S. He et al. / Neurocomputing 000 (2017) 1–10

preserve the similarity of the data at the same time. The similarity between data points can be measured by Hamming distance which costs little time to calculate for computers. The existing hash methods can be roughly divided into two groups [30]: data-independent and data-dependent. One of the most classic data-independent methods is Locality-Sensitive Hashing (LSH) [31] that has been widely used to handle massive data due to its simplicity. LSH uses a hash function that randomly projects or permutates nearby data points into same binary codes. However, LSH needs long binary codes to achieve promising retrieval performance which increases the storage space and computation costs. Moreover, LSH ignores the underlying distributions and manifold structure of the data on account of randomprojection. Realizing this deficiency, Weiss et al. proposed Spectral Hashing (SH) [32] utilizing the subset of thresholded eigenvectors of the graph Laplacian by relaxing the original problem which improves the retrieval accuracy to some extent. Yet it charges more time to build a neighborhood graph. Liu et al. delivered some improvements to SH and proposed the Anchor Graph Hashing (AGH) [33] that uses anchor graphs to obtain low-rank adjacency matrices. Formulation of AGH costs constant time by extrapolating graph Laplacian eigenvectors to eigenfunctions. Note that SH and AGH is data-dependent cause it exploits the feature information of the data and preserve the metric structure. This kind of methods are called unsupervised methods like principal component analysis based hashing (PCAH) [34], iterative quantization (ITQ) [35], isotropic hashing (Iso-Hash) [36] and an affinity-preserving kmeans hashing (KMH) [37]. However, hashing methods mentioned above can not achieve high retrieval performance with a simple approximate affinity matrix [38]. Due to the semantic gap where the visual similarity often differs from semantic similarity, returning the nearest neighbors in metric space can not guarantee search quality [34,39]. To solve this problem, images that are artificially labeled as similar or dissimilar are used by supervised hashing methods [5,40–47]. The following are most popular ones among them. Kernel-Based Supervised Hash (KSH) [48] maps data into Hamming space where similar items have minimal Hamming distance and simultaneously dissimilar items have maximum distance. Binary reconstructive embeddings (BRE) [49] proposed the hash function which can minimize the reconstruction error between the metric space and Hamming space. Canonical Correlation Analysis with Iterative Quantization (CCA-ITQ) [35] and Supervised Discrete Hashing (SDH) [50] are proposed to satisfy the semantic similarity. By leveraging pairwise labeled information, the performance of supervised methods has been remarkably improved. Moreover, some hashing learning approaches based deep neural networks have been recently proposed to perform simultaneous. No matter the method is supervised or unsupervised, the object function with discrete constraints involves a mixed binary-integer problem [50] which is NP-hard. To tackle this problem, most hash methods relax the discrete constraints. They first calculate a continuous solution then threshold it to obtain binary codes without realizing the importance of discrete optimization. This technique will leads to the significant information loss during the learning process [51]. It has been shown that the quality of the codes degrades quickly especially when the code length increases if we ignore the discrete constraints. Some methods try to improve the quality by replacing the sign function with more smoothing sigmoid function [52]. However, it does not solve the problem explicitly. Recently, only few methods directly generate hash codes in discrete space. In addition, deep neural network has been applied in recent retrieval progress [53–55]. Deep-learning-based hashing obtains high efficiency by learning the image representation and hash codes tightly coupled [56,57]. Nonetheless, due to the result-

ing computational expense and storage cost are huge, we do not make much comparison. In this paper, we aim to design a supervised hash method which can efficiently generate high-quality compact codes. We utilize the anchor graph which is built based on the pairwise similarity to exploit the inner structure of the original data, in the process of learning hash function, we also take the supervision information to preserve the pairwise similarity to improve the accuracy of retrieval. To avoid accumulated errors caused by continuous relaxation, we choose to directly optimize the binary codes. With the discrete constraints added to objective function, we propose a novel hashing framework, termed Local and Inner Data Structure Supervised Hashing (LISH) which is able to efficiently generate codes and satisfy the semantic similarity at the same time. We enhance our work by conduct more experiments to assure the results much more detailed and accurate [58]. We also simplify some of our mathematical derivation. Our main contributions are summarized as follows: • Our method uses graph Laplacian to captures the local neighborhoods to enhance hashing codes’ quality. And semantic gap can be properly solved by utilizing labeled information. By this way, both metric and semantic similarity are preserved by our method which contribute a lot to improve the performance significantly. • Most existing hash method solve the problem with the relaxation of the discrete constraints, since we directly optimize our method and each bit can be sequentially learned by the algorithm, our method outperforms in an alternative and efficient manner. • We evaluate our method on three popular large-scale image datasets and obtain superior accuracy than state-of-the-arts. The remainder of this paper is organized as follows. A brief review of related work is in Section 2. We present the detailed formulation of proposed LISH method in Section 3. Experimental results are given in Section 4. We conclude our work in Section 5. 2. Related work Suppose we have n samples xi ∈ Rd , i = 1, . . . , n, deposited in matrix X = [x1 , x2 , . . . , xn ]T ∈ Rn×d , where d is the dimensionality of the feature space. In the this section, we briefly review a few representative methods including Spectral Hashing (SH), Kernel Supervised Hashing (KSH). 2.1. Spectral hashing Spectral Hashing is one of the most popular data-dependent hash methods. It generates bits by spectral graph partition [59]. Specifically, it thresholds the eigenvectors of Laplacian graph. it maps similar data points from input space into Hamming space and preserve their similarity at the same time. The object function of Spectral Hashing is formulated as follows:

min



ai j b i − b j 2

ij

sub ject to :

ij

∈ −1, 1k ,

 i

bi = 0,

1 T bi bi = I, n

(1)

where ai j = exp(−xi − x j 2 / 2 ) represents the similarity between xi and xj , bi is the corresponding binary codes of xi , the radius parameter  defines the distance which corresponds to similar items. It is obvious that the more similar xi and xj are, the value of the Aij becomes larger, which leads to the smaller Hamming dis tance bi − b j 2 . The constraint i bi = 0 ensures the hash bit to be balanced in order to maximize the information. The constraint

Please cite this article as: S. He et al., Learning binary codes with local and inner data structure, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.12.005

ARTICLE IN PRESS

JID: NEUCOM

[m5G;December 14, 2017;20:43]

S. He et al. / Neurocomputing 000 (2017) 1–10

3

Fig. 1. Block diagram of our hash methods.

1 n



T

bi bi = I enforces hash codes to be orthogonal such that the redundancy error can be minimized. The above function can be rewritten as following matrix form:

min T r (BT (D − A )B ) sub ject to : B ∈ {−1, 1}, BT 1 = 0, BT B = I,

(2)

where D is the diagonal n × n matrix whose ith item equals to the sum of the items of A’s ith row. We can see that its solution is non-trivial because it’s a balanced graph partition problem which is NP hard. But by relaxing the discrete constraint B(i, j ) ∈ {−1, 1}, we can use the spectral graph analysis to obtain the solutions of the problem which are the k eigenvectors of D − A with minimal eigenvalue. However, Spectral Hashing requires the input data to be uniform distribution which is strict for many real-world occasions [9] and the radius parameter  is sensitive so it is often difficult to determinate the appropriate value. Moreover, it suffers from the low-quality hash codes due to constraint BT B = I which requires the generated binary codes to be uncorrelated.

Kernel-Based Supervised Hash (KSH) is proposed by Liu in 2012. It leverages kernel methods to project X into Kernel space. KSH is proposed to find hash codes whose Hamming distance is minimized on similar pairs and maximized on dissimilar pairs [60]. The similar pairs (defined by metric distance or labels) are collected in set M and dissimilar pairs are collected in set C. With m data points selected from X, it use the kernel trick to map input data into kernel space in which they can be divided. The hash function is defined as:

hk (x ) = sgn

m 

⎧ ⎨1(xi , x j ) ∈ M si j = −1(xi , x j ) ∈ C ⎩ 0

(4)

otherwise.

As the above mentioned, coder (xi ) ◦ coder (x j ) ∈ [−r, r] and si j ∈ [−1, 1], KSH learns the hash code of the labeled input data X by the following object function: 2

HT H − rSF

min

H∈{−1,1}

(5)

sub ject to : H1n = 0, ¯ ) = [coder (x1 ), . . . , coder (xn )] ∈ Rr×n denotes noted that H = sgn(PK the code matrix of input data produced by hashing functions, we can rewrite the object function in (4) as: 2

¯ )T sgn(GK ¯ ) − rSF , min sgn(PK

(6)

P

¯ is the kernel matrix and  ·  represents the Frobenius where K norm. 3. Local and inner data structure supervised hashing

2.2. Kernel-based supervised hash



In order to derive the optimization model, we introduce the label matrix S ∈ Rn×n which is defined as:

 κ (x j , x )g jk − bk = sgn(gk κ¯ (x )),

(3)

In this section, we mainly introduce the algorithm of our method in detail. We propose an alternative optimization model and sequentially learn each bit. The hash functions are learned during the optimization process simultaneously. We defined X above and denote the label matrix Y = [y1 , y2 , . . . , yn ]T ∈ {0, 1}n×c where c is the number of classes of labels. When xi belongs to class j, yij equals 1, otherwise equals 0. By means of finding the mapping relation between the original feature space and Hamming space, hashing generates binary codes to represent the features (Fig. 1). 3.1. Anchor graphs

j=1

where g = [g1 , . . . , gm ] is denoted as a set which contains   m kernels. bk = 1n ni=1 m is the median of j=1 κ (x j , xi )g jk m { j=1 κ (x j , xi )g j }ni=1 due to the balancing criterion. Analogously, let us define the r-bit hash code of each sample x is coder (x ) = [h1 (x ), . . . , hr (x )] ∈ {1, −1}1×r . KSH learns the projection matrix P ∈ Rr×m to ensure that hk (xi ) = hk (x j ) when xi and xj are similar ((xi , x j ) ∈ M) and hk (xi ) = hk (xj ) when xi and xj are dissimilar ((xi , x j ) ∈ C). Through the above conditions, it is obvious to find that coder (xi ) ◦ coder (x j ) = r if (xi , x j ) ∈ M; otherwise coder (xi ) ◦ coder (x j ) = −r. where ◦ represents the inner product.

Since conventional graph construction is inefficient in large scale, we use anchor graphs [33] to build the graph affinity matrix A. As in [33], by defining a subset U = {u j ∈ Rd }m where uj reprej=1 sents an anchor to approximate the neighborhood structure of the training dataset X. The regression matrix Z ∈ Rn×m that measures the underlying relationship between X and U can be calculated as follows:

Zi j =

⎧ ⎨ ⎩





0,

exp − D 2 (xi , u j )/t j ∈i



exp(−D 2 (xi , u j )/t )

,

∀ j ∈ i ,

(7)

otherwise.

Please cite this article as: S. He et al., Learning binary codes with local and inner data structure, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.12.005

ARTICLE IN PRESS

JID: NEUCOM 4

[m5G;December 14, 2017;20:43]

S. He et al. / Neurocomputing 000 (2017) 1–10

Table 1 Notations.

The optimization model can be written as:

min Y − BW2 + λW2 .

Notation

Description

n, d, m r, c X φ (X) Y U B xi , bi or bj ai A [33] V W P bl , wl ω, λ, ν , η, ρ

the number of image corpus, feature dimensions and anchors the length of hash codes and the number of label categories the n × d matrix for the feature of the dataset the RBF kernel mapping matrix for the X the n × c matrix for the label of the dataset a subset for the dataset the n × r matrix for the hash codes of the dataset a data point and its binary hash code of the dataset a anchor for the dataset the n × n graph affinity matrix a approximate matrix with continuous values to the matrix B the r × c classification matrix for the dataset the m × r projection matrix for the φ (X) a column of matrix B and W the regularization parameters

The distance function D (· ) used here is 2 distance, t is the bandwidth parameter and i = 1 , . . . , n. The anchor graph gives the diagonal matrix  = diag ZT 1 ∈ Rm×m . Since approximate affinity matrix A = Z−1 ZT is positive semidefinite (PSD) and has unit row and column sums, so the resulting graph Laplacian is L = In − A. Furthermore, A can be calculated with high efficiency. Each data point map to an r-bit binary code bi . The code matrix B = [b1 , b2 , . . . , bn ]T ∈ Rn×r . In order to assure that the similar inputs have the minimal Hamming distance of the hashing codes, the following objective formulation is proposed: (Table 1)

min B

1 2



bi − b j 2 Ai j = tr BT LB ,

n 

s.t.

n×r

B ∈ {±1}

(8)

,

1T B = 0, BT B = nIr .

Constraint 1T B = 0 ensures each bit to be balanced, and BT B = nIr enforces hashing codes to be uncorrelated to minimize the redundancy among these bits. By using the anchor graph Laplacian obtained above, Eq. (8) can be simplified as:





max tr BT AB , B



s.t.

n×r

B ∈ {±1}

(9)

,





max t r BT AB − B

ρ 2

n×r

s.t. B ∈ {±1}

dist

2



B, ,

,

(10)









where ρ is a tuning parameter. Since t r BT B = t r VT V = nr, problem (10) equals to the following problem:









max t r BT AB + ρ t r BT V , s.t.

3.3. Proposed model For most data-independent hashing methods like LSH, they use linear random projections which lack good discrimination over data. We nonlinearly embed the input data by leveraging the RBF kernel mapping [49] [48]. We choose the nonlinear embedding algorithm since it has been widely used in various popular hash methods [48,50] which is formulated as:

F ( x ) = φ ( x )P,











(13)

where φ (x ) = exp x − a1  /σ , . . . , exp x − am  /σ , {a j }m j=1 are the randomly chosen anchors. To learn a discrete matrix, combining the relaxed empirical fitness term from problem (11) and the relaxed regularization term, we propose a novel optimization model as:



2

2



min tr BT LB + ωY − BW2 + λW2

B,W,U,F





+ νB − F (X )2 + ηP2 − ρ tr BT V ,



n×r

B ∈ {±1}

(14)

, V ∈ Rn×r ,

1T V = 0, VT V = nIr .

The first term approximates the underlying structure of the data and the second term is the loss function measuring the approximation error between the prediction results and labels. There are several loss function like logistic loss, least square loss and hinge loss function. The least loss function is popular in quantization and classification problems and [61] also prove that its better performance. And the norms here refer to the matrix norms induced by vector norms. Therefore, here we choose the least square loss function to evaluate the variance. By minimizing the object function (14), we can get the discriminative hash matrix B, each row of it represents the corresponding binary code.

1T B = 0, BT B = nIr .

However, problem (9) is still NP-hard. As in [33], by defining a set = {V ∈ Rn×r | 1T V = 0, VT V = nIr }, problem (9) can be rewritten by softening constraints as:

B,Y

Here Matrix W is the classification matrix which is jointly learned with the binary codes. The above optimization model has been shown effective recently [50].

s.t.

i, j=1



(12)

B,W



n×r

B ∈ {±1}

, V ∈ Rn×r ,

(11)

1T V = 0, VT V = nIr .

We construct the anchor graph to preserve the intrinsic geometric structure of the input data due to its simplicity and efficiency. 3.2. Label information In order to lead into label information, we adopt the 2 in the supervised setting, which is assumed to be easiest for computing.

3.4. Alternating manipulation Our hashing problem is a nonlinear mixed-integer program involving a discrete variable B, a continuous variable V and two regular variables W and P. In this case, problem (14) is a NP-hard problem and also difficult to find a approximate solution. As we can see, only when ω, λ, ν, η, ρ = 0 and r = 1, problem (14) is a Max-Cut problem [38], there exists no polynomial-time algorithm or approximate solution which can achieve the global optimum unless P = NP [62]. For this purpose, an effective solution to the problem (14) is using alternating manipulation algorithm. In this way, our hashing problem can be decomposed into four subproblems: B-subproblem, V-subproblem, W-subproblem and P-subproblem. 3.4.1. B-subproblem Notice that term λW2 and ηP2 are constant for Bsubproblem, we simply cast out these two terms. Acknowledging 2 the linear algebra theorem that  ·  = tr (· )T (· ) , the rest regularization terms can be changed into tr( · ) form for succeeding manipulation. And we have the anchor graph Laplacian L = In − A. By these three steps, the optimization model can be simplified as:

Please cite this article as: S. He et al., Learning binary codes with local and inner data structure, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.12.005

ARTICLE IN PRESS

JID: NEUCOM

[m5G;December 14, 2017;20:43]

S. He et al. / Neurocomputing 000 (2017) 1–10

5

Fig. 2. Convergence curves of Eq. (19) in 16 and 64-bit.









max t r BT AB − ωt r Y − BW YT − WT BT B

− ν tr





B − F ( X ) BT − F T ( X )



n×r

B ∈ {±1}

s.t.





Algorithm 1: B-subproblem of LISH.



+ ρ tr BT V ,

(15)

, V ∈ Rn×r , 1

1 V = 0, V V = nIr . T

T

2

In order to directly optimize problem (15), we can further simplify it into





(16)

B∈{±1}

where C1 = 2ωWYT + 2F T (X ) + ρ VT is the constant term that can be obtained easily. However, it is NP hard to achieve B. To solve this problem, we use DCC algorithm [50] to learn B bit by bit. In the iteration, we draw out the l th, l = 1, . . . , r column of B as bl and the B is the matrix B which excludes bl . In the same way, wl and cl respectively are the l th column of W and C, W and C are the the matrix W and C which exclude wl and cl . Then we reformulate the problem (16) as:



max tr bl bTl A − 2ωbl wl T Wl Bl + bl cl T , T

T

bl ∈{±1}

max

n×1

bl ∈{±1}







T

where C2 = clT − 2ωwTl W l B



 T l

(18)

k

l

l

is constant.

l

converge [38]. The next discrete point bkl +1 is

bkl +1 ∈ arg max

n×1

bl ∈{±1}

where







fˆk bl = f bkl +

13

;

bl

= sgn 2Abl + C2 ;

k := k + 1; until bl converges; l := l + 1; until l = r; return B

max tr (BT V ),

l

l



(19)



fˆk bl = f bkl −

2Abkl + C2 bl + 2Abl + C2 bl , we can clearly see that the first and second term are constant. To maximize the problem (19), the following function is proposed:



bkl +1 = sgn 2Abkl + C2T .

(21)

s.t. 1T V = 0, VT V = nIr .



 ∇ f bkl , bl − bkl ,



∇ f bkl = 2Abkl + C2T , thus we have

k

T k T



11 12



∇ f bl = 2Abkl + C2T ;

k+1 k T

8 9

T l

V∈Rn×r

In the kth iteration, a proxy function fˆk (bl ) is defined to linearize f (bl ) at the point bkl . This majorization method was first introduced by [63]. Since A is positive semidefinite, f is a convex function then we have fˆk (bl ) ≤ f bl , the fact f bkl +1 ≥







fˆ bk+1 ≥ fˆ bk ≡ f bk guarantees that f bk and bk could k

k

7

10

T

3.4.2. V-subproblem When B, P and W are fixed in the problem (14), the objective function of the V-subproblem is

f bl = tr bTl Abl + C2 bl ,





C2 = clT − 2ωwTl W l B

6

(17)

n×1

which is equivalent to

4 5

maxn×r tr BBT A − ωBWWT BT + BC1 ,



3

Input : B0 ∈ {−1, 1}n×r , V ∈ Rn×r , P ∈ Rm×r , W ∈ Rr×c and Y ∈ Rn×c ; Output: B = [b1 , b2 , . . . bl , . . . br ]; l := 1; repeat k := 1; repeat C1 = 2ωWYT + 2F T (X ) + ρ VT ;

(20)

Fig. 2 shows that our algorithm converges rapidly to reach the optimal solution. The procedure to solve the B-subproblem is described in Algorithm 1.

We solve the problem with the aid of singular value decomposition (SVD). B∗ denotes a zero-mean matrix with row-wise, where B∗ = B(In − 1n 11T ). We write the B∗ as B∗ = JKT by using the SVD, 



where J ∈ Rn×r and K ∈ Rr×r are left and right matrices of singular vectors correspondingly. Then we apply eigendecomposition  −T −  2 0 for the small r × r matrix B∗ T B∗ = [K K] b [K K] after intro0 0 −

 ducing a matrix K ∈ Rr×(r−r ) by employing the Gram–Schmidt or−





thogonalization to the zero eigenvalues, where K T K = Ir−r , KT K = − −

T



0. Under the restriction of J T J = Ir−r , [J 1] J = 0, we naturally −



add a matrix J ∈ Rn×(r−r ) to satisfy the constraint 1T V = 0. Then we have J = B∗ K−1 . By the adoption of relevant theorem in DGH [38] we can simply have

V∗ =

− − √ n [J J ][K K]T ,

(22)

which is a high degree approximate solution to the V-subproblem.

Please cite this article as: S. He et al., Learning binary codes with local and inner data structure, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.12.005

ARTICLE IN PRESS

JID: NEUCOM 6

[m5G;December 14, 2017;20:43]

S. He et al. / Neurocomputing 000 (2017) 1–10

Fig. 3. Convergence curves of Eq. (14) in 16 and 64-bit.

3.4.3. P-subproblem When B, V and W are fixed in the problem (14). The objective function of the P-subproblem is

min P

νB − F (X ) + ηP . 2

2

(23)

Taking the advantage of Eq. (13), problem (23) turns into

min P

η B − PT φ (x )2 + P2 . ν

(24)

With the help of the least-squares, the Eq. (24) can be clearly solved by

P=

 T η −1 T φ X φ X + Ir φ X B. ν

(25)

3.4.4. W-subproblem When B, V and P are fixed in the problem (14), the objective function of the W-subproblem is

min P

Y − BW2 +

λ W2 . ω

(26)

Similar to the Eq. (24), we can get the solution of Eq. (26) by the regularized least squares problem. Then we have the closedform solution:



W = BT B +

λ −1 T I B Y. ω r

(27)

Fig. 3 shows the objective function values and the number of iterations where we can see the rapid convergence of our algorithm. We implement our idea by Algorithm 2. Algorithm 2: Local and Inner Data Structure Supervised Hashing (LISH).

1 2 3 4 5 6 7 8

Input : X ∈ Rn×d : training data, r: hash code length, m: number of anchor points, ω, λ, ν , η and ρ : initial parameters; Output: B ∈ {−1, 1}n×r : binary codes, P ∈ Rm×r : projection matrix; Initialization: randomly initialize B, P, W and V; repeat B-subproblem: Update B according to Algorithm 1; P-subproblem: Update P according to Eq. (25); W-subproblem: Update W according to Eq. (27); V-subproblem: Update V according to Eq. (22); until there is no change to B, P, W, V; return B, P, W and V

4. Experiments We conduct extensive experiments to evaluate our proposed method on three publicly available large-scale image datasets: CIFAR-10, SUN397 and YouTube Faces. CIFAR-10 dataset is a labeled subset of 80-million tiny images collection [64], which contains 60K 32 × 32 color images of ten categories and each of them has 60 0 0 samples. The entire dataset is partitioned into two parts: a training subset of 59,0 0 0 images and a test subset of 10 0 0 images. Each image is represented by a 512-dimensional GIST feature vector [65]. YouTube Faces datasets contains 1595 different people, we choose 38 people and each one has 20 0 0 images to form a subset of totally 100K face images and each image is represented by a 1770-dimensional LBP feature vector in this dataset. SUN397 consists of 108K images from 397 scene categories, each of which is represented as a 1600-dimensional feature vector. The test dataset includes 42 categories with each containing more than 500 images. The ground truths of three datasets are defined by whether two samples share common class labels. The proposed method is compared against several start-of-theart supervised hashing methods including Kernel Supervised Hashing (KSH) [48], Supervised Discrete Hashing (SDH) [50], Binary Reconstructive Embedding (BRE) [49] and Latent Factor Hashing (LFH) [66] and an unsupervised method Discrete Graph Hashing (DGH) [38]. We use the public codes and suggested parameters of these methods from authors. In consideration of the fair comparison, we choose 10 0 0 and 20 0 0 samples for learning. For KSH, DGH and SDH, we randomly select 300 anchors when using 10 0 0 samples and 500 anchors when using 20 0 0 samples. The number of bits is varied from 8 to 32 for all three datasets. In the experimental part, we evaluate the above hashing methods in term of two standard criterions: mean average precision (MAP) and precision-recall (PR) curve. The performance curves are shown from Figs. 4–6. The comparison results of MAP and the training time are presented from Tables 2 to 4. 4.1. CIFAR-10 The comparative results in MAP is shown in Table 2 and the precision-recall (PR) curves are given in Fig. 4. We also randomly select 300 anchors when we use 1000 train data and 500 anchors for 20 0 0 train data.We generate 8, 16 and 32-bit binary codes by LFH, SDH, BRE, KSH, DGH and our proposed LISH. As shown in Table 2, it is not surprising to see that the performance become higher as the training samples increase and the precision for KSH, SDH and LISH increase significantly when we use long codes while BRE and LFH are not so outstanding in comparison. It can be seen from Fig. 4, precision rate decreases when

Please cite this article as: S. He et al., Learning binary codes with local and inner data structure, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.12.005

ARTICLE IN PRESS

JID: NEUCOM

[m5G;December 14, 2017;20:43]

S. He et al. / Neurocomputing 000 (2017) 1–10

7

Fig. 4. Different hashing methods’ Precision-recall curves using 8, 16 and 32-bit codes on the CIFAR-10 dataset with 20 0 0 training samples.

Fig. 5. Different hashing methods’ Precision-recall curves using 16, 32 and 64-bit codes on the SUN397 dataset with 20 0 0 training samples.

Fig. 6. Different hashing methods’ Precision-recall curves using 16, 32 and 64-bit codes on the YouTube Faces dataset with 20 0 0 training samples. Table 2 Results with performance (MAP) and training time (s) of different comparing algorithms on CIFAR-10 dataset. Method

n = 10 0 0

n = 20 0 0

MAP

BRE LFH KSH SDH LISH DGH

Time

MAP

8-bit

16-bit

32-bit

32-bit

8-bit

16-bit

32-bit

32-bit

Time

0.1403 0.1146 0.2182 0.1512 0.2122 0.1163

0.1439 0.1796 0.2446 0.2491 0.2524 0.1142

0.1618 0.1979 0.2618 0.2751 0.2834 0.1135

79.01 0.53 117.48 0.22 6.93 0.76

0.1339 0.1402 0.2369 0.1913 0.2420 0.1315

0.1507 0.1539 0.2696 0.2744 0.2862 0.1346

0.1612 0.20 0 0 0.2975 0.3170 0.3173 0.1332

402.39 0.80 412.02 0.90 29.68 0.97

Please cite this article as: S. He et al., Learning binary codes with local and inner data structure, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.12.005

ARTICLE IN PRESS

JID: NEUCOM 8

[m5G;December 14, 2017;20:43]

S. He et al. / Neurocomputing 000 (2017) 1–10 Table 3 Results with performance (MAP) and training time (s) of different comparing algorithms on SUN397 dataset. Method

n = 10 0 0

n = 20 0 0

MAP

BRE LFH KSH SDH LISH DGH

Time

MAP

Time

16-bit

32-bit

64-bit

64-bit

16-bit

32-bit

64-bit

64-bit

0.1659 0.1252 0.2960 0.2703 0.2922 0.1127

0.1997 0.1943 0.3733 0.3245 0.3794 0.1230

0.2251 0.3057 0.4155 0.4230 0.4368 0.1217

2053.44 2.56 312.89 8.95 31.56 0.38

0.1854 0.1688 0.3623 0.3621 0.3735 0.0748

0.2169 0.2736 0.4429 0.3655 0.4653 0.0773

0.2473 0.3512 0.4831 0.5136 0.5268 0.0721

3409.66 2.88 847.78 10.93 88.12 1.65

Table 4 Results with performance (MAP) and training time (s) of different comparing algorithms on YouTube Faces dataset. Method

n = 10 0 0

n = 20 0 0

MAP

BRE LFH KSH SDH LISH DGH

Time

MAP

Time

16-bit

32-bit

64-bit

64-bit

16-bit

32-bit

64-bit

64-bit

0.4982 0.3731 0.8454 0.8367 0.8408 0.3088

0.5909 0.7011 0.9003 0.7539 0.9071 0.4437

0.6437 0.8442 0.9250 0.9278 0.9326 0.5286

605.09 3.42 214.67 28.60 19.91 0.55

0.5384 0.4230 0.8969 0.9030 0.9049 0.1876

0.5932 0.7242 0.9414 0.8191 0.9770 0.2920

0.6479 0.8532 0.9532 0.9597 0.9630 0.2952

2170.45 4.30 870.09 70.02 60.05 2.37

recall rate increases, the reason is that precision is sensitive to true positive rate and recall rate is sensitive to false positive rate. Note that LFH has a low accuracy at 8-bit even than unsupervised method DGH when n = 10 0 0, and KSH’s performance is slightly higher than our method at 8-bit, however, it costs much longer time than ours and the precision at other bits are lower. It is clear to see that LISH achieves the highest retrieval accuracy (precision and MAP) with almost all code lengths and significantly outperforms the compared hashing methods at 16-bit. It is clear that LISH is very effective with different code lengths for the CIFAR-10. 4.2. SUN397 We take the SUN397 dataset to evaluate the computational efficiency of compared methods. Again, we randomly select 10 0 0 and 20 0 0 images from the training subset. Evaluation results in term of MAP and Precision are presented in Table 3 and Fig. 5, respectively. LFH increases fast with code length, the reason may be it performs well at the long bits in SUN397. As shown in Table 3, BRE is significantly inefficient because it cost several times more than other methods. We can also find that LFH and BRE’s tendency to increase are inferior to other three supervised methods. As Fig. 5 shows, KSH has higher performance at high recall and SDH increases rapidly with the code length. However, our method achieves superior results than both of them and cost less time relatively. 4.3. YouTube Faces In YouTube Faces dataset, it contains 98,0 0 0 images of 1595 individuals, 38 people is chosen and each one has 2,0 0 0 images to form a subset of totally 100K faces and other 3.8K images to constitute the testdata. Evaluation results in term of Precision and MAP are shown in Table 4 and Fig. 6. Since YouTube Faces dataset is made up of people’s faces whose characteristics are obvious to recognize, the results of BRE, KSH, SDH and our method increase rapidly. Except for LFH, the other four methods are demonstrated to be less efficient than ours. Since we additionally extract the

underlying structure of the dataset, LISH obtains the best performance among all hashing methods. Specifically, the best performance of LISH is achieved when n = 10 0 0 at 32-bit, its MAP is 16.89% higher than the SDH. 5. Conclusion In this paper, we exploited underlying manifold structure of samples by graph Laplacian. The approximate anchor graph was used to save training cost. To capture and preserve the semantic label information in the Hamming space, we explicitly formulated the tractable optimization function integrated with 2 loss and decomposed it into several sub-problems which could be iteratively solved by our algorithm. We proposed a discrete supervised paradigm to directly generate hash codes without continuous relaxation, by working in the discrete code space, the retrieval accuracy of the short binary codes can be boosted. Empirical evaluations in retrieving semantically similar neighbors on three benchmark databases showed that our method has superior performance over state-of-the-arts. References [1] M. Wang, X. Hua, J. Tang, R. Hong, Beyond distance measurement: constructing neighborhood similarity for video annotation, IEEE Trans. Multmed. 11 (3) (2009) 465–476. [2] B. Kulis, P. Jain, K. Grauman, Fast similarity search for learned metrics, IEEE Trans. Pattern Anal. Mach. Intell. 31 (12) (2009) 2143–2157. [3] M. Hu, Y. Yang, F. Shen, L. Zhang, H.T. Shen, X. Li, Robust web image annotation via exploring multi-facet and structural knowledge, IEEE Trans. Image Process. 26 (10) (2017) 4871–4884. [4] Y. Gong, S. Lazebnik, A. Gordo, F. Perronnin, Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval, IEEE Trans. Pattern Anal. Mach. Intell. 35 (12) (2013) 2916–2929. [5] X. Liu, J. He, D. Liu, B. Lang, Compact kernel hashing with multiple features, in: Proceedings of the ACM International Conference on Multimedia, 2012, pp. 881–884. [6] M. Hu, Y. Yang, F. Shen, N. Xie, H.T. Shen, Hashing with angular reconstructive embeddings, IEEE Trans. Image Process. 27 (2) (2018) 545–555. [7] Y. Mu, G. Hua, W. Fan, S. Chang, Hash-SVM: scalable kernel machines for large-scale visual classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2014, pp. 979–986. [8] X. Zhu, X. Li, S. Zhang, Block-row sparse multiview multilabel learning for image classification, IEEE Trans. Cybern. 46 (2) (2016) 450–461.

Please cite this article as: S. He et al., Learning binary codes with local and inner data structure, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.12.005

JID: NEUCOM

ARTICLE IN PRESS

[m5G;December 14, 2017;20:43]

S. He et al. / Neurocomputing 000 (2017) 1–10 [9] J. Wang, W. Liu, S. Kumar, S. Chang, Learning to hash for indexing big data – a survey, Proc. IEEE 104 (1) (2016) 34–57. [10] C. Strecha, A.M. Bronstein, M.M. Bronstein, P. Fua, LDAHash: improved matching with smaller descriptors, IEEE Trans. Pattern Anal. Mach. Intell. 34 (1) (2012) 66–78. [11] X. Li, G. Lin, C. Shen, A. van den Hengel, A.R. Dick, Learning hash functions using column generation, in: Proceedings of the International Conference Machine Learning, 2013, pp. 142–150. [12] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, T. Chua, Neural collaborative filtering, in: Proceedings of the Twenty Sixth International Conference on World Wide Web, WWW, 2017, pp. 173–182. [13] J. Song, Y. Yang, Y. Yang, Z. Huang, H.T. Shen, Inter-media hashing for large-scale retrieval from heterogeneous data sources, in: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2013, pp. 785–796. [14] X. Zhu, X. Li, S. Zhang, C. Ju, X. Wu, Robust joint graph sparse coding for unsupervised spectral feature selection, IEEE Trans. Neural Netw. Learn. Syst. 28 (6) (2017) 1263–1275. [15] R. Hu, X. Zhu, D. Cheng, W. He, Y. Yan, J. Song, S. Zhang, Graph self-representation method for unsupervised feature selection, Neurocomputing 220 (2017) 130–137. [16] X. Liu, Y. Mu, B. Lang, S. Chang, Mixed image-keyword query adaptive hashing over multilabel images, ACM Trans. Multimed. Comput. Commun. Appl. 10 (2) (2014) 22:1–22:21. [17] J. Wang, X. Xu, S. Guo, L. Cui, X. Wang, Linear unsupervised hashing for ANN search in euclidean space, Neurocomputing 171 (2016) 283–292. [18] Y. Zhang, W. Lu, Y. Liu, F. Wu, Kernelized sparse hashing for scalable image retrieval, Neurocomputing 172 (2016) 207–214. [19] X. Zhu, L. Zhang, Z. Huang, A sparse embedding and least variance encoding approach to hashing, IEEE Trans. Image Process. 23 (9) (2014) 3737–3750. [20] A. Gionis, P. Indyk, R. Motwani, Similarity search in high dimensions via hashing, in: Proceedings of the Twenty Fifth International Conference on Very Large Data Bases, VLDB, 1999, pp. 518–529. [21] Y. Yang, Z. Ma, Y. Yang, F. Nie, H.T. Shen, Multitask spectral clustering by exploring intertask correlation, IEEE Trans. Cybern. 45 (5) (2015) 1069–1080. [22] X. He, H. Zhang, M. Kan, T. Chua, Fast matrix factorization for online recommendation with implicit feedback, in: Proceedings of the Thirty Ninth International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR, 2016, pp. 549–558. [23] Y. Yang, F. Shen, Z. Huang, H.T. Shen, X. Li, Discrete nonnegative spectral clustering, IEEE Trans. Knowl. Data Eng. 29 (9) (2017) 1834–1845. [24] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, R.C. Jain, Content-based image retrieval at the end of the early years, IEEE Trans. Pattern Anal. Mach. Intell. 22 (12) (20 0 0) 1349–1380. [25] J.L. Bentley, Multidimensional binary search trees used for associative searching, Commun. ACM 18 (9) (1975) 509–517. [26] J.H. Friedman, J.L. Bentley, R.A. Finkel, An algorithm for finding best matches in logarithmic expected time, ACM Trans. Math. Softw. 3 (3) (1977) 209–226. [27] C. Silpa-Anan, R.I. Hartley, Optimised KD-trees for fast image descriptor matching, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2008. [28] V. Gaede, O. Günther, Multidimensional access methods, ACM Comput. Surv. (CSUR) 30 (2) (1998) 170–231. [29] J.K. Uhlmann, Satisfying general proximity/similarity queries with metric trees, Inf. Process. Lett. 40 (4) (1991) 175–179. [30] S. Wang, Z. Huang, X. Xu, A multi-label least-squares hashing for scalable image search, in: Proceedings of the SIAM International Conference on Data Mining, SIAM, 2015, pp. 954–962. [31] A. Gionis, P. Indyk, R. Motwani, et al., Similarity search in high dimensions via hashing, in: Proceedings of the Twenty Fifth International Conference on Very Large Data Bases, VLDB, vol. 99, 1999, pp. 518–529. [32] Y. Weiss, A. Torralba, R. Fergus, Spectral hashing, in: Proceedings of the Advances in Neural Information Processing Systems, NIPS, 2008, pp. 1753–1760. [33] W. Liu, J. Wang, S. Kumar, S. Chang, Hashing with graphs, in: Proceedings of the International Conference on Machine Learning, 2011, pp. 1–8. [34] W. Jun, Y. Lee, B. Jun, Duplicate video detection for large-scale multimedia, Multimed. Tools Appl. 75 (23) (2016) 15665–15678. [35] Y. Gong, S. Lazebnik, Iterative quantization: a procrustean approach to learning binary codes, in: Proceedings of the Computer Vision and Pattern Recognition, CVPR, 2011, pp. 817–824. [36] W. Kong, W. Li, Isotropic hashing, in: Proceedings of the Advances in Neural Information Processing Systems, NIPS, 2012, pp. 1655–1663. [37] K. He, F. Wen, J. Sun, K-means hashing: an affinity-preserving quantization method for learning binary compact codes, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2013, pp. 2938–2945. [38] W. Liu, C. Mu, S. Kumar, S. Chang, Discrete graph hashing, in: Proceedings of the Advances in Neural Information Processing Systems, NIPS, 2014, pp. 3419–3427. [39] J. Wang, S. Kumar, S. Chang, Semi-supervised hashing for large-scale search, IEEE Trans. Patterns Anal. Mach. Intell. 34 (12) (2012) 2393–2406. [40] Y. Luo, Y. Yang, F. Shen, Z. Huang, P. Zhou, H.T. Shen, Robust discrete code modeling for supervised hashing, Pattern Recognit. 75 (2018) 128–135.

9

[41] X. Liu, J. He, B. Lang, Reciprocal hash tables for nearest neighbor search, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2013. [42] X. Liu, L. Huang, C. Deng, J. Lu, B. Lang, Multi-view complementary hash tables for nearest neighbor search, in: Proceedings of the IEEE International Conference on Computer Vision, ICCV, 2015, pp. 1107–1115. [43] Y. Mu, X. Chen, X. Liu, T. Chua, S. Yan, Multimedia semantics-aware query-adaptive hashing with bits reconfigurability, Int. J. Multimed. Inf. Retr. 1 (1) (2012) 59–70. [44] C. Deng, X. Liu, Y. Mu, J. Li, Large-scale multi-task image labeling with adaptive relevance discovery and feature hashing, Signal Process. 112 (2015) 137–145. [45] X. Liu, X. Fan, C. Deng, Z. Li, H. Su, D. Tao, Multilinear hyperplane hashing, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 5119–5127. [46] Z. Li, X. Liu, J. Wu, H. Su, Adaptive binary quantization for fast nearest neighbor search, in: Proceedings of the European Association for Artificial Intelligence, 2016, pp. 64–72. [47] G. Shakhnarovich, P.A. Viola, T. Darrell, Fast pose estimation with parameter-sensitive hashing, in: Proceedings of the Ninth IEEE International Conference on Computer Vision, ICCV, 2003, pp. 750–759. [48] W. Liu, J. Wang, R. Ji, Y. Jiang, S. Chang, Supervised hashing with kernels, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2012, pp. 2074–2081. [49] B. Kulis, T. Darrell, Learning to hash with binary reconstructive embeddings, in: Proceedings of the Advances in Neural Information Processing Systems, NIPS, 2009, pp. 1042–1050. [50] F. Shen, C. Shen, W. Liu, H.T. Shen, Supervised discrete hashing, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2015, pp. 37–45. [51] Y. Yang, F. Shen, H.T. Shen, H. Li, X. Li, Robust discrete spectral hashing for large-scale image semantic indexing, IEEE Trans. Big Data 1 (4) (2015) 162–171. [52] L. Liu, Z. Lin, L. Shao, F. Shen, G. Ding, J. Han, Sequential discrete hashing for scalable cross-modality similarity retrieval, IEEE Trans. Image Process. 26 (1) (2017) 107–118. [53] B. Wang, Y. Yang, X. Xu, A. Hanjalic, H.T. Shen, Adversarial cross-modal retrieval, in: Proceedings of the IEEE International Conference on ACM Multimedia, 2017, pp. 154–162. [54] R. Xia, Y. Pan, H. Lai, C. Liu, S. Yan, Supervised hashing for image retrieval via image representation learning, in: Proceedings of the Twenty Eighth Conference on Artificial Intelligence, AAAI, 2014, pp. 2156–2162. [55] H. Lai, Y. Pan, Y. Liu, S. Yan, Simultaneous feature learning and hash coding with deep neural networks, in: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, CVPR, 2015, pp. 3270–3278. [56] H. Liu, R. Wang, S. Shan, X. Chen, Deep supervised hashing for fast image retrieval, in: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 2064–2072. [57] V.E. Liong, J. Lu, G. Wang, P. Moulin, J. Zhou, Deep hashing for compact binary codes learning, in: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, CVPR, 2015, pp. 2475–2483. [58] S. He, G. Ye, M. Hu, Y. Yang, F. Shen, H.T. Shen, X. Li, Efficient supervised hashing via exploring local and inner data structure, in: Proceedings of the Australasian Database Conference, ADC, 2017. [59] J. Lurie, Review of Spectral Graph Theory: by Fan R. K. Chung, SIGACT News 30 (2) (1999) 14–16. [60] X. Shi, F. Xing, J. Cai, Z. Zhang, Y. Xie, L. Yang, Kernel-based supervised discrete hashing for image retrieval, in: Proceedings of the European Conference on Computer Vision, 2016, pp. 419–433. [61] G. Fung, O.L. Mangasarian, Multicategory proximal support vector machine classifiers, Mach. Learn. 59 (1–2) (2005) 77–97. [62] J. Håstad, Some optimal inapproximability results, J. ACM 48 (4) (2001) 798–859. [63] J. De Leeuw, Applications of convex analysis to multidimensional scaling, Department of Statistics, UCLA, 2005. [64] A. Torralba, R. Fergus, W.T. Freeman, 80 million tiny images: a large data set for nonparametric object and scene recognition, IEEE Trans. Pattern Anal. Mach. Intell. 30 (11) (2008) 1958–1970. [65] A. Oliva, A. Torralba, Modeling the shape of the scene: a holistic representation of the spatial envelope, Int. J. Comput. Vis. 42 (3) (2001) 145–175. [66] P. Zhang, W. Zhang, W. Li, M. Guo, Supervised hashing with latent factor models, in: Proceedings of the Thirty Seventh International ACM SIGIR Conference on Research Development in Information Retrieval, SIGIR, 2014, pp. 173–182. Shiyuan He is currently an undergraduate student in Yingcai Honors College, University of Electronic Science and Technology of China. His major research interests include Image Retrieval, Hash and machine learning.

Please cite this article as: S. He et al., Learning binary codes with local and inner data structure, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.12.005

ARTICLE IN PRESS

JID: NEUCOM 10

[m5G;December 14, 2017;20:43]

S. He et al. / Neurocomputing 000 (2017) 1–10 Guo Ye is the undergraduate student in Yingcai Honors College, University of Electronic Science and Technology of China. His interests include Image Retrieval, Hash and machine learning.

Fumin Shen received the B.S. from Shandong University in 2007 and the Ph.D. degree from the Nanjing University of Science and Technology, China, in 2014. He is currently an Associate Professor with the School of Computer Science and Engineering, University of Electronic Science and Technology of China, China. His major research interests include computer vision and machine learning, including face recognition, image analysis, hashing methods, and robust statistics with its applications in computer vision.

Mengqiu Hu received the bachelors degree in 2015 from University of Electronic Science and Technology of China. He is currently a master student in University of Electronic Science and Technology of China. His major research interests include computer vision and machine learning.

Heng Tao Shen received the B.Sc. degree (Hons.) and the Ph.D. degree from the Department of Computer Science, National University of Singapore, in 20 0 0 and 20 04, respectively. He joined The University of Queensland as a Lecturer, a Senior Lecturer, and a Reader, where he became a Professor in 2011. He is currently a Professor of Computer Science and an ARC Future Fellow with the School of Information Technology and Electrical Engineering, The University of Queensland. He is also a Visiting Professor with Nagoya University and the National University of Singapore. His research interests mainly include multimedia/ mobile/Web search, and big data management on spatial, temporal, multimedia, and social media databases. He has extensively published and served on program committees in most prestigious international publication venues of interests. He received the Chris Wallace Award for outstanding Research Contribution in 2010 conferred by the Computing Research and Education Association, Australasia. He is also an Associate Editor of the IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING. He will serve as the PC Co-Chair of the ACM Multimedia in 2015.

Yang Yang received the bachelors degree from Jilin University in 2006, the masters degree from Peking University in 2009, and the Ph.D. degree from The University of Queensland, Australia, in 2012, under the supervision of Prof. H. T. Shen and Prof. X. Zhou. He was a Research Fellow with the National University of Singapore from 2012 to 2014. He is currently with the University of Electronic Science and Technology of China.

Please cite this article as: S. He et al., Learning binary codes with local and inner data structure, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.12.005