Knowledge-Based Systems xxx (xxxx) xxx
Contents lists available at ScienceDirect
Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys
A scalable sub-graph regularization for efficient content based image retrieval with long-term relevance feedback enhancement ∗
Mingbo Zhao a , Jiao Liu b , , Zhao Zhang c , Jicong Fan d,e a
Donghua University, China Shanghai University of Engineering Science, China c Hefei University of Technology, China d The Chinese University of Hong Kong, Shenzhen, Hong Kong e Shenzhen Research Institute of Big Data, Shenzhen, China b
article
info
Article history: Received 4 May 2020 Received in revised form 30 September 2020 Accepted 6 October 2020 Available online xxxx Keywords: Manifold ranking Content based image retrieval Relevance feedback
a b s t r a c t The goal of content-based image retrieval (CBIR) is to search relevant images through the analysis of image content. Manifold Ranking (MR) and Efficient Manifold Ranking (EMR) method has been successfully applied to content-based image retrieval due to its ability to discover underlying geometrical structure of dataset given the query data. But given the image database is scalable, the graph in MR and EMR cannot be extended or updated as their graph size is fixed. In this paper, to solve the above problem, we consider to formulate a sub-graph based on fixed anchors, instead of constructing the graph based on the while dataset, where the anchors are selected by utilizing conventional k-means method and the sub-graph weight matrix is defined by the similarity between any pair-wise anchors. Since the number of anchors is much smaller than the original dataset, updating the sub-graph is much easier than the original graph of whole dataset. Motivated by such sub-graph construction, we then develop an efficient graph regularization framework to predict the ranking scores for the whole data along the sub-graph, where the ranking score is first propagated from query image to the partial anchors, then from partial anchors to all anchors via the sub-graph and finally to the whole dataset. It can also utilize user relevance feedbacks to update the sub-graph so that the discriminative information can be involved to enhance the retrieval performance in a long term. Extensive simulations verify the effectiveness of the proposed method. © 2020 Elsevier B.V. All rights reserved.
1. Introduction In the real world, there are ever-increasing image data generated from Internet or daily social communication. To handle such large-scale data, Content based Image Retrieval (CBIR) is an important technique that has been attracted great interests during the last years [1,2]. The goal of CBIR is to search the most relevant feedback images according to the query image by utilizing the low-level features, including global features (e.g., color moment, edge histogram, LBP [3]) and local features (e.g., SIFT [4]), automatically extracted from images. But the low-level visual features may not accurately characterize the high-level semantic concepts embedded in the original data making ‘‘semantic gap’’ between them. To handle this problem, the deep learning methods, which has been proposed during the past few years, are able to extract the mid or high-level features of data by employing deep architectures composed of multiple non-linear transformations, ∗ Corresponding author. E-mail address:
[email protected] (J. Liu).
through which the underlying semantic structure of images can well be captured so as to naturally narrow down the ‘‘semantic gap’’. Another type of methods for narrowing down the gap is to utilize the so-called multimodal image retrieval. These methods have merged image and textual contents for better characterize the features of images so that the performance of image retrieval can be further enhanced. Many works for multi-modal image retrieval have been proposed during the past decades, which include [5–8]. In addition, Relevance feedback [9] is also a useful tool for CBIR for narrow down the ‘‘semantic gap’’, as user’s highlevel perception is captured by dynamically updated weights based on the user’s feedback. In general, in order to measure the similarity between the query image and database images, the most frequently used CBIR system is distance-based ranking method, where the Euclidean distance is usually used either in the original feature space or in a lower dimensional space of image features. Recently, due to the growth of visual contents, rapid search in a large database becomes an emerging need. A practical strategy is to use the technique of Approximate Nearest Neighbor (ANN) [10– 12] or hashing based method [13,14] for speedup. These methods
https://doi.org/10.1016/j.knosys.2020.106505 0950-7051/© 2020 Elsevier B.V. All rights reserved.
Please cite this article as: M. Zhao, J. Liu, Z. Zhang et al., A scalable sub-graph regularization for efficient content based image retrieval with long-term relevance feedback enhancement, Knowledge-Based Systems (2020) 106505, https://doi.org/10.1016/j.knosys.2020.106505.
M. Zhao, J. Liu, Z. Zhang et al.
Knowledge-Based Systems xxx (xxxx) xxx
project the high-dimensional features to a lower dimensional space, and then generate the compact binary codes. Benefiting from the produced binary codes, fast image search can be carried out via binary pattern matching or Hamming distance measurement, which dramatically reduces the computational cost and further optimizes the efficiency of the search. However, the above distance-based ranking methods and hashing methods only focus on the pair-wise similarity among data and the distribution of the whole data set is not considered. In our opinion, a good CBIR system should consider image features as well as the intrinsic structure of the image database. To handle the above problem, Manifold Ranking (MR) method [15–20], another type of ranking method, ranks data samples according to the intrinsic geometrical structure embedded in a large number of data. By taking the underlying structure into account [21], manifold ranking assigns each data sample a relative ranking score, instead of an absolute pairwise similarity. The score is treated as a similarity metric defined on the manifold, which is more meaningful to capturing the semantic relevance. However, manifold ranking has its own drawbacks to handle large scale databases, as it has expensive computational cost both in graph construction and ranking computation stages. To handle this problem, Efficient Manifold Ranking (EMR) is proposed to extend conventional MR by adopting an anchor graph on the database instead of the traditional k-nearest neighbor graph [22, 23], where each data point is first to find the k neighbors of anchors, and then the graph is constructed by the inner product of coefficients between the data and anchors. Finally, it designs a new form of adjacency matrix utilized to speed up the ranking computation. As a result, EMR can well handle a database with large-scale images and do the online retrieval in a short time. It should be noted that both MR and EMR have constructed the graph based on the whole dataset. However, in many real-world applications, the image database tends to be scalable and more and more data points will be included into the database. In such case, the graph in MR and EMR cannot be extended or updated, as the graph size is fixed. Therefore, how to construct a graph to characterize the original geometrical structure of scalable data manifold is a main challenge in graph-based ranking methods. Another factor is that user’s relevance feedback can be utilized for narrow down the ‘‘semantic gap’’ since they can be viewed as query data. How to involved such discriminative information to update the graph is also an important challenge. In this paper, we aim to develop a scalable and adaptive manifold ranking method for handling the above problems. In detail, we will consider to formulate a sub-graph based on fixed anchors, instead of constructing the graph based on the while dataset, where the anchors are selected by utilizing conventional k-means method and the sub-graph weight matrix is defined by the similarity between any pair-wise anchors. Since the number of anchors is much smaller than the original dataset, updating the sub-graph is much easy than the original graph of whole dataset. The scalability can also be naturally solved. In addition, if the number of anchors is over-completed, they are representative so that the sub-graph constructed by them can approximate the original manifold structure of dataset. Motivated by the sub-graph construction, we then develop a graph regularization framework to preserve the smoothness of predict ranking scores of anchors along the sub-graph and the final ranking scores of all data can also be propagated from the anchors via the adjacency matrix. Specifically, the ranking scores is first propagated from query image to the partial anchors according to the similarity between query image to the anchors, then is propagated from partial anchors to all anchors via the sub-graph and finally is propagated to the whole dataset according to the adjacency matrix between anchors and original data. Since the sub-graph is
quite smaller than the one constructed by the data, this can much speed up the ranking computation. The main contributions of this paper are as follows: (1) we develop a sub-graph construction, which is derived from a bipartite graph formed by the multi-layer composition process of data matrix. Since the anchors are the representations of original data, the anchor graph can approximate the original manifold structure embedded in the original data matrix. In addition, the multi-layer anchor graph is much smaller than the graph constructed by utilizing all the datasets so it is more salable; (2) We develop a faster ranking approach for fast CBIR, where the first step is to propagate the ranking score from query data to anchors, the second step to propagate them along the sub-graph and the final step is propagate them to the whole data. Since the sub-graph is very smaller, it is efficient for ranking computation; (3) The proposed ranking method can also be extended for handling user relevance feedback by short-term query extension. In addition, we also utilize the feedback to update the weights in graph so that the performance of ranking system can be enhanced in the long term. Extensive simulations are conducted to verify the effectiveness of the proposed methods. The rest of this paper is organized as follows: In Section 2, we briefly review related work for CBIR and graph based ranking method; in Section 3, we will propose our deep sub-graph based model for CBIR; extensive simulations are conducted in Section 4, and final conclusions are drawn in Section 5. 2. Notations and review of related work In this section, we will first give some notations and then briefly review some related work of manifold ranking. 2.1. Notations In this section, we will first give some notations and then briefly review manifold ranking, a graph based ranking model, for CBIR. Specifically, let X = [x1 , x2 , . . . , xn ]T ∈ R n×d be the data matrix, where d is the number of data features, Y = [y1 , y2 , . . . , yn ]T ∈ R n×1 be the initial ranking of all samples satisfying yj = 1 if xj is the query image, otherwise yj = 0. We also let F = [f1 , f2 , . . . , fn ]T ∈ R n×1 be the predicted ranking scores, where fi is the ranking value for xi satisfying 0 ≤ fi ≤ 1. The samples with the top scores are selected by the system as the most relevant images to the user query. In addition, in manifold ranking method, a similarity matrix must be defined for evaluating the similarities between any pair-wise samples. Let G = (V , E ) denote this graph, where V is the vertex set of G representing the training samples, E is the edge set of ˜ G associated with a weight matrix W . A typical way to define the (graph weight is to) utilize Gaussian function as [21]: Wij = exp −∥xi − xj ∥2 /2σ 2 , if xi is within the neighborhoods of xj or xj is within the neighborhoods of xi ; Wij = 0, otherwise. We also denote L = D − W is the graph Laplacian matrix, which is to provide a discrete approximation to the local geometry of data manifold, and ˜ L = D−1/2 LD−1/2 is the normalized graph Laplacian matrix, where ∑n D is a diagonal matrix with each diagonal elements as Djj = i=1 Wij . 2.2. Review of Manifold Ranking (MR) After the graph is constructed, the goal of manifold ranking method for CBIR is then to propagate the ranking information from query image to database images along the graph, through which the predicted ranking scores of database images can be 2
M. Zhao, J. Liu, Z. Zhang et al.
Knowledge-Based Systems xxx (xxxx) xxx
3. The proposed method
obtained. Specifically, the goal of manifold ranking is to minimize the following objective function: g (F ) =
n ∑ i,j=1
2 n f ∑ fj i ∥fi − yi ∥2F , − √ +α Wij √ Dii Djj
It should be noted that both MR and EMR have constructed the graph based on the whole dataset, which is not scalable. In other words, given new-coming data points, the graph cannot be extended or updated, as the graph size is fixed and cannot measure the similarities of new-coming data points. This can be usually confronted and inevitable in many real-world applications, as more and more data points have been included into the database. Therefore, how to construct a graph to characterize the original geometrical structure of scalable data manifold is a main challenge in graph-based ranking methods. Note that the main dilemma in the above problem is the fixedsize graph cannot be applicable for the increased data points. One solution is to update the original graph but this will certainly increase the computational cost. Here, we consider another solution to solve the above problem. In detail, instead of constructing the graph based on the whole dataset, we can consider formulating a sub-graph based on a fixed representative datasets or anchors. there are many ways to select anchors. Two commonly used methods that are random selection and k-means generation [24,25]. The random selection method randomly sampled m data from n original data as anchors. This method has strong randomness and cannot guarantee that the selected m anchors are suitable or the result is robust. The k-means selection method clusters n original samples into m clusters and selects the m clustering centers as anchors. This method can generate anchors but has high computational complexity. In our work, we also utilize k-means for anchor selection. But in order to speed up the procedure of k-means generation, we adopt stopping the iteration early and performing down-sampling as in [25] to guarantee the quality of generated anchors. This has three advantages: (1) the number of anchors can be fixed and the number is much smaller than the original dataset. Therefore, updating the sub-graph is much easier than the original graph of whole dataset. The problem for handling the newcoming datasets can be naturally solved; (2) if the anchors are over-completed or approximately over-completed, they are representative so that the sub-graph constructed by them can approximate the original manifold structure of dataset; (3) there are many ways to construct the robust sub-graph based on the small-size anchors with few computational complexity, so that both scalability and robustness can be satisfied. Then for online retrieval procedure, we can first calculate the similarity weight between query image and dataset so that the ranking scores can be propagated from query image to the partially anchors via the similarity vector. Then based on the constructed sub-graph, the ranking scores can be propagated along the sub-graph itself and finally be propagated back to the whole datasets according to the similarity matrix between anchors and data. Since the sub-graph is quite smaller than the one constructed by the data, this can much speed up the ranking computation. An illustration of sub graph construction based ranking method can be shown in Fig. 1.
(1)
i=1
F
where λ > 0 is a regularized parameter balancing the two regularized terms. By setting the derivation w.r.t. F to the zero, the optimal ranking scores can be given as follows: F = (I − γ M )−1 Y ,
(2)
where I ∈ R is an identity matrix, γ = 1/(1 + α) and M = D−1/2 W D−1/2 . The ranking score F can be viewed as a metric of manifold distance between the query images and database images, which is more meaningful to measure the semantic relevance. Finally, the database images with top ranking scores are returned to the user and wait for the user’s feedback for further re-ranking. n×n
2.3. Efficient Manifold Ranking (EMR) It should be noted that the key step for graph model is to design an adjacency matrix to approximate the geometrical structure of data manifold. However, MR has its own drawbacks to handle large scale databases, as it has expensive computational cost both in graph construction and ranking computation stages. To handle this problem, EMR has adopted anchor graph construc} { tion for efficient ranking. In detail, let B = b1 , b2 , . . . bq ∈ R q×d represent the set of anchors, S ∈ R q×n be the local weight matrix with each element Sij measuring the similarity between xj and bi ∑q satisfying Sij ≥ 0 and i=1 Sij = 1. Then, the anchor graph is formed as W s = S T ∆−1 S = H T H ,
(3)
where ∆ ∈ R q×q is a diagonal matrix with∑ each element ∆jj being n −1/2 the sum of the jth row in S, i.e., ∆jj = S. i=1 Sij , H = ∆ s It can be easily verified that W is symmetric, and the sum of each row or column of W s is equal to 1; hence, W s is also normalized. Then, by replacing M in Eq. (2) with( W s , the )optimal −1 ranking scores can be given as follows: F ∗ = I − γ W s Y =
)−1
I − γ HT H Y . With the form of W s = H T H , Eq. (2) can be rewritten as follows:
(
( F =
I −H
T
(
T
HH −
1
γ
)−1 ) Iq
H
Y,
(4)
where I q ∈ R q×q is an identity matrix and q is the number of features. Following Eq. (4), it can be noted that the computational ∗ complexity ( 3) (for3 )calculating the inverse part of F changes from O n to O d . In addition, note that the initial ranking vector Y satisfies yj = 1 if xj is the query image; yj = 0, otherwise. Then Eq. (4) can be further reduced to:
(
F ∗ = −H H H T −
1
γ
)−1 Iq
hq = −C hq ,
)−1
(5) 3.1. Sub-graph construction
where C = H H H − (1/γ ) I q ∈R is a projection matrix and can be calculated offline, hq = ∆−1/2 sq is the column of query in H . Following Eq. (5), it can be observe that no matter whether the query image is in database or not, we can simply obtain the ranking scores F ∗ by calculating hq for each query image and then performing an q × n multiplication as Eq. (5). In addition, given d ≪ n, the complexity for calculating F ∗ is O (qn), which is linear with the number of database n and suitable for handling large-scaled data. As a result, the strategy in Eq. (5) is simple and efficient.
(
T
q×n
In this paper, motivated by the above ends, we construct a sub-graph for approximating the original manifold structure of dataset. In detail, we need to first obtain anchors and then define an adjacency matrix for measuring the similarity between each data and anchors so that the ranking scores can be propagated from query data to anchors. Following the work in [22,25], a typical way is to use kernel regression [26]: Kδ bi , xj
(
Sij = ∑
s∈⟨j⟩
3
)
Kδ bs , xj
(
) , ∀ s ∈ ⟨j ⟩ ,
(6)
M. Zhao, J. Liu, Z. Zhang et al.
Knowledge-Based Systems xxx (xxxx) xxx
Fig. 1. The sub-graph regularized framework for manifold ranking: the first row shows the sub-graph construction procedure, where the anchors are first learned from the training data and then the adjacency matrix S is utilized to construct the sub-graph; the second row (from the left to the right subfigures) shows the ranking procedure, where the class labels of anchors are first propagated from query data to the anchors. Then, the ranking scores are propagated along the sub-graph and finally to the whole dataset by the adjacency matrix S.
where δ is the bandwidth of Gaussian function and ⟨j⟩ denotes the indices of the k neighborhood anchors of xj . By this definition, the data matrix X can roughly composited by the anchors B with S, i.e. X ≈ S T B, where anchors B can be obtained by conventional k-means clustering method. Obviously, we have S T 1q = 1, where 1 ∈ R n×1 and 1q ∈ R q×1 is the column vectors with n and q ones, respectively, so that the sum of each column of S is equal to 1. Here, in order to develop our proposed sub-graph manifold ranking method, we need to first construct a graph on the set of anchors and define the adjacency matrix to measure manifold consistency between any two anchors. There are many approaches to construct the graph by utilizing the dictionaries, such as conventional k-NN graph [27–30]. But intuitively, we will design the adjacency matrix W d ∈ R q×q by using S as follows: W d = ∆−1/2 SS T ∆−1/2 ,
anchors lying in the intrinsic manifold. Therefore, we aim to impose the degree normalization constraint on W that all vertices in the graph have the same degree Dii = 1, i.e. W 1 = 1. Since the adjacency matrix is always symmetric and non-negative, setting W 1 = 1 makes it a doubly-stochastic matrix [21]. There are two merits of the doubly-stochastic adjacency matrix: (1) it is highly robust to noise, since the normalized-degree constraint makes the weight Wij of a noisy data xi absolutely small compared to the weights between xi and closer neighbors; (2) it can handle imbalanced data, since it can strengthen the weights in the lowdensity region and weaken the weights in the high density region, which is advantageous for handling the case in which the density of datasets varies dramatically. Motivated by this end, we thereby give the objective function of the proposed weight matrix W by constraining the degree normalization and nonnegative constraint as follows:
(7)
where ∆ is a diagonal matrix with each element ∆ii being the sum of each row of S. The reasons why we choose such graph are as follows: (1) the anchors is a representation of original data, so that the anchor graph can approximate the original manifold structure embedded in the original data matrix; (2) the multilayer anchor graph is much smaller than the graph constructed by utilizing all the datasets so it is more salable; (3) if we further assume such affinities in the original high-dimensional data space can be preserved in the low-dimensional ranking scores, then we have Y ≈ S T Z , where Z = [z1 , z2 , . . . , zq ]T ∈ R q×1 represents the ranking scores of anchors B. This indicates that the class labels of dataset can be easily obtained by Y = S T Z given the ranking scores of anchors have already been inferred.
W = argmin ∥W − W 0 ∥2F W
s.t . W = W T , W 1 = 1, Wii = 0, ∀i ∈ [1, n] , W ≥ 0
(8)
Here, we further add a constraint Wii = 0, ∀i ∈ [1, n] to remove self-loops on graph. Eq. (8) falls into an instance of quadratic programming (QP). For efficient computation, we divide the QP problem into two convex sub-problems: W = argmin ∥W − W 0 ∥2F W
s.t . W = W T , W 1 = 1, Wii = 0, ∀i ∈ [1, n]
(9)
and W = argmin ∥W − W 0 ∥2F
s.t . W ≥ 0.
(10)
W
3.2. Nonparametric and robust graph construction
Incorporating the above derivations, we tackle the original QP problem in Eq. (8) by successively alternating between two subproblems in Eq. (9) and (10). This alternate projection process will converge due to the Von-Neumann’s successive projection lemma [31,32]. More importantly, the Von-Neumann’s lemma ensures that alternately solving sub-problems Eq. (9) and (10) with the current solution as input is theoretically guaranteed to converge to the global optimal solution of the target problem of Eq. (8).
Note that the sub-graph constructed in Eq. (7) is closely related to the adjacency matrix S. However, S is sensitive to the parameters and will be affected by the measurement between the data points and anchors. In this paper, we target to learn a new sub-graph which is independent to the adjacency matrix S. In our target, we hope the graph should be as close as possible to the given graph construction as in Eq. (7). We also hope it is homogeneous so that it is robust and nonparametric for the 4
M. Zhao, J. Liu, Z. Zhang et al.
Knowledge-Based Systems xxx (xxxx) xxx
3.3. Efficient ranking procedure
Algorithm 1: The proposed SGR
With the above graph construction, we then develop our subgraph model for efficient semi-supervised learning. Since the number of dictionary is much smaller than that of dataset, our goal is first to estimate the class labels of anchors Z from partially labeled samples via sub-graph model, and then to calculate those of unlabeled samples by the low-rank and sparse coding. An illustration of our goal can be seen in Fig. 1. Here, we first give the objective function of the proposed sub-graph regularized framework for calculating the class labels of anchors as follows:
1
2 3
4
5
n
J (Z , Y ) = min
∑
2 Ujj sj T Z − yj F
6
Input: Data matrix X ∈ R n×d , original ranking scores Y ∈ R n×1 , the number of dictionary q and other relative parameters. Calculate the S as Eq. (6). Form the weight matrix of sub-graph as W d = ∆−1/2 SS T ∆−1/2 in Eq. (7). Fine-tune sub-graph according to the objective function of Eq. (8). Calculate the ranking scores Y ∗ = S ηI L d + ηA Iq as in Eq. (14). Output: Estimated ranking scores Y ∈ R n×1 .
(
)−1
sTq = C sTq
j=1
+ηI
q ∑
,
2
Wijd zi − zj F +ηA ∥Z ∥2F
(11)
i,j=1
s.t . 0 ≤ zj ≤ 1, yq = 1, 0 ≤ yi ≤ 1, ∀i ∈ [1, n] where U is a diagonal matrix with Uii = 1 if xi is the query data; Uii = 0, otherwise. The first term in Eq. (11) is a fitting term measuring the inconsistency between the predicted ranking scores S T Z and initial query ranking score Y , the second term is a regularized term measuring the smoothness of predicted ranking scores on the graph, the third term is a Tikhonov regularization term to avoid the singularity of possible solution, ηI and ηA are two the parameter balancing the tradeoff between three terms. By setting the derivation of J (Z , Y ) w.r.t. Z and Y to zero, we can calculate the optimal Z and Y as follows:
{
Z ∗ = U S T SU S T + ηI L d + ηA I
(
)−1
(
}
Y
Y ∗ = S T Z ∗ = U S T SU S T + ηI L d + ηA I
Fig. 2. A comparison of data ranking: One-Swiss-Roll dataset (a) Ranking by Euclidean distances; (b) Ranking by the proposed SGR in Eq. (14). In this dataset, we only choose one data point in the core of one-Swiss-roll as the query data and the remaining ones as unlabeled data. The marker size of each unlabeled data point is proportional to its predicted ranking values. We can observe that the distance-based method fails to preserve the one-Swiss-roll structure, while the proposed SGR can well rank data points even if they are far from the annotated point.
)−1
SY
,
(12)
where L d is the normalized graph Laplacian matrix of W d , i.e. L d = I q − W d . Note that with some math derivation, the optimal solution of Eq. (12) can be rewritten as Y ∗ = S sq sTq + ηI L d + ηA I q
(
)−1
yq sTq
( )−1 )−1 ( d )−1 , = S 1 + sTq ηI L d + ηA I q sq ηI L + ηA I q sTq (
3.4. Application of CBIR with user relevance feedbacks (13) We next design an automatic feedback strategy to model the retrieval process by utilizing the proposed method. For each query image submitted by the user, the system retrieves and ranks the images in the database. Here, the rank for each image in the database is based on the estimated label information after performing the proposed method or other state-of-theart manifold methods. The top images (maybe 10, 20, or more images) with the highest-ranking scores are then selected as the feedback images, and their feedback information can be used for re-ranking. It is worth noting that for the sake of feedback, users should mark them as either relevant or irrelevant. In other words, if a feedback image is judged as relevant, it will be added to the query set, making the number of the query set increase. The newformed query set can then be used as inputs for a new-round ranking. The process will be iteratively performed several times until the user’s requirements are satisfied. Since the relevant feedbacks have involved more discriminative information, with the increase of the labeled set, the feedback annotated images are expected to be more and more relevant to the query image, which is good to enhance the performance of retrieval. There are two approaches for utilizing user relevance feedbacks, which can be categorized as short-term use or long-term use. In detail, for the short-term use of relevance feedbacks, the query extension strategy [34] is adopted for reranking, where the newly added relevance feedback data combined with the original query data will form a new query data. The new query data is then input to the ranking system for a new-ground ranking. There are many approaches for query extension. Let us define
where sq is the column vector measuring the similarity between xq anchors B, and yq = 1 given xq is the query data, and the second equation holds according to the Woodbury Identity ( )−1 ( )−1 formula [33], i.e. A + C BC T = A−1 − A−1 C B−1 + C T A−1 C C T A−1 , A, B and C are all matrixes with conformable sizes. )−1 Here, ( since sq ∈ R q×1 is only a vector, then 1 + sTq ηI L d + ηA I q sq is a constant that can be neglected. Therefore, we can obtain the final ranking scores of dataset as follows: Y ∗ = S ηI L d + ηA I
(
)−1
sTq = C sTq ,
(14)
)−1 ( where C = S ηI L d + ηA I ∈ R q×n is the projection matrix and can be calculated offline. From Eq. (14), we can see that after we obtain the projection matrix C , given a query data xq , we can calculate the ranking scores by C sTq with few computational complexity O (qn), no matter whether xq is in database or out of database, therefore the out-of-sample retrieval can naturally been solved. It can also be used for reranking given query data is extended by user relevance feedback. In addition, the computational complexity O (qn) is linear with the number of data hence can solving large-scale ranking problem. An simple example for data ranking by using the proposed SGR of (14) is given in Fig. 2, where we generalize a one-swiss-roll dataset for evaluation. In addition, data ranking has also been widely used in Content Based Image Retrieval (CBIR). We will in the following section illustrate this point. The basic steps of SGR can be seen in Algorithm 1. 5
M. Zhao, J. Liu, Z. Zhang et al.
Knowledge-Based Systems xxx (xxxx) xxx
S r = {sr0 , sr1 , . . . srm } ∈ R q×m as the query set, where sr0 = sq is the original query data, srk is the kth relevance feedback data, m we also denote sm q as the new query data. In order to make sq representative, a simply way∑ for query extension is to calculate m r the mean of S r , where sm q = i=1 si . In such way, more discriminative information has been merged into the new query data so that the ranking performance in the new round can be enhanced. However, the short-term use of relevance feedbacks can only enhance the ranking performance in current time. It cannot enhance the performance in a long term, as the parameters in the system is not updated accordingly. Thereby, long-term use of relevance feedback is necessary in order to update the system for incorporating more discriminative information embedded in the relevance feedback data. Here, back to our model in Eq. (13), our goal is to update S by fixing Z . In detail, denote sm q be the query extension data after being marked relevance, By putting sm q back into Eq. (14), we can obtain the optimal ranking scores of anchors ( ) −1 T Z r = S ηI L d + ηA I q sq . Then, by fixing Z = Z r , The objective function in Eq. (13) will be reduced to: J (S ) =
q ∑
2
Wijd zi − zj F = Tr S T M S
(
i,j=1
Fig. 3. Sub-Graph Construction without and with Long-term Relevance Feedbacks: One-Swiss-Roll dataset. In this dataset, we only choose one data point in the core of one-Swiss-roll as the query data and the remaining ones as unlabeled data. The marker size of each unlabeled data point is proportional to its predicted ranking values. We can observe that the updated sub-graph by considering long-term use of relevance feedbacks can better characterize the geometrical structure of data manifold.
) ,
4.1. Simulation settings
(15)
s.t . S1q = 1, S ≥ 0
Here, we choose three datasets, i.e. Fashion-MNIST [37], COREL [38] and CIFAR [39] datasets for evaluation. Fashion-MNIST is a dataset of Zalando’s article images, which serves as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning applications. In general, it consists of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28 × 28 grayscale image with a label from 10 classes, so that it shares the same image size and structure of training and testing splits to the original MNIST dataset. We use the 60,000 images as database images and the rest 10,000 images as queries for testing the out-of-sample retrieval performance; COREL dataset is large-scale image dataset which is typically used for CBIR task [38]. In our work, we utilize a subset of COREL dataset for evaluation, which contains 10,000 samples from diverse contents such as sunset, beach, flower, building, car, horse, mountains, fish, food, door, etc. Each category contains 100 samples with the size of 192 × 128 or 128 × 192. We use 90 samples per classes as training set and the remaining 10 samples per classes as query set; CIFAR-100 dataset is a labeled subset of 80 million tiny images dataset, which consists of 60000 samples in 100 classes such as airplane, automobile, bird, cat, dear, ship, truck, etc., with 600 images per class under 32 × 32 size. There are 500 training images and 100 testing images per class. Similarly, we also use 500 training images per class as training set and the remaining 100 images per classes as query set. In a real image retrieval system, the query image may be not in the image database. To simulate this environment, we divide the dataset into two non-overlapping subsets, i.e. 90% images are for training the system, and the rest 10% images used as query images. Another problem is how to extract the informative features of images to characterize the semantic concepts. Recent advances based on convolutional neural networks (CNN) has shown the powerful ability to learn the rich midlevel image representations. In our study, we apply our method on the global representation of midlevel layers of CNN [40], which is formed by the spatial max-pooling over the last convolutional layer of VGG. Fig. 4 first gives the visualization results of CBIR by the proposed SGR based on Fashion-MNIST [37], COREL [38] and CIFAR100 [39] datasets under 25 scope, where the images in the first column are the query images while those in the rest columns are the images with the highest ranking values. From the experiment results in Fig. 4, we can see the ranking results are satisfied,
where M ∈ R q×qis a symmetrical matrix with each element 2 satisfying Mij = zi − zj F , and the second equation holds as
2 ) ( ) ∑n ( ∑q d d = Tr W d M = i,j=1 M ⊙ W (i,j=T1 Wij) zi − zj F = Tr S M S , ⊙ is the pair-wise product. To update S, we then introduce multiplicative updating rules and give the Lagrange function to the problem of Eq. (15) by considering non-negative constraint and normalized constraint as follows: minS Tr S T M S − Tr φ S1q − 1
(
)
( (
))
+ Tr (ϕS ) ,
(16)
where φ ∈ R 1×n and ϕ ∈ R q×n are the Lagrange multiplier for the constraints S1q = 1 and Sij ≥ 0. By setting the derivative w.r.t. Sij to zero and using the Karush–Kuhn–Tucker (KKT) condition [35] ϕij Sij = 0, the updating rules of Sij , can be given as follows:
{
} (M S )ij Sij − φj Sij + ϕij Sij = 0 ϕij Sij = 0 , φ ⇒ Sij = (M S )j +α Sij
(17)
ij
where the jth element φj can be determined by the constraint S1q = 1, α → 0 is a very small value. Obviously, we can see that if
2 the ranking results zi and zj are close, then Mij = zi − zj F → 0.
2
Therefore, minimizing S i S Tj zi − zj F will penalize S i S Tj to be a large value, where S i is the ith row of S. This means closer ranking scores will make S i and S j closer. Finally, we can reformulate and fine-tune the sub-graph according to Eqs. (7) and (8), respectively. Thereby, the ranking system is updated in a long term, which is good to enhance the ranking results in future (see Fig. 3). 4. Simulations In this subsection, we apply the proposed SGR on contentbased image retrieval, and compare with those of the above manifold ranking based methods. Our goal is to treat the query image as input and perform manifold ranking method to estimate the ranking score of unlabeled set. Then, the data (maybe 10, 20, or more images) with highest ranking scores are selected as feedback. In addition, the users can further annotate such feedback images as relevant or irrelevant to the query image for query extension. In other words, we merge them with query data and computer their mean as new query data. The procedure will be iteratively conducted several times until the user’s requirements are satisfied. In this study, we follow the work in [36] for realizing CBIR. 6
M. Zhao, J. Liu, Z. Zhang et al.
Knowledge-Based Systems xxx (xxxx) xxx
Local Regression and Global Alignment (LRGA) [20]: another robust manifold ranking method that is insensitive to parameters; Anchor Graph Regularization (AGR) [22]: a popular scalable graph based method for handling large-scale semi-supervised learning. We extend it for handling image ranking task on CBIR such as in [41]; Efficient Manifold Ranking (EMR) [18]: a scalable extension of MR by utilizing anchor graph construction, which can handle large-scale image ranking in CBIR; Fast Spectral Ranking (FSR) [42]: an efficient ranking method to speeding up online search by approximating Fourier basis of constructed graph. Fig. 6 shows the qualitative results with different ranking methods based on Fashion-MNIST dataset, where we randomly select six query images for ranking. Here, we select the 25 top feedback images of different methods after fourth iterations of user relevant feedbacks. From simulation results in Fig. 6, we can see that the performance of LNP and LRGA are better than that of MR showing that the robust graph construction are more superior to Gaussian Kernel based graph, since they are sensitive to the parameters and data. In addition, by carefully selecting the number of anchors, the EMR, FSR and the proposed SGR can achieve better performance than conventional k-NN graph. The proposed SGR can achieve the best performance showing that the constructed sub-graph can represent clear data manifold embedded in the original data space.
Fig. 4. The ranking results of the proposed SGR with different datasets: The images in the first column for each figure are the query images while the remaining ones in the following column are the top images have the latest ranking scores.
4.3. Quantitative analysis of image ranking on CBIR where the ranking images are almost 100% related to the query image given several cases we choose. Fig. 9 shows the ranking results via varied iterations by user relevant feedback on FashionMNIST dataset (four random selected query images), where we in each iteration adopt query extension to merge relevant feedback images with the original query image and computer their mean as new query data. We then input such updated query image into the system for another-round ranking procedure. From simulation results in Fig. 9, we can see that by iteratively adding the user feedbacks to form new query image, the ranking results will graduate better in a way that the ranking images are more and more related to the original query image, showing that query extension strategy on the proposed SGR is actually useful to provide the discriminative information for CBIR (see Fig. 5).
For the evaluations of image retrieval results, we use an Mean Average Precision (MAP)-scope curve [18] to verify the performance of different ranking methods. The scope is the number of top-ranked images presented to the user, and the MAP is the mean of the ratios between the numbers of relevant images to the given scope. The MAP-scope curve describes the accuracies with varied scopes and, thus, gives an overall performance evaluation of the methods. We then fix the scope by 10, 20 and 50, and evaluate the performance of the different methods with varied iterations. Fig. 7 shows the average MAP-scope curves of different methods with fixed user feedback iterations of 1, 2 and 4. Fig. 8 further shows MAP with varied iterations under the fixed scope of 10, 20, and 50. From Figs. 7 and 8, we can see that by iteratively adding the user feedback, the retrieval accuracies for all the methods will graduate become better, which indicates that user feedback is actually useful to provide the discriminative information for CBIR. A case in point in Fig. 8 is that the proposed SGR on Fashion-MNIST can achieve 15% improvement of precision in the fourth feedback iteration compared with those in the first iteration when 10 scope is fixed. For Scope 20 and 50, such improvements can reach approximately 10% and 5%, respectively. for other datasets, the enhancements of precision can reach 10% from the fourth iteration to the first iteration. In addition, by carefully adjusting the number of anchors, the proposed SGR can achieve competitive retrieval results compared to other methods over the entire scope and iterations.
4.2. Qualitative analysis of image ranking on CBIR We in this subsection compared the proposed SGR methods with several state-of-the-art manifold ranking methods for CBIR. The baseline methods include: Manifold Ranking (MR) [15]: the original manifold ranking methods, which is the most important comparison one; Local Neighborhood Propagation (LNP) [19]: a classical manifold ranking method by constructing the graph with local reconstruction strategy. The graph is robust to the conventional Gaussian kernel graph;
Fig. 5. The ranking results via varied iterations by user relevant feedback based on Fashion-MNIST dataset: from the top row to the bottom row are the ranking results with the zero, first, second and fourth iteration of user relevance feedback. 7
M. Zhao, J. Liu, Z. Zhang et al.
Knowledge-Based Systems xxx (xxxx) xxx
Fig. 6. Qualitative results with different ranking methods based on Fashion-MNIST dataset: from the top row to bottom row in each subfigure are the methods: MR [15], LNP [19], LRGA [20], AGR [22], FSR [42], EMR [18] and the proposed SGR.
Fig. 7. MAP-scope curves on the query set of different methods for the different feedback iterations: from the top row to the bottom row are Fashion-MNIST, COREL and CIFAR-100 datasets.
8
M. Zhao, J. Liu, Z. Zhang et al.
Knowledge-Based Systems xxx (xxxx) xxx
Fig. 8. MAP of the query set of different methods at scope 10, 20 and 50: from the top row to the bottom row are Fashion-MNIST, COREL and CIFAR-100 datasets.
4.4. Comparison with short-term and long-term user relevance feedback
theoretical analysis and extensive simulations in our work, we can draw the conclusions of the proposed SGR as follows: (1) we develop a robust sub-graph with fixed small size and is insensitive to the parameters. it is also homogeneous so that It is robust and nonparametric for representing the anchors lying in the intrinsic manifold; (2) the proposed SGR is able to handle large-scale image ranking task due to the reason that updating the sub-graph is much easy than the original graph of whole dataset. The problem for handling the new-coming datasets can be naturally solved; (3) we develop a strategy by utilizing longterm user relevance feedbacks to update the weights on graph so that the image ranking performance can be enhanced. Extensive simulations verify the effectiveness of the proposed methods. While our work can perform well on some general image retrieval task, our future work can lie in the methodology and specific application aspects: (1) the proposed work mainly focuses on single image retrieval task and aim to develop a dynamic graph for characterizing more discriminative information. However, in many real-world applications, the dataset tends to be collected from different measurement views or represented with diverse features. Therefore, how to fuse different retrieval sets given multiple retrieval methods so that the performance of image retrieval accuracies can be enhanced, is another challenge in image retrieval task [43]. An existing work has firstly handled multiple retrieval task and has developed a graph-based query specific fusion approach where multiple retrieval sets are merged and reranked by conducting a
We in this subsection compare the performance of short-term and long-term user relevance feedback for enhancing the ranking results. Here, the short-term use of relevance feedback for the enhancement is to adopt the query extension strategy [40], where the newly added relevance feedback data combined with the original query data will form a new query data. While the longterm use of relevance feedback is to utilize the proposed strategy as in Section 3.3 to update the weight matrix of graph to enhance the ranking performance in a long term. Fig. 9 compares the ranking results of two strategies after four iterations of user relevance feedback based on Fashion-MNIST datasets. From simulation results, we can see that the long-term use of relevance feedback in Section 3.4 can achieve slightly better performance than those of short-term use of relevance feedback by query extension. This can be reasonable as the longterm use of relevance feedback as in Section 3.C can actually update the graph weights of SGR, so that the adjacency between two data points sharing similar ranking scores will become closer, which is good to enhance the performance of ranking results in future. 5. Conclusion In this paper, we propose an effective and robust sub-graphbased image ranking method for scalable CBIR. Following the 9
M. Zhao, J. Liu, Z. Zhang et al.
Knowledge-Based Systems xxx (xxxx) xxx
Fig. 9. Ranking results of short-term and long-term use relevance feedback for enhancing the performance of image ranking: Fashion-MNIST datasets (a) short-term use of relevance feedback; (b) long-term use of relevance feedback.
where On ∈ R n×n and Oq ∈ R q×q are two zero matrixes. Accordingly, the graph Laplacian matrix is L B = DB − W B∑ , where DB is a l+u+q diagonal matrix with each element being (DB )jj = (W B )ij . i=1 Obviously, DB can be rewritten as:
link analysis on a fused graph. Our future work can follow the work in [a] to extend current single retrieval task to multiple ones; (2) there are also many specific applications related to image retrieval. For example, person re-identification (RID) [44–47] is a very popular topic and specific application task for visual content-based retrieval, which has generally been utilized for multi-objective tracking (MOT), video surveillance et al. While the proposed work can be generally applied for unsupervised person Re-ID, where the person image database can be utilized to formulate a sub-graph to well characterize the geometrical structure of data manifold, and the relevant person image can be searched via it given certain query image. However, the person Re-ID task may confront some difficulties such as multi-cameras, et al. Our future work can lie in extend current work to handle person Re-ID task when considering solving the above challenges.
[ DB =
OTd
I Od
]
∆
.
(A.2)
where Od ∈ R q×n is a zero matrix. Since S1 = ∆1q and S T 1q = 1. −1/2 −1/2 We then normalized W B as P B = DB W B DB . By doing so, we can obtain the transition probability matrix in one step as: PB (1) bk |xj ≈ ∑q
(
)
−1/2
Skj
S ′ k ′ =1 k j
PB
( (1)
Skj
−1/2
S
)
= ∆kk
xj |bk ≈ ∑l+ukj
S ′ j′ =1 kj
= SkjT ∆kk
.
(A.3)
PB (1) xi |xj = 0, PB (1) (bh |bk ) = 0
(
)
∀i, j ∈ [1, n] , ∀h, k ∈ [1, q]
CRediT authorship contribution statement
For convenience, Eq. (A.3) can be rewritten based on a matrix formulation as:
Mingbo Zhao: Conceptualization, Methodology, Writing - original draft, Funding acquisition. Jiao Liu: Software, Writing - review & editing. Zhao Zhang: Validation, Investigation. Jicong Fan: Formal analysis, Visualization.
P B (B|X ) = ∆−1/2 S , P B (X |B) = S T ∆−1/2 (1)
(1) PB
(1)
(X |X ) =
(1) On P B
,
(B|B) = Oq
.
(A.4)
Following Eq. (A.4), we can then give the transition probability (2) (1) (1) matrix in two-time steps P B = P B P B as:
Declaration of competing interest
(2)
(2)
P B (B|X ) = Od , P B (X |B) = OTd
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
P B (X |X ) = S T ∆−1 S , P B (B|B) = ∆−1/2 SS T ∆−1/2 (2)
(2)
.
(A.5)
Based on Eq. (A.3) and (A.5), we can also give the transition (3) (2) (1) probability matrix in the three-time steps P B = P B P B as
Acknowledgments
P B (B|X ) = ∆−1/2 SS T ∆−1 S , (3)
P B (X |B) = S T ∆−1 SS T ∆−1/2
This work is partially supported by National Key Research and Development Program of China (2019YFC1521300), supported by National Natural Science Foundation of China (61971121, 61672365,61072151), supported by the Fundamental Research Funds for the Central Universities of China and DHU Distinguished Young Professor Program, supported by the Fundamental Research Funds for the Central Universities of China (JZ2019HGPA0102), and also supported by Anhui Provincial Natural Science Fund for Distinguished Young Scholars (2008085J30).
(3)
(3) PB
(X |X ) =
(3) On P B
,
(4)
(B|B) = Oq
(4) PB
and four-time steps
(A.6)
(1) = P (3) B P B as:
(4)
P B (B|X ) = Od , P B (X |B) = OTd
.
(4)
P B (X |X ) = S T ∆−1 S T S ∆−1 S T , (4) PB
(B|B) = ∆
−1/2
T
−1/2
SS ∆
∆
−1/2
T
SS ∆
(A.7)
−1/2
As a result, we have:
Appendix A. Theoretical derivation of graph construction
( )p−1 −1/2 ∆ S, (B|X ) = ∆−1/2 SS T ∆−1/2 ( T −1 )p−1 T −1/2 (2p−1) PB S ∆ (X |B) = S ∆ S (2p−1)
PB The graph construction in (7) can be theoretically derived by a probabilistic means, which can be further capture their relationship based on a bipartite graph, where the node set is formed by each data point xj and anchors bk , and the weight matrix W B ∈ R (n+q)×(n+q) for measuring the similarity between xj and bk is given as:
( WB =
On S
ST Oq
)
.
(2p−1) PB
(X |X ) =
(2p−1) On P B
,
(A.8)
(B|B) = Oq
and (2p)
PB
(2p) PB (2p) PB
(A.1) 10
(2p)
(B|X ) = Od , P B (X |B) = OTd ( )p (X |X ) = S T ∆−1 S , ( )p (B|B) = ∆−1/2 SS T ∆−1/2
(A.9)
M. Zhao, J. Liu, Z. Zhang et al.
Knowledge-Based Systems xxx (xxxx) xxx
given any p ∈ [1, 2, . . . , (∞]. Following Eq. )p (A.9), the (h, k)th (2p) element in P B (B|B) = ∆−1/2 SS T ∆−1/2 is the probability of the hth anchor reaching the kth anchor at the 2p step before the stop of random walk. Therefore we can observe that the adjacency matrix W d = ∆−1/2 SS T ∆−1/2 defined in Eq. (7) has (2p) close relationship to the probabilistic measure P B (B|B), which proves the validity of the graph construction.
[2] R. Datta, D. Joshi, J. Li, J.Z. Wang, Image retrieval: Ideas, influences, and trends of the new age, ACM Comput. Surv. (CSUR) 40 (2) (2008) 5. [3] T. Ojala, M. Pietikäinen, D. Harwood, A comparative study of texture measures with classification based on featured distributions, Pattern Recognit. 29 (1) (1996) 51–59. [4] D.G. Lowe, Object recognition from local scale-invariant features, in: Computer Vision, 1999. the Proceedings of the Seventh IEEE International Conference on, Vol. 2, IEEE, 1999, pp. 1150–1157. [5] H. Qiang, Y. Wan, Z. Liu, L. Xiang, X. Meng, Discriminative deep asymmetric supervised hashing for cross-modal retrieval, Knowl.-Based Syst. (2020) 106188. [6] P. Hu, D. Peng, X. Wang, Y. Xiang, Multimodal adversarial network for cross-modal retrieval, Knowl.-Based Syst. 180 (2019) 38–50. [7] S. Unar, X. Wang, C. Wang, Y. Wang, A decisive content based image retrieval approach for feature fusion in visual and textual images, Knowl.-Based Syst. 179 (2019) 8–20. [8] H. Xu, C. Huang, D. Wang, Enhancing semantic image retrieval with limited labeled examples via deep learning, Knowl.-Based Syst. 163 (2019) 252–266. [9] M.K. Kundu, M. Chowdhury, S.R. Bulò, A graph-based relevance feedback mechanism in content-based image retrieval, Knowl.-Based Syst. 73 (2015) 254–264. [10] K.Q. Weinberger, L.K. Saul, Distance metric learning for large margin nearest neighbor classification, J. Mach. Learn. Res. 10 (Feb) (2009) 207–244. [11] J.-E. Lee, R. Jin, A.K. Jain, Rank-based distance metric learning: An application to image retrieval, in: Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, IEEE, 2008, pp. 1–8. [12] X. He, D. Cai, J. Han, Learning a maximum margin subspace for image retrieval, IEEE Trans. Knowl. Data Eng. 20 (2) (2008) 189–201. [13] W. Liu, J. Wang, R. Ji, Y.-G. Jiang, S.-F. Chang, Supervised hashing with kernels, in: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, 2012, pp. 2074–2081. [14] R. Xia, Y. Pan, H. Lai, C. Liu, S. Yan, Supervised hashing for image retrieval via image representation learning, in: AAAI, Vol. 1, 2014, p. 2. [15] D. Zhou, J. Weston, A. Gretton, O. Bousquet, B. Schölkopf, Ranking on data manifolds, in: NIPS, Vol. 3, 2003. [16] J. He, M. Li, H.-J. Zhang, H. Tong, C. Zhang, Manifold-ranking based image retrieval, in: Proceedings of the 12th Annual ACM International Conference on Multimedia, ACM, 2004, pp. 9–16. [17] B. Xu, J. Bu, C. Chen, D. Cai, X. He, W. Liu, J. Luo, Efficient manifold ranking for image retrieval, in: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2011, pp. 525–534. [18] B. Xu, J. Bu, C. Chen, C. Wang, D. Cai, X. He, Emr: A scalable graph-based ranking model for content-based image retrieval, IEEE Trans. Knowl. Data Eng. 27 (1) (2015) 102–114. [19] F. Wang, C. Zhang, H.C. Shen, J. Wang, Semi-supervised classification using linear neighborhood propagation, in: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, Vol. 1, IEEE, 2006, pp. 160–167. [20] Y. Yang, F. Nie, D. Xu, J. Luo, Y. Zhuang, Y. Pan, A multimedia retrieval framework based on semi-supervised ranking and relevance feedback, IEEE Trans. Pattern Anal. Mach. Intell. 34 (4) (2012) 723–742. [21] D. Zhou, O. Bousquet, T.N. Lal, J. Weston, B. Schölkopf, Learning with local and global consistency, in: NIPS, Vol. 16, 2003, pp. 321–328. [22] W. Liu, J. He, S.-F. Chang, Large graph construction for scalable semisupervised learning, in: Proceedings of the 27th International Conference on Machine Learning, ICML-10, 2010, pp. 679–686. [23] M. Wang, W. Fu, S. Hao, D. Tao, X. Wu, Scalable semi-supervised learning by efficient anchor graph regularization, IEEE Trans. Knowl. Data Eng. 28 (7) (2016) 1864–1877. [24] F. Nie, W. Zhu, X. Li, Unsupervised large graph embedding, in: Thirty-first AAAI Conference on Artificial Intelligence, 2017. [25] D. Cai, X. Chen, Large scale spectral clustering via landmark-based sparse representation, IEEE Trans. Cybern. 45 (8) (2014) 1669–1680. [26] J. Friedman, T. Hastie, R. Tibshirani, The Elements of Statistical Learning, in: Springer Series in Statistics, vol. 1, New York, NY, USA, 2001. [27] X. Zhu, Z. Ghahramani, J.D. Lafferty, Semi-supervised learning using gaussian fields and harmonic functions, in: ICML, 2003. [28] F. Wang, C. Zhang, Label propagation through linear neighborhoods, IEEE Trans. Knowl. Data Eng. 20 (1) (2008) 55–67. [29] Y. Yang, F. Nie, D. Xu, J. Luo, Y. Zhuang, Y. Pan, A multimedia retrieval framework based on semi-supervised ranking and relevance feedback, IEEE Trans. Pattern Anal. Mach. Intell. 34 (4) (2012) 723–742. [30] S. Xiang, F. Nie, C. Zhang, Semi-supervised classification via local spline regression, IEEE Trans. Pattern Anal. Mach. Intell. 32 (11) (2010) 2039–2053. [31] J. Von Neumann, Functional Operators: Measures and Integrals, Vol. 1, Princeton University Press, 1950. [32] W. Liu, S.-F. Chang, Robust multi-class transductive learning with graphs.
Appendix B. Iterative solution of nonparametric and robust graph construction In this Appendix, we will give a detailed approach for iteratively calculating the optimal solution of (9) and (10). It can be noted that the sub-problem of Eq. (10) can be easily solved by forming W with the corresponding elements of W 0 larger than zero. In order to solve the sub-problem of Eq. (9), we need to take the Lagrangian as follows: J (W ) = ∥W − W 0 ∥2F − t Tr (W )
(B.1)
( ) −µ (W 1 − 1) − µ W T 1 − 1
where t and µ ∈ R1×n are the Lagrangian parameters. By setting the derivatives of J (W ) w.r.t. W to zero, we have W = W 0 + t I + µT 1 + 1T µ.
(B.2)
To fulfill Wii = 0, ∀i ∈ [1, n], i.e. Tr (W ) = 0, we have: Tr W 0 + t I + µT 1 + 1T µ = 0
(
)
⇒ Tr (W 0 ) + n t + 2µ1T = 0 . ( ) ⇒ t = −Tr (W 0 ) − 2µ1T /n
(B.3)
By replacing t in Eq. (B.2) with Eq. (B.3) and considering the normalized constraint W 1 = 1, we have:
{
1T W 1 = n = 1T W 0 1 + n t + nµ1 + n1µT
W 1 = 1 = W 0 1 + t 1 + µT 1 T 1 + 1 µ1 ⎧ T ⎪ ⎨ n = 1 W 0 1 − Tr (W 0 ) + 2 (n − 1) µ1 ⇒ 1 = W 0 1 − Tr (W 0 ) 1/n − 2 (µ1) 1/n ⎪ ( ) ⎩ + nµT + 1µT 1 ⎧ n − 1T W 0 1 + Tr (W 0 ) ⎪ T ⎪ ⎨ µ1 = 1µ = 2 (n − 1) )( ( ⇒ ⎪ I Tr (W 0 ) ⎪ ⎩ µ = 1T − 1T W 0 + 1T − n
n
n−2 2n2 (n − 1)
T
)
11
(B.4) The final equation holds as
µ=
1 n
(
1T − 1T W 0 + 1T
Tr (W 0 )
)
n
( ) (n − 2) n − 1T W 0 1 + Tr (W 0 ) T −1 2n2 (n − 1) ( ) 1 Tr (W 0 ) = 1T − 1T W 0 + 1T n n ) ( n−2 1T Tr (W 0 ) 11T T T T T − 1 11 − 1 W 11 + 0 2n2 (n − 1) n ( )( ) I n−2 T T T Tr (W 0 ) = 1 − 1 W0 + 1 − 2 11T n n 2n (n − 1)
(B.5)
Finally, by replacing µ in (B.3) and (B.4) with (B.5), we can calculate the optimal W for updation. References [1] Y. Liu, D. Zhang, G. Lu, W.-Y. Ma, A survey of content-based image retrieval with high-level semantics, Pattern Recognit. 40 (1) (2007) 262–282. 11
M. Zhao, J. Liu, Z. Zhang et al.
Knowledge-Based Systems xxx (xxxx) xxx [41] W. Liu, J. Wang, S.-F. Chang, Robust and scalable graph-based semisupervised learning, Proc. IEEE 100 (9) (2012) 2624–2638. [42] A. Iscen, Y. Avrithis, G. Tolias, T. Furon, O. Chum, Fast spectral ranking for similarity search, in: IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7632–7641. [43] S. Zhang, M. Yang, T. Cour, K. Yu, D.N. Metaxas, Query specific fusion for image retrieval, in: European Conference on Computer Vision, Springer, 2012, pp. 660–673. [44] H. Fan, L. Zheng, C. Yan, Y. Yang, Unsupervised person re-identification: Clustering and fine-tuning, ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) 14 (4) (2018) 1–18. [45] H. Fan, Y. Yang, Person tube retrieval via language description, in: AAAI, 2020, pp. 10754–10761. [46] Y. Ding, H. Fan, M. Xu, Y. Yang, Adaptive exploration for unsupervised person re-identification, ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) 16 (1) (2020) 1–19. [47] R. Zhou, X. Chang, L. Shi, Y.-D. Shen, Y. Yang, F. Nie, Person reidentification via multi-feature fusion with adaptive graph learning, IEEE Trans. Neural Netw. Learn. Syst. 31 (5) (2019) 1592–1601.
[33] K.B. Petersen, M.S. Pedersen, The Matrix Cookbook, 2012, version 20121115, URL http://www2.imm.dtu.dk/pubdb/p.php?3274, Nov. [34] G. Tolias, R. Sicre, H. Jégou, Particular object retrieval with integral max-pooling of cnn activations, arXiv preprint arXiv:1511.05879. [35] P. Sprechmann, A.M. Bronstein, G. Sapiro, Learning efficient sparse and low rank models, IEEE Trans. Pattern Anal. Mach. Intell. 37 (9) (2015) 1821–1833. [36] M. Zhao, T.W. Chow, Z. Zhang, B. Li, Automatic image annotation via compact graph based semi-supervised learning, Knowl.-Based Syst. 76 (2015) 148–165. [37] H. Xiao, K. Rasul, R. Vollgraf, Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017, arXiv:cs.LG/1708.07747. [38] J.Z. Wang, Simplicity : Semantics-sensitive integrated matching for picture libraries approach, IEEE Trans. Pattern Anal. Mach. Intell. 23 (9) (2007) 947–963. [39] A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images, Tech. Rep., 2009. [40] H. Azizpour, A.S. Razavian, J. Sullivan, A. Maki, S. Carlsson, From generic to specific deep representations for visual recognition, in: IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 36–45.
12