Signal Processing 121 (2016) 139–152
Contents lists available at ScienceDirect
Signal Processing journal homepage: www.elsevier.com/locate/sigpro
Relevance and irrelevance graph based marginal Fisher analysis for image search reranking Zhong Ji a, Yanwei Pang a,n, Yuan Yuan b, Jing Pan c a
School of Electronic Information Engineering, Tianjin University, Tianjin 300072, PR China Center for Optical Imagery Analysis and Learning (OPTIMAL), State Key Laboratory of Transient Optics and Photonics, Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an 710119, PR China c School of Electronic Engineering, Tianjin University of Technology and Education, Tianjin 300222, PR China b
a r t i c l e i n f o
abstract
Article history: Received 1 July 2015 Received in revised form 18 October 2015 Accepted 13 November 2015 Available online 30 November 2015
Learning-to-rank techniques have shown promising results in the domain of image ranking recently, where dimensionality reduction is a critical step to overcome the “curse of dimensionality”. However, conventional dimensionality reduction approaches cannot guarantee the satisfying performance because the important ranking information is ignored. This paper presents a novel “Ranking Dimensionality Reduction” scheme specifically designed for learning-to-rank based image ranking, which aims at not only discovering the intrinsic structure of data but also keeping the ordinal information. Within this scheme, a new dimensionality reduction algorithm called Relevance Marginal Fisher Analysis (RMFA) is proposed. RMFA models the proposed pairwise constraints of relevance-link and irrelevance-link into the relevance graph and the irrelevance graph, and applies the graphs to build the objective function with the idea of Marginal Fisher Analysis (MFA). Further, a semi-supervised RMFA algorithm called Semi-RMFA is developed to offer a more general solution for the real-world application. Extensive experiments are carried on two popular, real-world image search reranking datasets. The promising results demonstrate the robustness and effectiveness of the proposed scheme and methods. & 2015 Elsevier B.V. All rights reserved.
Keywords: Multimedia information system Image ranking Dimensionality reduction Image search reranking Learning-to-rank Marginal Fisher Analysis
1. Introduction In multimedia retrieval community, ranking has become an active research field due to the rapid growth of image/video repositories, Internet and mobile Internet. The applications include Content-Based Image Retrieval (CBIR) [1–5], image search reranking [6–8], image annotation [9] and tag ranking [10]. The construction of a ranking model is one of the key issues in the above-mentioned applications. Thus, many learning algorithms have been proposed to tackle this problem, such as manifold-ranking based methods [11,12], regression based methods [13], direct-optimization based n
Corresponding author.
http://dx.doi.org/10.1016/j.sigpro.2015.11.010 0165-1684/& 2015 Elsevier B.V. All rights reserved.
methods [6] and learning-to-rank (LTR) based methods [9,14]. Specifically, LTR refers to machine learning techniques for building ranking models with combining features extracted from query-document pairs through discriminative training [15]. Recent years have witnessed significant efforts on research and development of LTR techniques on multimedia retrieval and ranking [7,14,16–18,41,42]. Yang et al. were among the first researchers [7]. They utilized two popular LTR approaches for image search reranking by learning the co-occurrence patterns between target semantics and features extracted from the initial search list. Based on LTR, Geng et al. [14] presented a ranking model with large margin structured output learning, in which both textual and visual information are
140
Z. Ji et al. / Signal Processing 121 (2016) 139–152
simultaneously leveraged in the ranking learning process. In [16–18], the authors investigated the approaches of applying LTR techniques to CBIR. For example, Hu et al. [17] developed three schemes for multiple-instance ranking based on Ranking SVM [19] (a popular LTR algorithm). Li et al. [18] designed a few new hand-craft visual features for LTR, and also discussed the adaptability of three kinds of LTR methods in CBIR. In addition, LTR techniques have also been used to identify the best search result list from several candidates [20] and automatic image annotation [9]. All these efforts have achieved pleasing performances and showed the effectiveness of applying LTR technique on multimedia ranking. However, although LTR has been demonstrated as a powerful tool for multimedia ranking, it is actually often confronted with the problem of high dimensionality of visual features. Features play a critical role in LTR and multimedia ranking [13,15,18,21]. Unfortunately, they are usually represented with high dimensionality in multimedia ranking, which not only brings heavy burdens on computation and memory storage, but also causes the well-known phenomenon of the “curse of dimensionality” in machine learning. This means that the generalization capability reduces as the dimensionality increases in the case of limited training samples. Dimensionality reduction is an effective way to handle the problems brought by the high dimensionality [5,22,23]. It aims at finding a mapping function to transform the original high-dimensional features into the intrinsic low-dimensional representation. The mapping function may be linear or nonlinear, explicit or implicit. However, conventional methods are generally designed for classification application, but not for LTR. Thus, existing image ranking studies with LTR usually employ conventional dimensionality reduction methods directly. In this way, the great differences between ranking and classification for feature dimensionality reduction are ignored. As reported in [11], ranking are intrinsically different from classification. For example, the ranking application requires not only recognizing whether the data samples belong to the same set or not, but also providing their ordinal information. Specifically, there are only two opposite states (“0”, “1” or “ 1”, “þ 1”) for the classification, while more than two states for ranking. For example,
images in LTR are usually labeled as “very relevant”, “relevant” and “irrelevant” with “2”, “1”, and “0”, as shown in Fig. 1. More importantly, the relationships between different relevance degrees are also more complicated than those between different class labels. Therefore, direct utilization of existing dimensionality reduction techniques to the LTR scheme cannot achieve satisfying performance. According to the above considerations, this paper presents a novel dimensionality reduction scheme for LTR based image ranking application, which is named “Ranking Dimensionality Reduction”. The basic idea of the scheme is shown in Fig. 2. Furthermore, inspired by the success of Marginal Fisher Analysis (MFA) method [5], two novel Ranking Dimensionality Reduction methods called Relevance Marginal Fisher Analysis (RMFA) and Semisupervised Relevance Marginal Fisher Analysis (SemiMFA) are developed on the basis of this scheme. It is worthwhile to highlight several aspects of the proposed methods:
Fig. 2. The proposed “Ranking Dimensionality Reduction” scheme in LTR application. The novelty of the proposed scheme is highlighted by the red arrow. Without the step illustrated with the red arrow, it is the conventional dimensionality reduction scheme used in LTR-based image ranking. (Refer to the color version for a better review). (For interpretation of the references to color in this figure, the reader is referred to the web version of this article).
Very Relevant
Relevant
Irrelevant Dolphin
Car
Kid
Fig. 1. Example images in the LTR application with different relevance degrees to Dolphin, Car and Kid, respectively. It can be observed that the content of the “very relevant” samples match well with their labels, the “relevant” samples are somehow redundant in content, while the “irrelevant” samples do not match their labels.
Z. Ji et al. / Signal Processing 121 (2016) 139–152
1) A novel “Ranking Dimensionality Reduction” scheme (illustrated in Fig. 2) is presented, which aims at discovering the intrinsic structure held by the dataset as well as keeping the ordinal information. With this general framework, a few new dimensionality reduction algorithms for ranking can be developed. 2) A novel Ranking Dimensionality Reduction algorithm called Relevance Marginal Fisher Analysis (RMFA) is proposed by modeling the proposed pairwise constraint information of relevance-link and irrelevance-link relevance information into the relevance graph and the irrelevance graph respectively, and applying the graphs to build the objective function with the idea of MFA. 3) To offer a more general solution for the real-world application, a new Semi-supervised Relevance Marginal Fisher Analysis (Semi-RMFA) method is developed based on RMFA algorithm, which employs both the labeled and unlabeled data. 4) Extensive experiments and comprehensive comparisons on large real-world image search reranking datasets show that the proposed methods are very competitive with state-of-the-art dimensionality reduction and image search reranking methods. The rest of the paper is organized as follows. Previous efforts on dimensionality reduction and feature learning in LTR are discussed in the following section. Marginal Fisher Analysis (MFA) algorithm is briefly reviewed in Section 3. Section 4 describes our proposed RMFA and Semi-RMFA methods in detail, followed by a description of the experimental setup and analysis in Section 5. Section 6 concludes the paper.
2. Related work There exists rich research on dimensionality reduction and LTR in recent years. However, as far as we know, there is few dimensionality reduction effort specially designed for LTR. Thus, this section first gives brief reviews on dimensionality reduction and feature learning in LTR, respectively.
141
means of preserving some kinds of topological relation, such as geodesic distance and neighborhood relation. For example, Yan et al. [26] proposed a novel graph embedding framework to unify a number of existing dimensionality reduction methods (e.g., PCA, LDA, LLE, LE, ISOMAP and LPP), in which the statistical and geometrical properties of the data are encoded as graph relationships. Recently, Lawrence [25] presented a new perspective on spectral dimensionality reduction algorithms based on maximum entropy, and provided a unified framework for manifold-based methods and generative-modeling-based methods (e.g., probabilistic PCA). In image ranking application, labeled data are often very time consuming and expensive to obtain, however, it is easier to get plenty of unlabeled data. Therefore, semisupervised methods [4,27–29] are more suitable to tackle this problem. Moreover, the significance of “relevant” and “irrelevant” samples is unequal, and their data structures are also different. For example, the irrelevant images scatter in the whole space while the relevant images are not. Therefore, special-designed dimensionality reduction methods are required to address this situation. For example, He et al. [22] presented a Maximum Margin Projection (MMP) algorithm for CBIR within the relevance feedback framework, which is able to discover the local manifold structure by maximizing the margin between positive and negative samples at each local neighborhood. Bian et al. [4] proposed a Biased Discriminative Euclidean Embedding (BDEE) method for CBIR, which parameterizes samples in the original high-dimensional space to discover the intrinsic coordinate of the image low-level visual features. Moreover, transfer learning dimensionality reduction methods are also employed. For example, Tian et al. [29] proposed a novel Local-Global Discriminative (LGD) dimensionality reduction algorithm for image search reranking, in which a submanifold was learned by transferring the local geometry and the discriminative information from the labeled images to the whole (global) image database. These approaches have been successfully applied to many standard datasets and generated satisfying results. 2.2. Feature learning in learning-to-rank
2.1. Dimensionality reduction Dimensionality reduction plays an important role in overcoming the crucial “curse of dimensionality” problems and reducing the heavy burden of storage and computation. It has been confirmed that discovering the potential intrinsic low-dimensional structures of the highdimensional data is an essential preprocessing step for many further data analysis processes such as pattern recognition, computer vision and multimedia retrieval [4,5,24–27]. Many dimensionality reduction methods have been proposed in recent decades. Two most popular algorithms are Fisher Discriminant Analysis (FDA) [40] and Principal Component Analysis (PCA) [24]. Since the year of 2000, many manifold learning algorithms have been developed [5,25,26]. The goal of manifold learning algorithms is to discover the intrinsic manifold structure of the data by
Learning-to-rank is a relatively new research area emerging in the last decade. It is a type of supervised or semi-supervised machine learning techniques with the purpose of automatically creating a ranking model from the training data. Many powerful methods have been developed and successfully applied to the real applications such as web search [15]. However, it is until recently that the feature learning problem in LTR has emerged as a crucial issue. One representative research is [30], in which a greedy algorithm was proposed to select a subset of features with maximum total importance scores and minimum total similarity scores. And the authors also pointed out that it is not a good choice to directly apply the feature selection techniques for classification to ranking. A more recent effort [31] proposed a novel feature selection method for LTR based on sparse SVMs. It solves a joint convex
142
Z. Ji et al. / Signal Processing 121 (2016) 139–152
optimization problem which minimizes the ranking errors and simultaneously conducting feature selection. Both feature selection and dimensionality reduction are the process of reducing the dimensionality of original features under consideration. Feature selection methods select a smaller subset from the original large set of features, while dimensionality reduction methods transform the original features from the high-dimensional space to a space of fewer dimensionalities. Although there have been some efforts on feature selection in LTR, there has few efforts on dimensionality reduction in LTR.
3. A review of MFA The proposed Ranking Dimensionality Reduction algorithms are inspired by Marginal Fisher Analysis (MFA) proposed in [5]. Therefore, we briefly describe the main idea of MFA in this section. Given an undirected weighted graph G ¼ ðV; E; SÞ, each sample xi A ℝD ; ð1 r ir NÞ represents a node v A V, where N is the number of samples and D is the feature dimensionality. Edges e A E belong to V V, and S A ℝNN is a similarity matrix assigning values to each edge. The corresponding diagonal matrix D and the Laplacian matrix L P of a graph G are defined as L ¼ D S, Dii ¼ j a i Sij ; 8 i: Graph embedding algorithm [26] aims at determining a low-dimensional representation Y ¼ ½y1 ; y2 ; :::; yN of the sample set X ¼ ½x1 ; x2 ; :::; xN while maintaining similarities among node pairs, where the column vector yi is the embedding for the vertex xi . According to the graph preserving criterion [26], a linearized formulation of graph embedding can be rewritten as: W ¼
T
min T
trðW XBX WÞ ¼ a
trðWT XLXT WÞ ¼ argmin W
WT XLXT W WT XBXT W
;
distances between each sample and its neighbors within the same class, which is expressed as: X X Sw¼ ‖WT xi WT xj ‖2 i
i A Nk1 ðjÞ or j A Nk1 ðiÞ
¼ 2trðWT XðD SÞXT WÞ; ( 1; if i A Nk1 ðjÞ or j A Nk1 ðiÞ Sij ¼ ; 0; else:
ð3Þ
where N k1 ðiÞ denotes the index set of the k1 nearest neighbors of sample xi in the same class. On the other hand, the between-class separability is characterized as the sum of distances between margin samples from different classes, which is expressed as: X X Sb¼ ‖WT xi WT xj ‖2 i
ði;jÞ A Ψ k2 ðci Þ or ði;jÞ A Ψ k2 ðcj Þ
¼ 2trðWT XðDb Sb ÞXT WÞ; ( 1; if ði; jÞ A Ψ k2 ðci Þ or ði; jÞ A Ψ k2 ðcj Þ Sbij ¼ ; 0; else
ð4Þ
where Ψ k2 ðcÞ is a set of data pairs that are the k2 nearest pairs among the set ði; jÞj lðxi Þ ¼ c; lðxj Þ a c , and lðxi Þ ¼ c refers the label of xi is c. With S w and S b , the projection matrix W can be obtained by:
Sw trðWT XðD SÞXT WÞ W ¼ argmin ¼ argmin : W S W trðW T XðDb Sb ÞXT WÞ b
ð5Þ
This ratio formulation generally is solved with generalized eigenvalue decomposition by transforming the objective function into the tractable ratio trace form. MFA algorithm is a special linearization of the graph embedding framework and achieves much better performance in face recognition and CBIR applications.
ð1Þ where B is the constraint matrix that may simply be a diagonal matrix used for scale normalization or may express more general constraints among vertices in a penalty graph, a is a constant, trðÞ is a trace of a matrix. Y ¼ WT X and W ¼ ½w1 ; w2 ; :::; wd is the projection matrix, which can be obtained by solving the generalized eigenvalue decomposition problem: XLXT w ¼ λXΒXT w:
ð2Þ
The two matrices S and B play the crucial role in the graph embedding approach, where S is used to construct an intrinsic graph, and B is used to construct a penalty graph. Different definitions of them may lead to different algorithms. Therefore, many popular dimensionality reduction algorithms can be interpreted in this framework [26]. Based on the graph embedding framework, Marginal Fisher Analysis (MFA) enlarges the distances between margin samples of different classes to separate the samples from different classes [5]. Specifically, an intrinsic graph is designed to characterize the within-class compactness, and a penalty graph is designed to characterize the between-class separability of different classes. The within-class compactness is represented as the sum of
4. The proposed RMFA and semi-RMFA algorithms This section first introduces some notations, and then presents a new dimensionality reduction algorithm, i.e., Relevance Marginal Fisher Analysis (RMFA), which not only finds a low-dimensional embedding of the data samples, but also keeps the ordinal information. Further, to employ the unlabeled data, a semi-supervised RMFA algorithm, named Semi-RMFA, is also developed. For each query, the top N returned images are collected. Let XL ¼ ½x1 ; :::; xl A ℝDl be a set of l labeled samples, and zi A f0; :::; r 1g denotes the corresponding relevance degree label, which means the relevant extent to the query. In addition to the labeled samples, let XU ¼ ½xl þ 1 ; :::; xN A ℝDðN lÞ be a set of ðN lÞ unlabeled samples. The aim of the dimensionality reduction algorithms is to find a transformation matrix W ¼ ½w1 ; :::; wd A ℝDd that maps X ¼ ½x1 ; :::; xN A ℝDN to the lowdimensional vectors Y ¼ ½y1 ; :::; yn A ℝdN ðd5DÞ. The transformation process can be implemented by Y ¼ WT X, whose one-dimensional case is yi ¼ wT xi . Without loss of generality, we consider only the case of r ¼ 3. Thus, the relevance labels of “2”, “1”, “0” stand for “very relevant”, “relevant” and “irrelevant”, whose corresponding groups
Z. Ji et al. / Signal Processing 121 (2016) 139–152
143
samples in the group Q , and also few between any two samples in group P and Q . On the contrary, there are visual similarities in the other 4 pairwise relationships. Therefore, the pairwise relationships of C Q Q and C OQ are denoted as the irrelevance-link constraints, and those of C OO , C OP , C PP and C PQ are named as the relevance-link constraints. 4.2. The objective function of RMFA
Fig. 3. Pairwise constraints of relevance-link and irrelevance-link, where C OO , C OP , C PP and C PQ are relevance-link constraints, C Q Q and C OQ are irrelevance-link constraints.
are labeled as O, P, and Q , respectively. Moreover, the number of labeled samples to each group is s, and the corresponding labeled groups are denoted as LO , LP and LQ , respectively. It is easy to know that l ¼ 3s. 4.1. The proposed pairwise constraints of relevance-link and irrelevance-link The idea of MFA can actually be regarded as the utilization of the domain knowledge of pairwise constraints, which has been widely used in machine learning domain [27]. Generally, the pairwise constraints include the mustlink (sample pairs belong to the same class) and cannotlink (sample pairs belong to different classes). The mustlink constraint leads to the construction of intrinsic graph in MFA, and cannot-link constraint leads to the construction of penalty graph in MFA. However, as mentioned before, the ranking is different from that in classification. In ranking, samples in different relevance degrees may still possess similar characteristics because they are related to the same query. Therefore, must-link and cannot-link constraints cannot be employed in ranking directly. To acclimatize these concepts to ranking application, new concepts of relevance-link constraint and irrelevance-link constraint are proposed in the following. The relevance-link constraint is a pairwise constraint of which the samples are relevant to each other. That is to say, they have similar visual content in image ranking applications. On the contrary, the irrelevance-link constraint is the one of which the samples are irrelevant to each other. Fig. 3 shows the pairwise constraints of relevance-link and irrelevance-link. And C IJ denotes the pairwise constraint, where I and J denote the groups that the samples belong to. For example, C OP means two samples are from groups O and Prespectively, and C Q Q represents the constraint that two samples belong to the same group Q . As can be seen, there are six pairwise relationships. Through extensive experimental observations, we find that in most cases there are few visual similarities among
Inspired by the ideas of intrinsic graph and penalty graph in MFA, this paper develops the concepts of relevance graph and irrelevance graph based on the domain knowledge of relevance-link and irrelevance-link pairwise constraints. The relevance graph characterizes the compactness of the relevance-link constraint, while the irrelevance graph characterizes the separability of the irrelevance-link constraint, both are computed for each query, as illustrated in Fig. 4. Since the relevance degrees are different, the link strengths are also different for different relevance-link constraints. For instance, the sample pairs in C OO have higher visual similarities than those in C OP , thus the link strengths in C OO is higher than those in C OP . Following the framework of graph embedding [26], the objective function of relevance graph is defined as minimizing the following expression: J R ðWÞ ¼
1 X ‖WT xi WT xj ‖2 SRij 2ðx ;x Þ A N i
j
R
¼ trðW XL ðD SR ÞXTL WÞ T
R
¼ trðWT XL LR XTL WÞ;
ð6Þ
where NR indicates the index set of relevance-link constraints that include C OO , C OP , C PP and C PQ , SR is a similarity matrix constructed to model the adjacency relationship of the data pairs of relevance-link constraints, SRij measures the similarity of xi and xj and DR is a diagonal matrix with P its element DRii ¼ j SRij , the Laplacian matrix LR is formed by LR ¼ DR SR . The construction of similarity matrix is an important step in graph embedding algorithms. Based on the observation that different relevance-link constraints have different link strengths, SRi;j is defined as: 8 > < 1; if ðxi ; xj Þ A C OO SRi;j ¼
t; > : 0;
if ðxi ; xj Þ A C OP or C PQ or C PP ;
ð7Þ
otherwise
samples of xi and xj are data pair, where ðxi ; xj Þ means ‖x x ‖2 t ¼ exp i 2σ2 j , ‖ U ‖ denotes L2 -norm operator and σ is the variance. It is known that 0 o t o1. The data similarity in group O is set to be “1” is due to that its samples are all “very relevant” to the query, thus they are highly similar to each other. The samples in C OP , C PQ and C PP have a certain similarities, thus their values are represented by t. However, the data in C OQ and C Q Q are irrelevant to each other, thus their values are set to be “0”.
144
Z. Ji et al. / Signal Processing 121 (2016) 139–152
O 1
O
t
P
1
t
t
Q
1
P
Relevance graph
Q
Irrelevance graph
Fig. 4. Relevance graph and irrelevance graph, where the link-strengths of C OO , C Q Q and C OQ are set to be 1, those of C OP , C PP and C PQ are set to be t (0 o t o 1). Note only one edge is given for each constraint for greater clarity.
Meanwhile, the objective function of irrelevance graph is defined as maximizing the following expression: J I ðWÞ ¼
1 X ‖WT xi WT xj ‖2 SIij 2ðx ;x Þ A N i
j
I
¼ trðWT XL ðDI SI ÞXTL WÞ ¼ trðW
T
XL LI XTL WÞ
;
XL LI XTL wi ¼ λi XL LR XTL wi ; ð8Þ
where NI indicates the index set of irrelevance-link constraints that include C OQ and C Q Q , SI is a similarity matrix constructed to model the adjacency relationship of the data pairs of irrelevance-link constraints and DI is a diagP onal matrix with its element DIii ¼ j SIij , the Laplacian I I matrix LI is formed by LI ¼ D S . The elements of matrix SI are defined as: ( 1; if ðxi ; xj Þ A C OQ or C QQ : ð9Þ SIij ¼ 0; otherwise Finally, the objective function of the RMFA algorithm is expressed in the following: W ¼ arg max W
J I ðWÞ trðWT XL LI XTL WÞ ¼ arg max : J R ðWÞ trðWT XL LR XTL WÞ W
graph, the edges are assigned to “0” and “1” according to Eq. (9). 3) Eigen-problem: Compute the eigenvectors with respect to the non-zero eigenvalues for the generalized eigenvector problem:
ð10Þ
4.3. The algorithm procedure of RMFA The algorithm procedure of the proposed RMFA algorithm is stated below: 1) PCA projection: Similar to [26], to avoid singular problem and reduce noise disturbance, Principle Component Analysis (PCA) is first adopted to project X into a subspace by throwing away the smallest principal components to maintain 99% of the energy. For convenience, we still use xi to denote the data samples in the PCA subspace in the following steps. 2) Adjacency graph construction: Two graphs with l nodes are constructed. Their edges are determined by the groups (LO , LP , LQ ) to which their connecting node pairs belong. In relevance graph, the edges are assigned to “0”, “1”, and “t” according to Eq. (7). In irrelevance
ð11Þ
where wi is the generalized eigenvector and λi is the corresponding eigenvalue. To guarantee the nonsingularity of the matrix XL LR XTL , we apply the idea of regularization [32] by adding some constant values to the diagonal elements of XL LR XTL , as XL LR XTL þ αE, where E is an identity matrix and α 40. 4) Graph embedding: Let the column vectors w1 ; :::; wd , ordered according to the first d largest eigenvalues, be the solutions to Eq. (11). Thus, the embedding can be expressed as: X-Y ¼ WT X;
WT ¼ ½w1 ; :::; wd ;
i ¼ ½1; :::; d;
ð12Þ
where yi is a d-dimensional vector and W is a D d transformation matrix. From the above solution process, it can be seen that its computational complexity mainly contains two parts, which are for computing transformation matrix W and embedding matrix Y. For computing W, it requires the step of the Singular Value Decomposition (SVD), whose complexity is OðD3 Þ from Eq. (11). For generating Y, it can be seen that the complexity is OðdDNÞ from Eq. (12). Therefore, the computational complexity of RMFA scales as OðD3 þ dDNÞ. Moreover, the storage complexity is also related to W and Y. It requires 32Dd bits for W and 32dN bits for Y respectively, whose sum is 32dðD þ NÞ bits. 4.4. Semi-supervised Relevance Marginal Fisher Analysis (Semi-RMFA) In image ranking and many other practical applications, the labeled training samples are fairly expensive to obtain. Consequently, the phenomenon of high dimensionality D versus the low number of labeled samples l happens, which cannot guarantee the generalization capability of the machine learning algorithms, thus overfitting may
Z. Ji et al. / Signal Processing 121 (2016) 139–152
occur. Fortunately, the unlabeled ones are readily available. Therefore, in order to use unlabeled data to achieve more satisfactory results, a semi-supervised version of RMFA is proposed, which incorporates both labeled and unlabeled data samples into learning procedure. The key to most of semi-supervised learning algorithm is the consistency assumption, which means nearby samples tend to have the same label or similar embedding. In most of semi-supervised dimensionality reduction algorithms, labeled samples are used to provide discriminant information, and unlabeled samples together with labeled ones are used to preserve the intrinsic geometric structure. A typical way to incorporate the information of unlabeled samples is to impose a regularizer [33]. Thus, the semisupervised version of Eq. (10) is written as follows: arg max W
trðWT XL LI XTL WÞ
; trðWT XL LR XTL W þ αJðWÞÞ
JðWÞ ¼
1 2
Irrelevant graph JI(W)
Global graph J(W)
Objective function of Semi-RMFA: Eq.(18) arg max
W
W
J I (W) J R (W) J ( W)
‖WT xi WT xj ‖2 Sij
X L X w
XLX w
X L X
Graph embedding X
Y
W X
W
w
w
Fig. 5. Flowchart of Semi-RMFA algorithm.
i;j
¼ trðWT XLXT WÞ;
Sij ¼
Relevant graph JR(W)
Unlabeled data
Eigenvectors computing
¼ trðWT XðD SÞXT WÞ 8 <
Labeled data
ð13Þ
where the regularizer JðwÞ controls the learning complexity of the hypothesis family and the coefficient α controls the balance between the model complexity and the empirical loss. To preserve the intrinsic manifold structure of the whole dataset, the objective function of the global graph in the LPP algorithm [23] is used as the regularizer term JðwÞ: n X
145
e
‖xi xj ‖2 2σ
: 0;
;
if
xi A Nk ðxj Þ or xj A N k ðxi Þ ;
ð14Þ
and d5D, the computational complexity of Semi-RMFA is OðDðN2 þ D2 ÞÞ approximately.
ð15Þ
5. Experimental results
otherwise
where S is a similarity matrix modeling the adjacency relationship over the whole dataset, N k ðxj Þ denotes the set of k nearest neighbors of xj , and D is a diagonal matrix P with its element Dii ¼ j Sij , the Laplacian matrix L is formed by L ¼ D S. Finally, with this data dependent regularizer, the objective function of Semi-RMFA algorithm is expressed as: W ¼ arg max W
¼ arg max W
J I ðWÞ J R ðWÞ þ αJðWÞ trðWT XL LI XTL WÞ T
trðW XL LR XTL W þ αWT XLXT WÞ
:
ð16Þ
The algorithm procedure of Semi-RMFA is similar to that of RMFA algorithm, and its flowchart of Semi-RMFA algorithm is shown in Fig. 5. The computational complexity and storage complexity of Semi-RMFA is similar to that of RMFA. Besides, there is one more computational complexity of k nearest neighbor search since it adopts LPP as the regularizer, as shown in Eq. (14). The complexity of computing the distances between any two samples is OðDN2 Þ. The complexity of finding the k nearest neighbors for all the samples is OðkN 2 Þ. Thus, the computational complexities of k nearest neighbor search and Semi-RMFA are OððD þkÞN 2 Þ and OððD þkÞN 2 þD3 þdDNÞ. Since k5D
This section demonstrates the effectiveness of the proposed RMFA and Semi-RMFA algorithms with a typical image ranking application, i.e., image search reranking [6– 8,14,34,35]. Image search reranking aims at refining search performance by employing image or video visual information to reorder the initial text-based search results. It is a new paradigm followed by CBIR [5,22] and multimedia annotation [9] in the domain of multimedia content analysis and retrieval. A comprehensive survey of the literature can be found in [36]. Followed by the flowchart of Fig. 2, new image search reranking approaches with RMFA and Semi-RMFA are proposed respectively, which proceed according to the following steps. First, given a query, the commercial textbased image search engine (e.g., Microsoft Bing Image) returns its search results. Second, the original highdimensional image features of these initial search results are extracted to represent their visual contents. Then, some images are chosen to be labeled with the relevance degrees. Next, the labeled data or all the data are exploited in the proposed RMFA or Semi-RMFA algorithms to map the visual features into the intrinsically low-dimensional space. It should be noted that the graphs in both RMFA and Semi-RMFA are constructed for each query. Finally, a LTR algorithm (e.g., Ranking SVM [19]) is employed to use the labeled data as training data to build a
146
Z. Ji et al. / Signal Processing 121 (2016) 139–152
Table 1 Examples and number of queries of each category in MSRA-MM 2.0 image dataset. Category
Number of queries
Examples
Animal Cartoon Event NamedPerson Object PeopleRelated Scene TIME08 Misc
100 92 78 40 295 68 48 88 288
Bee, Deer, Eagle Avatar, Diddl, Snoopy Cycling, Diving, Fishing Picasso, Albert Einstein Bed, Jeep, Pluto Girl, Kid, Fairy, Queen City, Sea, Tornados Barack Obama, Paul Allen Amazon, Happy, Oops
Table 2 Parameter details of each dimensionality reduction methods. Methods
Reduced dimensionality
Weight (α)
Using PCA as a processing step?
Number of neighbor in graph (k)
PCA LPP FDA MFA SELF MMP RMFA Semi-RMFA
150 150 2 20 5 150 10 20
N/AN/A N/A N/A 0.5 0.5 N/A 1
N/A Yes Yes Yes No Yes Yes Yes
N/A 5 N/A 5 7 5 5 5
ranking function, which reorders all samples with the reduced low-dimensional visual features. Thus, a new reordered list is obtained. In the following, the datasets and methodologies are first introduced, and then the effectiveness of the proposed algorithms is demonstrated from extensive experiments and comprehensive comparisons. 5.1. Experimental settings The proposed RMFA and Semi-RMFA algorithms are validated on the popular and publicly available MSRA-MM 1.0 image dataset [37] and MSRA-MM 2.0 image dataset [38], respectively. Both datasets explored the query log of Microsoft Bing Image Search and selected a set of representative ones. For each query, about top 1000 images are collected. Specifically, MSRA-MM 1.0 dataset consists of 68 popular queries, which cover a wide variety of categories, including objects, people, event, entertainments, and location. There are totally 65 443 images in this dataset. MSRA-MM 2.0 image dataset is an advanced version of MSRA-MM image 1.0 dataset, where 1097 frequently used queries are added. These queries are manually classified into 9 categories, i.e., “Animal”, “Cartoon”, “Event”, “NamedPerson”, “Object”, “PeopleRelated”, “Scene”, “Misc”, and “TIME08”. The total image number is around one million, which makes it one of the largest datasets in image ranking domain. Table 1 shows the number of queries for each category and some example images are shown in Fig. 1. In both datasets, each image to the corresponding query was manually assigned with a relevance degree:
“irrelevant,” “relevant,” or “very relevant.” The three levels are indicated by scores 0, 1 and 2, respectively. More importantly, the datasets provide the original ranking information of the text-based search engine, with which the proposed methods are able to be evaluated against. Moreover, the available features are adopted to make the results reproducible and comparable. They are seven global features, including: (1) block-wise color moment, (2) HSV color histogram, (3) RGB color histogram, (4) color correlogram, (5) edge distribution histogram, (6) wavelet texture and (7) face features. The overall dimensionality of these features is 899. NDCG (Normalized Discounted Cumulative Gain) is a commonly adopted metric for evaluating a search engine's performance, especially when there are more than two relevance degrees [39]. Therefore, it is adopted to evaluate the ranking performance. Given a query q, the NDCG score at the depth p in the ranked documents is defined by: NDCG@p ¼ Z p
Xp
j
2r 1 ; j ¼ 1 logð1 þ jÞ
ð17Þ
where r j is the rating of the jth document, Z p is a normalization constant and is chosen so that a perfect ranking's NDCG@p value is 1. For MSRA-MM 1.0 image dataset, the final performance is obtained by averaging NDCG from 68 queries. As for MSRA-MM 2.0 image dataset, the final performance is obtained by averaging NDCG from each category. It should be noted that the training annotated images are included in the reported NDCG. Several popular feature dimensionality reduction methods are used for comparison, which include (1)
0.8747 0.010 0.7787 0.007 0.7397 0.006 0.7137 0.006 0.6977 0.005 0.685 7 0.005 0.6797 0.004 0.6757 0.004 0.6727 0.004 0.6747 0.004 0.7037 0.014 0.693 7 0.005 0.6767 0.008 0.662 7 0.008 0.654 7 0.008 0.6487 0.007 0.6447 0.007 0.642 7 0.008 0.6417 0.007 0.645 7 0.006 0.6687 0.025 0.6667 0.013 0.6577 0.011 0.6487 0.013 0.640 7 0.012 0.636 7 0.011 0.6337 0.011 0.6337 0.009 0.6337 0.008 0.6357 0.008 0.6477 0.028 0.653 7 0.014 0.650 7 0.014 0.642 7 0.014 0.6357 0.013 0.6317 0.011 0.629 7 0.010 0.630 7 0.010 0.630 7 0.010 0.6337 0.009 0.583 0.574 0.564 0.556 0.553 0.549 0.544 0.541 0.539 0.537 10 20 30 40 50 60 70 80 90 100
0.6707 0.026 0.665 7 0.011 0.659 7 0.012 0.6497 0.013 0.6417 0.012 0.636 7 0.012 0.634 7 0.010 0.634 7 0.009 0.6337 0.008 0.636 7 0.008
0.667 70.025 0.663 70.013 0.655 70.013 0.646 70.014 0.639 70.013 0.634 70.013 0.632 70.011 0.632 70.010 0.632 70.009 0.634 70.009
0.696 7 0.015 0.6777 0.008 0.665 7 0.009 0.652 7 0.010 0.645 7 0.012 0.6417 0.010 0.640 7 0.007 0.640 7 0.007 0.640 7 0.006 0.643 7 0.006
0.806 70.008 0.739 70.005 0.708 70.008 0.684 70.008 0.669 70.008 0.661 70.006 0.656 70.006 0.654 70.005 0.653 70.004 0.654 70.004
0.804 7 0.007 0.7407 0.005 0.7077 0.008 0.686 7 0.008 0.6737 0.008 0.6647 0.006 0.6577 0.006 0.656 7 0.004 0.654 7 0.003 0.655 7 0.004
Semi-RMFA RMFA LPP MMP SELF MFA FDA PCA Baseline Text NDCG@
Table 3 Performance comparison with different dimensionality reduction approaches. (The NDCG results contain two parts, The first number is mean value and the second number is standard deviation for all the methods except “Text”. For “Text”, there is only mean result. The boldface refers to the highest performance.)
Z. Ji et al. / Signal Processing 121 (2016) 139–152
147
Table 4 Performance gains (%) against “Baseline” at the depths of 10, 50 and 100 respectively. Methods
NDCG@10
NDCG@50
NDCG@100
PCA LPP FDA MFA SELF MMP RMFA Semi-RMFA
3.43 20.0 0.45 0.30 3.88 20.30 4.93 30.45
0.94 5.0 0.31 0.16 0.62 4.37 2.03 8.74
0.47 2.99 0.31 0.16 1.1 2.83 1.42 5.97
unsupervised algorithms: PCA [24], LPP [23], (2) supervised algorithms: FDA [40], MFA [5] and (3) semisupervised algorithms: SELF [28], MMP [22]. Since no ranking information is used in conventional dimensionality reduction algorithms, both the supervised and semisupervised algorithms regard the three relevance groups as three categories. Table 2 shows the parameter details of each algorithm. Specifically, “Text” refers to the performance of the original text-based search performance, and “Baseline” refers to the performance with the original provided 899D features. In both RMFA and Semi-RMFA algorithms, we set s¼ 5. Because the irrelevant ones are much easier to obtain than relevant ones for a given query, it is reasonable to pick up them automatically by randomly sampling those not associated with the textual query. Thus only the “very relevant” and “relevant” images are required to be labeled. For the sake of convenience, we set the same labeled number s for them. In addition, both the coefficient α in Semi-RMFA and the penalization parameter in the Ranking SVM model are set to be 1. As for the reduced dimensionalities in each dimensionality reduction approach, they are tuned from 5, 10, 15 to 200 and the optimal values are obtained on the average performance of 68 queries in MSRA-MM 1.0 dataset at NDCG@10. As for FDA, its reduced dimensionality is determined by τ 1, where τ is the number of categories [40]. Since FDA views the three relevance degrees as three classes, so its dimensionality is 2. And then these parameters are fixed in the processing of all queries in MSRA-MM 2.0. We adopt the algorithm of Ranking SVM [19] for ranking in all the experiments, and repeat them for six times and report the average and standard deviation results. The impaction of the important parameters is discussed in the following section. 5.2. Experiments on MSRA-MM 1.0 dataset This sub-section demonstrates the excellent performance of the proposed RMFA and Semi-RMFA algorithms on MSRA-MM 1.0 dataset. Table 3 illustrates the NDCG results at the depth¼{10, 20, 30, 40, 50, 60, 70, 80, 90, 100} with different dimensionality reduction approaches. In addition, the performance gains against “Baseline” are shown in Table 4, which take the depth at 10, 50 and 100 for examples. From Tables 3 and 4, we can observe that: (1) All dimensionality reduction algorithms together with
148
Z. Ji et al. / Signal Processing 121 (2016) 139–152
Table 5 t-test results at the 5% significance level for Semi-RMFA versus the other methods and “1” indicates significant improvement, “0” indicates no significant difference. Semi-RMFA versus:
NDCG@10
NDCG@50
NDCG@100
Text baseline PCA LPP FDA MFA SELF MMP RMFA
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 0 1 1 1 0 1
1.9
CPU Time (s)
1.7 1.5 1.3 1.1 0.9 0.7 0.5
Fig. 6. Average reranking time of each method for a query.
“Baseline” outperform “Text” significantly, which validates the usefulness of visual features in image search reranking, (2) The performances of PCA, FDA and MFA are inferior to “Baseline”, while those of SELF, MMP, LPP and the proposed RMFA, Semi-RMFA are superior to “Baseline”. It can be observed that the algorithms superior to “Baseline” are all based on the assumption of local manifold structure. Therefore, it can be concluded that manifold-based dimensionality reduction algorithms are more helpful than others to image ranking application. This is because relevant images lie on a manifold in intrinsic visual feature space. However, it does not mean that all manifold-based dimensionality reduction algorithms are effective (e.g., MFA), thus elaborate design is necessary, (3) RMFA has 20.58% and 4.93% performance gains against Text and Baseline at NDCG@10, which indicates the effectiveness of relevance graph and irrelevance graph. Moreover, the performance of RMFA is similar to SELF while is inferior to LPP, MMP and Semi-RMFA, which denotes that the unlabeled data is required to discover the intrinsic embedding structure, (4) Semi-RMFA consistently outperforms the other methods at all depths. Take the NDCG@10 for example, Semi-RMFA is better than PCA, FDA, MFA, SELF, RMFA, MMP, LPP by 35.05%, 31.10%, 30.76%, 24.56%, 25.20%, 8.46% and 8.65% respectively and (5) The small standard deviation shown in Table 3 demonstrates the robustness of the proposed methods. Moreover, t-test at the 5% significance level is employed to test the statistical significance of Semi-RMFA versus the
other algorithms, whose results are shown in Table 5. It can be seen that Semi-RMFA is significantly superior to other algorithms, especially at the depths of 10 and 50. Fig. 6 shows the average search reranking time of each method for a query running on a desktop computer of 2.53 GHz CPU and 4 G RAM. It can be observed that LPP and MMP method spend the most, and MFA, FDA and RMFA spend the least. This is because LPP and MMP use all the images and have higher reduced dimensionalities while MFA, FDA and RMFA only use the labeled images and have lower reduced dimensionalities. The time of SemiRMFA is between them, which is about 1.5 s. This is an acceptable latency time for reranking application. Fig. 7 illustrates the impact of labeled number s from 2, 5, 8 to 11. It can be observed that more labels generally brings higher improvement in performance. Experiments are also carried out under different dimensions to further disclose the relationship between the dimensionality and the ranking performance, as is shown in Fig. 8. From the figure, it can be discovered that the best dimensionalities for RMFA and SemiRMFA are 10 and 20 respectively, and that higher dimensionality only brings slight performance change. 5.3. Experiments on MSRA-MM 2.0 dataset This section shows the excellent performance of the proposed methods on MSRA-MM 2.0 dataset. Fig. 9 illustrates the NDCG results for each category at depths of 10, 50 and 100 respectively. It can be observed that SemiRMFA consistently outperforms the others for each category at all depths, which effectively shows the superiority of Semi-RMFA. Moreover, similar to that in MSRA-MM 1.0 dataset, the performance of RMFA is also similar to SELF and inferior to LPP, MMP and Semi-RMFA, which further denotes that the unlabeled data is helpful to discover the intrinsic embedding structure. Fig. 10 shows the average performance of all 9 categories for each dimensionality reduction methods including “Text” and “Baseline” at depths of 10, 50 and 100. It can be seen that the performances of all the methods degrade with the increase of depth, which is a quite reasonable phenomenon in information retrieval domain. It can also be observed that Semi-RMFA performs the best, followed by MMP, LPP, RMFA and SELF and MFA, PCA and FDA give worse outcomes. The
Z. Ji et al. / Signal Processing 121 (2016) 139–152
149
Fig. 7. Performance comparison of different labeled number s. The dashed lines denote the performances of RMFA, and the solid lines denote those of Semi-RMFA.
0.900
0.850
NDCG
0.800
d=10(Semi-RMFA)
d=20(Semi-RMFA)
d=30(Semi-RMFA)
d=40(Semi-RMFA)
d=5(RMFA)
d=10(RMFA)
d=15(RMFA)
d=20(RMFA)
0.750
0.700
0.650
0.600 10
20
30
40
50
60
70
80
90
100
Depth Fig. 8. Performance comparison in different reduced dimensionality d. The dimensionality is 5, 10, 15 and 20 for RMF and 10, 20, 30 and 40 for Semi-RMFA.
stable performance in such large a dataset demonstrates the robustness of our proposed methods. 5.4. Comparison with state-of-the-art image search reranking methods To further prove the effectiveness of the proposed scheme in Fig. 2 and the proposed dimensionality reduction algorithms, the following state-of-the-art image search reranking methods are used for comparison: 1) Bayesian reranking [35]: a novel image search reranking method based on Bayesian framework. It maximizes the ranking score consistency among visually similar data samples while minimizes the ranking distance, which
represents the disagreement between the objective ranking list and the initial text-based. 2) Context reranking [34]: a typical graph-based image search reranking method which formulates the reranking process as a random walk over a context graph, where images are nodes and the edges between them are weighted by multimodal similarities. 3) Multimodal graph-based reranking [6]: a recently proposed novel graph-based image search reranking method exploring multiple modalities. Seven graphs are built with the seven global features provided by the datasets (899-D) and the results are fused to get the final reranking result. The approach simultaneously learns relevance degrees, weights of modalities, and the distance metric with its scaling for each modality.
150
Z. Ji et al. / Signal Processing 121 (2016) 139–152
1
Text MFA
Baseline SELF
PCA MMP
LPP RMFA
0.950
FDA
NDCG@10
NDCG@50
NDCG@100
0.900
0.95
0.850
NDCG
0.9 0.85 0.8
0.800 0.750
0.75 0.700
0.7 0.65
0.650
0.6
Fig. 10. Average performance in MSRA-MM 2.0 image dataset for all methods at NDCG@10, NDCG@50 and NDCG@100.
NDCG@10
1 0.95
NDCG@50
0.9 0.85 0.8 0.75 0.7 0.65 0.6
Category
NDCG@50
Text MFA
Baseline SELF
PCA MMP
LPP RGE
FDA Semi-RGE
1 0.95 NDCG@100
0.9 0.85 0.8 0.75 0.7 0.65 0.6
Category
NDCG@100
Fig. 9. Performance comparison for each category at (a) NDCG@10, (b) NDCG@50, (c) NDCG@100 respectively.
The methods mentioned above are denoted as “Bayesian”, “Context” and “Multimodal” respectively. Again, “Text” refers to the performance of initial text-based search results, and “Baseline” represents the performance of directly using the original provided 899D features. Table 6 illustrates the average NDCG@100 measurements obtained by different methods for each category of queries. Here “MM-1.0” represents MSRA_MM 1.0 image dataset. It can be seen that all the reranking algorithms improve the
original search results, i.e., “Text”. Specifically, Multimodal method performs the best in categories of “Animal”, “Cartoon”, “Object”, “Scene”, and “TIME08”, while the proposed Semi-RMFA method performs the best in the other 5 categories, i.e., “MM-1.0”, “Event”, “NamedPerson”, “PeopleRelated” and “Misc”. Similar observations are found with the NDCG@10 and NDCG@50 measurements. Moreover, the last row in Table 6 shows the significance test result of t-test for Semi-RMFA versus the others. It can be seen that the performance of Semi-RMFA has significant superiority to the other methods except for the Multimodal method. The reason of this superiority lies in the facts that LTR is better at mining ranking information and the proposed RMFA and Semi-RMFA methods are better at discovering the intrinsic structure of visual features as well as keeping the ordinal information. Since ranking model and feature are two of the greatest factors in image ranking application, it is no hard to see the outperformance of the proposed Ranking Dimensionality Reduction scheme and RMFA/Semi-RMFA methods. Although the performance of Semi-RMFA is only slightly better than Multimodal method, its computational complexity and storage complexity is much smaller. It can be analyzed that the computational complexity of Multimodal method is mainly dominated by two parts: one is the k nearest neighbor search and the other is the multigraph fusion. The computational complexity of the k nearest neighbor search is OððD þk1 k2 ÞN2 Þ since it constructs k2 graphs with each modality, where k1 is the size of neighborhood and k2 is the number of modalities. The computational complexity of the multi-graph fusion is ! k2 P 2 D2k2 þ T 3 k2 , where T 1 , T 2 , T 3 are iteration O T 1 T 2 Nk1 i¼1
times. According to the parameters set in [6], the overall computational complexity of Multimodal method is one order larger than that of Semi-RMFA method. And the storage complexity of Multimodal method is 32k2 N 2 bit, which is about N times larger than 32dðD þNÞ of SemiRMFA.
6. Conclusion and future work This paper has presented a novel “Ranking Dimensionality Reduction” scheme by incorporating ranking
Z. Ji et al. / Signal Processing 121 (2016) 139–152
151
Table 6 Performance comparison for each category of different algorithms at NDCG@100. (The boldface refers to the highest performance for each category. The last row is the significance test result of t-test for Semi-RMFA versus the others, where “1” indicates significance improvement, “0” indicates no significance difference).
MM-1.0 Animal Cartoon Event Named person Object People related Scene TIME08 Misc Mean t-test result
Text
Baseline
Bayesian
Context
Multimodal
RMFA
Semi-RMFA
0.537 0.734 0.807 0.788 0.908 0.703 0.714 0.702 0.830 0.736 0.747 1
0.636 0.737 0.855 0.797 0.919 0.729 0.722 0.737 0.771 0.785 0.769 1
0.542 0.775 0.859 0.779 0.916 0.723 0.703 0.766 0.844 0.760 0.767 1
0.541 0.759 0.828 0.788 0.905 0.717 0.71 0.752 0.854 0.753 0.7607 1
0.568 0.791 0.865 0.811 0.940 0.745 0.742 0.792 0.870 0.790 0.791 0
0.643 0.734 0.850 0.798 0.934 0.728 0.735 0.728 0.778 0.783 0.771 1
0.674 0.754 0.858 0.812 0.954 0.737 0.755 0.759 0.823 0.804 0.793 N/A
information into the conventional dimensionality reduction methods. The scheme is designed specially for LTRbased image ranking application, and aims at discovering the intrinsic structure of data and keeping the ordinal information. Based on this scheme and MFA, Relevant Marginal Fisher Analysis (RMFA) algorithm and its semisupervised version (Semi-RMFA) are developed by constructing the relevance graph and irrelevance graph with the pairwise constraint information of relevance-link and irrelevance-link. A comprehensive set of experiments on image search reranking has been performed on the large, real-world datasets: MSRA-MM 1.0 image dataset and MSRA-MM 2.0 image dataset. These comparative studies clearly demonstrate that the proposed algorithms not only outperform the conventional classification-oriented dimensionality reduction algorithms, but also achieve superior performance against state-of-the-art image search reranking methods. The proposed Ranking Dimensionality Reduction scheme is general enough to be applied to other multimedia ranking domains such as personal recommendation, and CBIR. Moreover, we plan to introduce the ranking information to other popular dimensionality reduction algorithms.
Acknowledgments This work was supported in part by the National Basic Research Program of China (973 Program) (Grant no. 2014CB340400), the National Natural Science Foundation of China (Grant nos. 61271325, 61472273, 61172121, 61271412 and 61222109), the Elite Scholar Program of Tianjin University (No. 2015XRG-0014), and the Excellent Young Scholar of the Tianjin University of Technology and Education (Grant no. RC14-46).
References [1] Q. Jia, X. Tian, Query difficulty estimation via relevance prediction for image retrieval, Signal Process. 110 (2015) 232–243.
[2] C. Jin, S. Jin, Automatic image annotation using feature selection based on improving quantum particle swarm optimization, Signal Process. 109 (2015) 172–181. [3] M. Jian, K. Lam, Face-image retrieval based on singular values and potential-field representation, Signal Process. 100 (2014) 9–15. [4] W. Bian, D. Tao, Biased discriminant Euclidean embedding for content-based image retrieval, IEEE Trans. Image Process, 19, (2) pp. 545–554. [5] D. Xu, S. Yan, D. Tao, S. Lin, H. Zhang, Marginal Fisher Analysis and its variants for human gait recognition and content based image retrieval, IEEE Trans. Image Process. 16 (11) (2007) 2811–2821. [6] M. Wang, H. Li, D. Tao, K. Lu, X. Wu, Multimodal graph-based reranking for web image search, IEEE Trans. Image Process. 21 (11) (2012) 4649–4661. [7] Y. Yang, W. Hsu, H. Chen, Online reranking via ordinal informative concepts for context fusion in concept detection and video search, IEEE Trans. Circuits Syst. Video Technol. 19 (12) (2009) 1880–1890. [8] J. Yu, Y. Rui, B. Chen, Exploiting click constraints and multi-view features for image re-ranking, IEEE Trans. Multimedia16 1 (2014) 159–168. [9] J. Weston, S. Bengio, N. Usunier, Large scale image annotation: learning to rank with joint word-image embeddings, Mach. Learn. 81 (1) (2010) 21–35. [10] D. Liu, X. Hua, L. Yang, M. Wang, H. Zhang, Tag ranking, in: Proceedings of the WWW, 2009, pp. 351 360. [11] Z. Pan, X. You, H. Chen, D. Tao, B. Pang, Generalization performance of magnitude-preserving semi-supervised ranking with graph-based regularization, Inf. Sci. 221 (2013) 284–296. [12] B. Xu, J. Bu, C. Chen, D. Cai, X. He, W. Liu, J. Luo, Efficient manifold ranking for image retrieval, in: Proceedings of the ACM SIGIR, 2011, pp. 525–534. [13] Y. Liu, Y. Liu, S. Zhong, K. Chan, Semi-supervised manifold ordinal regression for image ranking, in: Proceedings of the ACM MM, 2011, pp. 1393–1396. [14] B. Geng, L. Yang, C. Xu, X. Hua, Content-aware ranking for visual search, in: Proceedings of the IEEE CVPR, 2010, pp. 3400–3407. [15] T. Liu, Learning to Rank for, Information Retrieval, Springer Press, Berlin, 2011. [16] F. Faria, A. Veloso, H. Almeida, E. Valle, R. Torres, M. Goncalves, Learning to rank for content-based image retrieval, in: Proceedings of the ACM MIR, 2010, pp. 285–294. [17] Y. Hu, M. Li, N. Yu, Multiple-instance ranking: learning to rank image for image retrieval, in: Proceedings of the IEEE CVPR, 2008, pp. 1–8. [18] Y. Li, C. Zhou, B. Geng, C. Xu, H. Liu, A comprehensive study on learning to rank for content-based image retrieval, Signal Process. 93 (2013) 1426–1434. [19] T. Joachims, Optimizing search engines using clickthrough data, in: Proceedings of the ACM SIGKDD, 2002, pp. 133–142. [20] X. Tian, Y. Lu, L. Yang, Q. Tian, Learning to judge image search results, in: Proceedings of the ACM MM, 2011, pp. 363–372. [21] C. Li, Q. Liu, J. Liu, H. Lu, Ordinal regularized manifold feature extraction for image ranking, Signal Process. 93 (2013) 1651–1661. [22] X. He, D. Cai, J. Han, Learning a maximum margin subspace for image retrieval, IEEE Trans. Knowl. Data Eng. 20 (2) (2008) 189–201. [23] X. He, S. Yan, Y. Hu, P. Niyogi, H. Zhang, Face recognition using Laplacianfaces, IEEE Trans. Pattern Anal. Mach. Intell. 27 (3) (2005) 328–340.
152
Z. Ji et al. / Signal Processing 121 (2016) 139–152
[24] I. Jolliffe, Principal Component Analysis, Springer-Verlag, New York, 1986. [25] N. Lawrence, Spectral dimensionality reduction via maximum entropy, in: Proceedings of the AISTATS, 2011, pp. 51–59. [26] S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, S. Lin, Graph embedding and extensions: a general framework for dimensionality reduction, IEEE Trans. Pattern Anal. Mach. Intell. 29 (1) (2007) 40–51. [27] D. Zhang, Z. Zhou, S. Chen, Semi-supervised dimensionality reduction, in: Proceedings of the SIAM ICDM, 2007, pp. 629–634. [28] M. Sugiyama, T. Ide, S. Nakajima, J. Sese, Semi-supervised local Fisher Discriminant Analysis for dimensionality reduction, Mach. Learn. 78 (1–2) (2010) 35–61. [29] X. Tian, D. Tao, X. Hua, X. Wu, Active reranking for web image search, IEEE Trans. Image Process. 19 (3) (2010) 805–820. [30] X. Geng, T. Liu, T. Qin, H. Li, Feature selection for ranking, in: Proceedings of the 30th Annu. Int. ACM SIGIR, 2007, pp. 407–414. [31] H. Lai, Y. Pan, Y. Tang, R. Yu, FSMRank: feature selection algorithm for learning to rank, IEEE Trans. Neural Netw. Learn. Syst. 24 (6) (2013) 940–952. [32] J. Friedman, Regularized discriminant analysis, J. Am. Stat. Assoc. 84 (405) (1989) 165–175. [33] D. Cai, X. He, J. Han, Semi-supervised Discriminant Analysis, in: Proceedings of the ICCV 2007, pp. 1–7. [34] W. Hsu, L. Kennedy, S. Chang, Video search reranking through random walk over document-level context graph, in: Proceedings of the ACM MM, 2007, pp. 971–980.
[35] X. Tian, L. Yang, J. Wang, Y. Yang, X. Wu, X. Hua, Bayesian video search reranking, in: A.C.M. Proc. (Ed.), MM, 2008, pp. 131–140. [36] T. Mei, Y. Rui, S. Li, Q. Tian, Multimedia search reranking: a literature survey, ACM Comput. Surv. 46 (3) (2014) 1–38. [37] M. Wang, L. Yang, X. Hua, MSRA-MM: bridging research and industrial societies for multimedia information retrieval, Microsoft Technical Report, Beijing, MSR-TR-2009-2030, 2009. [38] H. Li, M. Wang, X. Hua, MSRA-MM 2.0: a large-scale web multimedia dataset, in: Proceedings of the IEEE ICDM Workshops, 2009, pp. 164–169. [39] K. Järvelin, J. Kekäläinen, IR evaluation methods for retrieving highly relevant documents, in: Proceedings of the ACM SIGIR, 2000, pp. 41–48. [40] R. Fisher, The use of multiple measurements in taxonomic problems, Ann. Eugen. 7 (2) (1936) 179–188. [41] Y. Pang, Z. Song, X. Li, J. Pan, Truncation error analysis on reconstruction of signal from unsymmetrical local average sampling, IEEE Trans. Cybern. 45 (10) (2015) 2100–2104. [42] Z. Ji, Y. Pang, Y. He, H. Zhang, L.P.P. Semi-supervised, algorithms for learning-to-rank-based visual search reranking, Inf. Sci. 302 (2015) 83–93.