Relevance and irrelevance graph based marginal Fisher analysis for image search reranking

Signal Processing 121 (2016) 139–152 Contents lists available at ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro R...

Download PDF

2MB Sizes 0 Downloads 56 Views

Report

PDF Reader
Full Text

Signal Processing 121 (2016) 139–152

Contents lists available at ScienceDirect

Signal Processing journal homepage: www.elsevier.com/locate/sigpro

Relevance and irrelevance graph based marginal Fisher analysis for image search reranking Zhong Ji a, Yanwei Pang a,n, Yuan Yuan b, Jing Pan c a

School of Electronic Information Engineering, Tianjin University, Tianjin 300072, PR China Center for Optical Imagery Analysis and Learning (OPTIMAL), State Key Laboratory of Transient Optics and Photonics, Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an 710119, PR China c School of Electronic Engineering, Tianjin University of Technology and Education, Tianjin 300222, PR China b

a r t i c l e i n f o

abstract

Article history: Received 1 July 2015 Received in revised form 18 October 2015 Accepted 13 November 2015 Available online 30 November 2015

Learning-to-rank techniques have shown promising results in the domain of image ranking recently, where dimensionality reduction is a critical step to overcome the “curse of dimensionality”. However, conventional dimensionality reduction approaches cannot guarantee the satisfying performance because the important ranking information is ignored. This paper presents a novel “Ranking Dimensionality Reduction” scheme speciﬁcally designed for learning-to-rank based image ranking, which aims at not only discovering the intrinsic structure of data but also keeping the ordinal information. Within this scheme, a new dimensionality reduction algorithm called Relevance Marginal Fisher Analysis (RMFA) is proposed. RMFA models the proposed pairwise constraints of relevance-link and irrelevance-link into the relevance graph and the irrelevance graph, and applies the graphs to build the objective function with the idea of Marginal Fisher Analysis (MFA). Further, a semi-supervised RMFA algorithm called Semi-RMFA is developed to offer a more general solution for the real-world application. Extensive experiments are carried on two popular, real-world image search reranking datasets. The promising results demonstrate the robustness and effectiveness of the proposed scheme and methods. & 2015 Elsevier B.V. All rights reserved.

Keywords: Multimedia information system Image ranking Dimensionality reduction Image search reranking Learning-to-rank Marginal Fisher Analysis

1. Introduction In multimedia retrieval community, ranking has become an active research ﬁeld due to the rapid growth of image/video repositories, Internet and mobile Internet. The applications include Content-Based Image Retrieval (CBIR) [1–5], image search reranking [6–8], image annotation [9] and tag ranking [10]. The construction of a ranking model is one of the key issues in the above-mentioned applications. Thus, many learning algorithms have been proposed to tackle this problem, such as manifold-ranking based methods [11,12], regression based methods [13], direct-optimization based n

Corresponding author.

http://dx.doi.org/10.1016/j.sigpro.2015.11.010 0165-1684/& 2015 Elsevier B.V. All rights reserved.

methods [6] and learning-to-rank (LTR) based methods [9,14]. Speciﬁcally, LTR refers to machine learning techniques for building ranking models with combining features extracted from query-document pairs through discriminative training [15]. Recent years have witnessed signiﬁcant efforts on research and development of LTR techniques on multimedia retrieval and ranking [7,14,16–18,41,42]. Yang et al. were among the ﬁrst researchers [7]. They utilized two popular LTR approaches for image search reranking by learning the co-occurrence patterns between target semantics and features extracted from the initial search list. Based on LTR, Geng et al. [14] presented a ranking model with large margin structured output learning, in which both textual and visual information are

140

Z. Ji et al. / Signal Processing 121 (2016) 139–152

simultaneously leveraged in the ranking learning process. In [16–18], the authors investigated the approaches of applying LTR techniques to CBIR. For example, Hu et al. [17] developed three schemes for multiple-instance ranking based on Ranking SVM [19] (a popular LTR algorithm). Li et al. [18] designed a few new hand-craft visual features for LTR, and also discussed the adaptability of three kinds of LTR methods in CBIR. In addition, LTR techniques have also been used to identify the best search result list from several candidates [20] and automatic image annotation [9]. All these efforts have achieved pleasing performances and showed the effectiveness of applying LTR technique on multimedia ranking. However, although LTR has been demonstrated as a powerful tool for multimedia ranking, it is actually often confronted with the problem of high dimensionality of visual features. Features play a critical role in LTR and multimedia ranking [13,15,18,21]. Unfortunately, they are usually represented with high dimensionality in multimedia ranking, which not only brings heavy burdens on computation and memory storage, but also causes the well-known phenomenon of the “curse of dimensionality” in machine learning. This means that the generalization capability reduces as the dimensionality increases in the case of limited training samples. Dimensionality reduction is an effective way to handle the problems brought by the high dimensionality [5,22,23]. It aims at ﬁnding a mapping function to transform the original high-dimensional features into the intrinsic low-dimensional representation. The mapping function may be linear or nonlinear, explicit or implicit. However, conventional methods are generally designed for classiﬁcation application, but not for LTR. Thus, existing image ranking studies with LTR usually employ conventional dimensionality reduction methods directly. In this way, the great differences between ranking and classiﬁcation for feature dimensionality reduction are ignored. As reported in [11], ranking are intrinsically different from classiﬁcation. For example, the ranking application requires not only recognizing whether the data samples belong to the same set or not, but also providing their ordinal information. Speciﬁcally, there are only two opposite states (“0”, “1” or “ 1”, “þ 1”) for the classiﬁcation, while more than two states for ranking. For example,

images in LTR are usually labeled as “very relevant”, “relevant” and “irrelevant” with “2”, “1”, and “0”, as shown in Fig. 1. More importantly, the relationships between different relevance degrees are also more complicated than those between different class labels. Therefore, direct utilization of existing dimensionality reduction techniques to the LTR scheme cannot achieve satisfying performance. According to the above considerations, this paper presents a novel dimensionality reduction scheme for LTR based image ranking application, which is named “Ranking Dimensionality Reduction”. The basic idea of the scheme is shown in Fig. 2. Furthermore, inspired by the success of Marginal Fisher Analysis (MFA) method [5], two novel Ranking Dimensionality Reduction methods called Relevance Marginal Fisher Analysis (RMFA) and Semisupervised Relevance Marginal Fisher Analysis (SemiMFA) are developed on the basis of this scheme. It is worthwhile to highlight several aspects of the proposed methods:

Fig. 2. The proposed “Ranking Dimensionality Reduction” scheme in LTR application. The novelty of the proposed scheme is highlighted by the red arrow. Without the step illustrated with the red arrow, it is the conventional dimensionality reduction scheme used in LTR-based image ranking. (Refer to the color version for a better review). (For interpretation of the references to color in this ﬁgure, the reader is referred to the web version of this article).

Very Relevant

Relevant

Irrelevant Dolphin

Car

Kid

Fig. 1. Example images in the LTR application with different relevance degrees to Dolphin, Car and Kid, respectively. It can be observed that the content of the “very relevant” samples match well with their labels, the “relevant” samples are somehow redundant in content, while the “irrelevant” samples do not match their labels.

Z. Ji et al. / Signal Processing 121 (2016) 139–152

1) A novel “Ranking Dimensionality Reduction” scheme (illustrated in Fig. 2) is presented, which aims at discovering the intrinsic structure held by the dataset as well as keeping the ordinal information. With this general framework, a few new dimensionality reduction algorithms for ranking can be developed. 2) A novel Ranking Dimensionality Reduction algorithm called Relevance Marginal Fisher Analysis (RMFA) is proposed by modeling the proposed pairwise constraint information of relevance-link and irrelevance-link relevance information into the relevance graph and the irrelevance graph respectively, and applying the graphs to build the objective function with the idea of MFA. 3) To offer a more general solution for the real-world application, a new Semi-supervised Relevance Marginal Fisher Analysis (Semi-RMFA) method is developed based on RMFA algorithm, which employs both the labeled and unlabeled data. 4) Extensive experiments and comprehensive comparisons on large real-world image search reranking datasets show that the proposed methods are very competitive with state-of-the-art dimensionality reduction and image search reranking methods. The rest of the paper is organized as follows. Previous efforts on dimensionality reduction and feature learning in LTR are discussed in the following section. Marginal Fisher Analysis (MFA) algorithm is brieﬂy reviewed in Section 3. Section 4 describes our proposed RMFA and Semi-RMFA methods in detail, followed by a description of the experimental setup and analysis in Section 5. Section 6 concludes the paper.

2. Related work There exists rich research on dimensionality reduction and LTR in recent years. However, as far as we know, there is few dimensionality reduction effort specially designed for LTR. Thus, this section ﬁrst gives brief reviews on dimensionality reduction and feature learning in LTR, respectively.

141

means of preserving some kinds of topological relation, such as geodesic distance and neighborhood relation. For example, Yan et al. [26] proposed a novel graph embedding framework to unify a number of existing dimensionality reduction methods (e.g., PCA, LDA, LLE, LE, ISOMAP and LPP), in which the statistical and geometrical properties of the data are encoded as graph relationships. Recently, Lawrence [25] presented a new perspective on spectral dimensionality reduction algorithms based on maximum entropy, and provided a uniﬁed framework for manifold-based methods and generative-modeling-based methods (e.g., probabilistic PCA). In image ranking application, labeled data are often very time consuming and expensive to obtain, however, it is easier to get plenty of unlabeled data. Therefore, semisupervised methods [4,27–29] are more suitable to tackle this problem. Moreover, the signiﬁcance of “relevant” and “irrelevant” samples is unequal, and their data structures are also different. For example, the irrelevant images scatter in the whole space while the relevant images are not. Therefore, special-designed dimensionality reduction methods are required to address this situation. For example, He et al. [22] presented a Maximum Margin Projection (MMP) algorithm for CBIR within the relevance feedback framework, which is able to discover the local manifold structure by maximizing the margin between positive and negative samples at each local neighborhood. Bian et al. [4] proposed a Biased Discriminative Euclidean Embedding (BDEE) method for CBIR, which parameterizes samples in the original high-dimensional space to discover the intrinsic coordinate of the image low-level visual features. Moreover, transfer learning dimensionality reduction methods are also employed. For example, Tian et al. [29] proposed a novel Local-Global Discriminative (LGD) dimensionality reduction algorithm for image search reranking, in which a submanifold was learned by transferring the local geometry and the discriminative information from the labeled images to the whole (global) image database. These approaches have been successfully applied to many standard datasets and generated satisfying results. 2.2. Feature learning in learning-to-rank

2.1. Dimensionality reduction Dimensionality reduction plays an important role in overcoming the crucial “curse of dimensionality” problems and reducing the heavy burden of storage and computation. It has been conﬁrmed that discovering the potential intrinsic low-dimensional structures of the highdimensional data is an essential preprocessing step for many further data analysis processes such as pattern recognition, computer vision and multimedia retrieval [4,5,24–27]. Many dimensionality reduction methods have been proposed in recent decades. Two most popular algorithms are Fisher Discriminant Analysis (FDA) [40] and Principal Component Analysis (PCA) [24]. Since the year of 2000, many manifold learning algorithms have been developed [5,25,26]. The goal of manifold learning algorithms is to discover the intrinsic manifold structure of the data by

Learning-to-rank is a relatively new research area emerging in the last decade. It is a type of supervised or semi-supervised machine learning techniques with the purpose of automatically creating a ranking model from the training data. Many powerful methods have been developed and successfully applied to the real applications such as web search [15]. However, it is until recently that the feature learning problem in LTR has emerged as a crucial issue. One representative research is [30], in which a greedy algorithm was proposed to select a subset of features with maximum total importance scores and minimum total similarity scores. And the authors also pointed out that it is not a good choice to directly apply the feature selection techniques for classiﬁcation to ranking. A more recent effort [31] proposed a novel feature selection method for LTR based on sparse SVMs. It solves a joint convex

142

Z. Ji et al. / Signal Processing 121 (2016) 139–152

optimization problem which minimizes the ranking errors and simultaneously conducting feature selection. Both feature selection and dimensionality reduction are the process of reducing the dimensionality of original features under consideration. Feature selection methods select a smaller subset from the original large set of features, while dimensionality reduction methods transform the original features from the high-dimensional space to a space of fewer dimensionalities. Although there have been some efforts on feature selection in LTR, there has few efforts on dimensionality reduction in LTR.

3. A review of MFA The proposed Ranking Dimensionality Reduction algorithms are inspired by Marginal Fisher Analysis (MFA) proposed in [5]. Therefore, we brieﬂy describe the main idea of MFA in this section. Given an undirected weighted graph G ¼ ðV; E; SÞ, each sample xi A ℝD ; ð1 r ir NÞ represents a node v A V, where N is the number of samples and D is the feature dimensionality. Edges e A E belong to V V, and S A ℝNN is a similarity matrix assigning values to each edge. The corresponding diagonal matrix D and the Laplacian matrix L P of a graph G are deﬁned as L ¼ D S, Dii ¼ j a i Sij ; 8 i: Graph embedding algorithm [26] aims at determining a low-dimensional representation Y ¼ ½y1 ; y2 ; :::; yN of the sample set X ¼ ½x1 ; x2 ; :::; xN while maintaining similarities among node pairs, where the column vector yi is the embedding for the vertex xi . According to the graph preserving criterion [26], a linearized formulation of graph embedding can be rewritten as: W ¼

T

min T

trðW XBX WÞ ¼ a

trðWT XLXT WÞ ¼ argmin W

WT XLXT W WT XBXT W

;

distances between each sample and its neighbors within the same class, which is expressed as: X X Sw¼ ‖WT xi WT xj ‖2 i

i A Nk1 ðjÞ or j A Nk1 ðiÞ

¼ 2trðWT XðD SÞXT WÞ; ( 1; if i A Nk1 ðjÞ or j A Nk1 ðiÞ Sij ¼ ; 0; else:

ð3Þ

where N k1 ðiÞ denotes the index set of the k1 nearest neighbors of sample xi in the same class. On the other hand, the between-class separability is characterized as the sum of distances between margin samples from different classes, which is expressed as: X X Sb¼ ‖WT xi WT xj ‖2 i

ði;jÞ A Ψ k2 ðci Þ or ði;jÞ A Ψ k2 ðcj Þ

¼ 2trðWT XðDb Sb ÞXT WÞ; ( 1; if ði; jÞ A Ψ k2 ðci Þ or ði; jÞ A Ψ k2 ðcj Þ Sbij ¼ ; 0; else

ð4Þ

where Ψ k2 ðcÞ is a set of data pairs that are the k2 nearest pairs among the set ði; jÞj lðxi Þ ¼ c; lðxj Þ a c , and lðxi Þ ¼ c refers the label of xi is c. With S w and S b , the projection matrix W can be obtained by:

Sw trðWT XðD SÞXT WÞ W ¼ argmin ¼ argmin : W S W trðW T XðDb Sb ÞXT WÞ b

ð5Þ

This ratio formulation generally is solved with generalized eigenvalue decomposition by transforming the objective function into the tractable ratio trace form. MFA algorithm is a special linearization of the graph embedding framework and achieves much better performance in face recognition and CBIR applications.

ð1Þ where B is the constraint matrix that may simply be a diagonal matrix used for scale normalization or may express more general constraints among vertices in a penalty graph, a is a constant, trðÞ is a trace of a matrix. Y ¼ WT X and W ¼ ½w1 ; w2 ; :::; wd is the projection matrix, which can be obtained by solving the generalized eigenvalue decomposition problem: XLXT w ¼ λXΒXT w:

ð2Þ

The two matrices S and B play the crucial role in the graph embedding approach, where S is used to construct an intrinsic graph, and B is used to construct a penalty graph. Different deﬁnitions of them may lead to different algorithms. Therefore, many popular dimensionality reduction algorithms can be interpreted in this framework [26]. Based on the graph embedding framework, Marginal Fisher Analysis (MFA) enlarges the distances between margin samples of different classes to separate the samples from different classes [5]. Speciﬁcally, an intrinsic graph is designed to characterize the within-class compactness, and a penalty graph is designed to characterize the between-class separability of different classes. The within-class compactness is represented as the sum of

4. The proposed RMFA and semi-RMFA algorithms This section ﬁrst introduces some notations, and then presents a new dimensionality reduction algorithm, i.e., Relevance Marginal Fisher Analysis (RMFA), which not only ﬁnds a low-dimensional embedding of the data samples, but also keeps the ordinal information. Further, to employ the unlabeled data, a semi-supervised RMFA algorithm, named Semi-RMFA, is also developed. For each query, the top N returned images are collected. Let XL ¼ ½x1 ; :::; xl A ℝDl be a set of l labeled samples, and zi A f0; :::; r 1g denotes the corresponding relevance degree label, which means the relevant extent to the query. In addition to the labeled samples, let XU ¼ ½xl þ 1 ; :::; xN A ℝDðN lÞ be a set of ðN lÞ unlabeled samples. The aim of the dimensionality reduction algorithms is to ﬁnd a transformation matrix W ¼ ½w1 ; :::; wd A ℝDd that maps X ¼ ½x1 ; :::; xN A ℝDN to the lowdimensional vectors Y ¼ ½y1 ; :::; yn A ℝdN ðd5DÞ. The transformation process can be implemented by Y ¼ WT X, whose one-dimensional case is yi ¼ wT xi . Without loss of generality, we consider only the case of r ¼ 3. Thus, the relevance labels of “2”, “1”, “0” stand for “very relevant”, “relevant” and “irrelevant”, whose corresponding groups

Z. Ji et al. / Signal Processing 121 (2016) 139–152

143

samples in the group Q , and also few between any two samples in group P and Q . On the contrary, there are visual similarities in the other 4 pairwise relationships. Therefore, the pairwise relationships of C Q Q and C OQ are denoted as the irrelevance-link constraints, and those of C OO , C OP , C PP and C PQ are named as the relevance-link constraints. 4.2. The objective function of RMFA

Fig. 3. Pairwise constraints of relevance-link and irrelevance-link, where C OO , C OP , C PP and C PQ are relevance-link constraints, C Q Q and C OQ are irrelevance-link constraints.

are labeled as O, P, and Q , respectively. Moreover, the number of labeled samples to each group is s, and the corresponding labeled groups are denoted as LO , LP and LQ , respectively. It is easy to know that l ¼ 3s. 4.1. The proposed pairwise constraints of relevance-link and irrelevance-link The idea of MFA can actually be regarded as the utilization of the domain knowledge of pairwise constraints, which has been widely used in machine learning domain [27]. Generally, the pairwise constraints include the mustlink (sample pairs belong to the same class) and cannotlink (sample pairs belong to different classes). The mustlink constraint leads to the construction of intrinsic graph in MFA, and cannot-link constraint leads to the construction of penalty graph in MFA. However, as mentioned before, the ranking is different from that in classiﬁcation. In ranking, samples in different relevance degrees may still possess similar characteristics because they are related to the same query. Therefore, must-link and cannot-link constraints cannot be employed in ranking directly. To acclimatize these concepts to ranking application, new concepts of relevance-link constraint and irrelevance-link constraint are proposed in the following. The relevance-link constraint is a pairwise constraint of which the samples are relevant to each other. That is to say, they have similar visual content in image ranking applications. On the contrary, the irrelevance-link constraint is the one of which the samples are irrelevant to each other. Fig. 3 shows the pairwise constraints of relevance-link and irrelevance-link. And C IJ denotes the pairwise constraint, where I and J denote the groups that the samples belong to. For example, C OP means two samples are from groups O and Prespectively, and C Q Q represents the constraint that two samples belong to the same group Q . As can be seen, there are six pairwise relationships. Through extensive experimental observations, we ﬁnd that in most cases there are few visual similarities among

Inspired by the ideas of intrinsic graph and penalty graph in MFA, this paper develops the concepts of relevance graph and irrelevance graph based on the domain knowledge of relevance-link and irrelevance-link pairwise constraints. The relevance graph characterizes the compactness of the relevance-link constraint, while the irrelevance graph characterizes the separability of the irrelevance-link constraint, both are computed for each query, as illustrated in Fig. 4. Since the relevance degrees are different, the link strengths are also different for different relevance-link constraints. For instance, the sample pairs in C OO have higher visual similarities than those in C OP , thus the link strengths in C OO is higher than those in C OP . Following the framework of graph embedding [26], the objective function of relevance graph is deﬁned as minimizing the following expression: J R ðWÞ ¼

1 X ‖WT xi WT xj ‖2 SRij 2ðx ;x Þ A N i

j

R

¼ trðW XL ðD SR ÞXTL WÞ T

R

¼ trðWT XL LR XTL WÞ;

ð6Þ

where NR indicates the index set of relevance-link constraints that include C OO , C OP , C PP and C PQ , SR is a similarity matrix constructed to model the adjacency relationship of the data pairs of relevance-link constraints, SRij measures the similarity of xi and xj and DR is a diagonal matrix with P its element DRii ¼ j SRij , the Laplacian matrix LR is formed by LR ¼ DR SR . The construction of similarity matrix is an important step in graph embedding algorithms. Based on the observation that different relevance-link constraints have different link strengths, SRi;j is deﬁned as: 8 > < 1; if ðxi ; xj Þ A C OO SRi;j ¼

t; > : 0;

if ðxi ; xj Þ A C OP or C PQ or C PP ;

ð7Þ

otherwise

samples of xi and xj are data pair, where ðxi ; xj Þ means ‖x x ‖2 t ¼ exp i 2σ2 j , ‖ U ‖ denotes L2 -norm operator and σ is the variance. It is known that 0 o t o1. The data similarity in group O is set to be “1” is due to that its samples are all “very relevant” to the query, thus they are highly similar to each other. The samples in C OP , C PQ and C PP have a certain similarities, thus their values are represented by t. However, the data in C OQ and C Q Q are irrelevant to each other, thus their values are set to be “0”.

144

Z. Ji et al. / Signal Processing 121 (2016) 139–152

O 1

O

t

P

1

t

t

Q

1

P

Relevance graph

Q

Irrelevance graph

Fig. 4. Relevance graph and irrelevance graph, where the link-strengths of C OO , C Q Q and C OQ are set to be 1, those of C OP , C PP and C PQ are set to be t (0 o t o 1). Note only one edge is given for each constraint for greater clarity.

Meanwhile, the objective function of irrelevance graph is deﬁned as maximizing the following expression: J I ðWÞ ¼

1 X ‖WT xi WT xj ‖2 SIij 2ðx ;x Þ A N i

j

I

¼ trðWT XL ðDI SI ÞXTL WÞ ¼ trðW

T

XL LI XTL WÞ

;

XL LI XTL wi ¼ λi XL LR XTL wi ; ð8Þ

where NI indicates the index set of irrelevance-link constraints that include C OQ and C Q Q , SI is a similarity matrix constructed to model the adjacency relationship of the data pairs of irrelevance-link constraints and DI is a diagP onal matrix with its element DIii ¼ j SIij , the Laplacian I I matrix LI is formed by LI ¼ D S . The elements of matrix SI are deﬁned as: ( 1; if ðxi ; xj Þ A C OQ or C QQ : ð9Þ SIij ¼ 0; otherwise Finally, the objective function of the RMFA algorithm is expressed in the following: W ¼ arg max W

J I ðWÞ trðWT XL LI XTL WÞ ¼ arg max : J R ðWÞ trðWT XL LR XTL WÞ W

graph, the edges are assigned to “0” and “1” according to Eq. (9). 3) Eigen-problem: Compute the eigenvectors with respect to the non-zero eigenvalues for the generalized eigenvector problem:

ð10Þ

4.3. The algorithm procedure of RMFA The algorithm procedure of the proposed RMFA algorithm is stated below: 1) PCA projection: Similar to [26], to avoid singular problem and reduce noise disturbance, Principle Component Analysis (PCA) is ﬁrst adopted to project X into a subspace by throwing away the smallest principal components to maintain 99% of the energy. For convenience, we still use xi to denote the data samples in the PCA subspace in the following steps. 2) Adjacency graph construction: Two graphs with l nodes are constructed. Their edges are determined by the groups (LO , LP , LQ ) to which their connecting node pairs belong. In relevance graph, the edges are assigned to “0”, “1”, and “t” according to Eq. (7). In irrelevance

ð11Þ

where wi is the generalized eigenvector and λi is the corresponding eigenvalue. To guarantee the nonsingularity of the matrix XL LR XTL , we apply the idea of regularization [32] by adding some constant values to the diagonal elements of XL LR XTL , as XL LR XTL þ αE, where E is an identity matrix and α 40. 4) Graph embedding: Let the column vectors w1 ; :::; wd , ordered according to the ﬁrst d largest eigenvalues, be the solutions to Eq. (11). Thus, the embedding can be expressed as: X-Y ¼ WT X;

WT ¼ ½w1 ; :::; wd ;

i ¼ ½1; :::; d;

ð12Þ

where yi is a d-dimensional vector and W is a D d transformation matrix. From the above solution process, it can be seen that its computational complexity mainly contains two parts, which are for computing transformation matrix W and embedding matrix Y. For computing W, it requires the step of the Singular Value Decomposition (SVD), whose complexity is OðD3 Þ from Eq. (11). For generating Y, it can be seen that the complexity is OðdDNÞ from Eq. (12). Therefore, the computational complexity of RMFA scales as OðD3 þ dDNÞ. Moreover, the storage complexity is also related to W and Y. It requires 32Dd bits for W and 32dN bits for Y respectively, whose sum is 32dðD þ NÞ bits. 4.4. Semi-supervised Relevance Marginal Fisher Analysis (Semi-RMFA) In image ranking and many other practical applications, the labeled training samples are fairly expensive to obtain. Consequently, the phenomenon of high dimensionality D versus the low number of labeled samples l happens, which cannot guarantee the generalization capability of the machine learning algorithms, thus overﬁtting may

Z. Ji et al. / Signal Processing 121 (2016) 139–152

occur. Fortunately, the unlabeled ones are readily available. Therefore, in order to use unlabeled data to achieve more satisfactory results, a semi-supervised version of RMFA is proposed, which incorporates both labeled and unlabeled data samples into learning procedure. The key to most of semi-supervised learning algorithm is the consistency assumption, which means nearby samples tend to have the same label or similar embedding. In most of semi-supervised dimensionality reduction algorithms, labeled samples are used to provide discriminant information, and unlabeled samples together with labeled ones are used to preserve the intrinsic geometric structure. A typical way to incorporate the information of unlabeled samples is to impose a regularizer [33]. Thus, the semisupervised version of Eq. (10) is written as follows: arg max W

trðWT XL LI XTL WÞ

; trðWT XL LR XTL W þ αJðWÞÞ

JðWÞ ¼

1 2

Irrelevant graph JI(W)

Global graph J(W)

Objective function of Semi-RMFA: Eq.(18) arg max

W

W

J I (W) J R (W) J ( W)

‖WT xi WT xj ‖2 Sij

X L X w

XLX w

X L X

Graph embedding X

Y

W X

W

w

w

Fig. 5. Flowchart of Semi-RMFA algorithm.

i;j

¼ trðWT XLXT WÞ;

Sij ¼

Relevant graph JR(W)

Unlabeled data

Eigenvectors computing

¼ trðWT XðD SÞXT WÞ 8 <

Labeled data

ð13Þ

where the regularizer JðwÞ controls the learning complexity of the hypothesis family and the coefﬁcient α controls the balance between the model complexity and the empirical loss. To preserve the intrinsic manifold structure of the whole dataset, the objective function of the global graph in the LPP algorithm [23] is used as the regularizer term JðwÞ: n X

145

e

‖xi xj ‖2 2σ

: 0;

;

if

xi A Nk ðxj Þ or xj A N k ðxi Þ ;

ð14Þ

and d5D, the computational complexity of Semi-RMFA is OðDðN2 þ D2 ÞÞ approximately.

ð15Þ

5. Experimental results

otherwise

where S is a similarity matrix modeling the adjacency relationship over the whole dataset, N k ðxj Þ denotes the set of k nearest neighbors of xj , and D is a diagonal matrix P with its element Dii ¼ j Sij , the Laplacian matrix L is formed by L ¼ D S. Finally, with this data dependent regularizer, the objective function of Semi-RMFA algorithm is expressed as: W ¼ arg max W

¼ arg max W

J I ðWÞ J R ðWÞ þ αJðWÞ trðWT XL LI XTL WÞ T

trðW XL LR XTL W þ αWT XLXT WÞ

:

ð16Þ

The algorithm procedure of Semi-RMFA is similar to that of RMFA algorithm, and its ﬂowchart of Semi-RMFA algorithm is shown in Fig. 5. The computational complexity and storage complexity of Semi-RMFA is similar to that of RMFA. Besides, there is one more computational complexity of k nearest neighbor search since it adopts LPP as the regularizer, as shown in Eq. (14). The complexity of computing the distances between any two samples is OðDN2 Þ. The complexity of ﬁnding the k nearest neighbors for all the samples is OðkN 2 Þ. Thus, the computational complexities of k nearest neighbor search and Semi-RMFA are OððD þkÞN 2 Þ and OððD þkÞN 2 þD3 þdDNÞ. Since k5D

This section demonstrates the effectiveness of the proposed RMFA and Semi-RMFA algorithms with a typical image ranking application, i.e., image search reranking [6– 8,14,34,35]. Image search reranking aims at reﬁning search performance by employing image or video visual information to reorder the initial text-based search results. It is a new paradigm followed by CBIR [5,22] and multimedia annotation [9] in the domain of multimedia content analysis and retrieval. A comprehensive survey of the literature can be found in [36]. Followed by the ﬂowchart of Fig. 2, new image search reranking approaches with RMFA and Semi-RMFA are proposed respectively, which proceed according to the following steps. First, given a query, the commercial textbased image search engine (e.g., Microsoft Bing Image) returns its search results. Second, the original highdimensional image features of these initial search results are extracted to represent their visual contents. Then, some images are chosen to be labeled with the relevance degrees. Next, the labeled data or all the data are exploited in the proposed RMFA or Semi-RMFA algorithms to map the visual features into the intrinsically low-dimensional space. It should be noted that the graphs in both RMFA and Semi-RMFA are constructed for each query. Finally, a LTR algorithm (e.g., Ranking SVM [19]) is employed to use the labeled data as training data to build a

146

Z. Ji et al. / Signal Processing 121 (2016) 139–152

Table 1 Examples and number of queries of each category in MSRA-MM 2.0 image dataset. Category

Number of queries

Examples

Animal Cartoon Event NamedPerson Object PeopleRelated Scene TIME08 Misc

100 92 78 40 295 68 48 88 288

Bee, Deer, Eagle Avatar, Diddl, Snoopy Cycling, Diving, Fishing Picasso, Albert Einstein Bed, Jeep, Pluto Girl, Kid, Fairy, Queen City, Sea, Tornados Barack Obama, Paul Allen Amazon, Happy, Oops

Table 2 Parameter details of each dimensionality reduction methods. Methods

Reduced dimensionality

Weight (α)

Using PCA as a processing step?

Number of neighbor in graph (k)

PCA LPP FDA MFA SELF MMP RMFA Semi-RMFA

150 150 2 20 5 150 10 20

N/AN/A N/A N/A 0.5 0.5 N/A 1

N/A Yes Yes Yes No Yes Yes Yes

N/A 5 N/A 5 7 5 5 5

ranking function, which reorders all samples with the reduced low-dimensional visual features. Thus, a new reordered list is obtained. In the following, the datasets and methodologies are ﬁrst introduced, and then the effectiveness of the proposed algorithms is demonstrated from extensive experiments and comprehensive comparisons. 5.1. Experimental settings The proposed RMFA and Semi-RMFA algorithms are validated on the popular and publicly available MSRA-MM 1.0 image dataset [37] and MSRA-MM 2.0 image dataset [38], respectively. Both datasets explored the query log of Microsoft Bing Image Search and selected a set of representative ones. For each query, about top 1000 images are collected. Speciﬁcally, MSRA-MM 1.0 dataset consists of 68 popular queries, which cover a wide variety of categories, including objects, people, event, entertainments, and location. There are totally 65 443 images in this dataset. MSRA-MM 2.0 image dataset is an advanced version of MSRA-MM image 1.0 dataset, where 1097 frequently used queries are added. These queries are manually classiﬁed into 9 categories, i.e., “Animal”, “Cartoon”, “Event”, “NamedPerson”, “Object”, “PeopleRelated”, “Scene”, “Misc”, and “TIME08”. The total image number is around one million, which makes it one of the largest datasets in image ranking domain. Table 1 shows the number of queries for each category and some example images are shown in Fig. 1. In both datasets, each image to the corresponding query was manually assigned with a relevance degree:

“irrelevant,” “relevant,” or “very relevant.” The three levels are indicated by scores 0, 1 and 2, respectively. More importantly, the datasets provide the original ranking information of the text-based search engine, with which the proposed methods are able to be evaluated against. Moreover, the available features are adopted to make the results reproducible and comparable. They are seven global features, including: (1) block-wise color moment, (2) HSV color histogram, (3) RGB color histogram, (4) color correlogram, (5) edge distribution histogram, (6) wavelet texture and (7) face features. The overall dimensionality of these features is 899. NDCG (Normalized Discounted Cumulative Gain) is a commonly adopted metric for evaluating a search engine's performance, especially when there are more than two relevance degrees [39]. Therefore, it is adopted to evaluate the ranking performance. Given a query q, the NDCG score at the depth p in the ranked documents is deﬁned by: NDCG@p ¼ Z p

Xp

j

2r 1 ; j ¼ 1 logð1 þ jÞ

ð17Þ

where r j is the rating of the jth document, Z p is a normalization constant and is chosen so that a perfect ranking's NDCG@p value is 1. For MSRA-MM 1.0 image dataset, the ﬁnal performance is obtained by averaging NDCG from 68 queries. As for MSRA-MM 2.0 image dataset, the ﬁnal performance is obtained by averaging NDCG from each category. It should be noted that the training annotated images are included in the reported NDCG. Several popular feature dimensionality reduction methods are used for comparison, which include (1)

0.8747 0.010 0.7787 0.007 0.7397 0.006 0.7137 0.006 0.6977 0.005 0.685 7 0.005 0.6797 0.004 0.6757 0.004 0.6727 0.004 0.6747 0.004 0.7037 0.014 0.693 7 0.005 0.6767 0.008 0.662 7 0.008 0.654 7 0.008 0.6487 0.007 0.6447 0.007 0.642 7 0.008 0.6417 0.007 0.645 7 0.006 0.6687 0.025 0.6667 0.013 0.6577 0.011 0.6487 0.013 0.640 7 0.012 0.636 7 0.011 0.6337 0.011 0.6337 0.009 0.6337 0.008 0.6357 0.008 0.6477 0.028 0.653 7 0.014 0.650 7 0.014 0.642 7 0.014 0.6357 0.013 0.6317 0.011 0.629 7 0.010 0.630 7 0.010 0.630 7 0.010 0.6337 0.009 0.583 0.574 0.564 0.556 0.553 0.549 0.544 0.541 0.539 0.537 10 20 30 40 50 60 70 80 90 100

0.6707 0.026 0.665 7 0.011 0.659 7 0.012 0.6497 0.013 0.6417 0.012 0.636 7 0.012 0.634 7 0.010 0.634 7 0.009 0.6337 0.008 0.636 7 0.008

0.667 70.025 0.663 70.013 0.655 70.013 0.646 70.014 0.639 70.013 0.634 70.013 0.632 70.011 0.632 70.010 0.632 70.009 0.634 70.009

0.696 7 0.015 0.6777 0.008 0.665 7 0.009 0.652 7 0.010 0.645 7 0.012 0.6417 0.010 0.640 7 0.007 0.640 7 0.007 0.640 7 0.006 0.643 7 0.006

0.806 70.008 0.739 70.005 0.708 70.008 0.684 70.008 0.669 70.008 0.661 70.006 0.656 70.006 0.654 70.005 0.653 70.004 0.654 70.004

0.804 7 0.007 0.7407 0.005 0.7077 0.008 0.686 7 0.008 0.6737 0.008 0.6647 0.006 0.6577 0.006 0.656 7 0.004 0.654 7 0.003 0.655 7 0.004

Semi-RMFA RMFA LPP MMP SELF MFA FDA PCA Baseline Text NDCG@

Table 3 Performance comparison with different dimensionality reduction approaches. (The NDCG results contain two parts, The ﬁrst number is mean value and the second number is standard deviation for all the methods except “Text”. For “Text”, there is only mean result. The boldface refers to the highest performance.)

Z. Ji et al. / Signal Processing 121 (2016) 139–152

147

Table 4 Performance gains (%) against “Baseline” at the depths of 10, 50 and 100 respectively. Methods

NDCG@10

NDCG@50

NDCG@100

PCA LPP FDA MFA SELF MMP RMFA Semi-RMFA

3.43 20.0 0.45 0.30 3.88 20.30 4.93 30.45

0.94 5.0 0.31 0.16 0.62 4.37 2.03 8.74

0.47 2.99 0.31 0.16 1.1 2.83 1.42 5.97

unsupervised algorithms: PCA [24], LPP [23], (2) supervised algorithms: FDA [40], MFA [5] and (3) semisupervised algorithms: SELF [28], MMP [22]. Since no ranking information is used in conventional dimensionality reduction algorithms, both the supervised and semisupervised algorithms regard the three relevance groups as three categories. Table 2 shows the parameter details of each algorithm. Speciﬁcally, “Text” refers to the performance of the original text-based search performance, and “Baseline” refers to the performance with the original provided 899D features. In both RMFA and Semi-RMFA algorithms, we set s¼ 5. Because the irrelevant ones are much easier to obtain than relevant ones for a given query, it is reasonable to pick up them automatically by randomly sampling those not associated with the textual query. Thus only the “very relevant” and “relevant” images are required to be labeled. For the sake of convenience, we set the same labeled number s for them. In addition, both the coefﬁcient α in Semi-RMFA and the penalization parameter in the Ranking SVM model are set to be 1. As for the reduced dimensionalities in each dimensionality reduction approach, they are tuned from 5, 10, 15 to 200 and the optimal values are obtained on the average performance of 68 queries in MSRA-MM 1.0 dataset at NDCG@10. As for FDA, its reduced dimensionality is determined by τ 1, where τ is the number of categories [40]. Since FDA views the three relevance degrees as three classes, so its dimensionality is 2. And then these parameters are ﬁxed in the processing of all queries in MSRA-MM 2.0. We adopt the algorithm of Ranking SVM [19] for ranking in all the experiments, and repeat them for six times and report the average and standard deviation results. The impaction of the important parameters is discussed in the following section. 5.2. Experiments on MSRA-MM 1.0 dataset This sub-section demonstrates the excellent performance of the proposed RMFA and Semi-RMFA algorithms on MSRA-MM 1.0 dataset. Table 3 illustrates the NDCG results at the depth¼{10, 20, 30, 40, 50, 60, 70, 80, 90, 100} with different dimensionality reduction approaches. In addition, the performance gains against “Baseline” are shown in Table 4, which take the depth at 10, 50 and 100 for examples. From Tables 3 and 4, we can observe that: (1) All dimensionality reduction algorithms together with

148

Z. Ji et al. / Signal Processing 121 (2016) 139–152

Table 5 t-test results at the 5% signiﬁcance level for Semi-RMFA versus the other methods and “1” indicates signiﬁcant improvement, “0” indicates no signiﬁcant difference. Semi-RMFA versus:

NDCG@10

NDCG@50

NDCG@100

Text baseline PCA LPP FDA MFA SELF MMP RMFA

1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1

1 1 1 0 1 1 1 0 1

1.9

CPU Time (s)

1.7 1.5 1.3 1.1 0.9 0.7 0.5

Fig. 6. Average reranking time of each method for a query.

“Baseline” outperform “Text” signiﬁcantly, which validates the usefulness of visual features in image search reranking, (2) The performances of PCA, FDA and MFA are inferior to “Baseline”, while those of SELF, MMP, LPP and the proposed RMFA, Semi-RMFA are superior to “Baseline”. It can be observed that the algorithms superior to “Baseline” are all based on the assumption of local manifold structure. Therefore, it can be concluded that manifold-based dimensionality reduction algorithms are more helpful than others to image ranking application. This is because relevant images lie on a manifold in intrinsic visual feature space. However, it does not mean that all manifold-based dimensionality reduction algorithms are effective (e.g., MFA), thus elaborate design is necessary, (3) RMFA has 20.58% and 4.93% performance gains against Text and Baseline at NDCG@10, which indicates the effectiveness of relevance graph and irrelevance graph. Moreover, the performance of RMFA is similar to SELF while is inferior to LPP, MMP and Semi-RMFA, which denotes that the unlabeled data is required to discover the intrinsic embedding structure, (4) Semi-RMFA consistently outperforms the other methods at all depths. Take the NDCG@10 for example, Semi-RMFA is better than PCA, FDA, MFA, SELF, RMFA, MMP, LPP by 35.05%, 31.10%, 30.76%, 24.56%, 25.20%, 8.46% and 8.65% respectively and (5) The small standard deviation shown in Table 3 demonstrates the robustness of the proposed methods. Moreover, t-test at the 5% signiﬁcance level is employed to test the statistical signiﬁcance of Semi-RMFA versus the

other algorithms, whose results are shown in Table 5. It can be seen that Semi-RMFA is signiﬁcantly superior to other algorithms, especially at the depths of 10 and 50. Fig. 6 shows the average search reranking time of each method for a query running on a desktop computer of 2.53 GHz CPU and 4 G RAM. It can be observed that LPP and MMP method spend the most, and MFA, FDA and RMFA spend the least. This is because LPP and MMP use all the images and have higher reduced dimensionalities while MFA, FDA and RMFA only use the labeled images and have lower reduced dimensionalities. The time of SemiRMFA is between them, which is about 1.5 s. This is an acceptable latency time for reranking application. Fig. 7 illustrates the impact of labeled number s from 2, 5, 8 to 11. It can be observed that more labels generally brings higher improvement in performance. Experiments are also carried out under different dimensions to further disclose the relationship between the dimensionality and the ranking performance, as is shown in Fig. 8. From the ﬁgure, it can be discovered that the best dimensionalities for RMFA and SemiRMFA are 10 and 20 respectively, and that higher dimensionality only brings slight performance change. 5.3. Experiments on MSRA-MM 2.0 dataset This section shows the excellent performance of the proposed methods on MSRA-MM 2.0 dataset. Fig. 9 illustrates the NDCG results for each category at depths of 10, 50 and 100 respectively. It can be observed that SemiRMFA consistently outperforms the others for each category at all depths, which effectively shows the superiority of Semi-RMFA. Moreover, similar to that in MSRA-MM 1.0 dataset, the performance of RMFA is also similar to SELF and inferior to LPP, MMP and Semi-RMFA, which further denotes that the unlabeled data is helpful to discover the intrinsic embedding structure. Fig. 10 shows the average performance of all 9 categories for each dimensionality reduction methods including “Text” and “Baseline” at depths of 10, 50 and 100. It can be seen that the performances of all the methods degrade with the increase of depth, which is a quite reasonable phenomenon in information retrieval domain. It can also be observed that Semi-RMFA performs the best, followed by MMP, LPP, RMFA and SELF and MFA, PCA and FDA give worse outcomes. The

Z. Ji et al. / Signal Processing 121 (2016) 139–152

149

Fig. 7. Performance comparison of different labeled number s. The dashed lines denote the performances of RMFA, and the solid lines denote those of Semi-RMFA.

0.900

0.850

NDCG

0.800

d=10(Semi-RMFA)

d=20(Semi-RMFA)

d=30(Semi-RMFA)

d=40(Semi-RMFA)

d=5(RMFA)

d=10(RMFA)

d=15(RMFA)

d=20(RMFA)

0.750

0.700

0.650

0.600 10

20

30

40

50

60

70

80

90

100

Depth Fig. 8. Performance comparison in different reduced dimensionality d. The dimensionality is 5, 10, 15 and 20 for RMF and 10, 20, 30 and 40 for Semi-RMFA.

stable performance in such large a dataset demonstrates the robustness of our proposed methods. 5.4. Comparison with state-of-the-art image search reranking methods To further prove the effectiveness of the proposed scheme in Fig. 2 and the proposed dimensionality reduction algorithms, the following state-of-the-art image search reranking methods are used for comparison: 1) Bayesian reranking [35]: a novel image search reranking method based on Bayesian framework. It maximizes the ranking score consistency among visually similar data samples while minimizes the ranking distance, which

represents the disagreement between the objective ranking list and the initial text-based. 2) Context reranking [34]: a typical graph-based image search reranking method which formulates the reranking process as a random walk over a context graph, where images are nodes and the edges between them are weighted by multimodal similarities. 3) Multimodal graph-based reranking [6]: a recently proposed novel graph-based image search reranking method exploring multiple modalities. Seven graphs are built with the seven global features provided by the datasets (899-D) and the results are fused to get the ﬁnal reranking result. The approach simultaneously learns relevance degrees, weights of modalities, and the distance metric with its scaling for each modality.

150

Z. Ji et al. / Signal Processing 121 (2016) 139–152

1

Text MFA

Baseline SELF

PCA MMP

LPP RMFA

0.950

FDA

NDCG@10

NDCG@50

NDCG@100

0.900

0.95

0.850

NDCG

0.9 0.85 0.8

0.800 0.750

0.75 0.700

0.7 0.65

0.650

0.6

Fig. 10. Average performance in MSRA-MM 2.0 image dataset for all methods at NDCG@10, NDCG@50 and NDCG@100.

NDCG@10

1 0.95

NDCG@50

0.9 0.85 0.8 0.75 0.7 0.65 0.6

Category

NDCG@50

Text MFA

Baseline SELF

PCA MMP

LPP RGE

FDA Semi-RGE

1 0.95 NDCG@100

0.9 0.85 0.8 0.75 0.7 0.65 0.6

Category

NDCG@100

Fig. 9. Performance comparison for each category at (a) NDCG@10, (b) NDCG@50, (c) NDCG@100 respectively.

The methods mentioned above are denoted as “Bayesian”, “Context” and “Multimodal” respectively. Again, “Text” refers to the performance of initial text-based search results, and “Baseline” represents the performance of directly using the original provided 899D features. Table 6 illustrates the average NDCG@100 measurements obtained by different methods for each category of queries. Here “MM-1.0” represents MSRA_MM 1.0 image dataset. It can be seen that all the reranking algorithms improve the

original search results, i.e., “Text”. Speciﬁcally, Multimodal method performs the best in categories of “Animal”, “Cartoon”, “Object”, “Scene”, and “TIME08”, while the proposed Semi-RMFA method performs the best in the other 5 categories, i.e., “MM-1.0”, “Event”, “NamedPerson”, “PeopleRelated” and “Misc”. Similar observations are found with the NDCG@10 and NDCG@50 measurements. Moreover, the last row in Table 6 shows the signiﬁcance test result of t-test for Semi-RMFA versus the others. It can be seen that the performance of Semi-RMFA has signiﬁcant superiority to the other methods except for the Multimodal method. The reason of this superiority lies in the facts that LTR is better at mining ranking information and the proposed RMFA and Semi-RMFA methods are better at discovering the intrinsic structure of visual features as well as keeping the ordinal information. Since ranking model and feature are two of the greatest factors in image ranking application, it is no hard to see the outperformance of the proposed Ranking Dimensionality Reduction scheme and RMFA/Semi-RMFA methods. Although the performance of Semi-RMFA is only slightly better than Multimodal method, its computational complexity and storage complexity is much smaller. It can be analyzed that the computational complexity of Multimodal method is mainly dominated by two parts: one is the k nearest neighbor search and the other is the multigraph fusion. The computational complexity of the k nearest neighbor search is OððD þk1 k2 ÞN2 Þ since it constructs k2 graphs with each modality, where k1 is the size of neighborhood and k2 is the number of modalities. The computational complexity of the multi-graph fusion is ! k2 P 2 D2k2 þ T 3 k2 , where T 1 , T 2 , T 3 are iteration O T 1 T 2 Nk1 i¼1

times. According to the parameters set in [6], the overall computational complexity of Multimodal method is one order larger than that of Semi-RMFA method. And the storage complexity of Multimodal method is 32k2 N 2 bit, which is about N times larger than 32dðD þNÞ of SemiRMFA.

6. Conclusion and future work This paper has presented a novel “Ranking Dimensionality Reduction” scheme by incorporating ranking

Z. Ji et al. / Signal Processing 121 (2016) 139–152

151

Table 6 Performance comparison for each category of different algorithms at NDCG@100. (The boldface refers to the highest performance for each category. The last row is the signiﬁcance test result of t-test for Semi-RMFA versus the others, where “1” indicates signiﬁcance improvement, “0” indicates no signiﬁcance difference).

MM-1.0 Animal Cartoon Event Named person Object People related Scene TIME08 Misc Mean t-test result

Text

Baseline

Bayesian

Context

Multimodal

RMFA

Semi-RMFA

0.537 0.734 0.807 0.788 0.908 0.703 0.714 0.702 0.830 0.736 0.747 1

0.636 0.737 0.855 0.797 0.919 0.729 0.722 0.737 0.771 0.785 0.769 1

0.542 0.775 0.859 0.779 0.916 0.723 0.703 0.766 0.844 0.760 0.767 1

0.541 0.759 0.828 0.788 0.905 0.717 0.71 0.752 0.854 0.753 0.7607 1

0.568 0.791 0.865 0.811 0.940 0.745 0.742 0.792 0.870 0.790 0.791 0

0.643 0.734 0.850 0.798 0.934 0.728 0.735 0.728 0.778 0.783 0.771 1

0.674 0.754 0.858 0.812 0.954 0.737 0.755 0.759 0.823 0.804 0.793 N/A

information into the conventional dimensionality reduction methods. The scheme is designed specially for LTRbased image ranking application, and aims at discovering the intrinsic structure of data and keeping the ordinal information. Based on this scheme and MFA, Relevant Marginal Fisher Analysis (RMFA) algorithm and its semisupervised version (Semi-RMFA) are developed by constructing the relevance graph and irrelevance graph with the pairwise constraint information of relevance-link and irrelevance-link. A comprehensive set of experiments on image search reranking has been performed on the large, real-world datasets: MSRA-MM 1.0 image dataset and MSRA-MM 2.0 image dataset. These comparative studies clearly demonstrate that the proposed algorithms not only outperform the conventional classiﬁcation-oriented dimensionality reduction algorithms, but also achieve superior performance against state-of-the-art image search reranking methods. The proposed Ranking Dimensionality Reduction scheme is general enough to be applied to other multimedia ranking domains such as personal recommendation, and CBIR. Moreover, we plan to introduce the ranking information to other popular dimensionality reduction algorithms.

Acknowledgments This work was supported in part by the National Basic Research Program of China (973 Program) (Grant no. 2014CB340400), the National Natural Science Foundation of China (Grant nos. 61271325, 61472273, 61172121, 61271412 and 61222109), the Elite Scholar Program of Tianjin University (No. 2015XRG-0014), and the Excellent Young Scholar of the Tianjin University of Technology and Education (Grant no. RC14-46).

References [1] Q. Jia, X. Tian, Query difﬁculty estimation via relevance prediction for image retrieval, Signal Process. 110 (2015) 232–243.

[2] C. Jin, S. Jin, Automatic image annotation using feature selection based on improving quantum particle swarm optimization, Signal Process. 109 (2015) 172–181. [3] M. Jian, K. Lam, Face-image retrieval based on singular values and potential-ﬁeld representation, Signal Process. 100 (2014) 9–15. [4] W. Bian, D. Tao, Biased discriminant Euclidean embedding for content-based image retrieval, IEEE Trans. Image Process, 19, (2) pp. 545–554. [5] D. Xu, S. Yan, D. Tao, S. Lin, H. Zhang, Marginal Fisher Analysis and its variants for human gait recognition and content based image retrieval, IEEE Trans. Image Process. 16 (11) (2007) 2811–2821. [6] M. Wang, H. Li, D. Tao, K. Lu, X. Wu, Multimodal graph-based reranking for web image search, IEEE Trans. Image Process. 21 (11) (2012) 4649–4661. [7] Y. Yang, W. Hsu, H. Chen, Online reranking via ordinal informative concepts for context fusion in concept detection and video search, IEEE Trans. Circuits Syst. Video Technol. 19 (12) (2009) 1880–1890. [8] J. Yu, Y. Rui, B. Chen, Exploiting click constraints and multi-view features for image re-ranking, IEEE Trans. Multimedia16 1 (2014) 159–168. [9] J. Weston, S. Bengio, N. Usunier, Large scale image annotation: learning to rank with joint word-image embeddings, Mach. Learn. 81 (1) (2010) 21–35. [10] D. Liu, X. Hua, L. Yang, M. Wang, H. Zhang, Tag ranking, in: Proceedings of the WWW, 2009, pp. 351 360. [11] Z. Pan, X. You, H. Chen, D. Tao, B. Pang, Generalization performance of magnitude-preserving semi-supervised ranking with graph-based regularization, Inf. Sci. 221 (2013) 284–296. [12] B. Xu, J. Bu, C. Chen, D. Cai, X. He, W. Liu, J. Luo, Efﬁcient manifold ranking for image retrieval, in: Proceedings of the ACM SIGIR, 2011, pp. 525–534. [13] Y. Liu, Y. Liu, S. Zhong, K. Chan, Semi-supervised manifold ordinal regression for image ranking, in: Proceedings of the ACM MM, 2011, pp. 1393–1396. [14] B. Geng, L. Yang, C. Xu, X. Hua, Content-aware ranking for visual search, in: Proceedings of the IEEE CVPR, 2010, pp. 3400–3407. [15] T. Liu, Learning to Rank for, Information Retrieval, Springer Press, Berlin, 2011. [16] F. Faria, A. Veloso, H. Almeida, E. Valle, R. Torres, M. Goncalves, Learning to rank for content-based image retrieval, in: Proceedings of the ACM MIR, 2010, pp. 285–294. [17] Y. Hu, M. Li, N. Yu, Multiple-instance ranking: learning to rank image for image retrieval, in: Proceedings of the IEEE CVPR, 2008, pp. 1–8. [18] Y. Li, C. Zhou, B. Geng, C. Xu, H. Liu, A comprehensive study on learning to rank for content-based image retrieval, Signal Process. 93 (2013) 1426–1434. [19] T. Joachims, Optimizing search engines using clickthrough data, in: Proceedings of the ACM SIGKDD, 2002, pp. 133–142. [20] X. Tian, Y. Lu, L. Yang, Q. Tian, Learning to judge image search results, in: Proceedings of the ACM MM, 2011, pp. 363–372. [21] C. Li, Q. Liu, J. Liu, H. Lu, Ordinal regularized manifold feature extraction for image ranking, Signal Process. 93 (2013) 1651–1661. [22] X. He, D. Cai, J. Han, Learning a maximum margin subspace for image retrieval, IEEE Trans. Knowl. Data Eng. 20 (2) (2008) 189–201. [23] X. He, S. Yan, Y. Hu, P. Niyogi, H. Zhang, Face recognition using Laplacianfaces, IEEE Trans. Pattern Anal. Mach. Intell. 27 (3) (2005) 328–340.

152

Z. Ji et al. / Signal Processing 121 (2016) 139–152

[24] I. Jolliffe, Principal Component Analysis, Springer-Verlag, New York, 1986. [25] N. Lawrence, Spectral dimensionality reduction via maximum entropy, in: Proceedings of the AISTATS, 2011, pp. 51–59. [26] S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, S. Lin, Graph embedding and extensions: a general framework for dimensionality reduction, IEEE Trans. Pattern Anal. Mach. Intell. 29 (1) (2007) 40–51. [27] D. Zhang, Z. Zhou, S. Chen, Semi-supervised dimensionality reduction, in: Proceedings of the SIAM ICDM, 2007, pp. 629–634. [28] M. Sugiyama, T. Ide, S. Nakajima, J. Sese, Semi-supervised local Fisher Discriminant Analysis for dimensionality reduction, Mach. Learn. 78 (1–2) (2010) 35–61. [29] X. Tian, D. Tao, X. Hua, X. Wu, Active reranking for web image search, IEEE Trans. Image Process. 19 (3) (2010) 805–820. [30] X. Geng, T. Liu, T. Qin, H. Li, Feature selection for ranking, in: Proceedings of the 30th Annu. Int. ACM SIGIR, 2007, pp. 407–414. [31] H. Lai, Y. Pan, Y. Tang, R. Yu, FSMRank: feature selection algorithm for learning to rank, IEEE Trans. Neural Netw. Learn. Syst. 24 (6) (2013) 940–952. [32] J. Friedman, Regularized discriminant analysis, J. Am. Stat. Assoc. 84 (405) (1989) 165–175. [33] D. Cai, X. He, J. Han, Semi-supervised Discriminant Analysis, in: Proceedings of the ICCV 2007, pp. 1–7. [34] W. Hsu, L. Kennedy, S. Chang, Video search reranking through random walk over document-level context graph, in: Proceedings of the ACM MM, 2007, pp. 971–980.

[35] X. Tian, L. Yang, J. Wang, Y. Yang, X. Wu, X. Hua, Bayesian video search reranking, in: A.C.M. Proc. (Ed.), MM, 2008, pp. 131–140. [36] T. Mei, Y. Rui, S. Li, Q. Tian, Multimedia search reranking: a literature survey, ACM Comput. Surv. 46 (3) (2014) 1–38. [37] M. Wang, L. Yang, X. Hua, MSRA-MM: bridging research and industrial societies for multimedia information retrieval, Microsoft Technical Report, Beijing, MSR-TR-2009-2030, 2009. [38] H. Li, M. Wang, X. Hua, MSRA-MM 2.0: a large-scale web multimedia dataset, in: Proceedings of the IEEE ICDM Workshops, 2009, pp. 164–169. [39] K. Järvelin, J. Kekäläinen, IR evaluation methods for retrieving highly relevant documents, in: Proceedings of the ACM SIGIR, 2000, pp. 41–48. [40] R. Fisher, The use of multiple measurements in taxonomic problems, Ann. Eugen. 7 (2) (1936) 179–188. [41] Y. Pang, Z. Song, X. Li, J. Pan, Truncation error analysis on reconstruction of signal from unsymmetrical local average sampling, IEEE Trans. Cybern. 45 (10) (2015) 2100–2104. [42] Z. Ji, Y. Pang, Y. He, H. Zhang, L.P.P. Semi-supervised, algorithms for learning-to-rank-based visual search reranking, Inf. Sci. 302 (2015) 83–93.

Relevance and irrelevance graph based marginal Fisher analysis for image search reranking

Relevance and irrelevance graph based marginal Fisher analysis for image search reranking

Recommend Documents