Rank canonical correlation analysis and its application in visual search reranking

Signal Processing 93 (2013) 2352–2360 Contents lists available at SciVerse ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate...

Download PDF

1MB Sizes 0 Downloads 45 Views

Report

PDF Reader
Full Text

Signal Processing 93 (2013) 2352–2360

Contents lists available at SciVerse ScienceDirect

Signal Processing journal homepage: www.elsevier.com/locate/sigpro

Rank canonical correlation analysis and its application in visual search reranking Zhong Ji n, Peiguang Jing, Yuting Su, Yanwei Pang School of Electronic Information Engineering, Tianjin University, Tianjin 300072, PR China

a r t i c l e in f o

abstract

Article history: Received 5 December 2011 Received in revised form 31 March 2012 Accepted 8 May 2012 Available online 28 May 2012

Ranking relevance degree information is widely utilized in the ranking models of information retrieval applications, such as text and multimedia retrieval, question answering, and visual search reranking. However, existing feature dimensionality reduction methods neglect this kind of valuable potential supervised information. In this paper, we extend the pairwise constraints from the traditional class labels to ranking relevance degrees, and propose a novel dimensionality reduction method called Rank-CCA. Rank-CCA effectively incorporates ranking relevance constraints into standard canonical correlation analysis (CCA) algorithm, and is able to employ the knowledge of both unlabeled and labeled data. In the application of visual search reranking, our proposed method is veriﬁed through extensive experimental studies. Experimental results show that Rank-CCA is superior to standard CCA and semi-supervised CCA (Semi-CCA) algorithm, and achieves comparable performance with several state-of-theart reranking methods while preserving the superiority of low dimensional features. & 2012 Elsevier B.V. All rights reserved.

Keywords: Canonical correlation analysis Feature dimensionality reduction Visual search reranking Information retrieval

1. Introduction With the rapid increase of high dimensional data, dimensionality reduction technique is widely used in many real-world applications, such as data mining, pattern recognition and information retrieval. It aims to transform high dimensional data into a meaningful representation of reduced dimensionality, which is an effective way to mitigate the ‘‘curse of dimensionality’’ and high computational burden and storage cost. Over the past decades, a large number of techniques have been proposed [1–3], from the linear algorithms such as Principal Components Analysis (PCA) and Principal Curves to the nonlinear algorithms such as kernel methods and manifold methods, from the global algorithms such as Isometric Feature Mapping (ISOMAP) to local algorithms such as Locality Preserving Projections (LPP) and

n

Corresponding author. Tel.: þ 86 13132245212. E-mail address: [email protected] (Z. Ji).

0165-1684/$ - see front matter & 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.sigpro.2012.05.006

Neighborhood Preserving Embedding (NPE), from the unsupervised algorithms such as PCA and LPP to supervised algorithms such as Linear Discriminant Analysis (LDA) and Maximum Margin Criterion (MMC), and from the single-modal algorithms to the multi-view algorithms such as Canonical Correlation Analysis (CCA). In recent years, many new dimensionality reduction algorithms were designed for particular applications whose data have special characteristics. For example, semi-supervised dimensionality reduction methods [4–7] were proposed to handle the situation where unlabeled examples are readily available but labeled ones are fairly expensive to obtain, which is the common case in tasks such as multimedia retrieval, image annotation, and medical data analysis. Multiple-view dimensionality reduction methods [8,9] were developed to deal with the applications where an instance may have multiple representations from different feature spaces or graph spaces. In information retrieval tasks, relevance degree information is ubiquitous in ranking techniques, which always plays a key role and have received great attentions from

Z. Ji et al. / Signal Processing 93 (2013) 2352–2360

industry and academic communities. A ranking model can directly take the queries and documents as inputs and compute a matching score using some heuristic formulas, and can also extract some features for each query– document pair and combine these features to produce the matching score [10]. Popular information retrieval applications include visual search reranking [11–13], question answering [14], Video Annotation [15–17], multimedia retrieval and summarization [18–20], collaborative ﬁltering [21], and so on. In most cases, each document will be labeled manually or automatically with a relevance degree to the query in the form of binary judgment or multiple ordered ones (Perfect, Excellent, Good, Fair, or Bad). This kind of special labels is different from the traditional ones, and has led to the emerging of a new research area named learning to rank [10]. However, learning to rank technique only focuses on the leveraging machine learning methods in the ranking process to build effective ranking models, which does not take into account the feature dimensionality reduction problem. Unfortunately, as far as we know, there has been little previous work employing the relevance degree information in the dimensionality reduction techniques. Generally, multimedia data have multiple modalities, and each modality is usually represented with high dimensional features and has particular statistical properties. CCA copes with the mutual relationships between two modalities to extract the representation of the semantics. It is one of the valuable multi-data processing methods, and has been successfully utilized in many research areas such as multimedia content analysis and retrieval [22], action classiﬁcation [23], facial expression recognition [24], fMRI data analysis [5], and so on. Inspired by the pairwise constraints method [4] and semi-supervised canonical correlation analysis (SemiCCA) method [25], we propose a method to incorporate relevance constraints with the dimensionality reduction technique. In [4], the authors proposed a semi-supervised dimensionality reduction method, which exploited the abundant unlabeled instances together with pairwise constraints to preserve the intrinsic training data structure in the projected low-dimensional space. Generally, pairwise constraints are relatively easy to obtain, which include must-link (instance pairs belong to the same class) and cannot-link (instance pairs belong to different classes) constraints. Peng and Zhang [25] further applied the idea of pairwise constraints to CCA, and presented a semi-supervised method called Semi-CCA. They proved its effectiveness via extensive experiments on UCI handwritten digit dataset, Yale and AR facial datasets. Based on the framework of CCA, we extend the pairwise constraints from class labels to relevance degrees, and develop a novel dimensionality reduction algorithm named Rank-CCA for information ranking tasks. Compared to previous work, this paper makes the following contributions: (1) To the best of our knowledge, our approach is the ﬁrst to consider relevance degree information in dimensionality reduction technique.

2353

(2) A novel Rank-CCA algorithm is proposed, which incorporates the relevance constraints to standard CCA method and can employ the knowledge of both unlabeled and labeled data. (3) A visual search reranking method is investigated with the proposed Rank-CCA algorithm, which is conﬁrmed to be superior to standard CCA and Semi-CCA algorithm, and is comparable to some state-of-the-art methods while preserving the superiority of low dimensional features. The remainder of this paper is organized as follows. In Section 2, we provide literature short reviews on the extensions of CCA and visual search reranking methods. Section 3 gives a brief review of CCA. Section 4 formulates and deﬁnes Rank-CCA in detail. Experimental results in image search reranking are given in Section 5. At last, we conclude this paper in Section 6. 2. Related works 2.1. Extensions of canonical correlation analysis Canonical Correlation Analysis (CCA) is a technique for ﬁnding basis vectors such that the correlations between the projections of the paired variables onto the corresponding vectors are mutually maximized [26]. It is a fundamental technique in statistics and dimensionality reduction, and is typically used for multi-view data samples. In recent years, many research studies have been carried out to improve CCA for different situations in different ways. For example, kernel CCA [24] and locality preserving CCA (LPCCA) [27] were developed to reveal the nonlinear relationships hidden behind original data. Kernel CCA ﬁrst projects the data into a higher-dimensional feature space through well-known ‘‘kernel trick’’, and then performs CCA in the new feature space. LPCCA decomposes the global nonlinear problem into many local linear ones, which can be processed by linear CCA. And then, the ﬁnal global problem can be solved by optimizing the combination of these local sub-problems. Recently, Hardoon and Shawe-Taylor proposed a sparse CCA [28] algorithm to focus on the scenario when one is interested in a primal representation for the ﬁrst view while having a dual representation for the second view. Sparse CCA minimizes the number of features used in both the primal and dual projections while maximizing the correlation between the two views. Some research works [29,30] aimed at analyzing the correlations between more (than two) datasets and the corresponding methods are called multi-set CCA (MCCA). For example, in [30], the authors proposed an MCCA algorithm for color images, which can extract three color components and provide the analytical solution. It means the method can directly acquire the typically correlative features of three input datasets. In the meantime, a surge of efforts have been made in algorithms for semi-supervised extensions of CCA (Semi-CCA) [5,25,31], which are designed for the commonly encountered situation that the number of labeled samples are limited, while

2354

Z. Ji et al. / Signal Processing 93 (2013) 2352–2360

abundant unlabeled samples are available. For example, to utilize both labeled and unlabeled data, Peng and Zhang [25] took advantage of pairwise constraints on CCA, and Kimura et al. [31] introduced principal component analysis (PCA) into CCA by bridging CCA with paired samples and PCA with paired and unpaired samples via a trade-off parameter. 2.2. Visual search reranking Visual search reranking is a new paradigm followed by content-based image/video retrieval and concept detection in multimedia content analysis and retrieval domain. It is deﬁned as reordering visual documents (images or video clips) based on the initial text search results by incorporating visual information. The surge of this paradigm is due to the fact that most popular image search engines still perform text-based search techniques by using the metadata associated with media contents, such as the title, comments, and so on. However, this kind of search approach often returns some noisy results on the top of the ranking list since the text cannot entirely reﬂect the image visual contents. Visual search reranking holds the simple search mechanism preferred by typical users, and exploits the visual information and multimedia analysis methods in another way. Therefore, it integrates characteristics of real-time and accuracy, and has great importance to establish practical image search system. Over the past decade, visual search reranking has been receiving great attention, and the current approaches can be roughly divided into two categories: unsupervised reranking [12,13,32–34] and supervised reranking [11,35–39]. The former is generally based on the smoothness assumption, which considers that the relevance scores of visually similar documents should be closer. Therefore, this kind of method aimed at discovering and mining the visual patterns that have high visual similarity. Obviously, one of the effective ways is using the clustering algorithm since it can detect the recurrent patterns. For example, Hsu et al. [32] and Wei et al. [33] used information bottleneck and NCuts clustering algorithm, respectively, to reﬁne the initial performance. Recently, graph-based methods have shown promising results, in which a graph is constructed with the visual documents as the nodes and the edges between them being weighted by their visual similarity. In [12,13,34], reranking was formulated as random walk over the graph and the ranking scores were propagated through the edges. Reranking results were ﬁnally obtained through the stationary probability distribution. Supervised reranking methods ﬁrst train a classiﬁer using the train data directly from the initial search results, and then reorder all the documents by the relevance scores predicted from the classiﬁer. The key technical difﬁculties depend on the selection of train data. Currently, there are mainly three solutions. (1) Choosing some train data manually [35], or with active learning method [36]. (2) Adopting the idea of pseudo-relevance feedback, in which a signiﬁcant fraction of top-ranked documents are assumed to be relevant and are then used to build a model for reranking the search result set

[11,37]. (3) Utilization of other information resources [38,39], such as the concept detectors and Internetrelated resources. Furthermore, there are also some new research works beyond the two categories mentioned above. For instance, Zhang et al. [40] presented a novel method by investigating visual search reranking from the view of global optimization. They introduced particle swarm optimization (PSO) mechanism and viewed the reranking as a mapping process from the initial text search list to the objective ranked list. Results diversity is also taken into account in visual search reranking. Wang et al. [41] and Ji et al. [42] studied the joint optimization of search relevance and diversity. Speciﬁcally, Wang et al. [41] proved that relevance reranking can be regarded as the process of optimizing the mathematical expectation of the conventional Average Precision (AP) measure, while diversity reranking can be viewed as an optimization process to a new Average Diversity Precision (ADP) measure. 3. Canonical correlation analysis Consider a set of paired samples Si ¼{(xi,yi)}ARp Rq, i¼ 1,y,n, where fxigni¼ 1 and fyigni¼ 1 are obtained from different information channels. Without loss of generality, we assume that X¼[x1,y,xn]ARp n and Y¼ [y1,y,yn]ARq n are both centered, which can always be achieved by subtracting the sample means from each sample. The aim of CCA is to seek a pair of linear transforms, wxARp and wyARq for X and Y, such that correlations between X 0 ¼ wTx X and Y 0 ¼ wTy Y are maximized. The objective function is formularized as follows: wT C w

x xy y r ¼ max qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ wx ,wy

ð1Þ

wTx C xx wx UwTy C yy wy

where Cxx ¼XXTARp p, Cyy ¼YYTARq q are the within-set covariance matrices and Cxy ¼XYTARp q is the between-sets covariance matrix and C xy ¼ C Tyx . More details about derivation and solution of CCA can be found in [22]. 4. Rank canonical correlation analysis Relevance degree information is different from the class label in nature. The former measures the degree of closeness for a document to the query in information retrieval applications, while the latter refers to a common attribute for one category in machine learning and pattern recognition tasks. For example, in conventional pattern classiﬁcation tasks, instances from the same category always share similar characteristics, and those instances from different categories generally have different characteristics, so semi-supervised dimensionality reduction methods can be adopted with must-link and cannot-link constraints [4,25]. However, in ranking applications, examples in different relevance degree may still possess similar characteristics since they are related to the same query. Therefore, pairwise constraints of must-link and cannot-link constraints cannot be employed directly. Without loss of generality, we consider only the threelevel relevance degree label method, which is the label of

Z. Ji et al. / Signal Processing 93 (2013) 2352–2360

2355

Fig. 1. Example images at different relevance degrees for the queries of (a) ‘‘horses’’ and (b) ‘‘car’’.

‘‘very relevant’’, ‘‘relevant’’ and ‘‘irrelevant’’. And their data sets are represented with A, B, C, respectively. Fig. 1 illustrates some example images at different relevance degree for the same query. We can see that images from set A and set B have some kind of similarity for the same query, since they reﬂect the same topic. We can also discover that images in set C are dissimilar even for the same query. Therefore, based on the framework of CCA, we obtain the following relevance constraints. (1) Except for set C, different modality features between different instances from the same group have the maximum correlation. We regard these as relevant constraints, i.e. A–A and B–B. (2) Since images in set A and set B have visual similarity to some extent, we also regard A–B as relevant constraints. However, we add a coefﬁcient aA[0,1] to control its effect. (3) The constraints A–C, B–C and C–C are set to be irrelevant constraints, which mean that different modality features between different instances from them have the minimum correlation. This is due to the fact that set C is actually a clutter group, its images have no visual similarity with the others either in intra-set or in inter-set. Accordingly, in order to improve the performance of CCA in ranking applications, we incorporate the constraints of relevance degree information into CCA, which is referred to as Rank-CCA. The standard CCA optimization problem is modiﬁed so that the cross-covariance matrix Cxy in Eq. (1) is replaced by a term C^ xy that takes relevance constraints into account. Note that both x and y have been centralized. The objective function is formularized as wTx C^ xy wy r ¼ max qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ wx ,wy wTx C xx wx UwTy C yy wy

ð2Þ

þ

X ðxi ,yj Þ2ðA,CÞ

ðxi yTj þ xj yTi Þ þ

X ðxi ,yj Þ2ðB,CÞ

ðxi yTj þ xj yTi Þ

¼ XEY T þ gðXC AA Y T þXC BB Y T þ aXC AB Y T Þ ð1gÞðXC CC Y T þ XC AC Y T þ XC BC Y T Þ ¼ XðEþ gC AA þ gC BB þ gaC AB ð1gÞC CC ð1gÞC AC ð1gÞC BC ÞY T ¼ XSY T

ð3Þ

AA,

In Eq. (3), E is an identity matrix, C CBB, CCC, CAB, CAC and CBC are the constraint matrices, and aA[0,1], gA[0,1] are the weighting parameters. Similar to CCA, the solution of Rank-CCA can also be obtained by computing a generalized eigenvalue decomposition problem, and the solution of Rank-CCA is equivalent to the optimal problem as follows: max wx ,wy

wTx C^ xy wy wTx C xx wx ¼ 1,

s:t:

wTy C yy wy ¼ 1

ð4Þ

The corresponding Lagrangian is Lðl1 , l2 ,wx ,wy Þ ¼ wTx C^ xy wy

l1 h 2

i l h i 2 wTx C^ xx wx 1 wTy C^ yy wy 1 2

ð5Þ Taking derivatives with respect to wx and wy, we obtain i @L l1 h C xx wx þ C Txx wx ¼ C^ xy wy l1 C xx wx ¼ 0 ¼ C^ xy wy @wx 2 ð6Þ i @L l2 h C yy wy þ C Tyy wy ¼ C^ yx wx l2 C yy wy ¼ 0 ¼ ðwTx C^ xy ÞT @wy 2 ð7Þ Subtracting wTx times the ﬁrst equation from wTy times the second, we have wTx C^ xy wy l1 wTx C xx wx ¼ 0

ð8Þ

wTy C^ yx wx l2 wTy C yy wy ¼ 0

ð9Þ

0 ¼ wTy C^ yx wx l2 wTy C yy wy wTx C^ xy wy þ l1 wTx C xx wx

where C^ xy ¼ XY T þ g

X

ðx yT þ xj yTi Þ þ ðx ,y Þ2ðA,AÞ i j i

X

þa

ðxi ,yj Þ2ðA,BÞ

j

ðxi yTj þ xj yTi Þ ð1gÞ

X

ðx yT þxj yTi Þ ðx ,y Þ2ðB,BÞ i j i

X

j

ðx yT ðxi ,yj Þ2ðC,CÞ i j

þxj yTi Þ

¼ l2 wTy C yy wy þ l1 wTx C xx wx ¼ l2 þ l1 ¼ 0

ð10Þ

Let l1 ¼ l2 ¼ l, assuming Cxx is invertible, we have wx ¼

1

l

^ C 1 xx C xy wy

ð11Þ

2356

Z. Ji et al. / Signal Processing 93 (2013) 2352–2360

and so substituting in Eq. (9) gives 2 ^ C^ yx C 1 xx C xy wy ¼ l C yy wy

ð12Þ

Similarly, we can have 2 ^ C^ xy C 1 yy C yx wx ¼ l C xx wx

ð13Þ

Then we employ singular value decomposition (SVD) to solve Rank-CCA equation following Ref. [43]. Let 1=2 ^ 1=2 Z ¼ C xx , u ¼ C xx wx , v ¼ C 1=2 C xy C 1=2 yy yy wy , Eqs. (12) and (13) can be rewritten as ( 2 ZZ T u ¼ l u ð14Þ 2 T Z Zv ¼ l v P Let Z ¼ UDV T ¼ di¼ 1 ui vTi be the SVD decomposition of matrix Z, where the ith diagonal element of diagonal matrix D is just li, ui and vi are the ith row of matrices U and V, respectively, corresponding to singular value li, we have 8 < wxi ¼ C 1=2 ui xx ð15Þ 1=2 : wyi ¼ C yy vi From Eq. (15), we obtain the ith (i¼ 1,y,d, d rmin(p,q)) pairs of basis vector of Rank-CCA. After obtaining eigenvectors Wx ¼[wx1,y,wxp]ARp p and Wy ¼[wy1,y,wyq]ARq q, we get new transform eigenvectors W x0 ¼ ½wx1 ,. . .,wxd 2 Rpd , W y0 ¼ ½wx1 ,. . .,wxd 2 Rqd by taking the ﬁrst d (d rmin(p, q) eigenvectors from Wx and Wy. Then for any sample (x, y), we can extract features as [W Tx0 x, W Ty0 y]. The pseudo-code of Rank-CCA algorithm is summarized in Table 1. 5. Experiments on visual search reranking with Rank-CCA In this section, we evaluate the performance of the proposed Rank-CCA algorithm in image visual search reranking task. We ﬁrst introduce the dataset and methodologies, and then demonstrate the effectiveness of Rank-CCA algorithm from two aspects. On one hand, we label the relevance degree to some images manually. With these true labeled data and unlabeled data, we demonstrate that our proposed Rank-CCA algorithm is superior to standard CCA [26] and Semi-CCA [25] algorithms in ranking applications. On the other hand, we

adopt a pseudo-relevance feedback method to label the images automatically, which aims to prove that Rank-CCA based visual search reranking algorithm is comparable to some state-of-the-art methods with pseudo-true label data and unlabeled data. Note that for all the statistical experiments, we repeat them for three times and report the average results. 5.1. Dataset and methodologies We conduct experiments on the publicly available MSRA-MM image dataset [44] which consists of 68 popular queries collected from the image search engines of Microsoft Bing Search. These queries cover a wide variety of categories, including objects, people, event, entertainments, and location. For each query, about top 1000 images along with the surrounding texts are collected. As a result, the dataset contains 65,443 images in total. The rank orders of these images are obtained as the initial ranked lists. In the dataset, each image to the corresponding query was manually labeled with three levels: (0) ‘‘irrelevant,’’ (1) ‘‘relevant,’’ and (2) ‘‘very relevant.’’ To evaluate the ranking performance, Normalized Discounted Cumulative Gain (NDCG) is adopted, which is widely used in information retrieval tasks, especially when there are more than two relevance levels [45]. Given a query q, the NDCG score at the depth d in the ranked documents is deﬁned by NDCG@d ¼ Z d

Xd

j

2r 1 j ¼ 1 logð1 þ jÞ

where rj is the rating of the jth document, Zd is a normalization constant and is chosen so that a perfect ranking’s NDCG@d value is 1. To make our results reproducible and to enable comparisons with other approaches, we adopt the provided features. Speciﬁcally, we select 144D color correlogram and 128D wavelet texture as the image features since they come from different modalities. And they are x and y in Eq. (3). In addition, we used top 500 images in the initial search results for reranking in our experiments, since it is typical that there are very few relevant images after the top 500 search results [12]. The framework of the proposed Rank-CCA based image visual search reranking method is given in Fig. 2. Take the

Table 1 The algorithm of Rank-CCA. Input:

Training data: Si(iA[1,n]), of which SA, SB, SC are the labeled subset for sets A, B and C, respectively. Parameters: a and g

Output: Projection vectors: W x0 ¼ ½wx1 ,. . .,wxd 2 Rpd and W y0 ¼ ½wx1 ,. . .,wxd 2 Rqd Step 1: Centering data: x0i ¼ xi x, y0i ¼ yi y, ði ¼ 1,. . .,nÞ, X 0 ¼ ½x01 ,. . .,x0n , Y 0 ¼ ½y01 ,. . .,y0n Step 2: Initialize constraint matrices CAA, CBB, CCC, CAB, CAC and CBC to be n n zero matrix Step 3: Compute constraint matrices Set the corresponding position in constraint matrices to be ‘‘1’’ according to the original position i and j of x0i and y0i Do C AA ¼ C TAA , C BB ¼ C TBB , C CC ¼ C TCC , C AB ¼ C TAB , C AC ¼ C TAC , C BC ¼ C TBC Step 4: Compute covariance matrices Cxx ¼X0 X0 T, Cyy ¼Y0 Y0 T, C^ xy ¼ X 0 SY 0T S ¼ E þ gC AA þ gC BB þ gaC AB ð1gÞC CC ð1gÞC AC ð1gÞC BC Step 5: Compute matrix Z ¼ C 1=2 C^ xy C 1=2 xx

yy

Step 6: Perform SVD decomposition Z¼ UDVT Step 7: Choose [u1,y,ud] and [v1,y,vd], d o n Step 8: Obtain W x0 ¼ C 1=2 ½u1 ,. . .,ud , W y0 ¼ C 1=2 ½v1 ,. . .,vd xx yy

ð16Þ

Z. Ji et al. / Signal Processing 93 (2013) 2352–2360

2357

Fig. 2. Framework for Rank-CCA based image visual search reranking illustrated with the query ‘‘boy’’.

1 Baseline

0.9

CCA

Semi-CCA

Rank-CCA

0.8 0.7 0.6

10

0

90

@ G C D N

D

C

G

@ N

N

D

C

G

G C

@

80

70 @

60 D N

D

C

G

@ N

N

D

C

G

G C D N

@

50

40 @

30 @ G C

D N

G C D N

N

D

C

G

@

@

20

10

0.5

Fig. 3. Performance comparisons of different dimensionality reduction approaches with manually labeled data.

query term ‘‘boy’’ as an example. When ‘‘boy’’ is submitted to the web image search engine, an initial textbased search result is returned to the user (only the top ten images are given for illustration). The result is unsatisfactory because some woman and cartoon images are retrieved as top results. To rerank these images, multimodal features are ﬁrst extracted to represent their visual contents. And then, since there is generally no explicit training data, a manual labeling or pseudo-relevance feedback mechanism is adopted to label some data with relevance degrees. Both labeled and unlabeled data are employed in the proposed Rank-CCA algorithm to reduce the image features’ dimensionality. Next, Ranking SVM algorithm [46] is taken to use the labeled data as training data to build a ranking function. Finally, all the images are reranked with the reduced low dimensional features and ranking function. In Rank-CCA, Semi-CCA and standard CCA algorithms, we set d ¼30, i.e. reduce the original feature dimension to 30D. We randomly select k¼10 labeled images from each relevance degree group. The trade-off parameter C is set to be 0.1 in the Ranking SVM model. In addition, a and g are set to 0.4 and 0.8, respectively, in Rank-CCA. 5.2. Visual search reranking with manually labeled data In this section, we demonstrate the performance of Rank-CCA with manually labeled data in image visual search reranking task. Fig. 3 presents the NDCG results

with different depths for algorithms of CCA, Semi-CCA and Rank-CCA, which use the same training data. The baseline refers to the performance of the initial text-based search result. From the ﬁgure we can observe that (1) The proposed Rank-CCA algorithm outperforms the others. (2) Both Rank-CCA and Semi-CCA have signiﬁcant performance improvements to the baseline and standard CCA, which demonstrate pairwise constraints are effective to enhance the discriminative power of original features. (3) The superiority of Rank-CCA to Semi-CCA identiﬁes the relevance constraints are more suitable than must-link and cannot-link constraints in ranking tasks. (4) The performance differences between Rank-CCA and the others decline with the increase of the depth. However, it becomes stable when the depth is greater than 40. This phenomenon demonstrates that Rank-CCA is a robust algorithm and can achieve steady performance gain. We then evaluate the impact of weighting parameters a and g in Eq. (3). Fig. 4 depicts the performance of the Rank-CCA based reranking method with different a and g ranging from 0.0 to 1.0 in terms of NDCG@50. From the ﬁgure, we discover that the best weight for g is 0.8, which means that the relevant constraints are more important than irrelevant constraints. Even if we only use the relevant constrains (g ¼1.0), the performance is relatively better than most of the other cases. Our conjecture is that relevant constraints are more helpful to get the ‘‘visual pattern’’ [12,32] which is a basic assumption in reranking applications. It can also be found that 0.4 is the best

2358

Z. Ji et al. / Signal Processing 93 (2013) 2352–2360

choice for a. It conﬁrms the fact the relevant constraint A– B has a smaller impact than A–A and B–B. Further, the inﬂuence of a different labeled number k is investigated in Fig. 5. We can see that the performance steadily rises with the increase of k. It demonstrates that more constraints in number can bring more improvement in performance. Meanwhile, it also conﬁrms that RankCCA is superior to Semi-CCA.

pseudo-relevance feedback (PRF). PRF is initially introduced in [47], and has been shown to be effective in improving initial text search results in both text and video retrieval [11,37]. We modify the typicality-based idea developed by Liu et al. [37] to carry out PRF. More speciﬁcally, the initial search results are ﬁrst clustered into a set of clusters with afﬁnity propagation [48]. And then, pseudo-relevance scores of each sample are obtained by combining the cluster typicality and local typicality measures [37], which are decided by visual similarities and initial order information. Finally, the samples are ordered by descending pseudo-relevance scores, from which we select the labeled data from the top, middle and bottom. To demonstrate the effectiveness of the proposed Rank-CCA method in reranking task, we implement the performance comparison with the following three stateof-the-art reranking algorithms, as illustrated in Fig. 6. In the three approaches, the parameters are selected to achieve the best performance, and the results are obtained from the bar histogram presented in [12]

5.3. Visual search reranking with pseudo-relevance feedback In this section, we demonstrate the performance of Rank-CCA with training data automatically acquired with α

0.71

γ

0.7 0.69 0.68 0.67 0.66

(a) Harvesting reranking (Harvesting-TV) [11]: A typical supervised reranking method used both the textual and visual features, and also utilized PRF approach to get training data. (b) Co-reranking (Co-reranking-TV) [12]: A typical graphbased reranking method which reinforced the textual and visual information mutually via coupling two random walk graphs. (c) Context reranking (Context-V) [13]: A classic graphbased reranking method which made use of random walk with visual features to perform reranking.

0.65 0.64 1

Fig. 4. Performance comparisons of different weighting parameters a and g.

It should be mentioned that the visual features employed in the three methods were mainly based on bag-of-visual-words (BOW). From Fig. 6, we observe that the proposed Rank-CCA method has comparable performance with the other methods. Co-reranking is better than ours, because it employed a novel mutual reinforcement mechanism with both text and visual features. Only in terms of visual features, the BOW feature it used is up to 2000D. However, ours is only 60D from 272D low-level features. Moreover, the performance of Rank-CCA is similar to Harvesting reranking and Context reranking

10

0

90 C D N

C

Fig. 6. Performance comparisons of different algorithms.

G

G

@

@

80

Co-reranking-TV

D

60 @ C

D N

G C D N

Rank-CCA

G

@

@

50

40

Harvesting-TV

G N

D

C

G

@

20 C D N

N

D

C

G

@

10 @ G C D N

Context-V

30

Baseline

0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54 0.53 0.52

@

Fig. 5. Performance comparisons of a different labeled number k.

N

0.9

G

0.8

C

0.7

70

0.6

D

0.5

@

0.4

N

0.3

G

0.2

C

0.1

D

0

N

0.63

Z. Ji et al. / Signal Processing 93 (2013) 2352–2360

2359

Fig. 7. Top ten results of the queries (a) ‘‘Hawaii’’, (b) ‘‘Baby’’, (c) ‘‘Cake’’ before and after reranking, of which the ﬁrst row is original search results, and the second row is the results after reranking.

methods, which indicates the effectiveness of the relevance constraints in feature dimensionality reduction. Fig. 7 shows the top ten images of some queries: ‘‘Hawaii’’, ‘‘Baby’’ and ‘‘Cat.’’ We can observe that the proposed Rank-CCA method gets satisfactory reranking results.

employ these dimensionality reduction algorithms to perform some other ranking tasks, such as question answering and multimedia summarization.

6. Conclusions

The work was supported by the National Natural Science Foundation of China (Grant nos. 60975001, 61172121, 61170239), the Tianjin Research Program of Application Foundation and Advanced Technology (Grant nos. 10JCYBJC07700, 09JCYBJC00900), the Specialized Research Fund for the Doctoral Program of Higher Education (no. 20090032110028), the Innovation Foundation of Tianjin University (no. 60302019), and the Program for New Century Excellent Talents in University (no. NCET10–0620). We would like to thank Linjun Yang in MSRA for providing the public MSRA-MM Dataset.

This paper has proposed an effective and practical algorithm for dimensionality reduction in ranking tasks. The proposed Rank-CCA algorithm introduces relevance constraints into CCA method, and achieves superior performance than standard CCA and Semi-CCA algorithms. In addition, only employing low-level and low-dimensional visual features, Rank-CCA performs similarly to several state-of-the-art methods in image visual search reranking application. Further, we will incorporate the relevance constraints into some extensions of CCA, such as kernel CCA and sparse CCA to handle more complicated circumstances. Multiple-view dimensionality reduction methods, such as the method of Multiview Spectral Embedding (MSE) proposed in [9], are also very effective to deal with multiple modalities in multimedia data, and therefore will be one of our study directions. And moreover, we will

Acknowledgments

References [1] L.J.P. van der Maaten, E.O. Postma, H.J. Herik, Dimensionality Reduction: A Comparative Review, Tilburg University Technical Report, TiCC-TR 2009-005, 2009. [2] S.C. Yan, D. Xu, B.Y. Zhang, Graph embedding and extensions: a general framework for dimensionality reduction, IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (1) (2007) 40–51.

2360

Z. Ji et al. / Signal Processing 93 (2013) 2352–2360

[3] D. Zhou, Z.M. Tang, A modiﬁcation of kernel discriminant analysis for high-dimensional data—with application to face recognition, Signal Processing 90 (8) (2010) 2423–2430. [4] D.Q. Zhang, Z.H. Zhou, S.C. Chen, Semi-supervised dimensionality reduction, in: Proceedings of the SIAM International Conference on Data Mining, 2007, pp. 629–634. [5] M.B. Blaschko, J.A. Shelton, A. Bartels, et al., Semi-supervised kernel canonical correlation analysis with application to human fMRI, Pattern Recognition Letters 32 (11) (2011) 1572–1583. [6] J. Yu, D.C. Tao, M. Wang, Adaptive hypergraph learning and its application in image classiﬁcation, IEEE Transactions on Image Processing, /http://dx.doi.org/10.1109/TIP.2012.2190083S. [7] J. Yu, D.Q. Liu, D.C. Tao, H.S. Seah, Complex object correspondence construction in 2D animation, IEEE Transactions on Image Processing 20 (11) (2011) 3257–3269. [8] C.P. Hou, C.S. Zhang, Y Wu, F.P. Nie, Multiple view semi-supervised dimensionality reduction, Pattern Recognition 43 (2010) 720–730. [9] T. Xia, D.C. Tao, T. Mei, Y.D. Zhang, Multiview spectral embedding, IEEE Transactions on Systems, Man, and Cybernetics, Part B 40 (6) (2010) 1438–1446. [10] T.Y. Liu, Learning to Rank for Information Retrieval, Springer Press, Berlin, 2011. [11] F. Schroff, A. Criminisi, A. Zisserman, Harvesting image databases from the web, in: Proceedings of the IEEE International Conference on Computer Vision, 2007, pp. 1–8. [12] T. Yao, T. Mei, C.W. Ngo, Co-reranking by mutual reinforcement for image search, in: Proceedings of the ACM International Conference on Image and Video Retrieval, 2010, pp. 34–41. [13] W. Hsu, L. Kennedy, S.F. Chang, Video search reranking through random walk over document-level context graph, in: Proceedings of the ACM International Conference on Multimedia, 2007, pp. 971–980. [14] Z.J. Zha, L.J. Yang, T. Mei, M. Wang, Z.F. Wang, Visual query suggestion, in: Proceedings of the ACM Multimedia Conference, 2009, pp. 15–24. [15] M. Wang, X.S. Hua, J.H. Tang, R.C. Hong., Beyond distance measurement: constructing neighborhood similarity for video annotation, IEEE Transactions on Multimedia 11 (3) (2009) 465–476. [16] M. Wang, X.S. Hua, R.C. Hong, et al., Uniﬁed video annotation via multi-graph learning, IEEE Transactions on Circuits and Systems for Video Technology 19 (5) (2009) 733–746. [17] M. Wang, X.S. Hua, T. Mei, et al., Semi-supervised kernel density estimation for video annotation, Computer Vision and Image Understanding 113 (3) (2009) 384–396. [18] X.F. He, D. Cai, J.W. Han, Learning a maximum margin subspace for image retrieval, IEEE Transactions on Knowledge and Data Engineering 20 (2) (2008) 189–201. [19] Y.W. Pang, Q. Hao, Y. Yuan, T. Hu, R. Cai, L. Zhang, Summarizing tourist destinations by mining user-generated travelogues and photos, Computer Vision and Image Understanding 115 (3) (2011) 352–363. [20] R.C. Hong, J.H. Tang, H.K. Tan, et al., Beyond search: event driven summarization for web videos, ACM Transactions on Multimedia Computing, Communications and Applications 2 (3) (2010) 1–21. [21] D. Liu, S.C. Yan, X.S. Hua, H.J. Zhang, Image retagging using collaborative tag propagation, IEEE Transactions on Multimedia 13 (4) (2011) 702–712. [22] D.R. Hardoon, S Szedmak, J.R. Shawe-Taylor, Canonical correlation analysis: an overview with application to learning methods, Neural Computation (16) (2004) 2639–2664. [23] T.K. Kim, S.F. Wong, R. Cipolla, Tensor canonical correlation analysis for action classiﬁcation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8. [24] W.Z. Zheng, X.Y. Zhou, C.R. Zou, L. Zhao, Facial expression recognition using kernel canonical correlation analysis, IEEE Transactions on Neural Networks 17 (1) (2006) 233–238. [25] Y. Peng, D.Q. Zhang, Semi-supervised kernel canonical correlation analysis, Journal of Software 19 (11) (2008) 2822–2832. [26] H. Hotelling, Relations between two sets of variates, Biometrika 28 (1936) 312–377.

[27] H.X. Wang, Local two-dimensional canonical correlation analysis, IEEE Signal Processing Letters 17 (11) (2010) 921–924. [28] D.R. Hardoon, J. Shawe-Taylor, Sparse canonical correlation analysis, Machine Learning Journal 83 (3) (2011) 331–353. [29] Y. Yuan, Q.S. Sun, Q. Zhou, D.S. Xia, A novel multiset integrated canonical correlation analysis framework and its application in feature fusion, Pattern Recognition 44 (5) (2011) 1031–1040. [30] X.Y. Jing, S. Li, C. Lan, et al., Color image canonical correlation analysis for face feature extraction and recognition, Signal Processing 91 (8) (2011) 2132–2140. [31] A. Kimura, H. Kameoka, M. Sugiyama, et al., SemiCCA: efﬁcient semi-supervised learning of canonical correlations, in: Proceedings of the International Conference on Pattern Recognition, 2010, pp. 2933–2936. [32] W. Hsu, L. Kennedy, S.F. Chang, Video search reranking via information bottleneck principle, in: Proceedings of the ACM International Conference on Multimedia, 2006, pp. 35–44. [33] S.K. Wei, Y. Zhao, Z.F. Zhu, et al., Multimodal fusion for video search reranking, IEEE Transactions on Knowledge and Data Engineering (2010) 1191–1199. [34] Y.S. Jing, S. Baluja, VisualRank: applying pagerank to large-scale image search, IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (11) (2008) 1877–1890. [35] X.G. Wang, K. Liu, X.O. Tang, Query-speciﬁc visual semantic spaces for web image re-ranking, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 857–864. [36] X.M. Tian, D.C. Tao, X.S. Hua, X.Q. Wu, Active reranking for web image search, IEEE Transactions on Image Processing 19 (3) (2010) 805–820. [37] Y. Liu, T. Mei, X.-S. Hua, J.H. Tang, X.Q. Wu, S.P. Li, Learning to video search rerank via pseudo preference feedback, in: Proceedings of the IEEE International Conference on Multimedia and Expo, 2008, pp. 207–210. [38] Y. Liu, T. Mei, X.S. Hua, Crowd Reranking: exploring multiple search engines for visual search reranking, in: Proceedings of the ACM Conference on Research and Development in Information Retrieval, 2009, pp. 500–507. [39] L. Kennedy, S.F. Chang, A reranking approach for context-based concept fusion in video indexing and retrieval, in: Proceedings of the International Conference on Image and Video Retrieval, 2007, pp. 333–340. [40] L. Zhang, T. Mei, Y. Liu, D.C. Tao, H.Q. Zhou, Visual search reranking via adaptive particle swarm optimization, Pattern Recognition 44 (2011) 1811–1820. [41] M. Wang, K.Y. Yang, X.S. Hua, et al., Towards a relevant and diverse search of social images, IEEE Transactions on Multimedia 12 (8) (2010) 829–842. [42] Z. Ji, Y.T. Su, Y.W. Pang, X.J. Qu, Diversifying the image relevance reranking with absorbing random walks, in: Proceedings of the International Conference on Image and Graphics, 2011, pp. 981– 986. [43] T. Sun, S. Chen, J.Y. Yang, P. Shi, A novel method of combined feature extraction for recognition, in: Proceedings of the IEEE Conferences on Data Mining, 2008, pp. 1043–1048. [44] M. Wang, L.J. Yang, X.S. Hua, MSRA-MM: bridging research and industrial societies for multimedia information retrieval, Microsoft Technical Report (MSR-TR-2009-30), 2009, pp. 1–14. [45] K. Jarvelin, J. Kekalainen., IR evaluation methods for retrieving highly relevant documents, ACM Special Interest Group on Information Retrieval (2000) 41–48. [46] R. Herbrich, K. Obermayer, T. Graepel, Large margin rank boundaries for ordinal regression, Advances in Large Margin Classiﬁers (2000) 115–132. [47] J.G. Carbonell, Y.M. Yang, R.E. Frederking, et al., Translingual information retrieval: a comparative evaluation, in: Proceedings of the International Joint Conference on Artiﬁcial Intelligence, 1997, pp. 1–7. [48] B.J. Frey, D. Dueck, Clustering by passing messages between data points, Science 315 (5814) (2007) 972–976.

Rank canonical correlation analysis and its application in visual search reranking

Rank canonical correlation analysis and its application in visual search reranking

Recommend Documents