Robust image retrieval with hidden classes

Computer Vision and Image Understanding 117 (2013) 670–679 Contents lists available at SciVerse ScienceDirect Computer Vision and Image Understandin...

Download PDF

894KB Sizes 3 Downloads 83 Views

Report

PDF Reader
Full Text

Computer Vision and Image Understanding 117 (2013) 670–679

Contents lists available at SciVerse ScienceDirect

Computer Vision and Image Understanding journal homepage: www.elsevier.com/locate/cviu

Robust image retrieval with hidden classes q Jun Zhang a,*, Lei Ye b, Yang Xiang a, Wanlei Zhou a a b

School of Information Technology, Deakin University, Melbourne, Australia School of Computer Science and Software Engineering, University of Wollongong, Wollongong, Australia

a r t i c l e

i n f o

Article history: Received 6 April 2011 Accepted 24 February 2013 Available online 7 March 2013 Keywords: Content-based image retrieval Hidden classes Robust image retrieval Image classiﬁcation Novel query detection

a b s t r a c t For the purpose of content-based image retrieval (CBIR), image classiﬁcation is important to help improve the retrieval accuracy and speed of the retrieval process. However, the CBIR systems that employ image classiﬁcation suffer from the problem of hidden classes. The queries associated with hidden classes cannot be accurately answered using a traditional CBIR system. To address this problem, a robust CBIR scheme is proposed that incorporates a novel query detection technique and a self-adaptive retrieval strategy. A number of experiments carried out on the two popular image datasets demonstrate the effectiveness of the proposed scheme. Ó 2013 Elsevier Inc. All rights reserved.

1. Introduction Content-based image retrieval (CBIR) is an active research area. The aim of various CBIR systems is to search images by analyzing their content. Images are normally described by their low-level features such as color, texture and shape [1,2]. In the literature, a signiﬁcant amount of research has been conducted relating to CBIR [3,4]. However, the robustness of CBIR systems has not been sufﬁciently investigated even though the topic of robustness has been explored extensively in traditional information retrieval [5]. We have already identiﬁed and addressed unclean queries as a problem of robustness [6], however in this paper, we will study the hidden class problem of CBIR systems employing image classiﬁcation as preprocessing. The application of image classiﬁcation techniques into a CBIR system results in a user’s queries being answered with images in predeﬁned classes, thus helping to improve retrieval accuracy and speed. However, in a large-scale image collection, some image classes may be unseen [4]. We call these hidden classes as opposed predeﬁned classes. The existence of hidden classes severely affects the retrieval accuracy of image classiﬁcation based CBIR systems. There are two approaches that can address the problem of robustness. One approach is detecting hidden classes at the stage of preprocessing in order to avoid the problem of hidden classes when answering a query. The second approach is to take hidden classes into account when answering a query because different retrieval q

This paper has been recommended for acceptance by Chung-Sheng Li.

⇑ Corresponding author.

E-mail address: [email protected] (J. Zhang). 1077-3142/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.cviu.2013.02.008

strategies can be adopted for different queries. We decided upon the second approach because it is too difﬁcult to detect hidden classes during preprocessing without extra information. Under the query-by-example (QBE) paradigm, there are three problems that arise due to hidden classes. When considering hidden classes, a user’s queries can be divided into two categories; a common query and a novel query. Fig. 1 illustrates a hidden class, common query and novel query. A common query can be answered using a predeﬁned image class because relevant images of the common query have been gathered in this class. A novel query is associated with a hidden class and it cannot be answered using any predeﬁned image classes. The ﬁrst problem is how to identify whether a query is a common or novel query. This determination will inﬂuence the retrieval strategy. The second problem is how to predict a relevant predeﬁned image class for a common query. The third problem is how to perform image retrieval for a novel query if it is not associated with any predeﬁned image class. The solutions to these problems will result in a new retrieval scheme that can manage the problem of hidden classes. In this paper, we aim to address the critical problem of hidden classes in CBIR systems. Our major contributions are summarized as follows. We propose a robust CBIR scheme that can incorporate multiimage queries and a support vector machine (SVM) to effectively deal with hidden classes. We develop a novel query detection technique to determine whether a user’s query is a common or novel query, therefore making it feasible to consider hidden classes in the retrieval process.

J. Zhang et al. / Computer Vision and Image Understanding 117 (2013) 670–679

671

Fig. 1. Illustration of the problem of hidden classes.

We develop a self-adaptive retrieval strategy. For a common query, a relevant predeﬁned image class will be predicted and the within images are ranked. For a novel query, a new method is proposed to ﬁlter out the irrelevant images before image ranking. Finally, a number of experiments that were carried out on a Corel image dataset and the NUS-WIDE-LITE dataset [23] demonstrate the effectiveness of the proposed scheme. In particular, the improvement on precision depends on the number of hidden classes, with over 10% achieved. The remainder of this paper is organized as follows: Section 2 reviews related work. Section 3 presents the novel CBIR scheme and discussion is provided in Section 4. In Section 5, the experimental evaluation and results are reported and the conclusion to this paper is presented in Section 6.

2. Related work Image classiﬁcation improves the accuracy and speed of a content based image retrieval (CBIR) system [4]. Images in a collection can be categorized by supervised image classiﬁcation using predeﬁned image classes. For a given query, the retrieval results of a CBIR system are generated by ﬁrst locating the most relevant image followed by ranking the images within the class [7,4]. It should be noted that image classiﬁcation is not necessary for all CBIR systems. A CBIR system can be entirely based on similarity retrieval without any classes. This paper focuses on CBIR systems which perform classiﬁcation ﬁrst. A signiﬁcant amount of research has been undertaken with the aim of improving the performance of image classiﬁcation [8]. One such approach has been to develop new image matching methods and incorporate them into the training process of a multiclass classiﬁer. For instance, spatial pyramid matching was proposed and incorporated into SVM for natural image classiﬁcation [9]. Considering the trade-off between discriminative power and invariance differing from task to task, a kernel

learning method was proposed in order to achieve different levels of trade-off for image classiﬁcation [10]. To perform object localization, an efﬁcient subwindow search method was proposed [11] that can be combined with a spatial pyramid kernel to improve the multiclass classiﬁer. Another approach has been to directly enhance a multiclass classiﬁer by considering the characteristics of real applications. For instance, a hybrid method was proposed to combine the nearest neighbor classiﬁer and the support vector machine [12], thus helping to overcome several problems of the two individual methods. A self-taught learning method was proposed to use the unlabeled images randomly downloaded from the Internet to improve the performance on a speciﬁc image classiﬁcation task [13]. In defense of the nearest-neighbor (NN) based image classiﬁcation, a naive-Bayes nearest neighbor classiﬁer was proposed to demonstrate the effectiveness of non-parametric NN methods [14]. Certain scholarship has also addressed a similar problem of unknown concepts in the semantic space. A combination scheme of query by multi-example and semantic retrieval was proposed to alleviate the inﬂuence of unknown concepts to semanticbased image retrieval [15]. To bridge the gap between a limited number of learned concept detectors and the full vocabulary a user has, an automatic video retrieval method [16] was proposed by building a set of machine learned concept detectors that were enriched with semantic descriptions and semantic structure obtained from WordNet. Other works have attempted to address the unknown concepts related problems using image classiﬁcation. For example, a novel sparse graph based semi-supervised learning approach was proposed [17] for harnessing the labeled and unlabeled data simultaneously for the purpose of inferring the images’ semantic concepts more accurately. To promote image annotation performance, a correlative linear neighborhood propagation method was proposed by adapting the hidden semantic correlation into graph-based semi-supervised learning [18]. Considering any ambiguous or unknown concepts in the query, IntentSearch [19] was proposed as a simpliﬁed version of active reranking to capture the user’s intention more accurately. The user’s intention is deﬁned by only one query image

672

J. Zhang et al. / Computer Vision and Image Understanding 117 (2013) 670–679

in IntentSearch, and does not work as well when the user’s intention is too complex to be represented by one image. An enhanced active reranking scheme [20] was proposed to employ a structural information based sample selection strategy for reducing the user’s labeling efforts. This localizes the user’s intention in the visual feature space by a local–global discriminative dimension reduction algorithm. Our work is different as our research focuses on the problem of hidden classes in the CBIR systems that employ image classiﬁcation. Firstly, the unknown concepts in the semantic space are not equal to the hidden classes in a categorized image collection because they pose different perspectives. The unknown concepts are derived from semantic meaning while the hidden classes are based on visual similarity. The existing solutions for unknown concepts cannot be applied to deal with hidden classes. In fact, it remains unsolved as to how to identify and answer the queries associated with hidden classes that can dramatically affect the retrieval performance. In addition to this, our work aims to propose a new CBIR scheme that can handle hidden classes in a single interaction without the process of relevance feedback or active reranking.

3. A robust CBIR scheme This section describes the proposed CBIR scheme as illustrated in Fig. 2. Due to hidden classes, a common query and a novel query are two types of queries requiring different retrieval strategies. In this work, novel query detection is proposed to determine if a query is a common query or a novel query. Following this, different types of queries can be answered using different image ranking methods. To support different ranking strategies, a new preprocessing is developed. Let us consider an image collection X containing N images, X = {x1, x2, . . . , xN}. The content of each image is represented by a low-level feature vector. Assuming there are M predeﬁned image classes, {x1, . . . , xM}, and a set of training samples, Si, is available

for each predeﬁned image class xi. A user’s query consists of multi^1 ; . . . ; x ^ L g. ple example images, Q ¼ fx 3.1. Preprocessing Preprocessing is designed to train a set of classiﬁers for the predeﬁned image classes. In contrast to conventional schemes, image classiﬁers are specialized to address the problems arising from hidden image classes. At the stage of novel query detection, the classiﬁers are modiﬁed and combine to construct a novel query detector. At the stage of image ranking for a common query, the classiﬁers combine to predict the relevant class. At the stage of image ranking for a novel query, the classiﬁers are modiﬁed and combine to ﬁlter out strongly irrelevant images. Taking hidden classes into account, a new two-step strategy is proposed for classiﬁer training. In the ﬁrst step, we train a set of weak classiﬁers for the predeﬁned image classes. For the ith image class, the training samples, Si, are regarded as positive samples, and the training samples of other image classes, {Sj}, j – i, are regarded as the negative samples. All positive and negative samples merge to train a two-class SVM [24,25]. The decision function is ~f i ðxÞ, which is used to express this SVM. We can use ~f i ðxÞ to compute a decision value for an image and this decision value can be converted to a probability [26] or a binary value [25]. The weak classiﬁers are then applied to ﬁlter the image collection to obtain a e , which do not belong to any predeﬁned image set of images, W class with a high probability. The ﬁltering rule is

max~f i ðxÞ < 0; i

i 2 ½1; M:

In the second step, we re-train a set of strong classiﬁers for the predeﬁned image classes. The difference from the ﬁrst step is that we e as the negative samples, with randomly selected images from W

Query

Preprocessnig

Novel Query Detection

ð1Þ

No

Yes

Image Retrieval for Novel Query

Image Retrieval for Common Query

Retrieval Results Display Fig. 2. Retrieval with hidden classes.

673

J. Zhang et al. / Computer Vision and Image Understanding 117 (2013) 670–679

(

new information included in order to improve the classiﬁers. For the ith image class, the training samples, Si, were used as the positive samples. The training samples of other image classes, {Sj}, j – i, e , were used as the negand some randomly selected images from W ative samples. All positive and negative samples were combined to train an SVM classiﬁer, fi(x). Consequently, a set of SVMs, {f1(x), . . . , fM(x)}, can be prepared for predeﬁned image classes. The two-step training strategy is necessary to address the hidden class problem. In the ﬁrst step, all training samples come from predeﬁned image classes, meaning that the training set will not include the information of hidden classes. In this sense, the trained classiﬁers are weak. In the second step, some new samples were e . Since the extended training set conrandomly selected from W tains the information of hidden classes, the trained classiﬁers become stronger.

Pðxi jxÞ ¼ 1

The ﬁrst method applies MVR to extend the NQD-Single meth^ , the vote associated ods, namely NQD-MVR. For a query image x to the ith predeﬁned image class can be calculated as

^ Þ TÞ: V i ðxÞ ¼ signðfi ðx

ð2Þ

^1 ; . . . ; x ^L g and the detection Given a multi-image query Q ¼ fx threshold T, the novel query detection rule for the ith predeﬁned image class is that Q is a novel query to the ith class L X L ^j Þ < : if V i ðx 2 j¼1

i

iffi ðxÞ < 0:

ð5Þ

^1 ; . . . ; x ^L g and the detection Given a multi-image query Q ¼ fx threshold T, for the ith predeﬁned image class, the conditional probability of P(xijQ) can be calculated as

Pðxi jQ Þ ¼

L 1X ^j Þ: Pðxi jx L j¼1

ð6Þ

The novel query detection rule is that, Q is a novel query to the ith class

if Pðxi jQÞ < T:

ð7Þ

The ﬁnal novel query detection rule for multiclass is that, Q is a novel query

i 2 ½1; M:

ð8Þ

The two multi-image query detection methods are evaluated in our experiments. Note that, if a query is not a novel query, it is a common query. 3.3. Image retrieval for a common query For a common query, it can be answered using the images in a predeﬁned image class. The retrieval results are obtained by predicting a relevant image class followed by ranking images in this class. 3.3.1. Relevant class prediction Relevant class prediction can be regarded as an extended multiclass classiﬁcation problem. For a single query image, this is a classic multiclass classiﬁcation problem and the conventional solution can be expressed as

^j Þ; i ¼ arg max Pðxi jx i

ð9Þ

i.e., the relevant class is a predeﬁned image class with the maxi^j . mum posterior probability for x We then extend the conventional method to manage a multiimage query by taking the applied detection strategy into account. With NQD-BSR, the prediction rule can be expressed as L X ^ j Þ: i ¼ arg max Pðxi jx i

ð10Þ

j¼1

With NQD-MVR, the prediction rule can be expressed as L X ^j Þ: i ¼ arg max V i ðx i

ð11Þ

j¼1

If there is more than one image class with maximum votes, Eq. (10) can be applied to solve the tie.

ð3Þ

The ﬁnal novel query detection rule for multiclass is that, Q is a novel query L X L ^j Þ < ; V i ðx 2 j¼1

1 ; 1þexpðfi ðxÞÞ

if maxi Pðxi jQÞ < T;

Novel query detection is proposed to identify a query as a common query or a novel query as different retrieval strategies will be applied to answer different queries. In the proposed scheme, novel query detection is achieved by extending traditional novelty detection techniques [27]. Traditional novelty detection is formulated as a two-class classiﬁcation problem using random rejects as negative training samples [27–29], however these methods do not support multi-image queries. In the CBIR ﬁeld, a multi-image query [21,22,15] is preferable because this can express the user’s search intent better than a single image query. We incorporate multi-image queries and the combination rules to achieve accurate novel query detection. Firstly, we construct an SVM-based novel query detector for a ^Þ < T; x ^ is a novel single query image, namely NQD-Single. If fi ðx query to the ith class. T is the predeﬁned detection threshold. In the case of multiclass, the problem of novel query detection can be separated into a set of basic novel query detections. The query ^ is determined as a novel query only if it is a novel image x ^ is a novel query if query for all classes. Therefore, x ^ Þ < T; i 2 ½1; M. maxi f i ðx Now, we construct a novel query detector for multi-image queries by incorporating the NQD-Single method and the combination rules. In this work, two different combination rules, the majority vote rule (MVR) and the Bayes sum rule (BSR) [30,31], are applied to implement the multi-image query detection.

iffi ðxÞ P 0

i

3.2. Novel query detection

if max

1 Pðxi jxÞ ¼ 1þexpðf ; ðxÞÞ

i 2 ½1; M:

ð4Þ

The second method is applying BSR to extend the NQD-Single method, namely NQD-BSR. First, the decision values produced by a SVM classiﬁer are converted into the conditional probabilities using the sigmoid function [26],

3.3.2. Ranking for a common query Ranking images in the relevant class can further improve the retrieval results. First, the images in the relevant class are extracted from the image collection. The images with fi ðxÞ > T are considered as belonging to the relevant class and then ranked according to their relevance to the query. It should be highlighted that this process can be conducted off-line, i.e., the images of predeﬁned image classes can be selected in preprocessing and saved for later use. The next step is to compute relevance between the images in xi and the multi-image query, Q ¼ fx^1 ; . . . ; x^L g. In this paper, we cast it as a one-class classiﬁcation problem [32], i.e., only positive

674

J. Zhang et al. / Computer Vision and Image Understanding 117 (2013) 670–679

^1 ; . . . ; x ^ L g to samples are available. We use multiple query images fx train a one-class SVM [33], fq(x). Then, images in xi are ranked according to the decision values produced by fq(x). Finally, the top k images in the ranking list are returned as the retrieval results. This process is also helpful to improve the user experience because ordinary users normally have more interest in the top ranked images in practical applications.

A novel query does not associate with any predeﬁned image class. A straightforward way is to rank images in the entire collection according to their similarities to the query. However, this approach does not take into account the knowledge of predeﬁned image classes. We observe that if the images are strongly relevant to a predeﬁned image class, they are not relevant to the novel query. Therefore, the proposed approach can improve the retrieval results by ﬁltering out irrelevant images before image ranking. 3.4.1. Image ﬁltering Since the novel query cannot be answered by any predeﬁned image class, it is likely the images in the predeﬁned image classes are not relevant to the novel query and can be ﬁltered out. The prepared image classiﬁers, {f1(x), . . . , fM(x)}, can be utilized to produce a set of images, W, from the whole image collection. The images in W are not relevant to the predeﬁned image classes. These images will be used to answer the novel query. The ﬁltering rule is that, i

i 2 ½1; M;

ð12Þ

put x in W. This image ﬁltering process can be conducted off-line, i.e., W can be created in the stage of preprocessing. 3.4.2. Ranking for a novel query In this paper, we cast image ranking for a novel query as a binary classiﬁcation problem. The query images are used as the positive samples and the training samples for the predeﬁned image classes are used as the negative samples. All positive and negative samples are combined to train a classiﬁer, such as a SVM, for image ranking. However, the positive and negative samples are unbalanced, which inﬂuences the accuracy of a SVM. We apply the asymmetric bagging strategy [34] to construct an ensemble of SVMs and combine the outputs for image ranking. The detailed algorithm is described in Table 1. In this algorithm, the decision value fqi(xj) of an image xj produced by the ith bagging classiﬁer is converted into a conditional probability P(rjfqi, xj) using the sigmoid

Table 1 Ensemble classiﬁer based image ranking. ^1 ; . . . ; x ^L g, SVM classiﬁer V, Input: query image set Q ¼ fx integer B (the number of bagging classiﬁers), training samples for the predeﬁned image classes S ¼ [M i¼1 Si , and the image set W for ranking 1. For i = 1 to B { 2. Ni=random sampling from S, with jNij = jQj 3. fqi(x) = V(Q, Ni) 4. For j = 1 to jWj { 5. Compute fqi(xj) 6. Convert fqi(xj) to P(rjfqi, xj) as Eq. (5)} 7. } 8. } 9. For j = 1 to jWj { P 10. Pðrjxj Þ ¼ 1B Bi¼1 Pðrjfqi ; xj Þ 11. } 12. Rank {xj}, xj 2 W according to P(rjxj) Output: ranked list {xj}, xj 2 W

4. Discussions of implementation This section discusses the implementation of the proposed scheme including utilization of a multi-image query, on-line computation time, and setting of the threshold.

3.4. Image retrieval for a novel query

if max f i ðxÞ < 0;

function in Eq. (5). P(rjfqi, xj) is a probability of the image being relevant to the query predicted by the ith SVM classiﬁer. All decision values of xj produced by the ensemble of SVM classiﬁers are then merged to make the ﬁnal decision using the Bayesian sum rule.

4.1. Utilization of multi-image query In the proposed scheme, the utilization of a multi-image query is based on combination rules. Speciﬁcally, this approach is applied to novel query detection and the relevant class prediction. In this section, we formulate the classiﬁcation of multi-image queries in a theoretical framework [30] and show that the extension of multi-image queries is a classiﬁer combination problem [35]. Consider a pattern recognition problem where a pattern (query) Q is to be assigned to one of the M possible classes {x1, . . . , xM}. Let us assume we have a classiﬁer but the given pattern can be represented using L distinct measurement vectors from different ^1 ; . . . ; x ^ L g. This is typical classiﬁer observations (query images), fx combination architecture using repeated measurements [35]. In the measurement space, each class xk is modeled by the probability density function p(xijxk) and its priori probability of occurrence is denoted as P(xk). According to the Bayesian theory, given measurements ^ i ; i 2 ½1; L, the pattern, Q, should be assigned to class xj provided x the a posteriori probability of that interpretation is the maximum, i.e.,

assign Q ! xj

if

^1 ; . . . ; x ^L Þ ¼ max Pðxk jx ^1 ; . . . ; x ^L Þ: Pðxj jx

ð13Þ

k

Then, the combination rules, such as Bayes sum and majority vote can be derived from Eq. (13) under certain speciﬁc conditions [30]. The classiﬁer combination methods can be employed to extend conventional methods for single image queries to multi-image queries. 4.2. On-line computation time For practical applications, the on-line computation time is an important factor. We brieﬂy discuss the on-line computation time concerning the proposed scheme. At the stage of novel query detection, the detection methods depend on the set of SVM classiﬁers for the predeﬁned image classes. The set of SVM classiﬁers can be trained off-line in preprocessing so they will not affect the on-line computation time. Furthermore, the detection methods have a short computation time as they need to handle a small number of query images. At the stage of image retrieval for a common query, the set of SVM classiﬁers can be trained off-line without affecting the on-line computation time. The extended method for relevant class prediction has a short computation time thanks to a small number of query images. Following this, the training of one-class SVM using the query images is extremely fast. The image ranking is completed quickly because only those images in the predicted relevant class are ranked. At the stage of image retrieval for a novel query, image ﬁltering can be conducted off-line. The on-line computation time can be dramatically reduced since only the retained images are used for ranking. Furthermore, the image ranking method has a short

J. Zhang et al. / Computer Vision and Image Understanding 117 (2013) 670–679

computation time because each bagging classiﬁer has only a small number of training samples available and the number of bagging classiﬁers is small.

One Hidden Class 1

True Noval Query Rate

0.95 0.9 0.85 0.8 0.75 0.7

675

4.3. Setting of the threshold The setting of the detection threshold is important for the proposed retrieval scheme. If the threshold is too high, many common queries will be considered novel queries. A low threshold on the other hand, will lead to many novel queries thought of as common queries. Inaccurate detection of queries will inﬂuence the retrieval performance. To study the impact of the threshold on novel query detection, a number of experiments were performed with various thresholds. The experimental results are reported in Section 5.1. Moreover, we applied a practical method [36] in the proposed retrieval scheme to select a proper threshold. We chose a threshold by comparing results on a test set after preprocessing. The test set was specially collected for the purpose of threshold setting and its ground truth was known. 5. Experimental evaluation

0.65 NQD−BSR NQD−MVR NQD−Single

0.6 0.55 0.32

0.33

0.34

0.35

0.36

Threshold Two Hidden Classes 1

True Novel Query Rate

0.95 0.9 0.85

5.1. Evaluation of novel query detection

0.8 0.75 0.7 0.65 NQD−BSR NQD−MVR NQD−Single

0.6 0.55 0.32

0.33

0.34

0.35

0.36

Threshold Three Hidden Classes 1 0.95

True Novel Query Rate

A number of experiments were carried out on two image datasets, Corel [37] and NUS-WIDE [23], to evaluate the proposed scheme. To simulate the problem of hidden classes, we assumed that certain image classes were predeﬁned classes and other image classes were hidden classes in the experiments. For each predeﬁned class, 30% of images in the class were randomly selected and used as the training samples. For each hidden class, no training samples were available. This means the hidden classes were unknown by the CBIR system. The queries for performance evaluation were randomly created from all image classes including predeﬁned and hidden classes.

0.9 0.85 0.8 0.75 0.7 0.65 NQD−BSR NQD−MVR NQD−Single

0.6 0.55 0.32

0.33

0.34

0.35

Threshold Fig. 3. True rate of novel query detection.

0.36

Three sets of experiments were carried out on the Corel image dataset to evaluate the methods of novel query detection. The Corel image dataset consists of ten classes with each class containing 100 real-world images. This image dataset was well organized and suited to the evaluation of novel schemes. Two standardized MPEG-7 visual descriptors [38], CSD and EHD, were selected to describe the image content. In the ﬁrst experiment, nine image classes were set as predeﬁned classes and one image class was set as a hidden class. In the second experiment, eight image classes were set as predeﬁned classes and two image classes were set as hidden classes. In the third experiment, seven image classes were set as predeﬁned classes and three image classes were set as hidden classes. The performance in terms of the true novel query rate and the false novel query rate on a large number of random queries is provided. The true novel query rate is deﬁned as the fraction of accurately detected novel queries over the total number of novel queries. The false novel query rate is deﬁned as the fraction of incorrectly detected novel queries over the total number of common queries. Figs. 3 and 4 show the performances of the three methods for NQD-Single, NQD-MVR and NQD-BSR, with various thresholds. To facilitate the comparison, the threshold is expressed by the value of probability. In NQD-Single, each query includes a single example image. In NQD-MVR and NQD-BSR, each query includes ﬁve example images. The experimental results show that both NQD-BSR and NQD-MVR were superior to NQD-Single. Firstly, NQD-BSR and NQD-MVR composed higher true novel query rates compared to NQD-Single. Secondly, the false novel query rate of NQD-BSR was much lower than that of NQD-Single. The false novel query rate of NQD-MVR was also slightly lower than that of NQD-Single. A basic reason for this is that multiple examples can

676

J. Zhang et al. / Computer Vision and Image Understanding 117 (2013) 670–679

1

NQD−BSR NQD−MVR NQD−Single

0.9

0.35

0.8

0.3

0.7

Precision

False Novel Query Rate

0.4

1 Hidden Class

One Hidden Class

0.45

0.25 0.2

0.6 0.5 0.4

0.15 0.3 0.1 0.2 0.05 0 0.32

Conventional scheme Proposed scheme with MVR Proposed scheme with BSR

0.1 0.33

0.34

0.35

0

0.36

0

0.1

0.2

0.3

0.4

Threshold Two Hidden Classes

0.8

0.9

1

0.7

0.8

0.9

1

0.7

0.8

0.9

1

0.9 0.8 0.7

0.3

Precision

False Novel Query Rate

0.7

1

NQD−BSR NQD−MVR NQD−Single

0.35

0.25 0.2

0.6 0.5 0.4

0.15 0.3 0.1

0.2

0.05 0 0.32

Conventional scheme Proposed scheme with MVR Proposed scheme with BSR

0.1 0.33

0.34

0.35

0

0.36

0

0.1

0.2

0.3

0.4

Threshold Three Hidden Classes 0.4

0.5

0.6

Recall 3 Hidden Classes

0.45

1 NQD−BSR NQD−MVR NQD−Single

0.9

0.35

0.8

0.3

0.7

Precision

Flase Novel Query Rate

0.6

2 Hidden Classes

0.45 0.4

0.5

Recall

0.25 0.2 0.15

0.6 0.5 0.4 0.3

0.1 0.2 0.05

Conventional scheme Proposed scheme with MVR Proposed scheme with BSR

0.1 0 0.32

0.33

0.34

0.35

0.36

Threshold

0

0

0.1

0.2

0.3

0.4

0.5

0.6

Recall Fig. 4. False rate of novel query detection. Fig. 5. Retrieval performance on Corel dataset.

describe user query intent better than a single example. The experimental results also show that NQD-BSR outperforms NQD-MVR thanks to a low false novel query rate. For example, when the threshold was set at 0.33, the false novel query rate

of NQD-BSR was over 15% lower than that of NQD-MVR. Moreover, it was easier to choose a proper threshold for NQD-BSR in order to guarantee a high true novel query rate and a low

J. Zhang et al. / Computer Vision and Image Understanding 117 (2013) 670–679

false novel query rate. However, this was difﬁcult for NQD-MVR, and therefore, the NQD-BSR method was preferable for novel query detection.

5.2. Evaluation of image retrieval A number of image retrieval experiments were carried out to evaluate the CBIR schemes. The proposed schemes were compared with a conventional CBIR scheme employing image classiﬁcation. In the conventional scheme, if a single query image was provided, a relevant image class could be predicted using a conventional classiﬁcation approach. The images in the relevant class would be ranked according to their Euclidean distance to the query image. In contrast to the conventional scheme, the proposed scheme takes hidden classes into account and supports multi-image queries. Two methods of query novelty detection were applied in the proposed scheme, resulting in two versions; the proposed scheme with MVR and the proposed scheme with BSR. Fig. 5 demonstrates the retrieval performance of competing schemes on the Corel dataset. These results show that the proposed scheme outperforms the conventional scheme in terms of handling hidden classes using either MVR or BSR. Fig. 6 shows a selection of real retrieval results for a novel query. In this case, the conventional scheme predicted an irrelevant class in which there were few relevant images to the query. In contrast, the proposed scheme with BSR accurately detected this novelty query and produced an excellent result. The proposed scheme with MVR predicted an irrelevant class with many relevant images for the query, even though it failed to detect this as a novel query. An important reason for this is that the proposed scheme can handle novel queries, while the conventional scheme cannot. Particularly for a novel query, the conventional schemes will still predict a predeﬁned image class as the relevant class with the retrieval performance dramatically degraded. Furthermore, the proposed scheme with BSR is better than that with MVR because NQD-BSR has a much lower false novel query rate than NQD-MVR. The high false novel query rate can inﬂuence the retrieval performance of the proposed scheme with MVR.

677

Furthermore, we performed a number of image retrieval experiments on the NUS-WIDE-LITE dataset, which is a subset of 55,615 images randomly selected from NUS-WIDE [23]. The NUS-WIDELITE dataset has been frequently used for retrieval performance evaluation. The 64-D color histogram and the 73-D edge direction histogram were used to describe the image content. Three sets of experiments were carried out with different hidden classes. In the ﬁrst experiment, 10% of image classes were set as hidden classes and 90% of image classes were set as predeﬁned classes. In the second experiment, 20% of image classes were set as hidden classes and 80% of image classes were set as predeﬁned classes. In the third experiment, 30% of image classes were set as hidden classes and 70% of image classes were set as predeﬁned classes. Fig. 7 demonstrates the retrieval performance of competing schemes on the NUS-WIDE-LITE dataset. These results show that either version of the proposed scheme outperforms the conventional scheme. The proposed scheme with BSR is better than the proposed scheme with MVR. These experimental results further validate the effectiveness of the proposed scheme. Even though the proposed scheme has different retrieval performances on different testing datasets, it still consistently outperformed the conventional retrieval method when hidden classes were present in the categorized image collection. We argue that the data characteristics of various CBIR benchmarks, such as MPEG-7 (http://mpeg.chiariglione.org/standards/mpeg-7/ mpeg-7.htm) and the NIST TRACVID benchmark (http://trecvid. nist.gov/), cause different retrieval performances. Firstly, the number of hidden classes can affect the retrieval performance, which is probably directly related to the amount of predeﬁned classes in the classiﬁed image collection. The retrieval performance will decrease as the number of hidden classes increases because any further beneﬁt of image classiﬁcation during preprocessing is lost. The deﬁnition of image classes may affect the accuracy of novel query detection. The different construction strategies for image classes in various benchmarks inﬂuences the reliability of the design of novelty query detection. However, the consideration of hidden classes determines that our proposed scheme has a superior retrieval performance compared to the conventional CBIR scheme on various benchmarks.

Fig. 6. Retrieval results of a novel query on Corel dataset.

678

J. Zhang et al. / Computer Vision and Image Understanding 117 (2013) 670–679

of content-based image retrieval (CBIR) systems employing image classiﬁcation. We observed that, because of hidden classes, the queries can be separated into two categories; either a common query or a novel query. In the proposed scheme, novel query detection was developed to determine whether a query was a novel query or a common query. A self-adaptive strategy was proposed to conduct image retrieval for different types of queries. Therefore, the problem of hidden classes can be addressed from the perspective of query answering. A number of experiments carried out on two real-world image datasets validated the effectiveness of our proposed scheme. Compared to the conventional scheme, the proposed scheme can achieve over a 10% improvement in its retrieval performance, thus helping to signiﬁcantly improve user’s experience in top ranked images.

10% Hidden Classes 0.6 Conventional scheme Proposed scheme with MVR Proposed scheme with BSR

0.5

Precision

0.4

0.3

0.2

0.1

Acknowledgments 0 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall 20% Hidden Classes 0.6 Conventional scheme Proposed scheme with MVR Proposed scheme with BSR

0.5

References

Precision

0.4

0.3

0.2

0.1

0 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall 30% Hidden Classes 0.6 Conventional scheme Proposed scheme with MVR Proposed scheme with BSR

0.5

Precision

0.4

0.3

0.2

0.1

0 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

The authors thank Dr. Jinhui Tang for providing the NUS-WIDE dataset [23]. The authors would also like to thank the anonymous reviewers for their thoughtful and insightful comments that helped to improve the quality of this paper.

1

Recall Fig. 7. Retrieval performance on NUS-WIDE-LITE dataset.

6. Conclusions In this paper, we identiﬁed and addressed a new robustness problem of hidden classes which severely affected the performance

[1] T. Sikora, The MPEG-7 visual standard for content description-an overview, IEEE Trans. Circ. Syst. Video Technol. 11 (6) (2001) 696–702. [2] H. Qi, K. Li, Y. Shen, W. Qu, An effective solution for trademark image retrieval by combining shape description and feature matching, Pattern Recogn. 43 (6) (2010) 2017–2027. [3] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain, Content-based image retrieval at the end of the early years, IEEE Trans. Pattern Anal. Mach. Intell. 22 (12) (2000) 1349–1380. [4] R. Datta, D. Joshi, J. Li, J.Z. Wang, Image retrieval: ideas, inﬂuences, and trends of the new age, ACM Comput. Surv. 40 (2) (2008) 5:1–5:60. [5] C.D. Manning, P. Raghavan, H. Schtze, Introduction to Information Retrieval, Cambridge University Press, 2008. [6] J. Zhang, L. Ye, Content based image retrieval using unclean positive examples, IEEE Trans. Image Process. 18 (10) (2009) 2370–2375. [7] A. Vailaya, M.A.T. Figueiredo, A.K. Jain, H.-J. Zhang, Image classiﬁcation for content-based indexing, IEEE Trans. Image Process. 10 (1) (2001) 117–130. [8] A. Bosch, X. Munoz, R. Marti, Which is the best way to organize/classify images by content?, Image Vision Comput 25 (6) (2007) 778–791. [9] S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: spatial pyramid matching for recognizing natural scene categories, IEEE Conf. Comput. Vision Pattern Recogn. 2 (2006) 2169–2178. [10] M. Varma, D. Ray, Learning the discriminative power-invariance trade-off, IEEE Int. Conf. Comput. Vision (2007) 1–8. [11] C.H. Lampert, M.B. Blaschko, T. Hofmann, Beyond sliding windows: object localization by efﬁcient subwindow search, IEEE Conf. Comput. Vision Pattern Recogn. (2008) 1–8. [12] H. Zhang, A.C. Berg, M. Maire, J. Malik, SVM-KNN: discriminative nearest neighbor classiﬁcation for visual category recognition, IEEE Int. Conf. Comput. Vision Pattern Recogn. 2 (2006) 2126–2136. [13] R. Raina, A. Battle, H. Lee, B. Packer, A.Y. Ng, Self-taught learning: transfer learning from unlabeled data, Int. Conf. Mach. Learn. (2007) 759–766. [14] O. Boiman, E. Shechtman, M. Irani, In defense of nearest-neighbor based image classiﬁcation, IEEE Conf. Comput. Vision Pattern Recogn. (2008) 1–8. [15] N. Rasiwasia, P.J. Moreno, N. Vasconcelos, Bridging the gap: query by semantic example, IEEE Trans. Multimedia 9 (5) (2007) 923–938. [16] C.G.M. Snoek, B. Huurnink, L. Hollink, M. de Rijke, G. Schreiber, M. Worring, Adding semantics to detectors for video retrieval, IEEE Trans. Multimedia 9 (5) (2007) 975–986. [17] J. Tang, S. Yan, R. Hong, G.-J. Qi, T.-S. Chua, Inferring semantic concepts from community-contributed images and noisy tags, in: Proceedings of the 17th ACM International Conference on Multimedia, Beijing, China, October 2009, pp. 223–232. [18] J. Tang, X.-S. Hua, M. Wang, Z. Gu, G.-J. Qi, X. Wu, Correlative linear neighborhood propagation for video annotation, IEEE Trans. Syst., Man, Cybern. B 39 (2) (2009) 409–416. [19] J. Cui, F. Wen, X. Tang, Real time google and live image search re-ranking, in: Proceedings of the 16th ACM International Conference on Multimedia, New York, NY, USA, 2008, pp. 729–732. [20] X. Tian, D. Tao, X.-S. Hua, X. Wu, Active reranking for web image search, IEEE Trans. Image Process. 19 (3) (2010) 805–820. [21] Q. Iqbal, J.K. Aggarwal, Feature integration, multi-image queries and relevance feedback in image retrieval, in: The 6th International Conference on Visual Information Systems (VISUAL 2003), Miami, Florida, September 2003, pp. 467– 474.

J. Zhang et al. / Computer Vision and Image Understanding 117 (2013) 670–679 [22] T.E. Bjoerge, E.Y. Chang, Why one example is not enough for an image query, in: IEEE International Conference on Multimedia and Expo., vol. 1, 27–30 June 2004, pp. 253–256. [23] T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, Y. Zheng, Nus-wide: a real-world web image database from national university of singapore, in: ACM Int. Conf. on Image and Video Retrieval, Greece, July 2009. [24] V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, 1995. [25] J.C. Burges, A tutorial on support vector machine for pattern recognition, Data Min. Knowl. Disc. 2 (2) (1998) 121–167. [26] J. Platt, Probabilistic outputs for support vector machines and comparison to regularized likelihood methods, in: Proc. Advances in Large Margin Classiﬁers, 2000, pp. 61–74. [27] M. Markou, S. Singh, Novelty detection: a review-part 2: neural network based approaches, Signal Process. 83 (2003) 2499–2521. [28] G. Vasconcelos, M. Fairhurst, D. Bisset, Recognizing novelty in classiﬁcation tasks, in: Proc. NIPS Workshop Novelty Detection and Adaptive Systems Monitoring, 1994. [29] S. Singh, M. Markou, An approach to novelty detection applied to the classiﬁcation of image regions, IEEE Trans. Knowl. Data Eng. 16 (4) (2004) 396–407.

679

[30] J. Kittler, M. Hatef, R. Duin, J. Matas, On combining classiﬁers, IEEE Trans. Pattern Anal. Mach. Intell. 20 (3) (1998) 226–239. [31] L.I. Kuncheva, Combining Pattern Classiﬁers: Methods and Algorithms, John Wiley & Sons, 2004. [32] Y. Chen, X.S. Zhou, T. Huang, One-class svm for learning in image retrieval, in: IEEE Int. Conf. on Image Processing, vol. 1, 2001, pp. 34–37. [33] B. Scholkopf, J. Platt, J. Shawe-Taylor, A. Smola, R. Williamson, Estimating the support of a high-dimensional distribution, Microsoft Research, Tech. Rep. MSR-TR-99-87, 1999. [34] D. Tao, X. Tang, X. Li, X. Wu, Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval, IEEE Trans. Pattern Anal. Mach. Intell. 28 (7) (2006) 1088–1099. [35] A. Webb, Statistical Pattern Recognition, John Wiley & Sons, 2002. [36] L.M. Manevitz, M. Yousef, One-class SVMs for document classiﬁcation, J. Mach. Learn. Res. 2 (2001) 139–154. [37] J. Wang, J. Li, G. Wiederhold, Simplicity: semantics-sensitive integrated matching for picture libraries, IEEE Trans. Pattern Anal. Mach. Intell. 23 (9) (2001) 947–963. [38] B.S. Manjunath, J.R. Ohm, V.V. Vasudevan, A. Yamada, Color and texture descriptors, IEEE Trans. Circ. Syst. Video Technol. 11 (6) (2001) 703–715.

Robust image retrieval with hidden classes

Robust image retrieval with hidden classes

Recommend Documents