Computer Vision and Image Understanding 117 (2013) 670–679
Contents lists available at SciVerse ScienceDirect
Computer Vision and Image Understanding journal homepage: www.elsevier.com/locate/cviu
Robust image retrieval with hidden classes q Jun Zhang a,*, Lei Ye b, Yang Xiang a, Wanlei Zhou a a b
School of Information Technology, Deakin University, Melbourne, Australia School of Computer Science and Software Engineering, University of Wollongong, Wollongong, Australia
a r t i c l e
i n f o
Article history: Received 6 April 2011 Accepted 24 February 2013 Available online 7 March 2013 Keywords: Content-based image retrieval Hidden classes Robust image retrieval Image classification Novel query detection
a b s t r a c t For the purpose of content-based image retrieval (CBIR), image classification is important to help improve the retrieval accuracy and speed of the retrieval process. However, the CBIR systems that employ image classification suffer from the problem of hidden classes. The queries associated with hidden classes cannot be accurately answered using a traditional CBIR system. To address this problem, a robust CBIR scheme is proposed that incorporates a novel query detection technique and a self-adaptive retrieval strategy. A number of experiments carried out on the two popular image datasets demonstrate the effectiveness of the proposed scheme. Ó 2013 Elsevier Inc. All rights reserved.
1. Introduction Content-based image retrieval (CBIR) is an active research area. The aim of various CBIR systems is to search images by analyzing their content. Images are normally described by their low-level features such as color, texture and shape [1,2]. In the literature, a significant amount of research has been conducted relating to CBIR [3,4]. However, the robustness of CBIR systems has not been sufficiently investigated even though the topic of robustness has been explored extensively in traditional information retrieval [5]. We have already identified and addressed unclean queries as a problem of robustness [6], however in this paper, we will study the hidden class problem of CBIR systems employing image classification as preprocessing. The application of image classification techniques into a CBIR system results in a user’s queries being answered with images in predefined classes, thus helping to improve retrieval accuracy and speed. However, in a large-scale image collection, some image classes may be unseen [4]. We call these hidden classes as opposed predefined classes. The existence of hidden classes severely affects the retrieval accuracy of image classification based CBIR systems. There are two approaches that can address the problem of robustness. One approach is detecting hidden classes at the stage of preprocessing in order to avoid the problem of hidden classes when answering a query. The second approach is to take hidden classes into account when answering a query because different retrieval q
This paper has been recommended for acceptance by Chung-Sheng Li.
⇑ Corresponding author.
E-mail address:
[email protected] (J. Zhang). 1077-3142/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.cviu.2013.02.008
strategies can be adopted for different queries. We decided upon the second approach because it is too difficult to detect hidden classes during preprocessing without extra information. Under the query-by-example (QBE) paradigm, there are three problems that arise due to hidden classes. When considering hidden classes, a user’s queries can be divided into two categories; a common query and a novel query. Fig. 1 illustrates a hidden class, common query and novel query. A common query can be answered using a predefined image class because relevant images of the common query have been gathered in this class. A novel query is associated with a hidden class and it cannot be answered using any predefined image classes. The first problem is how to identify whether a query is a common or novel query. This determination will influence the retrieval strategy. The second problem is how to predict a relevant predefined image class for a common query. The third problem is how to perform image retrieval for a novel query if it is not associated with any predefined image class. The solutions to these problems will result in a new retrieval scheme that can manage the problem of hidden classes. In this paper, we aim to address the critical problem of hidden classes in CBIR systems. Our major contributions are summarized as follows. We propose a robust CBIR scheme that can incorporate multiimage queries and a support vector machine (SVM) to effectively deal with hidden classes. We develop a novel query detection technique to determine whether a user’s query is a common or novel query, therefore making it feasible to consider hidden classes in the retrieval process.
J. Zhang et al. / Computer Vision and Image Understanding 117 (2013) 670–679
671
Fig. 1. Illustration of the problem of hidden classes.
We develop a self-adaptive retrieval strategy. For a common query, a relevant predefined image class will be predicted and the within images are ranked. For a novel query, a new method is proposed to filter out the irrelevant images before image ranking. Finally, a number of experiments that were carried out on a Corel image dataset and the NUS-WIDE-LITE dataset [23] demonstrate the effectiveness of the proposed scheme. In particular, the improvement on precision depends on the number of hidden classes, with over 10% achieved. The remainder of this paper is organized as follows: Section 2 reviews related work. Section 3 presents the novel CBIR scheme and discussion is provided in Section 4. In Section 5, the experimental evaluation and results are reported and the conclusion to this paper is presented in Section 6.
2. Related work Image classification improves the accuracy and speed of a content based image retrieval (CBIR) system [4]. Images in a collection can be categorized by supervised image classification using predefined image classes. For a given query, the retrieval results of a CBIR system are generated by first locating the most relevant image followed by ranking the images within the class [7,4]. It should be noted that image classification is not necessary for all CBIR systems. A CBIR system can be entirely based on similarity retrieval without any classes. This paper focuses on CBIR systems which perform classification first. A significant amount of research has been undertaken with the aim of improving the performance of image classification [8]. One such approach has been to develop new image matching methods and incorporate them into the training process of a multiclass classifier. For instance, spatial pyramid matching was proposed and incorporated into SVM for natural image classification [9]. Considering the trade-off between discriminative power and invariance differing from task to task, a kernel
learning method was proposed in order to achieve different levels of trade-off for image classification [10]. To perform object localization, an efficient subwindow search method was proposed [11] that can be combined with a spatial pyramid kernel to improve the multiclass classifier. Another approach has been to directly enhance a multiclass classifier by considering the characteristics of real applications. For instance, a hybrid method was proposed to combine the nearest neighbor classifier and the support vector machine [12], thus helping to overcome several problems of the two individual methods. A self-taught learning method was proposed to use the unlabeled images randomly downloaded from the Internet to improve the performance on a specific image classification task [13]. In defense of the nearest-neighbor (NN) based image classification, a naive-Bayes nearest neighbor classifier was proposed to demonstrate the effectiveness of non-parametric NN methods [14]. Certain scholarship has also addressed a similar problem of unknown concepts in the semantic space. A combination scheme of query by multi-example and semantic retrieval was proposed to alleviate the influence of unknown concepts to semanticbased image retrieval [15]. To bridge the gap between a limited number of learned concept detectors and the full vocabulary a user has, an automatic video retrieval method [16] was proposed by building a set of machine learned concept detectors that were enriched with semantic descriptions and semantic structure obtained from WordNet. Other works have attempted to address the unknown concepts related problems using image classification. For example, a novel sparse graph based semi-supervised learning approach was proposed [17] for harnessing the labeled and unlabeled data simultaneously for the purpose of inferring the images’ semantic concepts more accurately. To promote image annotation performance, a correlative linear neighborhood propagation method was proposed by adapting the hidden semantic correlation into graph-based semi-supervised learning [18]. Considering any ambiguous or unknown concepts in the query, IntentSearch [19] was proposed as a simplified version of active reranking to capture the user’s intention more accurately. The user’s intention is defined by only one query image
672
J. Zhang et al. / Computer Vision and Image Understanding 117 (2013) 670–679
in IntentSearch, and does not work as well when the user’s intention is too complex to be represented by one image. An enhanced active reranking scheme [20] was proposed to employ a structural information based sample selection strategy for reducing the user’s labeling efforts. This localizes the user’s intention in the visual feature space by a local–global discriminative dimension reduction algorithm. Our work is different as our research focuses on the problem of hidden classes in the CBIR systems that employ image classification. Firstly, the unknown concepts in the semantic space are not equal to the hidden classes in a categorized image collection because they pose different perspectives. The unknown concepts are derived from semantic meaning while the hidden classes are based on visual similarity. The existing solutions for unknown concepts cannot be applied to deal with hidden classes. In fact, it remains unsolved as to how to identify and answer the queries associated with hidden classes that can dramatically affect the retrieval performance. In addition to this, our work aims to propose a new CBIR scheme that can handle hidden classes in a single interaction without the process of relevance feedback or active reranking.
3. A robust CBIR scheme This section describes the proposed CBIR scheme as illustrated in Fig. 2. Due to hidden classes, a common query and a novel query are two types of queries requiring different retrieval strategies. In this work, novel query detection is proposed to determine if a query is a common query or a novel query. Following this, different types of queries can be answered using different image ranking methods. To support different ranking strategies, a new preprocessing is developed. Let us consider an image collection X containing N images, X = {x1, x2, . . . , xN}. The content of each image is represented by a low-level feature vector. Assuming there are M predefined image classes, {x1, . . . , xM}, and a set of training samples, Si, is available
for each predefined image class xi. A user’s query consists of multi^1 ; . . . ; x ^ L g. ple example images, Q ¼ fx 3.1. Preprocessing Preprocessing is designed to train a set of classifiers for the predefined image classes. In contrast to conventional schemes, image classifiers are specialized to address the problems arising from hidden image classes. At the stage of novel query detection, the classifiers are modified and combine to construct a novel query detector. At the stage of image ranking for a common query, the classifiers combine to predict the relevant class. At the stage of image ranking for a novel query, the classifiers are modified and combine to filter out strongly irrelevant images. Taking hidden classes into account, a new two-step strategy is proposed for classifier training. In the first step, we train a set of weak classifiers for the predefined image classes. For the ith image class, the training samples, Si, are regarded as positive samples, and the training samples of other image classes, {Sj}, j – i, are regarded as the negative samples. All positive and negative samples merge to train a two-class SVM [24,25]. The decision function is ~f i ðxÞ, which is used to express this SVM. We can use ~f i ðxÞ to compute a decision value for an image and this decision value can be converted to a probability [26] or a binary value [25]. The weak classifiers are then applied to filter the image collection to obtain a e , which do not belong to any predefined image set of images, W class with a high probability. The filtering rule is
max~f i ðxÞ < 0; i
i 2 ½1; M:
In the second step, we re-train a set of strong classifiers for the predefined image classes. The difference from the first step is that we e as the negative samples, with randomly selected images from W
Query
Preprocessnig
Novel Query Detection
ð1Þ
No
Yes
Image Retrieval for Novel Query
Image Retrieval for Common Query
Retrieval Results Display Fig. 2. Retrieval with hidden classes.
673
J. Zhang et al. / Computer Vision and Image Understanding 117 (2013) 670–679
(
new information included in order to improve the classifiers. For the ith image class, the training samples, Si, were used as the positive samples. The training samples of other image classes, {Sj}, j – i, e , were used as the negand some randomly selected images from W ative samples. All positive and negative samples were combined to train an SVM classifier, fi(x). Consequently, a set of SVMs, {f1(x), . . . , fM(x)}, can be prepared for predefined image classes. The two-step training strategy is necessary to address the hidden class problem. In the first step, all training samples come from predefined image classes, meaning that the training set will not include the information of hidden classes. In this sense, the trained classifiers are weak. In the second step, some new samples were e . Since the extended training set conrandomly selected from W tains the information of hidden classes, the trained classifiers become stronger.
Pðxi jxÞ ¼ 1
The first method applies MVR to extend the NQD-Single meth^ , the vote associated ods, namely NQD-MVR. For a query image x to the ith predefined image class can be calculated as
^ Þ TÞ: V i ðxÞ ¼ signðfi ðx
ð2Þ
^1 ; . . . ; x ^L g and the detection Given a multi-image query Q ¼ fx threshold T, the novel query detection rule for the ith predefined image class is that Q is a novel query to the ith class L X L ^j Þ < : if V i ðx 2 j¼1
i
iffi ðxÞ < 0:
ð5Þ
^1 ; . . . ; x ^L g and the detection Given a multi-image query Q ¼ fx threshold T, for the ith predefined image class, the conditional probability of P(xijQ) can be calculated as
Pðxi jQ Þ ¼
L 1X ^j Þ: Pðxi jx L j¼1
ð6Þ
The novel query detection rule is that, Q is a novel query to the ith class
if Pðxi jQÞ < T:
ð7Þ
The final novel query detection rule for multiclass is that, Q is a novel query
i 2 ½1; M:
ð8Þ
The two multi-image query detection methods are evaluated in our experiments. Note that, if a query is not a novel query, it is a common query. 3.3. Image retrieval for a common query For a common query, it can be answered using the images in a predefined image class. The retrieval results are obtained by predicting a relevant image class followed by ranking images in this class. 3.3.1. Relevant class prediction Relevant class prediction can be regarded as an extended multiclass classification problem. For a single query image, this is a classic multiclass classification problem and the conventional solution can be expressed as
^j Þ; i ¼ arg max Pðxi jx i
ð9Þ
i.e., the relevant class is a predefined image class with the maxi^j . mum posterior probability for x We then extend the conventional method to manage a multiimage query by taking the applied detection strategy into account. With NQD-BSR, the prediction rule can be expressed as L X ^ j Þ: i ¼ arg max Pðxi jx i
ð10Þ
j¼1
With NQD-MVR, the prediction rule can be expressed as L X ^j Þ: i ¼ arg max V i ðx i
ð11Þ
j¼1
If there is more than one image class with maximum votes, Eq. (10) can be applied to solve the tie.
ð3Þ
The final novel query detection rule for multiclass is that, Q is a novel query L X L ^j Þ < ; V i ðx 2 j¼1
1 ; 1þexpðfi ðxÞÞ
if maxi Pðxi jQÞ < T;
Novel query detection is proposed to identify a query as a common query or a novel query as different retrieval strategies will be applied to answer different queries. In the proposed scheme, novel query detection is achieved by extending traditional novelty detection techniques [27]. Traditional novelty detection is formulated as a two-class classification problem using random rejects as negative training samples [27–29], however these methods do not support multi-image queries. In the CBIR field, a multi-image query [21,22,15] is preferable because this can express the user’s search intent better than a single image query. We incorporate multi-image queries and the combination rules to achieve accurate novel query detection. Firstly, we construct an SVM-based novel query detector for a ^Þ < T; x ^ is a novel single query image, namely NQD-Single. If fi ðx query to the ith class. T is the predefined detection threshold. In the case of multiclass, the problem of novel query detection can be separated into a set of basic novel query detections. The query ^ is determined as a novel query only if it is a novel image x ^ is a novel query if query for all classes. Therefore, x ^ Þ < T; i 2 ½1; M. maxi f i ðx Now, we construct a novel query detector for multi-image queries by incorporating the NQD-Single method and the combination rules. In this work, two different combination rules, the majority vote rule (MVR) and the Bayes sum rule (BSR) [30,31], are applied to implement the multi-image query detection.
iffi ðxÞ P 0
i
3.2. Novel query detection
if max
1 Pðxi jxÞ ¼ 1þexpðf ; ðxÞÞ
i 2 ½1; M:
ð4Þ
The second method is applying BSR to extend the NQD-Single method, namely NQD-BSR. First, the decision values produced by a SVM classifier are converted into the conditional probabilities using the sigmoid function [26],
3.3.2. Ranking for a common query Ranking images in the relevant class can further improve the retrieval results. First, the images in the relevant class are extracted from the image collection. The images with fi ðxÞ > T are considered as belonging to the relevant class and then ranked according to their relevance to the query. It should be highlighted that this process can be conducted off-line, i.e., the images of predefined image classes can be selected in preprocessing and saved for later use. The next step is to compute relevance between the images in xi and the multi-image query, Q ¼ fx^1 ; . . . ; x^L g. In this paper, we cast it as a one-class classification problem [32], i.e., only positive
674
J. Zhang et al. / Computer Vision and Image Understanding 117 (2013) 670–679
^1 ; . . . ; x ^ L g to samples are available. We use multiple query images fx train a one-class SVM [33], fq(x). Then, images in xi are ranked according to the decision values produced by fq(x). Finally, the top k images in the ranking list are returned as the retrieval results. This process is also helpful to improve the user experience because ordinary users normally have more interest in the top ranked images in practical applications.
A novel query does not associate with any predefined image class. A straightforward way is to rank images in the entire collection according to their similarities to the query. However, this approach does not take into account the knowledge of predefined image classes. We observe that if the images are strongly relevant to a predefined image class, they are not relevant to the novel query. Therefore, the proposed approach can improve the retrieval results by filtering out irrelevant images before image ranking. 3.4.1. Image filtering Since the novel query cannot be answered by any predefined image class, it is likely the images in the predefined image classes are not relevant to the novel query and can be filtered out. The prepared image classifiers, {f1(x), . . . , fM(x)}, can be utilized to produce a set of images, W, from the whole image collection. The images in W are not relevant to the predefined image classes. These images will be used to answer the novel query. The filtering rule is that, i
i 2 ½1; M;
ð12Þ
put x in W. This image filtering process can be conducted off-line, i.e., W can be created in the stage of preprocessing. 3.4.2. Ranking for a novel query In this paper, we cast image ranking for a novel query as a binary classification problem. The query images are used as the positive samples and the training samples for the predefined image classes are used as the negative samples. All positive and negative samples are combined to train a classifier, such as a SVM, for image ranking. However, the positive and negative samples are unbalanced, which influences the accuracy of a SVM. We apply the asymmetric bagging strategy [34] to construct an ensemble of SVMs and combine the outputs for image ranking. The detailed algorithm is described in Table 1. In this algorithm, the decision value fqi(xj) of an image xj produced by the ith bagging classifier is converted into a conditional probability P(rjfqi, xj) using the sigmoid
Table 1 Ensemble classifier based image ranking. ^1 ; . . . ; x ^L g, SVM classifier V, Input: query image set Q ¼ fx integer B (the number of bagging classifiers), training samples for the predefined image classes S ¼ [M i¼1 Si , and the image set W for ranking 1. For i = 1 to B { 2. Ni=random sampling from S, with jNij = jQj 3. fqi(x) = V(Q, Ni) 4. For j = 1 to jWj { 5. Compute fqi(xj) 6. Convert fqi(xj) to P(rjfqi, xj) as Eq. (5)} 7. } 8. } 9. For j = 1 to jWj { P 10. Pðrjxj Þ ¼ 1B Bi¼1 Pðrjfqi ; xj Þ 11. } 12. Rank {xj}, xj 2 W according to P(rjxj) Output: ranked list {xj}, xj 2 W
4. Discussions of implementation This section discusses the implementation of the proposed scheme including utilization of a multi-image query, on-line computation time, and setting of the threshold.
3.4. Image retrieval for a novel query
if max f i ðxÞ < 0;
function in Eq. (5). P(rjfqi, xj) is a probability of the image being relevant to the query predicted by the ith SVM classifier. All decision values of xj produced by the ensemble of SVM classifiers are then merged to make the final decision using the Bayesian sum rule.
4.1. Utilization of multi-image query In the proposed scheme, the utilization of a multi-image query is based on combination rules. Specifically, this approach is applied to novel query detection and the relevant class prediction. In this section, we formulate the classification of multi-image queries in a theoretical framework [30] and show that the extension of multi-image queries is a classifier combination problem [35]. Consider a pattern recognition problem where a pattern (query) Q is to be assigned to one of the M possible classes {x1, . . . , xM}. Let us assume we have a classifier but the given pattern can be represented using L distinct measurement vectors from different ^1 ; . . . ; x ^ L g. This is typical classifier observations (query images), fx combination architecture using repeated measurements [35]. In the measurement space, each class xk is modeled by the probability density function p(xijxk) and its priori probability of occurrence is denoted as P(xk). According to the Bayesian theory, given measurements ^ i ; i 2 ½1; L, the pattern, Q, should be assigned to class xj provided x the a posteriori probability of that interpretation is the maximum, i.e.,
assign Q ! xj
if
^1 ; . . . ; x ^L Þ ¼ max Pðxk jx ^1 ; . . . ; x ^L Þ: Pðxj jx
ð13Þ
k
Then, the combination rules, such as Bayes sum and majority vote can be derived from Eq. (13) under certain specific conditions [30]. The classifier combination methods can be employed to extend conventional methods for single image queries to multi-image queries. 4.2. On-line computation time For practical applications, the on-line computation time is an important factor. We briefly discuss the on-line computation time concerning the proposed scheme. At the stage of novel query detection, the detection methods depend on the set of SVM classifiers for the predefined image classes. The set of SVM classifiers can be trained off-line in preprocessing so they will not affect the on-line computation time. Furthermore, the detection methods have a short computation time as they need to handle a small number of query images. At the stage of image retrieval for a common query, the set of SVM classifiers can be trained off-line without affecting the on-line computation time. The extended method for relevant class prediction has a short computation time thanks to a small number of query images. Following this, the training of one-class SVM using the query images is extremely fast. The image ranking is completed quickly because only those images in the predicted relevant class are ranked. At the stage of image retrieval for a novel query, image filtering can be conducted off-line. The on-line computation time can be dramatically reduced since only the retained images are used for ranking. Furthermore, the image ranking method has a short
J. Zhang et al. / Computer Vision and Image Understanding 117 (2013) 670–679
computation time because each bagging classifier has only a small number of training samples available and the number of bagging classifiers is small.
One Hidden Class 1
True Noval Query Rate
0.95 0.9 0.85 0.8 0.75 0.7
675
4.3. Setting of the threshold The setting of the detection threshold is important for the proposed retrieval scheme. If the threshold is too high, many common queries will be considered novel queries. A low threshold on the other hand, will lead to many novel queries thought of as common queries. Inaccurate detection of queries will influence the retrieval performance. To study the impact of the threshold on novel query detection, a number of experiments were performed with various thresholds. The experimental results are reported in Section 5.1. Moreover, we applied a practical method [36] in the proposed retrieval scheme to select a proper threshold. We chose a threshold by comparing results on a test set after preprocessing. The test set was specially collected for the purpose of threshold setting and its ground truth was known. 5. Experimental evaluation
0.65 NQD−BSR NQD−MVR NQD−Single
0.6 0.55 0.32
0.33
0.34
0.35
0.36
Threshold Two Hidden Classes 1
True Novel Query Rate
0.95 0.9 0.85
5.1. Evaluation of novel query detection
0.8 0.75 0.7 0.65 NQD−BSR NQD−MVR NQD−Single
0.6 0.55 0.32
0.33
0.34
0.35
0.36
Threshold Three Hidden Classes 1 0.95
True Novel Query Rate
A number of experiments were carried out on two image datasets, Corel [37] and NUS-WIDE [23], to evaluate the proposed scheme. To simulate the problem of hidden classes, we assumed that certain image classes were predefined classes and other image classes were hidden classes in the experiments. For each predefined class, 30% of images in the class were randomly selected and used as the training samples. For each hidden class, no training samples were available. This means the hidden classes were unknown by the CBIR system. The queries for performance evaluation were randomly created from all image classes including predefined and hidden classes.
0.9 0.85 0.8 0.75 0.7 0.65 NQD−BSR NQD−MVR NQD−Single
0.6 0.55 0.32
0.33
0.34
0.35
Threshold Fig. 3. True rate of novel query detection.
0.36
Three sets of experiments were carried out on the Corel image dataset to evaluate the methods of novel query detection. The Corel image dataset consists of ten classes with each class containing 100 real-world images. This image dataset was well organized and suited to the evaluation of novel schemes. Two standardized MPEG-7 visual descriptors [38], CSD and EHD, were selected to describe the image content. In the first experiment, nine image classes were set as predefined classes and one image class was set as a hidden class. In the second experiment, eight image classes were set as predefined classes and two image classes were set as hidden classes. In the third experiment, seven image classes were set as predefined classes and three image classes were set as hidden classes. The performance in terms of the true novel query rate and the false novel query rate on a large number of random queries is provided. The true novel query rate is defined as the fraction of accurately detected novel queries over the total number of novel queries. The false novel query rate is defined as the fraction of incorrectly detected novel queries over the total number of common queries. Figs. 3 and 4 show the performances of the three methods for NQD-Single, NQD-MVR and NQD-BSR, with various thresholds. To facilitate the comparison, the threshold is expressed by the value of probability. In NQD-Single, each query includes a single example image. In NQD-MVR and NQD-BSR, each query includes five example images. The experimental results show that both NQD-BSR and NQD-MVR were superior to NQD-Single. Firstly, NQD-BSR and NQD-MVR composed higher true novel query rates compared to NQD-Single. Secondly, the false novel query rate of NQD-BSR was much lower than that of NQD-Single. The false novel query rate of NQD-MVR was also slightly lower than that of NQD-Single. A basic reason for this is that multiple examples can
676
J. Zhang et al. / Computer Vision and Image Understanding 117 (2013) 670–679
1
NQD−BSR NQD−MVR NQD−Single
0.9
0.35
0.8
0.3
0.7
Precision
False Novel Query Rate
0.4
1 Hidden Class
One Hidden Class
0.45
0.25 0.2
0.6 0.5 0.4
0.15 0.3 0.1 0.2 0.05 0 0.32
Conventional scheme Proposed scheme with MVR Proposed scheme with BSR
0.1 0.33
0.34
0.35
0
0.36
0
0.1
0.2
0.3
0.4
Threshold Two Hidden Classes
0.8
0.9
1
0.7
0.8
0.9
1
0.7
0.8
0.9
1
0.9 0.8 0.7
0.3
Precision
False Novel Query Rate
0.7
1
NQD−BSR NQD−MVR NQD−Single
0.35
0.25 0.2
0.6 0.5 0.4
0.15 0.3 0.1
0.2
0.05 0 0.32
Conventional scheme Proposed scheme with MVR Proposed scheme with BSR
0.1 0.33
0.34
0.35
0
0.36
0
0.1
0.2
0.3
0.4
Threshold Three Hidden Classes 0.4
0.5
0.6
Recall 3 Hidden Classes
0.45
1 NQD−BSR NQD−MVR NQD−Single
0.9
0.35
0.8
0.3
0.7
Precision
Flase Novel Query Rate
0.6
2 Hidden Classes
0.45 0.4
0.5
Recall
0.25 0.2 0.15
0.6 0.5 0.4 0.3
0.1 0.2 0.05
Conventional scheme Proposed scheme with MVR Proposed scheme with BSR
0.1 0 0.32
0.33
0.34
0.35
0.36
Threshold
0
0
0.1
0.2
0.3
0.4
0.5
0.6
Recall Fig. 4. False rate of novel query detection. Fig. 5. Retrieval performance on Corel dataset.
describe user query intent better than a single example. The experimental results also show that NQD-BSR outperforms NQD-MVR thanks to a low false novel query rate. For example, when the threshold was set at 0.33, the false novel query rate
of NQD-BSR was over 15% lower than that of NQD-MVR. Moreover, it was easier to choose a proper threshold for NQD-BSR in order to guarantee a high true novel query rate and a low
J. Zhang et al. / Computer Vision and Image Understanding 117 (2013) 670–679
false novel query rate. However, this was difficult for NQD-MVR, and therefore, the NQD-BSR method was preferable for novel query detection.
5.2. Evaluation of image retrieval A number of image retrieval experiments were carried out to evaluate the CBIR schemes. The proposed schemes were compared with a conventional CBIR scheme employing image classification. In the conventional scheme, if a single query image was provided, a relevant image class could be predicted using a conventional classification approach. The images in the relevant class would be ranked according to their Euclidean distance to the query image. In contrast to the conventional scheme, the proposed scheme takes hidden classes into account and supports multi-image queries. Two methods of query novelty detection were applied in the proposed scheme, resulting in two versions; the proposed scheme with MVR and the proposed scheme with BSR. Fig. 5 demonstrates the retrieval performance of competing schemes on the Corel dataset. These results show that the proposed scheme outperforms the conventional scheme in terms of handling hidden classes using either MVR or BSR. Fig. 6 shows a selection of real retrieval results for a novel query. In this case, the conventional scheme predicted an irrelevant class in which there were few relevant images to the query. In contrast, the proposed scheme with BSR accurately detected this novelty query and produced an excellent result. The proposed scheme with MVR predicted an irrelevant class with many relevant images for the query, even though it failed to detect this as a novel query. An important reason for this is that the proposed scheme can handle novel queries, while the conventional scheme cannot. Particularly for a novel query, the conventional schemes will still predict a predefined image class as the relevant class with the retrieval performance dramatically degraded. Furthermore, the proposed scheme with BSR is better than that with MVR because NQD-BSR has a much lower false novel query rate than NQD-MVR. The high false novel query rate can influence the retrieval performance of the proposed scheme with MVR.
677
Furthermore, we performed a number of image retrieval experiments on the NUS-WIDE-LITE dataset, which is a subset of 55,615 images randomly selected from NUS-WIDE [23]. The NUS-WIDELITE dataset has been frequently used for retrieval performance evaluation. The 64-D color histogram and the 73-D edge direction histogram were used to describe the image content. Three sets of experiments were carried out with different hidden classes. In the first experiment, 10% of image classes were set as hidden classes and 90% of image classes were set as predefined classes. In the second experiment, 20% of image classes were set as hidden classes and 80% of image classes were set as predefined classes. In the third experiment, 30% of image classes were set as hidden classes and 70% of image classes were set as predefined classes. Fig. 7 demonstrates the retrieval performance of competing schemes on the NUS-WIDE-LITE dataset. These results show that either version of the proposed scheme outperforms the conventional scheme. The proposed scheme with BSR is better than the proposed scheme with MVR. These experimental results further validate the effectiveness of the proposed scheme. Even though the proposed scheme has different retrieval performances on different testing datasets, it still consistently outperformed the conventional retrieval method when hidden classes were present in the categorized image collection. We argue that the data characteristics of various CBIR benchmarks, such as MPEG-7 (http://mpeg.chiariglione.org/standards/mpeg-7/ mpeg-7.htm) and the NIST TRACVID benchmark (http://trecvid. nist.gov/), cause different retrieval performances. Firstly, the number of hidden classes can affect the retrieval performance, which is probably directly related to the amount of predefined classes in the classified image collection. The retrieval performance will decrease as the number of hidden classes increases because any further benefit of image classification during preprocessing is lost. The definition of image classes may affect the accuracy of novel query detection. The different construction strategies for image classes in various benchmarks influences the reliability of the design of novelty query detection. However, the consideration of hidden classes determines that our proposed scheme has a superior retrieval performance compared to the conventional CBIR scheme on various benchmarks.
Fig. 6. Retrieval results of a novel query on Corel dataset.
678
J. Zhang et al. / Computer Vision and Image Understanding 117 (2013) 670–679
of content-based image retrieval (CBIR) systems employing image classification. We observed that, because of hidden classes, the queries can be separated into two categories; either a common query or a novel query. In the proposed scheme, novel query detection was developed to determine whether a query was a novel query or a common query. A self-adaptive strategy was proposed to conduct image retrieval for different types of queries. Therefore, the problem of hidden classes can be addressed from the perspective of query answering. A number of experiments carried out on two real-world image datasets validated the effectiveness of our proposed scheme. Compared to the conventional scheme, the proposed scheme can achieve over a 10% improvement in its retrieval performance, thus helping to significantly improve user’s experience in top ranked images.
10% Hidden Classes 0.6 Conventional scheme Proposed scheme with MVR Proposed scheme with BSR
0.5
Precision
0.4
0.3
0.2
0.1
Acknowledgments 0 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall 20% Hidden Classes 0.6 Conventional scheme Proposed scheme with MVR Proposed scheme with BSR
0.5
References
Precision
0.4
0.3
0.2
0.1
0 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall 30% Hidden Classes 0.6 Conventional scheme Proposed scheme with MVR Proposed scheme with BSR
0.5
Precision
0.4
0.3
0.2
0.1
0 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
The authors thank Dr. Jinhui Tang for providing the NUS-WIDE dataset [23]. The authors would also like to thank the anonymous reviewers for their thoughtful and insightful comments that helped to improve the quality of this paper.
1
Recall Fig. 7. Retrieval performance on NUS-WIDE-LITE dataset.
6. Conclusions In this paper, we identified and addressed a new robustness problem of hidden classes which severely affected the performance
[1] T. Sikora, The MPEG-7 visual standard for content description-an overview, IEEE Trans. Circ. Syst. Video Technol. 11 (6) (2001) 696–702. [2] H. Qi, K. Li, Y. Shen, W. Qu, An effective solution for trademark image retrieval by combining shape description and feature matching, Pattern Recogn. 43 (6) (2010) 2017–2027. [3] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain, Content-based image retrieval at the end of the early years, IEEE Trans. Pattern Anal. Mach. Intell. 22 (12) (2000) 1349–1380. [4] R. Datta, D. Joshi, J. Li, J.Z. Wang, Image retrieval: ideas, influences, and trends of the new age, ACM Comput. Surv. 40 (2) (2008) 5:1–5:60. [5] C.D. Manning, P. Raghavan, H. Schtze, Introduction to Information Retrieval, Cambridge University Press, 2008. [6] J. Zhang, L. Ye, Content based image retrieval using unclean positive examples, IEEE Trans. Image Process. 18 (10) (2009) 2370–2375. [7] A. Vailaya, M.A.T. Figueiredo, A.K. Jain, H.-J. Zhang, Image classification for content-based indexing, IEEE Trans. Image Process. 10 (1) (2001) 117–130. [8] A. Bosch, X. Munoz, R. Marti, Which is the best way to organize/classify images by content?, Image Vision Comput 25 (6) (2007) 778–791. [9] S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: spatial pyramid matching for recognizing natural scene categories, IEEE Conf. Comput. Vision Pattern Recogn. 2 (2006) 2169–2178. [10] M. Varma, D. Ray, Learning the discriminative power-invariance trade-off, IEEE Int. Conf. Comput. Vision (2007) 1–8. [11] C.H. Lampert, M.B. Blaschko, T. Hofmann, Beyond sliding windows: object localization by efficient subwindow search, IEEE Conf. Comput. Vision Pattern Recogn. (2008) 1–8. [12] H. Zhang, A.C. Berg, M. Maire, J. Malik, SVM-KNN: discriminative nearest neighbor classification for visual category recognition, IEEE Int. Conf. Comput. Vision Pattern Recogn. 2 (2006) 2126–2136. [13] R. Raina, A. Battle, H. Lee, B. Packer, A.Y. Ng, Self-taught learning: transfer learning from unlabeled data, Int. Conf. Mach. Learn. (2007) 759–766. [14] O. Boiman, E. Shechtman, M. Irani, In defense of nearest-neighbor based image classification, IEEE Conf. Comput. Vision Pattern Recogn. (2008) 1–8. [15] N. Rasiwasia, P.J. Moreno, N. Vasconcelos, Bridging the gap: query by semantic example, IEEE Trans. Multimedia 9 (5) (2007) 923–938. [16] C.G.M. Snoek, B. Huurnink, L. Hollink, M. de Rijke, G. Schreiber, M. Worring, Adding semantics to detectors for video retrieval, IEEE Trans. Multimedia 9 (5) (2007) 975–986. [17] J. Tang, S. Yan, R. Hong, G.-J. Qi, T.-S. Chua, Inferring semantic concepts from community-contributed images and noisy tags, in: Proceedings of the 17th ACM International Conference on Multimedia, Beijing, China, October 2009, pp. 223–232. [18] J. Tang, X.-S. Hua, M. Wang, Z. Gu, G.-J. Qi, X. Wu, Correlative linear neighborhood propagation for video annotation, IEEE Trans. Syst., Man, Cybern. B 39 (2) (2009) 409–416. [19] J. Cui, F. Wen, X. Tang, Real time google and live image search re-ranking, in: Proceedings of the 16th ACM International Conference on Multimedia, New York, NY, USA, 2008, pp. 729–732. [20] X. Tian, D. Tao, X.-S. Hua, X. Wu, Active reranking for web image search, IEEE Trans. Image Process. 19 (3) (2010) 805–820. [21] Q. Iqbal, J.K. Aggarwal, Feature integration, multi-image queries and relevance feedback in image retrieval, in: The 6th International Conference on Visual Information Systems (VISUAL 2003), Miami, Florida, September 2003, pp. 467– 474.
J. Zhang et al. / Computer Vision and Image Understanding 117 (2013) 670–679 [22] T.E. Bjoerge, E.Y. Chang, Why one example is not enough for an image query, in: IEEE International Conference on Multimedia and Expo., vol. 1, 27–30 June 2004, pp. 253–256. [23] T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, Y. Zheng, Nus-wide: a real-world web image database from national university of singapore, in: ACM Int. Conf. on Image and Video Retrieval, Greece, July 2009. [24] V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, 1995. [25] J.C. Burges, A tutorial on support vector machine for pattern recognition, Data Min. Knowl. Disc. 2 (2) (1998) 121–167. [26] J. Platt, Probabilistic outputs for support vector machines and comparison to regularized likelihood methods, in: Proc. Advances in Large Margin Classifiers, 2000, pp. 61–74. [27] M. Markou, S. Singh, Novelty detection: a review-part 2: neural network based approaches, Signal Process. 83 (2003) 2499–2521. [28] G. Vasconcelos, M. Fairhurst, D. Bisset, Recognizing novelty in classification tasks, in: Proc. NIPS Workshop Novelty Detection and Adaptive Systems Monitoring, 1994. [29] S. Singh, M. Markou, An approach to novelty detection applied to the classification of image regions, IEEE Trans. Knowl. Data Eng. 16 (4) (2004) 396–407.
679
[30] J. Kittler, M. Hatef, R. Duin, J. Matas, On combining classifiers, IEEE Trans. Pattern Anal. Mach. Intell. 20 (3) (1998) 226–239. [31] L.I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms, John Wiley & Sons, 2004. [32] Y. Chen, X.S. Zhou, T. Huang, One-class svm for learning in image retrieval, in: IEEE Int. Conf. on Image Processing, vol. 1, 2001, pp. 34–37. [33] B. Scholkopf, J. Platt, J. Shawe-Taylor, A. Smola, R. Williamson, Estimating the support of a high-dimensional distribution, Microsoft Research, Tech. Rep. MSR-TR-99-87, 1999. [34] D. Tao, X. Tang, X. Li, X. Wu, Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval, IEEE Trans. Pattern Anal. Mach. Intell. 28 (7) (2006) 1088–1099. [35] A. Webb, Statistical Pattern Recognition, John Wiley & Sons, 2002. [36] L.M. Manevitz, M. Yousef, One-class SVMs for document classification, J. Mach. Learn. Res. 2 (2001) 139–154. [37] J. Wang, J. Li, G. Wiederhold, Simplicity: semantics-sensitive integrated matching for picture libraries, IEEE Trans. Pattern Anal. Mach. Intell. 23 (9) (2001) 947–963. [38] B.S. Manjunath, J.R. Ohm, V.V. Vasudevan, A. Yamada, Color and texture descriptors, IEEE Trans. Circ. Syst. Video Technol. 11 (6) (2001) 703–715.