An image retrieval scheme with relevance feedback using feature reconstruction and SVM reclassification

An image retrieval scheme with relevance feedback using feature reconstruction and SVM reclassification

Neurocomputing 127 (2014) 214–230 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom An imag...

23MB Sizes 1 Downloads 101 Views

Neurocomputing 127 (2014) 214–230

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

An image retrieval scheme with relevance feedback using feature reconstruction and SVM reclassification Xiang-Yang Wang a,b,n, Yong-Wei Li a, Hong-Ying Yang a, Jing-Wei Chen a a b

School of Computer and Information Technology, Liaoning Normal University, Dalian 116029, China Jiangsu Key Laboratory of Image and Video Understanding for Social Safety Nanjing University of Science and Technology, Nanjing 210094, China

art ic l e i nf o

a b s t r a c t

Article history: Received 5 June 2011 Received in revised form 7 August 2013 Accepted 12 August 2013 Communicated by M. Wang Available online 28 October 2013

In content-based image retrieval (CBIR), the gap between low-level visual features and high-level semantic meanings usually leads to poor performance, and relevance feedback (RF) is an effective method to bridge this gap and to scale up the performance in CBIR systems. In recent years, the support vector machine (SVM) based relevance feedbacks have been popular because they can outperform many other classifiers when the size of the training set is small, but they are often very complex and some unsatisfactory relevance of results occur frequently. To overcome the above limitations, we propose a SVM relevance feedback CBIR algorithm based on feature reconstruction, in which the covariance matrix based kernel empirical orthogonal complement component analysis is utilized. Firstly, the original input image space is projected nonlinearly onto a high-dimensional feature space by using nonlinear analysis approaches. Secondly, the covariance matrix of the positive feedback images are calculated, and the kernel empirical orthogonal complement components of the covariance matrix are also calculated. Thirdly, the new features of positive feedback images, negative feedback images, and all the remaining images are reconstructed by utilizing the kernel empirical orthogonal complement components of positive feedback images. Finally, a SVM classifier is constructed and all the images are resorted based on the new reconstructed image feature. Experiments on large databases show that the proposed algorithms are significantly more effective than the state-of-the-art approaches. & 2013 Elsevier B.V. All rights reserved.

Keywords: Content-based image retrieval Relevance feedback Support vector machine Orthogonal complement component analysis Feature reconstruction

1. Introduction With advances in information technology, there is an explosive growth of image databases, which demands effective and efficient tools that allow users to search through such a large collection. Traditionally, the most straightforward way to implement image database-management systems is by means of using the conventional database-management systems such as relational databases or object-oriented databases. The system of these kinds is usually called keyword-based, in which the images are annotated with keywords. However, as the databases grow larger, the traditional keywords based method to retrieve a particular image becomes inefficient and suffers from the following limitations: (1) It is difficult to express visual content like color, texture, shape, and object within the image precisely. (2) For a large dataset, it requires more skilled labor and need very large, sophisticated keyword systems. (3) Further, the keywords increase linguistic barrier to share image data globally. To overcome several of these n Corresponding author at: School of Computer and Information Technology, Liaoning Normal University, Dalian 116029, China. Tel.: þ 86 411 85992415; fax: þ86 411 85992005. E-mail addresses: [email protected] (X.-Y. Wang), [email protected] (H.-Y. Yang).

0925-2312/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2013.08.007

limitations, many content-based image retrieval (CBIR) systems have been proposed in recent decades, including QBIC, Photobook, MARS, NeTra, PicHunter, Blobworld, VisualSEEK, SIMPLIcity, and others [1,2]. In a typical CBIR system, low-level visual image features (e.g., color, texture, and shape) are automatically extracted for image descriptions and indexing purposes. To search for desirable images, a user presents an image as an example of similarity, and the system returns a set of similar images based on the extracted features. However, the paramount challenge in CBIR is the so-called semantic gap between the low-level visual features and the high-level semantic concepts. To bridge the semantic gap, relevance feedback (RF) methods were proposed to learn the user's intentions [3]. Relevance feedback, originally developed for text retrieval systems, has found wide spread acceptance in CBIR systems [4,5]. The main idea of relevance feedback is to let the user guide the system. The conventional process of relevance feedback is as follows: (1) from the retrieved images, the user labels a number of relevant samples as positive feedbacks, and a number of irrelevant samples as negative feedbacks; and (2) the CBIR system then refines its retrieval procedure based on these labeled feedback samples to improve retrieval performance. These two steps can be carried out iteratively. As a result, the performance of the system can be

X.-Y. Wang et al. / Neurocomputing 127 (2014) 214–230

enhanced by gradually learning the user's preferences. Relevance feedback is one of the mechanisms of increasing the accuracy of the retrievals. Most of the researches of the relevance feedback discuss the algorithms and manipulations of the proposed schemes and compare the retrieval results with the different relevance feedback approaches. In this paper, we proposed a novel relevance feedback CBIR approach in the different ways, in which the feature reconstruction and SVM reclassification are utilized. Firstly, the original input image space is projected nonlinearly onto a high dimensional feature space by using nonlinear analysis approaches. Secondly, the covariance matrix of the positive feedback images is calculated, and the kernel empirical orthogonal complement components of the covariance matrix are also calculated. Thirdly, the new features of positive feedback images, negative feedback images, and all the remaining images are reconstructed by utilizing the kernel empirical orthogonal complement components of positive feedback images. Finally, a SVM classifier is constructed and all the images are resorted based on the new reconstructed image feature. The rest of this paper is organized as follows. Section 2 discusses the related works of the relevance feedback. Section 3 describes the covariance matrix based kernel empirical orthogonal complement component analysis. Section 4 contains the description of our SVM-based active feedback system. Simulation results in Section 5 will show the performance of our scheme. Finally, Section 6 concludes this presentation.

2. Related works In recent years, there is an unprecedented development in the relevance feedback CBIR field, and many relevance feedback methods have been introduced. These relevance feedback techniques can be roughly divided into subspace learning, random sampling, feature selection, and support vector machine (SVM) based methods [3–6]. 2.1. Subspace learning In essence, the subspace learning based RF methods either find a low-dimensional subspace of the feature space, such that the positive and negative samples are well separated after projection to this subspace, or define a (1 þx)-class problem (biased discriminant analysis or BDA for short) and find a subspace within which to separate the one positive class from the unknown number of negative classes. Lu et al. [7] proposed a semisupervised subspace learning algorithm for image retrieval, and it is fundamentally based on locality-preserving projection (LPP) which can incorporate user's relevance feedbacks. As the user's feedbacks are accumulated, a semantic subspace can be ultimately obtained in which different semantic classes can be best separated and the retrieval performance can be enhanced. By intelligently utilizing the similarity and dissimilarity information in semantic and geometric (image) domains, Yu et al. [8] proposed an optimal semantic subspace projection (SSP) that captures the most important properties of the subspaces with respect to classification. Si et al. [9] presented a family of subspace learning algorithms based on a new form of regularization, which transfers the knowledge gained in training samples to testing samples. In particular, the new regularization minimizes the Bregman divergence between the distribution of training samples and that of testing samples in the selected subspace, so it boosts the performance when training and testing samples are not independent and identically distributed. Mehdizadeh et al. [10] proposed a subspace learning method based on semi-supervised neighborhood preserving discriminant

215

learning, which was called semi-supervised neighborhood preserving discriminant embedding (SNPDE). The method preserves the local neighborhood structure of face manifold and maximizes the separability of different classes using linear discriminant analysis (LDA). The subspace learning can reduce the dimensionality of data without losing intrinsic information. But, for a classification task with C classes, if the dimension of the projected subspace is strictly lower than C-1, the projection to a subspace tends to merge those classes, which are close together in the original feature space. In addition, the subspace learning FR tends to give undesired results in the reduced subspace if samples in a class are multimodal.

2.2. Random sampling Random sampling is an essential component of bagging, which is a variance reduction technique. Random sampling based RF approaches can apply statistical sampling techniques to reduce some particular problems in relevance feedback. For example, the classifier unstable problem caused by the insufficient number of labeled feedback samples, the classification hyperplane biased problem caused by the imbalanced distributions of positive and negative classes, the over-fitting problem caused by the very highdimensional features used for image representation, etc. [6]. Zhang et al. [11] presented a random sampling SVM based query expansion for relevance feedback learning. Firstly, they adopted a random sampling method to construct multiple asymmetric bagging SVM classifiers (hard or binary SVM each) and aggregated them to form a compound SVM classifier by classifier committee voting. Subsequently, the voting results were combined with query expansion to sort the final feedback ranking results. Sirikunya et al. [12] proposed a new relevance feedback approach based on a lazy processing framework, in which the random sampling, data clustering, and ensembles of classifiers are combined. Zhou et al. [13] presented a method for image retrieval that employs a maximum likelihood method to organize the dataset and a quasirandom stratified sampling for the query operation. The method is quite general in nature and allows the use of a variety of metrics between image pairs and descriptors. For random sampling based RF approaches, it is a difficult issue how to combine multiple classifiers to form a classification.

2.3. Feature selection Generally, the feature selection based FR methods can adjust weights associated with various dimensions of the feature space to enhance the importance of those dimensions that help in retrieving the relevant images and to reduce the importance of those dimensions that hinder the retrieval performance. Kherfi et al. [14] presented a new RF framework based on a feature selection algorithm that nicely combines the advantages of a probabilistic formulation with those of using both the positive example (PE) and the negative example (NE). Ziou et al. [15] proposed a probabilistic framework for efficient retrieval and indexing of image collections. This framework uncovers the hierarchical structure underlying the collection from image features based on a hybrid model that combines both generative and discriminative learning. Piras et al. [16] proposed a weighted similarity measure based on the nearest-neighbor relevance feedback technique. Each image is ranked according to a relevance score depending on nearest-neighbor distances from relevant and nonrelevant images. Distances are computed by a weighted measure, the weights being related to the capability of feature spaces of representing relevant images as nearest-neighbors. Chen et al. [17] proposed an

216

X.-Y. Wang et al. / Neurocomputing 127 (2014) 214–230

automatic visual feature weighting method to enhance contentbased image retrieval (CBIR), which can capture user's search intention by identifying the important visual features located at region of interest. Given a query image, the importances of visual features are automatically weighted by a random walk algorithm from a feature association graph, whose association strength is estimated by a localized visual word cooccurrence count among a set of pseudo relevance feedbacks. However, after the feature is weighted, the new feature distribution may be different from the original one. 2.4. Support vector machine based method Support vector machine based FR methods either estimate the density of positive instances or regard RF as a classification problem with the positive and negative samples as training sets. SVM active learning [18,19], which plays an important role in CBIR RF research, selects the samples near the SVM boundary and queries the user for labels. After training, the points near the SVM boundary are regarded as the most informative images while the most-positive images are the farthest ones from the boundary on the positive side. Steven et al. [20] proposed a novel semisupervised SVM batch mode active learning scheme for solving relevance feedback in content-based image retrieval. The scheme handled the small training size problem via a semi-supervised learning technique, and the batch sampling problem in active learning by a min–max framework. Liu et al. [21] proposed a SVMbased active feedback in image retrieval using clustering and unlabeled data, in which a new active selection criterion to select images for user's feedback is designed, and unlabeled images are incorporated within the cotraining framework. Huang et al. [22] proposed a novel paired feature AdaBoost learning system for relevance feedback-based image retrieval. To facilitate density estimation in the feature learning, the author proposed an ID3like balance tree quantization method to preserve most discriminative information. By using paired feature combination, they mapped all training samples obtained in the relevance feedback process onto paired feature spaces and employed the AdaBoost algorithm to select a few feature pairs with best discrimination capabilities in the corresponding paired feature spaces. Min et al. [23] presented a fuzzy support vector machine (FSVM) that is more robust to the problems, such as small size of samples, biased hyperplane, over-fitting and real-time. However, it is difficult to select an effective fuzzy membership function, because different people have different methods to select the fuzzy membership function. The SVM based FR methods have been popular because they can outperform many other classifiers when the size of the training set is small, but they are often very complex and some unsatisfactory relevance of results occur frequently. In this paper, the main idea is to reconstruct all image features with the eigenfeature space of positive feedback images, and SVM reclassify the reconstructed images.

3. The covariance matrix based kernel empirical orthogonal complement component analysis Relevance feedback is an effective method to bridge the gap between the low-level visual features and the high-level semantic meanings, and can scale up the performance in CBIR systems. The support vector machine is a popular small sample learning method used in recent years, which obtains top-level performance in different applications. The SVM based relevance feedback has shown promising results owing to its good generalization ability, without restrictive assumptions regarding the data, fast learning and evaluation for relevance feedback, flexibility, etc. However,

most of SVM-based relevance feedbacks treat positive and negative feedbacks equivalently. This assumption is not appropriate, since the two groups of training feedbacks have very different properties, that is, all positive feedbacks share a homogeneous concept while negative feedbacks do not. To explore solutions to the above issue, according to the fact that a single Gaussian distribution often accurately describes the distribution of samples in the input feature space when the positive feedbacks are similar objects under the same conditions, Tao et al. [24] derived the linear space orthogonal complement component analysis. Like principal component analysis, orthogonal complement component analysis is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called orthogonal complement components. The orthogonal complement component analysis can capture the invariant subspace of all positive feedbacks or the homogeneous concept shared by all positive feedbacks. The labeled positive feedbacks are mapped to their center. This mapping is realized in a feature subspace, into which other samples are also mapped. Experiments show that orthogonal complement component analysis performs better than conventional SVM-based relevance feedbacks when the positive feedbacks are similar objects under the same conditions. However, considering all positive feedbacks forming a single Gaussian is not reasonable, because there are always many different objects under different conditions (e.g. different view angles, different illuminations, etc.) for CBIR. Meanwhile, the dimension of the orthogonal complement components decreases with the increasing of the positive feedbacks. Consequently, the performance of the system will be degraded by the noise. The covariance matrix is a symmetric matrix where its diagonal entries represent the variance of each feature and the nondiagonal entries represent their respective correlations. To improve image retrieval performance with SVM-based relevance feedback, we propose a covariance matrix based kernel empirical orthogonal complement component analysis to analyze positive feedback and negative feedbacks, which can be regarded as an enhanced orthogonal complement component analysis. For the proposed covariance matrix based kernel empirical orthogonal complement component analysis, the original input space is first nonlinearly mapped to an arbitrarily high dimensional feature space according to the kernel approach, in which the distribution of samples is linearized. And then, the orthogonal complement component analysis is used to obtain a classifier in the kernel feature space. Different from orthogonal complement component analysis, we use not only positive feedbacks but also negative feedbacks to construct the bases, that is Φ ? A spanfφðxiþ ÞjPi¼ 1 ;

φðxj ÞjN j ¼ 1g

ð1Þ

In fact, the most suitable way to construct the bases is to incorporate all images in the database, because many more kernel features can be generated by this approach. However, the method is practically intractable for CBIR relevance feedback. Therefore, we can only use all the labeled feedback images to construct the bases. Given a set of images fxi ; 1 ri r Ng, where N is the number of images. Similar to SVM and other kernel machines, we first map a sample x to φðxÞ in a higher dimensional kernel space, and then we can calculate the covariance matrix of the positive feedbacks according to P

Ε¼ ∑

i¼1



ðφðxiþ Þ  φðx þ ÞÞðφðxiþ Þ  φðx þ ÞÞT



ð2Þ

X.-Y. Wang et al. / Neurocomputing 127 (2014) 214–230 P

x þ ¼ ð1=P ∑ xiþ Þ i¼1

After the positive feedbacks, negative feedbacks, and images in the database have been projected onto the empirical kernel orthogonal complement subspace, we set the feedback images as the training samples z

ð3Þ

P

φðx þ Þ ¼ ð1=PÞ ∑ φðxiþ Þ

ð4Þ

i¼1

where xiþ is the ith positive feedback image, x þ is the center of the positive feedback images, and φðx þ Þ is the center of the positive feedbacks in the higher dimensional kernel space. The principal components describe the variance of the distribution of the feedback images while the orthogonal complement components describe the invariance. That is, the orthogonal components correspond to the directions with minimal variances. So, we can obtain the orthogonal complement component by solving the eigenvalue problem using the Karhunen–Leove transformation (KLT) [25]   Λ 0 ¼ ½ Φ Φ ? T Ε½ Φ Φ ?  ð5Þ 0 0 where E is the covariance matrix of positive feedbacks, Φ is the principal subspace of Ε, Φ ? is the orthogonal complement subspace of Φ in Ε, and Λ is the corresponding diagonal matrix of the eigenvalues of Φ. The unitary matrix Φ ? defines a coordinate transform, which de-correlates the data, makes explicit the invariant subspace of the matrix operator Ε, and ensures that all the positive feedbacks are mapped to their center. After having obtained the orthogonal complement component Φ ? , we can reconstruct the positive feedbacks, negative feedbacks, and images in the database by three steps. Step 1: project all the positive feedback images xiþ onto the empirical orthogonal complement component subspace according to yiþ ¼ ðΦ ? ÞT ðφðxiþ Þ  φðx þ ÞÞ

ð6Þ

217

z ¼ ½ yiþ jPi¼ 1 ; yj jN j ¼ 1 and train the standard SVM classifier, N is the number of the negative feedback images. Finally, we can measure the dissimilarity through the output of SVM according to Ns

∑ αi yi Kðzi ; yÞ þ b

ð9Þ

i¼1

where N s is the number of the support vectors. The covariance matrix based kernel empirical orthogonal complement component analysis SVM is described in Algorithm 1. Algorithm 1. The covariance matrix based kernel empirical orthogonal complement component analysis SVM 1. Map the image features to the higher dimensional kernel space, that is x-φðxÞ. 2. Calculate the covariance matrix Ε of the positive feedback images. 3. Calculate the eigenvalue and the eigenvector of the covariance matrix Ε. 4. Calculate the kernel empirical orthogonal complement components Φ ? of the covariance matrix Ε by solving the eigenvalue problem 0 ¼ ðΦ ? ÞT ΕΦ ? . 5. Reconstruct the new image features: 5.1. Section 3. Step 1; 5.2. Section 3. Step 2; 5.3. Section 3. Step 3. 6. Train a standard SVM classifier on z ¼ ½ yiþ jPi¼ 1 ; yj jN j ¼ 1 . 7. Resort the projected remaining images y using the output of

xi

Step 2: project all the negative feedback images onto the empirical kernel orthogonal complement subspace according to yi ¼ ðΦ ? ÞT ðφðxi Þ  φðx þ ÞÞ

s SVM ∑N i ¼ 1 αi yi Kðzi ; yÞ þb.

4. Image retrieval system

ð7Þ

Step 3: project the remaining images x in the database onto the same subspace according to y ¼ ðΦ ? ÞT ðφðxÞ  φðx þ ÞÞ

ð8Þ

For CBIR the search engine is required to feedback the most semantically relevant images after each previous relevance feedback iteration. The user will not label many images for each iteration and will usually only do a few iterations. Fig. 1 describes our image retrieval system framework with relevance

N Relevance Feedback

User Y

Positive Images User Labeling

Visual Features

Query Image

Negative Images Image

Query Unit

Results

Retrieval Results Similarity Image Database

Measure

Retrieval Unit

Final

Labeling Unit

Construct the SVM Classifiers

Samples Reconstruction

Learning Unit Fig. 1. Image retrieval system flow chart.

Compute Orthogonal Complement Component

Kernel Space Projection

218

X.-Y. Wang et al. / Neurocomputing 127 (2014) 214–230

feedback. From Fig. 1, we can see that our SVM feedback system mechanism has four main components: query, retrieval, labeling, and learning. Compared with the traditional CBIR without

feedback, there are two more components which are labeling and learning, and they are the key contributions in the relevance feedback system.

Fig. 2. User interface of the image retrieval system.

Fig. 3. The retrieval results by using the proposed algorithm without the relevance feedback: (a) the query image “Panda” (from COREL) and (b) the query image “Monkey” (from Caltech).

X.-Y. Wang et al. / Neurocomputing 127 (2014) 214–230

219

Fig. 4. The retrieval results after the first feedback iteration for query image “Panda” (from COREL): (a) the Scheme [18], (b) the Scheme [21], (c) the Scheme [22], and (d) our Scheme.

 Query Unit: Extract three features of every image in the database, then, store these features into the feature database. The three features are color, texture, and shape. The details of the three features are given as follows: (1) Color feature. Global color histogram (GCH) features are used as the color features. First, we convert the color space

from RGB into HSV. Then, we quantify the HSV color space into 64 bins, hue, saturation, and values are quantized into 8 bins, 4 bins, and 2 bins respectively. (2) Texture feature. Pyramidal wavelet transform (PWT) features are employed as the texture features. An image is transferred into the YCrCb color space, and the Haar Wavelet transform is

220

X.-Y. Wang et al. / Neurocomputing 127 (2014) 214–230

N

N

N

N

Y

N

N

Y

Y

N

N

Y

N

N

N

Y

Y

Y

N

Y

Y

N

N

Y

N

Y

N

Y

Y

Y

N

Y

N

Y

N

N

Y

N

N

N

Fig. 5. The retrieval results after the first feedback iteration for query image “Monkey” (from Caltech): (a) the Scheme [18], (b) the Scheme [21], (c) the Scheme [22], and (d) our Scheme.

X.-Y. Wang et al. / Neurocomputing 127 (2014) 214–230

221

Fig. 6. The retrieval results after the second feedback iteration for query image “Panda” (from COREL): (a) the Scheme [18], (b) the Scheme [21], (c) the Scheme [22], and (d) our Scheme.

applied to the component of Y. The mean and standard deviation are calculated in terms of the subbands at each decomposed level. The vector dimension is 18. (3) Shape feature. Edge direction histogram (EDH) features are used as the shape features. The edge of the images is extracted using the Sobel edge detection algorithm.



An image is transferred into the YCrCb color space, and the statistical edges of horizontal, 451, vertical, 1351, and isotropic edges are calculated on the component of Y. The vector dimension of EDH is 5. Retrieval Unit: Selecting randomly an image from the image database as the query image, and then, compute the similarity

222

X.-Y. Wang et al. / Neurocomputing 127 (2014) 214–230

N

N

N

N

Y

Y

Y

N

N

Y

N

Y

N

Y

N

N

N

Y

Y

N

Y

N

Y

N

Y

Y

N

N

N

N

Y

Y

N

Y

N

Y

N

Y

Y

N

Fig. 7. The retrieval results after the second feedback iteration for query image “Monkey” (from Caltech): (a) the Scheme [18], (b) the Scheme [21], (c) the Scheme [22], and (d) our Scheme.

X.-Y. Wang et al. / Neurocomputing 127 (2014) 214–230

223

Fig. 8. The retrieval results after the third feedback iteration for query image “Panda” (from COREL): (a) the Scheme [18], (b) the Scheme [21], (c) the Scheme [22], and (d) our Scheme.

224

X.-Y. Wang et al. / Neurocomputing 127 (2014) 214–230

Y

Y

Y

N

Y

Y

N

N

Y

N

Y

N

Y

N

Y

N

N

N

N

Y

Y

Y

N

Y

Y

Y

N

Y

N

N

Y

Y

Y

Y

Y

Y

Y

Y

Y

N

Fig. 9. The retrieval results after the third feedback iteration for query image “Monkey” (from Caltech): (a) the Scheme [18], (b) the Scheme [21], (c) the Scheme [22], and (d) our Scheme.

X.-Y. Wang et al. / Neurocomputing 127 (2014) 214–230

225

Fig. 10. The retrieval results after the fourth feedback iteration for query image “Panda” (from COREL): (a) the Scheme [18], (b) the Scheme [21], (c) the Scheme [22], and (d) our Scheme.

between the query image and the images in the database by the Euclidean distance and return 10 most similar images as the retrieval result.

 Labeling Unit: Labeling the top 10 returned images as the positive feedback sample or the negative feedback samples at the first feedback. Moreover, select other 290 most informative

226

X.-Y. Wang et al. / Neurocomputing 127 (2014) 214–230

Y

Y

Y

Y

N

Y

Y

N

Y

Y

Y

Y

Y

Y

Y

Y

Y

N

Y

N

Y

N

N

N

Y

N

N

Y

Y

N

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Fig. 11. The retrieval results after the fourth feedback iteration for query image “Monkey” (from Caltech): (a) the Scheme [18], (b) the Scheme [21], (c) the Scheme [22], and (d) our Scheme.

X.-Y. Wang et al. / Neurocomputing 127 (2014) 214–230



% Compute the similarity between the query image and every image in DB. In each round of relevance feedback: 3. P←P new , N←N [ N new , U←U  ðP [ NÞ % P and N are the positive and negative label samples. 4. N new ←Sampling ðU; sÞ, jN new j ¼ Sort Asc ðSi;j Þ % Select the most informative samples from U as the negative samples. 5. H←The covariance matrix based kernel empirical orthogonal complement component analysis SVM train (P, N). % Shown in Algorithm 1. 6. H←NormalizeðHÞ % Normalize H to (  1, 1). 7. For x A U, do Pool←Sort Ast ðAbsðHðxÞÞÞ % Select new label samples from the Pool. 8. For x A DB, do Result←Sort Dsc ðHðXÞÞ % The result with relevance feedback. Output: Pool, Result.

samples from the unlabeled image database as the negative feedback sample to improve the performance of system with the large number of samples. At the next feedback iterations, the labeled samples are selected from the feedback pool (N ¼ 15) and the unlabeled image database, the total number of the labeled samples is 300. Learning Unit: Training a relevance feedback model based on the SVM machine learning algorithm. In this unit, we first map the image features into the kernel space, then we make orthogonal complement component analysis on the positive feedback images, and finally we reconstruct the new features of the positive, negative, the remaining images in the database, and use them to train a standard SVM classifier which is proposed in Section 3. The SVM classifier resort the remaining images and return the feedback results, if the user is satisfied with the current retrieval result, the feedback process will be stopped, otherwise user labels new samples and goes to the next feedback iteration. The specific algorithm is shown in Algorithm 2. Algorithm 2. The proposed SVM relevance feedback contentbased image retrieval Input: DB ¼ L [ U – image database (L and U are the labeled and the unlabeled samples respectively), s – the sampling scale parameter. 1. Compute the visual features xi ði ¼ 1; 2; 3Þ of the images in the image database.  2. For xi ; xj A DB, do Si;j ←exp  jxi xj j2

As shown in Algorithm 2, the proposed algorithm in this paper includes initial retrieval without feedback and relevance feedback iterations. In the initial retrieval, we compute the visual features of images in the database and store them into the feature database, then, the user selects randomly an image from the image database as the query image and computes the similarity between the query image and every image in the image database, and finally, sorts the similarity and returns the top 10 images as the retrieval

Top 60 returned images

Top 20 returned images 0.9

0.6

0.8

0.5

0.7 0.6

Average AP

Average AP

227

0.5 0.4 0.3 0.2

0.4 0.3 0.2 0.1

0.1 0

0 0

1

2

3

4

0

Scheme[18]

Scheme[21]

Scheme[22]

1

2

3

4

Number of feedback iterations

Number of feedback iterations Scheme[18]

Our Scheme

Top 40 returned images

Scheme[21]

Scheme[22]

Our Scheme

Top 80 returned images

0.7

0.45

0.6

0.4 0.35 Average AP

Average AP

0.5 0.4 0.3 0.2

0.3 0.25 0.2 0.15 0.1

0.1 0

0.05 0

Scheme[18]

1 2 3 Number of feedback iterations Scheme[21]

Scheme[22]

4

Our Scheme

0

0 Scheme[18]

1 2 3 Number of feedback iterations Scheme[21]

Scheme[22]

4 Our Scheme

Fig. 12. Relationship between average AP and number of feedback iterations: (a) the top 20 returned images, (b) the top 40 returned images, (c) the top 60 returned images, and (d) the top 80 returned images.

228

X.-Y. Wang et al. / Neurocomputing 127 (2014) 214–230

results. After obtaining the relevance feedback iterations, the user labels the retrieval results as the positive samples or the negative samples at the first feedback, then, selects more negative samples from the unlabeled images which is shown is Algorithm 2. After that, we map the image features into the kernel space, then we make orthogonal complement component analysis on the positive feedback images, and finally we reconstruct the new features of the positive, negative, and the remaining images in the database, and use them to train a standard SVM classifier. The SVM classifier returns 10 images as the feedback results and 15 images in the pool which are to be labeled in the next feedback iteration. After the first feedback, the user labels the training samples from the pool, the other processes are the same as the above. When the user is satisfied with the feedback result, the feedback iterations stop. It can be seen from above flow that no additional efforts are required in the feedback process, such as asking the user to label more examples; what the user is to do is to label the returned images to be the positive or the negative samples. This algorithm puts minimum burden onto the user, so it complies with the requirements of the algorithm.

5.1. Image database We perform experiments over 5000 images from 150 categories of the COREL photo gallery, in which each category contains 100 images. Every database image is of size 256  384 or 384  256. To evaluate the retrieval performance, we need a ground truth to assess the relevance of the test query images. We follow the previous work to construct the ground truth by merging semantically similar categories. The reorganized database consists of 25 “semantic categories”. In our experiments, 1000 images are randomly selected out of the whole databases as test queries. We define images within the same semantic category as relevant. We also perform experiments over 5000 images from 256 object categories of the Caltech image database [26]. The Caltech image database comprises 30,607 images, in which each category has a minimum of 80 images. Caltech images are harvested from other popular online image database, and they represent a diverse set of lighting conditions, poses, backgrounds, image sizes, and camera systematics. The categories were hand-picked by the authors to represent a wide variety of natural and artificial objects in various setting. The organization is simple and the images are ready to use, without the need for cropping or other processing.

5. Experimental results 5.2. Feature extraction To evaluate the performance of the proposed algorithm, we conduct an extensive set of CBIR experiments by comparing the proposed algorithm to several state-of-the-art feedback methods [21,22,18] that have been used in image retrieval.

Generally in a CBIR relevance feedback system, images are represented by three main features: color, texture and shape [1]. Color information is the most informative feature because of its

The first feedback iteration

0.6

The third feedback iteration

0.8 0.7

0.5 Average AP

Average AP

0.6 0.4 0.3 0.2

0.5 0.4 0.3 0.2

0.1

0.1 0

0 20

40

60

20

80

Number of returned images Scheme[18]

Scheme[21]

Scheme[22]

Our Scheme

Scheme[18]

The second feedback iteration

0.7

60

80

Scheme[21]

Scheme[22]

Our Scheme

The fourth feedback iteration

0.9 0.8

0.6

0.7 Average AP

0.5 Average AP

40

Number of returned images

0.4 0.3 0.2

0.6 0.5 0.4 0.3 0.2

0.1

0.1

0

0 20

Scheme[18]

40 60 Number of returned images Scheme[21]

Scheme[22]

80

Our Scheme

20 Scheme[18]

40 60 Number of returned images Scheme[21]

Scheme[22]

80 Our Scheme

Fig. 13. Relationship between average AP and number of returned images: (a) the first feedback iteration, (b) the second feedback iteration, (c) the third feedback iteration, and (d) the fourth feedback iteration.

X.-Y. Wang et al. / Neurocomputing 127 (2014) 214–230

160 140 120 Time(second)

robustness with respect to scaling, rotation, perspective, and occlusion. Texture information can be another important feature and previous studies have shown that texture structure and orientation fit well the model of human perception, similarly with shape information. In this paper, the global color histogram (GCH) is selected as the color features, pyramidal wavelet transform (PWT) features are employed as the texture features, and the edge direction histogram (EDH) is used as the shape features. For the 128 dimension SIFT, the feature vector for each keypoint is 4  4  8 ¼128 dimension.

100 80 60 40 20 0

5.3. Comparative performance evaluation We report experimental results that show the feasibility and utility of the proposed algorithm and compare its performance with three state-of-the-art feedback methods [21,22,18]. To simulate the practical situation of online users, the sequence of query images used in all the experiments is generated at random. At the beginning of image retrieval, the images in the database are ranked according to their Euclidean distances to the query, and top 15 images are labeled as the set of initially labeled data for learning system. In our system, according to Fig. 11, only 15 images judged as the most informative ones are put into a pool in each round of relevance feedback, which are shown in Fig. 2. In particular, only the positive images are required to be marked by the user and all the other images are automatically marked as negative by the system. Fig. 3 shows the retrieval results by using the proposed algorithm without the relevance feedback. Here, the image at the top of left-hand corner is the query image; the other images are the retrieval results. “R” denotes relevant image, and “N” denotes nonrelevant image. Figs. 4–11 show the retrieval results using different relevance feedbacks after the first feedback iteration. It is not difficult to see that, compared with the retrieval results without relevance feedback (see Fig. 3), the performance of our SVM relevance feedback retrieval system is improved, but it is not obvious. However, as the number of the feedback iteration increases, the performance of our SVM relevance feedback retrieval system becomes better and better, and it is more effective than three state-of-the-art feedback methods. We use the Average Precision (AP) measure defined by NISTTREC video (TRECVID) as our retrieval performance metric [27]. At each query session our system refines the retrievals by executing the proposed SVM relevance feedback framework for several iterations. The AP value that can be obtained at each iteration is defined as the average of precision value obtained after each relevant picture is retrieved. The precision value is the ratio 1 0.9 0.8

Average AP

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

100 Scheme[18]

150 200 250 Number of Training Samples Scheme[21]

Scheme[22]

229

1 Scheme[18]

2 3 Number of feedback iterations Scheme[21]

Secheme[22]

4 Our Scheme

Fig. 15. Relationship between time and number of feedback iterations.

between the retrieved relevant pictures and the number of pictures currently retrieved. Let P be the AP obtained at the current feedback iteration and it is computed by P ¼ ∑Di A R P i =jRj where P i denotes the precision value obtained after the system retrieves i top-ranked pictures, Di is one of the relevant pictures, R is the set of all relevant pictures that belong to the same class as the query, and jRj denotes the cardinality of R. The AP calculated over all relevant pictures can avoid precision fluctuation that is usually encountered by the traditional precision measure. Fig. 12 shows the variations of the average AP as the number of feedback iterations increases. Note that the average AP obtained at the zero feedback iteration indicates the average AP calculated at the first retrieval result of each query session before activating the relevance feedback process. Fig. 13 shows the variations of the average AP as the number of returned images increases. Fig. 14 shows the relationship between average AP and number of the training samples. Fig. 15 shows the variations of the time as the number of feedback iterations increases. According to Figs. 4–15, we see that the image retrieval performance by the proposed method is competitive with the other tested methods. The effectiveness of the proposed SVM relevance feedback CBIR algorithm results from using the feature reconstruction, in which the covariance matrix based kernel empirical OCCA is utilized.

6. Conclusion Recently, SVM has been widely applied in relevance feedback, which plays an essential role in improving the performance of CBIR. The main advantage of SVM is that its good generalization ability, without restrictive assumptions regarding the data, fast learning and evaluation for relevance feedback, flexibility, etc. However, the conventional SVM-based relevance feedback is often very complex and some unsatisfactory relevance of results occurs frequently. To overcome the above limitations, we propose a SVM relevance feedback CBIR algorithm based on feature reconstruction, in which the covariance matrix based kernel empirical orthogonal complement component analysis is utilized. Through experiments on Corel Photo Gallery and Caltech image collection, we show that our new method can improve the conventional SVM active learning based relevance feedback consistently.

300

Acknowledgment Our Scheme

Fig. 14. Relationship between average AP and number of training samples.

This work was supported by the National Natural Science Foundation of China under Grant nos. 61272416, 60873222, and 60773031,

230

X.-Y. Wang et al. / Neurocomputing 127 (2014) 214–230

the Open Project Program of Jiangsu Key Laboratory of Image and Video Understanding for Social Safety (Nanjing University of Science and Technology) under Grant no. 30920130122006, and Liaoning Research Project for Institutions of Higher Education of China under Grant no. L2013407. Reference [1] M. Wang, H. Li, D. Tao, K. Lu, X. Wu, Multimodal graph-based reranking for web image search, IEEE Trans. Image Process. 21 (11) (2012) 4649–4661. [2] W. Hu, N. Xie, L. Li, X. Zeng, A survey on visual content-based video indexing and retrieval, IEEE Trans. Syst., Man, Cybern., Part C: Appl. Rev. 41 (6) (2011) 797–819. [3] Y. Rui, T.S. Huang, M. Ortega, S. Mehrotra, Relevance feedback: a power tool for interactive content-based image retrieval, IEEE Trans. Circuits Syst. Video Technol. 8 (5) (1998) 644–655. [4] Peter Auer, Alex Po Leung, Relevance feedback models for content-based image retrieval, Multimedia Analysis, Processing and Communications, Studies in Computational Intelligence, vol. 346, 2011; 59–79. [5] M. Wang, B. Ni, X.S. Hua, Assistive tagging: a survey of multimedia tagging with human–computer joint exploration, ACM Comput. Surv. (CSUR) 44 (4) (2012) 25. [6] W. Bian, D.C. Tao, Biased discriminant Euclidean embedding for content-based image retrieval, IEEE Trans. Image Process. 12 (2) (2010) 545–554. [7] K. Lu, X.F. He, Image retrieval based on incremental subspace learning, Pattern Recognit. 38 (11) (2005) 2047–2054. [8] J. Yu, Q. Tian, Semantic subspace projection and its applications in image retrieval, IEEE Trans. Circuits Syst. Video Technol. 18 (4) (2008) 544–548. [9] S. Si, D.C. Tao, B. Geng, Bregman divergence-based regularization for transfer subspace learning, IEEE Trans. Knowl. Data Eng. 22 (7) (2010) 929–942. [10] M. Mehdizadeh, C. MacNish, R.N. Khan, M. Bennamoun, Semi-supervised neighborhood preserving discriminant embedding: a semi-supervised subspace learning algorithm, Lecture Notes in Computer Science vol. 6494 (2011) 199–212. [11] Zhen Zhang, Rongrong Ji, Hongxun Yao, Pengfei Xu, Jicheng Wang, Random sampling SVM based soft query expansion for image retrieval, in: Proceedings of the Fourth International Conference on Image and Graphics (ICIG 2007), 2007, pp. 805–809. [12] N. Sirikunya, A.H. Kien, P. Antoniya, H.H. Yao, A lazy processing approach to user relevance feedback for content-based image retrieval, in: Proceedings of the 2010 IEEE International Symposium on Multimedia, 2010, pp. 342–346. [13] J. Zhou, A. Antonio Robles-Kelly, A quasi-random sampling approach to image retrieval, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08), Anchorage, AK, 2008, pp. 1–8. [14] M.L. Kherfi, D. Ziou, Relevance feedback for CBIR: a new approach based on probabilistic feature weighting with positive and negative examples, IEEE Trans. Image Process. 15 (4) (2006) 1017–1030. [15] D. Ziou, T. Hamri, S. Boutemedjet, A hybrid probabilistic framework for content-based image retrieval with feature weighting, Pattern Recognit. 42 (7) (2009) 1511–1519. [16] L. Piras, G. Giacinto, Neighborhood-based feature weighting for relevance feedback in content-based retrieval, in: Proceedings of the 10th Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS '09), London, 6–8 May 2009, pp. 238–241. [17] J. Chen, R. Ma, Z. Su, Weighting visual features with pseudo relevance feedback for CBIR, in: Proceedings of the ACM International Conference on Image and Video Retrieval, 2010, pp. 220–227. [18] S. Tong, E. Chang, Support vector machine active learning for image retrieval, in: Proceedings of the ACM Multimedia, 2001, pp. 107–118. [19] M. Wang, X.S. Hua, Active learning in multimedia annotation and retrieval: a survey, ACM Trans. Intell. Syst. Technol. 2 (2) (2011) 10–31. [20] C.H.H. Steven, R. Jin, J.k. Zhu, M.R. Lyu, Semi-supervised SVM batch mode active learning and its applications to image retrieval, ACM Trans. Inf. Syst. 27 (3) (2009) 1–29. [21] Rujie Liu, Yuehong Wang, SVM-based active feedback in image retrieval using clustering and unlabeled data, Pattern Recognit. 41 (8) (2008) 2645–2655. [22] S. Huang, Q. Wu, S. Lai, Improved AdaBoost-based image retrieval with relevance feedback via paired feature learning, Multimed. Syst. 12 (1) (2006) 14–26.

[23] R. Min, H.D. Cheng, Effective image retrieval using dominant color descriptor and fuzzy support machine, Pattern Recognit. 42 (1) (2009) 147–157. [24] D.C. Tao, X.L. Li, Which components are important for interactive image searching, IEEE Trans. Circuits Syst. Video Technol. 18 (1) (2008) 3–11. [25] K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed., Academic Press, Boston, 1990. [26] G. Griffin, A. Holub, P. Perona, Caltech-256 Object Category Dataset, California Institute Technology, Technical Report 7694, 2007. [27] M.J. Huiskes, M.S. Lew, Performance evaluation of relevance feedback method, in: Proceedings of the 2008 International Conference on Content-Based Image and Video Retrieval (CIVR 08), 2008, pp. 239–248.

Xiang-Yang Wang is currently a professor with the Multimedia and Information Security Laboratory, School of Computer and Information Technology, Liaoning Normal University, China. His research interests lie in the areas of information security, image processing, pattern recognition, and computer vision. He is the author of two books. He has published over 80 papers in international journals (including IEEE/ACM Transactions) and 25 papers in international conferences and workshops. Mr. Wang is a reviewer for many leading international and national journals and conferences, including IEEE/ACM Transactions.

Yong-Wei Li received B.E. degree from the School of Computer and Information Technology, Liaoning Normal University, China, in 2012, where he is currently pursuing his M.S.E. degree. His research interests include image retrieval and signal processing.

Hong-Ying Yang is currently an assistant professor with the School of Computer and Information Technology at the Liaoning Normal University, China. She received her B.E. degree from the Liaoning Normal University, China and her M.S.E. degree from the Dalian Maritime University, China, in 1989 and 2010, respectively. Her research interests include signal processing and communications, and digital multimedia data hiding.

Jing-Wei Chen received B.E. degree from the School of Computer and Information Technology, Liaoning Normal University, China, in 2008, where he is currently pursuing his M.S.E. degree. His research interests include image retrieval and signal processing.