An efficient semantic – Related image retrieval method

An efficient semantic – Related image retrieval method

Accepted Manuscript An efficient semantic – related image retrieval method Quynh Dao Thi Thuy , Quynh Nguyen Huu , Canh Phuong Van , Tao Ngo Quoc PII...

9MB Sizes 0 Downloads 79 Views

Accepted Manuscript

An efficient semantic – related image retrieval method Quynh Dao Thi Thuy , Quynh Nguyen Huu , Canh Phuong Van , Tao Ngo Quoc PII: DOI: Reference:

S0957-4174(16)30677-7 10.1016/j.eswa.2016.12.004 ESWA 11013

To appear in:

Expert Systems With Applications

Received date: Revised date: Accepted date:

20 June 2016 18 October 2016 2 December 2016

Please cite this article as: Quynh Dao Thi Thuy , Quynh Nguyen Huu , Canh Phuong Van , Tao Ngo Quoc , An efficient semantic – related image retrieval method, Expert Systems With Applications (2016), doi: 10.1016/j.eswa.2016.12.004

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights

AC

CE

PT

ED

M

AN US

CR IP T

 Retrieving semantic images spread out in the entire feature space.  Determining the semantic weight of each query in combined distance calculation.  Identifying the importance for each feature.  Our method is quite effective, improving the retrieval in one feedback iteration.

ACCEPTED MANUSCRIPT

An efficient semantic – related image retrieval method Quynh Dao Thi Thuy1, Quynh Nguyen Huu2, Canh Phuong Van2, Tao Ngo Quoc3 1

Thai Nguyen University of Science, Quyet Thang, Thai Nguyen, Vietnam

CR IP T

(Email: [email protected]) 2

Information Technology Faculty, Electric Power University, Ha Noi, Vietnam (Email: [email protected],)

3

Institute of Information Technology, Vietnam Academy of Science and Technology, Ha Noi, Vietnam (Email: [email protected])

Abstract:

M

AN US

Many previous techniques were designed to retrieve semantic images in a certain neighborhood of the query image and thus bypassing the semantically related images in the whole feature space. Several recently methods were designed to retrieve semantically related images in the entire feature space but with low precision. In this paper, we propose a Semantic – Related Image Retrieval method (SRIR), which can retrieve semantic images spread in the entire feature space with high precision. Our method takes advantage of the user feedback to determine the semantic importance of each query and the importance of each feature. In addition, the retrieval time of our method does not increase with the number of user feedback. We also provide experimental results to demonstrate the effectiveness of our method.

1. Introduction

ED

Key words: Content-based image retrieval, semantic-related image, semantic weight, importance of feature.

AC

CE

PT

Content based image retrieval (CBIR) has received considerable attention over the past decade due to the demand of processing efficiently enormous amount of multimedia data and its rapid growth rate. Many CBIR systems have been developed, including QBIC [Flickner, M. et al., 1995], Photobook [A. Pentland et al., 1996], MARS [M. Ortega-Binderbergeret et al., 2004], NeTra [W. Y. Ma et al., 1997], PicHunter [I. J. Cox et al., 2000], Blobworld [C. Carson et al., 2002], VisualSEEK [Smith, J.R et al., 1996], SIMPLIcity [J. Z. Wanget al., 2001] and other systems [A. Gupta and R. Jain, 1997; Brunelli, R et al., 2000; Bartolini, I et al., 2001; K. Vu et al., 2003; B. Kang, 2004; H. T. Sshen et al., 2005; K. A. Hua et al., 2006; Liu,Y. et al., 2009]. In a typical CBIR system, the low-level visual image features (color, texture and shape) are extracted automatically for the goal of indexing and describing images. To search for the desired images, the user inserts a sample image and the system returns a set of similar images based on the extracted features. Querying semantic-related image in content-based image retrieval systems refers to finding the images that according to the user are related to each other in terms of semantics [R. Chang et al., 2011; Heath, D et al., 2014; Alzu'bi et al., 2015; Y. Yan et al., 2016] such as yellow and red 2

ACCEPTED MANUSCRIPT

AN US

CR IP T

rose images being semantic-related to each other. The approach of relevance feedback in content-based image retrieval (CBIR) is an active research area in recent years. There are several good methods which use this approach can be found in [Rui, Y et al., 1997; Rui, Y et al., 1998; Ishikawa, Y et al., 1998; M. Ortega-Binderberger et al., 2004; Andre B et al., 2012; Zhongmiao Xiao et al., 2014]. Most existing CBIR systems have presented the images by the feature vector using the visual features in which two vectors are considered close together if their corresponding images are similar. When CBIR systems present a set of images considered to be similar to a given query image, the user can retrieve the most relevant images for that given query and the systems adjust the query using relevant images that user has selected [Zhongmiao Xiaoet al, 2014] CBIR techniques based on relevance feedback do not require the user to provide initial queries correctly but require the user to construct optimal queries by evaluating whether images are relevant.

ED

M

The CBIR approach assumes that, in principle, the image is closely related to the query image in a certain feature space. However, the similarities among the images perceived by human does not necessarily correlate with the distance between them in the feature space. That is, the semantic-related image might be spread out in the entire feature space and scattered in several clusters rather one. In this case, the traditional relevance feedback [Rocchio, J.J, 1971; Flickner, M. et al, 1995; Smith, J.R et al, 1996; Rui, Y et al, 1998; Ishikawa, Y et al, 1998; Porkaew, K. et al,1999; Rui, Y et al, 1999; Bartolini, I et al, 2001; K. Vu et al, 2003; M. OrtegaBinderberger et al, 2004; D. Liu et al, 2009, Mattia Broilo et al, 2010; Andre B et al, 2012; Hieu. V. V., Quynh. N. H., 2014] does not work well when shifting the query center.

AC

CE

PT

Relevance feedback implementation refers to calculating one or more new query points in the feature space and changing the distance function. As shown in Figure 1(a), the initial research which use this approach can found in [Smith, J.R et al, 1996; Ishikawa, Y et al, 1998; Porkaew, K. et al, 1999; D. Liu et al, 2009; Andre B et al, 2012; Hieu. V. V., Quynh N. H., 2014] represents a new query by a single point and adjusts the weights of feature components to find an optimal query point and distance function. In this case, a single point is calculated by the average weight of all relevant images in the feature space. The bounds represent the lines with equivalent similarity. In a later approach, the authors [O. Chum et al, 2007; Robert Davis et al, 2012; Norton, D. et al, 2016] represent a new query with multiple points to determine the bounds as in Figure 1(b). This approach uses a clustering method [Charikar, M. et al, 1997] to calculate new query points by using the query results (related images) based on the user’s feedback. With assumption that the relevant images are mapped to points with similar weight, a wide bound is built to cover all the query points and the system finds the images similar with these queries. However, if the feature space and distance function is extremely different from the user perception, the related images are mapped to a region of any shape separated in the feature space. That is, the related images can be classified as other retrieved images in a given query. To quickly converge to higher semantic level information, the system will search for related images with any query points as shown in Figure 1(c). A query that retrieves the images similar to any 3

ACCEPTED MANUSCRIPT

AN US

CR IP T

of the query points is called a disjunctive query. In particular, a complex image query is represented in many separated cluster because semantically related images can be scattered in several visual clusters rather than one.

(a)

(b)

(c)

Figure 1. Query shape. (a) Adjust query point (b) Convex shape (multipoint) (c) Concave shape (multipoint)

AC

CE

PT

ED

M

Jin & French (2005) presented in [Jin & French, 2005] follows this separated query approach. The method has the advantage that the results are related-semantic images in the entire feature space. However, the method has limitations: the number of queries for the next iteration depends on the user's selection, thus two unfavorable possibilities might happen: In the first one, the user choose not enough feedback images (less than the number of clusters in the feature space). In this possibility, the performance of the system would not be guaranteed because according to the clustering theory [C.J. van Rijsbergern, 1997] more queries will cover more clusters. The second possibility is that the user selects too many feedback images. This possibility will increase the burden on combining multiple result lists (each query will have a result list). In addition, too many queries could not improve system performance much (experiment in [Jin & French, 2005]) has shown that precision increases rapidly from 1 to 8 queries and increases slowly from 8 to 20 queries). For instance, in the rose theme Corel database, each rose query image also scatters in four clusters (each cluster corresponds to one color of roses). Besides, this method uses the weights of queries equally and cannot determine the importance of each feature in multi-feature space in distance calculation, thereby not reflects properly semantic features of related images. The above limitations are one of the motivations for us to propose the method in this article. In this paper, we propose a related-semantic image retrieval method which has the advantage of not requiring the users to put in multiple appropriate query images to present their information needs, determine the semantic importance of each query (associated with distance) and the importance of each feature in order to improve method performance. In addition, combining 4

ACCEPTED MANUSCRIPT

result lists are not burdened by selecting multiple images from user while retrieving semantic images scattering in the entire feature space. By experiment, we will point out the precision of our method is higher than the tradition content-based image retrieval system, the Jin & French method and SSLL method [Zhongmiao Xiao et al, 2014].

CR IP T

2. Related work

AN US

The initial approach [Flickner, M. et al. 1995; R. Arandjelovi´c et al, 2012] of contentbased image retrieval is unsuitable for query and retrieval model based on the user's perception of visual similarity. There are two components to learn relevance feedback: a distance function and a new query point. The distance function is changed by learning the weights of feature components and the new query point is obtained by learning the optimal point where users search. Shifting query points was applied to image retrieval systems such as MARS [Rui, Y et al, 1997] and MindReader [Jiawei Han et al, 2011]. These systems present queries as a single point in the feature space and try to shift this point towards the positive points and away from negative points. This idea originated from Rochio algorithm [Rocchio, J.J, 1971], which has been successfully used in document retrieval.

AC

CE

PT

ED

M

Subsequently, query adjustment methods using multipoint relevance feedback were introduced. The query expansion approach [Porkaew, K. et al, 1999] of MARS constructs local clusters of related points. In this approach, all of the local clusters are merged to create a wide bound covering all query points. On the other hand, the query shifting approach [Ishikawa, Y et al, 1998; Porkaew, K. et al, 1999] ignores these clusters and considers all related points to be equivalent. These two approaches might generate super-ellipsoid or convex shape using certain feature spaces to cover all the query points for simple queries. However, both approaches fail to identify optimal region for complex queries. [Wu, L. et al, 2000] were presented FALCOM, the whole distance model to facilitate the learning of separated and concave points in vector space and as well as in any measurement space. However, the proposed combination distance function depends on specific experience knowledge and this model assumes that all the related points are query points. [Y. Chen et al, 2003] have also taken this approach of using multiple seed queries, but they use them to expand the bound around the best queries and still essentially search in a single region of feature space. Jin & French (2005) have shown that in the multi-image CBIR system, semantically related image may vary considerably in a certain feature space. During the analysis of experimental performance, the authors assert that searching the center of many queries may receive different efficiencies when the queries are located in one or more clusters. If the queries are located in the same cluster, query center can help improve performance, on the contrary, query center will reduce the performance. In this case, it may have various efficiencies when the queries are 5

ACCEPTED MANUSCRIPT

AN US

CR IP T

located in one or more clusters. Therefore, the authors propose the method of separated query approach (called multipoint [Porkaew, K. et al, 1999]). In their method, instead of finding a query center for selected positive examples, they perform separate queries and then import results corresponding to queries into a comprehensive list. Their method has obtained retrieval results including semantically related images scattered in the entire feature space rather than in the vicinity of a single point query. They use feedback images as subsequent queries instead of adjusting the original query in searching for the best query. They search simultaneously in the feature space and combine outputs to display to the end user. Although the method has the advantage that the results are semantically related images in the entire the semantic feature space, their approach only use the advantage of the merging of multiple representations and multiple systems without diving into utilizing semantic information of images to improve retrieval precision (see detailed analysis in section 1).

PT

3. Proposed method

ED

M

In a paper by Zhongmiao et al [Zhongmiao Xiao et al, 2014], image retrieval is performed with exploiting the synergism between short-term and long-term learning techniques to improve the retrieval performance. Specifically, method construct an adaptive semantic repository in long-term learning to store retrieval patterns of historical query sessions. Method then extract high-level semantic features from the semantic repository and seamlessly integrate low-level visual features and high-level semantic features in short-term learning to effectively represent the query in a single retrieval session. The high-level semantic features are dynamically updated based on users’ query concept and therefore represent the image’s semantic concept more accurately. The technique has the advantage of low space overhead on a large scale imagery database. However, their performances were far from satisfactory.

CE

In this section, we present the proposed image retrieval method. Again, the goals of our retrieval method are retrieving semantic images spread in the entire feature space, taking advantage of the user feedback to determine the semantic importance of each query and the importance of each feature.

AC

Multiple presentations Representations are the processing results of certain transformations of the images [French, J. et al, 2003]. Unlike extracted features, these representations will still keep the original media format and can be directly processed by the retrieval system. For example, a color histogram of a given image is not a representation because it is not a viewable image anymore and cannot be processed directly by the image retrieval system. A grayscale image is a representation of the original color image. Figure 2 shows four representations for a red rose (814.jpg in Corel database).

6

(C-)

(B+)

AN US

(C+)

CR IP T

ACCEPTED MANUSCRIPT

(B-)

M

Figure 2.Four representations of the same image. The representations use the original color image (C+) together with the grayscale image (B+) and both the color negative image (C-) and the grayscale negative image (B-). Multirepresentation retrieval does not help improve performance directly but helps us obtain diverse queries. Cluster algorithm for image objects that user selects

ED

Let S={s1,s2,…,sn} be a sample of n observations in n-dimensional Euclidean space; si is the i feature vector. If k is an integer, 2k
CE

PT

th

Ci ; 1ik (1a) CiCj=; ij

(1b)



(1c)

AC

From this point, the sets {Ci} are called "clusters" in S. Clustering in S refers to recognize a partition k {Ci} of S where subsets contain points having the high similarity within the same group and low similarity among inter groups. Mathematical standard of similarity which is used to determine an "optimal" partition k is called clustering standard. Clustering technique used is K-mean [J. B. MacQueen, 1967], we use this simple clustering to demonstrate that even when using a simple clustering technique, the performance of our method is still high. K-mean uses the centroid ci (the average value of the points representing

7

ACCEPTED MANUSCRIPT

image objects in the feature space) of cluster Ci to represent that cluster. The centroid ci of cluster Ci is calculated according to the formula: ∑

(2)



(

) (3)

AN US



CR IP T

The distance between a point in the feature space representing an image object pCi and the centroid ci is measured by Euclidean distance dist (p, ci). A target function is used to evaluate the quality of the cluster so that within - cluster image objects are similar but different from the images in other clusters. Quality of cluster Ci is measured by the variation within the cluster (total squared error between all the points representing image objects in Ci and the centroid ci) as follows:

Where E is the sum of squared error for all the image objects in the image set.

ED

M

When the user mark some points as semantically related, a small number of good representative points (centroids of clusters) are used to build the subsequent query. Therefore, we cluster the set of semantically related images and select a representative of clusters (centroid or an images of clusters) as subsequent respective query. To emphasize, in query construction, images which are related in the previous iteration are also combined into the clusters of the current iteration. Distance from an image to multipoint query

PT

The distance from an image

to multipoint query MQ=(Q1, Q2,..Qn) is the minimum of

weighted distances from an image )

((

CE

(

to each query Qi and is calculated by the formula (4):

In the formula (4), Dist(

)

(

,Qi

))

*

+ (4)

) with i=1..n, j=1..k is the distance from an

AC

image to a query Qi with feature weight (determined by the algorithm IF in Figure 5), (1-wij) is the sematic weight combined with distance dij (see way to calculate semantic weight in formula (5)). Figure 3 below is the Ri into the aggregate result list R.

algorithm performing the combination of result lists

8

ACCEPTED MANUSCRIPT

Algorithm Input:

CR IP T

Query lists: (Q1, Q2,….Qn) each Qi has a corresponding cluster Ci The number of returned images: k Result lists: (R1, R2,….Rn) //each Ri includes Feature weight: Weight_l Result list R for each Ri (i=1..n) do forj1 to k do 0; fori1 to n do ∑ (1-wij)*dist( If ( > (



(

AN US

Output:

,Qi,Weight_l); ) )

)

ED

M

fori1 to n do forj1 to k do Sort in ascending order of distance Return R

Figure 3. Combination algorithm combines n lists into one list. The Combination algorithm in Figure 3 is implemented as follows:

AC

CE

PT

With the input of query lists Q1, Q2,…Qn and respective result lists R1, R2, ….Rn, each Ri will take first k classified images . The algorithm will calculate the distance Dij from image to query Qi by the function dist( , , ) with feature weight Weight_ l (which is the output of the IF algorithm in Figure 5). Besides, the weight combined with distance Dij is calculated as the number of semantically related images of ith cluster divided by the total number of images of all n clusters. The final distance from an image to query Qi is the minimum of distance values from images to each query Qi. Then the algorithm sorts images in ascending order of distance Dij and eliminates duplicate images to get the combined result R. Proposition 1 [Complexity of the algorithm Combination].

The complexity of the algorithm Combination is ( ), where n is the number of combined lists and k is the number of returned images of each list. Proof. Obviously, there are two major operations in the algorithm Combination are step 1 and step 2. Therefore, we will determine the running time of step 1 and step 2. 9

ACCEPTED MANUSCRIPT

CR IP T

It is clear that, the running time of the step 1 is the time that executes the for loop body k times. In order to determine the time of this loop body, we need to determine running time of step 1.1 and step 1.2. The step 1.1 takes time O(1), step 1.2 takes time O(n) because the running time of both steps 1.2.1, 1.2.3 take O(1), step 1.2.2 calls the dist(,,) function with the running time is O(1). Therefore, according to the sum rule, the complexity of both step 1.1 and step 1.2 are O(n). Due to product rule, from the running time loop body is O(n), step 1 has O(kn) complexity. It is obvious that, the running time of the step 2 is the time that executes the for loop body n times. Thus, we need to determine the running time of step 2.1. This step effects to arrange k objects, which has O(klogk) complexity. According to the product rule, the running time of step 2 is O(knlogk). Moreover, step 3 has O(1) complexity.

AN US

Due to the number of combined lists n is often very small, we may assume that is is a constant. Therefore, the complexity of algorithm Combination is O(klogk). The conclusion of the proposition was proved. Calculate semantic weight combined with distance dij

ED

M

We rely on the perception that, a cluster containing multiple semantically related images is more important than remaining clusters. Therefore, the queries generated from that cluster have semantic weight higher than remaining clusters. So the calculation of semantic weight wij combined with distance dij from image to query Qi (belongs to semantic cluster i) is the ratio of the number of semantically related images in cluster i and the total number of related images of n semantic clusters.

*



PT

The weights need to satisfy the condition ∑

+ (5)

*

+

AC

CE

For instance, with and * +, we will have 3 queries corresponding to 3 clusters (see Figure 4) and semantic weight associated with query 1, 2 and 3 will be calculated by the formula (5) as follows:

10

ACCEPTED MANUSCRIPT

: Semantically related image Cluster _1

Cluster _2

Image IRij Cluster _3

CR IP T

: Query image

AN US

Figure 4. Illustration of the calculation of semantic weight from an image IRij to 3 queries in 3 respective clusters. Calculation algorithm for the feature importance:

ED

M

Each image is represented by a point in the multi-feature space. Typically, methods consider these features having the same importance. For instance, when comparing the features in the feature database1 that this paper uses (include 7 features ColorHsvHistogram64, ColorLuvMoment123, ColorHsvCoherence64, CoarsnessVector, Directionality, WaveletTwtTexture, and MRSAR), features will be considered to have equal importance. This does not reflect the fact that several features are more important than the rest. Therefore, we are interested in determining the importance of each feature in the image comparison phase.

AC

CE

PT

The main idea of determining the feature importance is based on user feedback. When a user responds some images as semantically related with query image, we will cluster these semantically related images into clusters and consider each of the clusters as follows: each image in the cluster will be a point in multi-feature space and these points will be located near each other in the multi-feature space. A shape covering these points will be projected into axes corresponding to features, then we calculate the variance of these points in each axis (knowing the degree of data dispersion in an axis in a large feature space means the importance in the axis low). Thus the importance of each feature in multi-feature space is the inverse of the variance of the points in that the axis. We will present the IF algorithm to determine feature importance. The algorithm calculates the feature importance in the feature space FS. Below is the IF algorithm.

11

ACCEPTED MANUSCRIPT

IF algorithm Set of n images in a cluster: C Feature set: FS Number of feature: m Ouput: Importance of featurel : Weight_l For l1 to m do {  ∑ ,(

,-

) )

AN US

( ∑

CR IP T

Input:

// feature weight lth

Weight_l }

ED

M

Figure 5. Calculation algorithm for the feature importance IF. IF algorithm, in Figure 5, takes input of n images in a cluster of space FS. th ,,, - and At this point, feature l of feature space FS will have n data points the variance algorithm of n data points in axis l of feature space FS. After calculating the variance , the algorithm gives the importance of each feature l in feature space FS. The feature importance in the axis l will equal and assign to Weight_l.

AC

CE

PT

Semantic-related image retrieval algorithm With a query image that users put in, the method uses the multi-representation approach [French, J et al, 2003] to obtain a set of diverse result images (includes images scattered in different clusters in the feature space). Due to the diversity of this set, users can select semantically related images according to their needs. After users select diverse semantically related images, these images will be divided into clusters and each cluster centroid will be chosen as the subsequent query with the aim of reducing the number of subsequent queries because: (1) too many queries will a burden for the result list combination algorithm and (2) the excessive number of queries does not improve the system performance much (the range from 4 to 8 is the best [Jin & French, 2005]). Subsequently, we will compare the query vectors representing clusters to the images in the set using feature importance (each feature will have a different importance) through the IF algorithm in Figure 5. The subsequent queries are executed on a traditional CBIR system to obtain the result lists. Combining these result lists will generate an aggregated result list. The process can be repeated until the users satisfy. Figure 6 below describes semantic-related image retrieval algorithm named SRIR.

12

ACCEPTED MANUSCRIPT

Input: Set of image database: DB Query image: Q Number of retrieved images after each iteration:k Feature space: F Number of features: m Ouput: Set of result images: R

CR IP T

SRIR algorithm (Semantic – Related Image Retrieval)

(

( 

( 

( 

(

);

)

(

);

) (

); )

(

); ) )

AC

CE

PT

ED

M

AN US

C+Q; PMQFC+ WMQFC+ ; DMQFC+ s1<1, PMQFC+, WMQFC+, DMQFC+, DB, k>; C-  ( ); PMQFC- WMQFC- ; DMQFC- s2<1, PMQFC-, WMQFC-, DMQFC-, DB, k>; B+  ( ); PMQFB+ WMQFB+ ; DMQFB+ s3<1, PMQFB+, WMQFB+, DMQFB+, DB, k>; B-  ( ); PMQFB- WMQFB- ; DMQFB- s4<1, PMQFB-, WMQFB-, DMQFB-, DB, k>; ( US; repeat USUS ( ); CL ( ); For i1 tondo (  ); ci (CiCL); PMQici For j1 to k do WMQi∑

DMQid (  ); Ri<1, PMQi, WMQi, DMQi, DB,k>; ( SR until (User stops responding); return R; Figure 6. Semantic-related image retrieval (SRIR) algorithm 13

)

ACCEPTED MANUSCRIPT

Semantic-related image retrieval (SRIR) algorithm in Figure 6 is implemented as follows:

PT

ED

M

AN US

CR IP T

Each image in the database is represented as a point in the feature space. The points in the feature space are divided into feature clusters. When the user submits a query on the query-byexample interface, the algorithm transforms the query image Q into three representations C-, B+, B- (assuming that we use 4 representations and C+ is Q) through the Transform function( , ). On the modified representation C+, C-, B+, B-, the algorithm extracts features through the Extraction function( , ) to get feature vectors of respective representations PMQFC+, PMQFC-, PMQFB+, PMQFB-. On the extracted feature vectors, the algorithm performs the retrieval of single point image (nMQ=1) with the query point being one of PMQFC+, PMQFC-, PMQFB+, PMQFB- that has the weight of 1 (queries have equal importance), the distance function dist( , , ) from a image in DB to one of queries PMQFC+, PMQFC-, PMQFB+, PMQFB- with database DB and feature weight Weight_l=1 to generate result lists s1, s2, s3, s4. Result lists s1, s2, s3, s4 will be combined through the Combination procedure( , , , , , , , , , ,) (see Combination() algorithm in Figure 3) to get the aggregate list S. On list S, users will select semantically related images as their thoughts through ( ) to get the set US. The set US will be classified into clusters Ci with i=1,..n, the algorithm calculates the importance of feature using the () algorithm in Figure 5 and the centroid of these clusters Ci using the Centroid( ) function and assigns to ci. The weight wMQi tied to query MQi is calculated using formula (1) and Ri is the result of single point query <1,PMQi,WMQi,DMQi,DB,k>. By combining Ri with i=1,..n through Combination(, , , , , , , , , , ) procedure, the algorithm generates list R. If satisfied, users will stop respond, otherwise the algorithm allows users to continue responding on the set R, the user feedback in the current iteration will be combined with semantically related images of the previous iteration through the union US ( ) with US being the set of user feedback on S of the previous iteration and S being replaced by R in the current iteration.

CE

Proposition 2 [Complexity of the algorithm SRIR]. Complexity of the algorithm SRIR is O(N), where N is the size of database image set.

AC

Proof.

Obviously, the running time of the algorithm SRIR depends on steps 3, 6, 9, 12, 13 and 15 since steps 1, 2, 4, 5, 7, 8, 10, 11, 14 and 16 is O(1). It is clear that, step 3 effects matching the query image with each image in database, thus, the step 3 takes O(N) where N is size of database image set, the time of step 6, 9 and 12 are equivalent. Besides, step 13 calls the algorithm Combination(, ,), thus, it takes O(klogk). It is obvious that, the number of loops in step 15 is the number of user’s feedbacks (there is small value), thus it is assumed to be a constant. Hence, the step 15 has the running time as 14

ACCEPTED MANUSCRIPT

CR IP T

that of the repeat…until loop body. On one hand, step 15.1 effects the user’s feedbacks on k first objects of set S, thus has O(k) complexity. The step 15.2 effects clustering set US into n clusters with l is size of set US and ( l) and t is number of loop times that clustering (t l), which has O(l) complexity [J. B. MacQueen, 1967, Jiawei Han et al.,2011]. The step 15.3 includes steps 15.3.1, 15.3.2 and 15.3.3, which take O(n) and step 15.3.4 has O(k) complexity. Because the running time of the step 15.3 is the time that executes the loop body n times, apply the product rule, which has O(nk) complexity. With n is constant, step 15.3 takes O(k). Step 15.4 calls algorithm Combination(), thus it has O(klogk) complexity. Step 15.5 takes O(1) complexity. Therefore, apply the sum rule, step 15 has O(klogk) complexity.

AN US

Here the running time of steps 1, 2, 4, 5, 7, 8, 10, 11, 14 and 16 is O(1). Steps 3, 6 and 9 has O(N) complexity with N is size of database image set. Step 13 has O(klogk) complexity with k N and the running time of step 15 is O(klogk) with k N. Therefore, apply the sum rule, algorithm SRIR has O(N) complexity. The conclusion of the proposition was proved. 4. EXPERIMENTAL EVALUATION 4.1. Test bed:

M

Image database:

PT

ED

The database used for the test is a subset of Corel. This set includes 34 categories, each with 100 images, specifically: 290, 700, 750, 770, 840, 1040, 1050, 1070, 1080, 1090, 1100, 1120, 1340, 1350, 1680, 2680, 2890, 3260, 3510, 3540, 3910, 4150, 4470, 4580, 4990, 5210, 5350, 5530, 5810, 5910, 6440, 6550, 6610, 6840. All of the images in this set have salient foreground objects. Each of the images has the size of max(width, height)=384 and min(width, height)=256. Feature vector:

AC

CE

The features are classified into two categories: color features and texture features (see Table 1 below). Table 1. Categories of feature.

Catagories of feature

Color feature

Texture feature

Name of feature

Length

Color histogram

ColorHsvHistogram64

64

Color moment

ColorLuvMoment123

9

Color coherence

ColorHsvCoherence64

128

Tamura texture

CoarsnessVector Directionality

10 8

Wavelet texture

WaveletTwtTexture

104

15

ACCEPTED MANUSCRIPT

MASAR texture

MRSAR

15

Image representations:

CR IP T

Four representations used are color (C+), color negative (C-), multilevel grayscale (B+), and multilevel grayscale negative (B-). Each representation has seven visual features including three color features and four texture features. All of these features are compared respectively to receive similarity and then linearly combined with equal weights (they are considered to have equal importance). The feature vectors corresponding to each channel form form a 2-d table which has 3400 lines (each line contains a feature vector of image with respective channel) and 03 columns (the first column is rank of image, second is image ID and third is similarity). Ground truth:

AN US

Corel ground truth is widely used in the evaluation CBIRR, so we also use Corel classification as the ground truth that is we consider all the images in the same Corel category as relevant. This ground truth consists of 4 columns (title: image query ID, initial query Q0, image ID and relevance) and 340 thousand lines (each line is a feature vector). 4.2. Relevance feedback simulation strategy:

ED

M

To imitate human behavior, we implemented relevance feedback simulation in our test. First, an initial query will be executed to create the initial query result. Then we simulated user interaction by choosing k related images from the initial retrieval result based on the ground truth. Relevant images from the feedback will be classified into clusters and the centroid of the clusters is used to form the subsequent query and implemented as concave multipoint query approach, after that the query results are combined to generate an aggregated result list.

CE

PT

Relevance feedback is implemented by the strategy of selecting the first relevant image (based on the ground truth) in the result list. In this strategy, the worst case is that there is no relevant image except the query image and the best case is that there are k-1 relevant images in addition to the query image. Thus the number of relevant images can vary from 1 to k images (including the query image). This strategy is used to simulate real users in our experiment. 4.3. Query implementation and evaluation:

AC

In our experiment, the factors were selected as follows: Each query inserted into the system has 4 representations. The results corresponding to 4 representations are combined to get the aggregated result list for the initial query. After that, the user would respond to the result list corresponding to the initial query in order to generate the subsequent queries (each cluster would be a subsequent query). In the distance calculation phase, weights tied to distances in every query and feature weights in the feature space are calculated to make the distance between the queries and images in the database image to be more precise. The merged result list of these new queries would be the retrieval result. The process would stop when the user does not continue to respond. The system model of this process is shown in Figure 7. Interfaces are in Appendix. 16

ACCEPTED MANUSCRIPT

S1 C+

Q2

S2 -

S

C

Q3

S3

B+ Q4

USER selects images in US

Search for representative

R1

Q2

R2

Qn

S4

B-

R

Search engine

AN US

Q

Clustering, calculating semantic importance of each query, calculating feature importance

Q1

CR IP T

Q1

Rn

Figure 7. System model.

PT

ED

M

All of 3400 images were used as queries. Average precision1 at 150 returned images are used to evaluate. The reason for choosing 150 is that users commonly see only within three screens (each screen has 50 images). Six feedback setups is implemented to compare the 1, 4, 8, 12, 16, 20 feedback queries and one feedback strategy, thus there would be 6 configurations. Four different methods were implemented to compare including Basic C+ (traditional CBIR system ith a representation C+ on the global feature), the Jin & French system [Jin & French, 2005], SSLL [Zhongmiao Xiao et al, 2014] and the SRIR system we proposed.

CE

In our experiment, we ran 3400 queries under 6 configurations to receive the average precision. Moreover, there is only 1 time where feedback is used in our experiment. The experimental results are shown in Figure 8. The horizontal axis indicates the number of clusters (can be 1, 4, 8, 12, 16, 20). The vertical axis indicates the precision. Four different methods including Basic C+, Jin & French, SSLL and SRIR are indicated by 4 curves.

AC

The results and average precision of 3400 queries are shown by data in Table 2 and by graph in Figure 8 below. Due to the space limitation of the paper, we only present in this paper the average precision of 34 query categories (see Table A1 in Appendix A), and the details about the precision of all 3400 queries are at the website: http://117.6.134.238:368/results.htm

1

Precision is the ratio of the number of images relevant with query image in the result set and the total number of returned images.

17

ACCEPTED MANUSCRIPT

Precision according to the number of queries Method 4queries

8queries

12queries

16 queries

20queries

Basic C+

0.20

0.22

0.23

0.24

0.245

0.25

Jin & French

0.24

0.29

0.31

0.33

0.34

0.35

SSLL

0.25

0.28

0.29

0.32

0.325

0.347

SRIR

0.5473

0.5968

0.6005

0.604

AC

CE

PT

ED

M

AN US

CR IP T

1query

18

0.6054

0.60258

ACCEPTED MANUSCRIPT

Table 2. Result table of 3 methods according to the number of queries in one feedback. Table 2 shows the average precision of four methods Basic C+, Jin & French, SSLL and our method SRIR at 1, 4, 8, 12, 16 and 20 queries. Only in our method, the number of clusters is also the number of queries.

CR IP T

0.7 0.6

0.4

Basic C+ Jin & French

0.3 0.2 0.1 0 1

4

8

AN US

Precision

0.5

12

16

SSLL SRIR

20

M

Number of feedback queries

ED

Figure 8. Precision comparison.

AC

CE

PT

We can make conclusions from Figure 8. The system precision increases (the vertical axis) along with the rise of the horizontal axis (number of clusters). The more clusters we use to retrieve, the higher the system performance are. We also found that the precision of SRIR method is better when the number of clusters between 1 and 8, specifically 54.73% at 1, 59.68% at 4 and 60.05% at 8. In the SRIR method, the performance curve increases rapidly from 1 to 8 clusters and increase slowly in the range of 12 to 20 clusters because 8 clusters covered most of the clusters in the feature space. Although Jin & French method also increased rapidly in the range of 1 to 8 queries [Jin & French,2005], our approach has much higher precision without increasing retrieval time. The main reason is that in our proposed method, although the number of clusters is between 1 and 8, our method takes advantage of semantic information from user feedback more than 8. 5. Conclusions We focused on proposing a method, called SRIR, to solve two main problems: (1) finding semantically related images spread out in the entire feature space with high precision and (2) retrieval time not increasing with the number of user feedback. To solve the problem (1), we took advantage of user feedback to determine the semantic importance of each query and the importance of each feature. For the problem (2), we did not consider each user feedback as a subsequent query but rather considered each subsequent query corresponding to a group of user feedback. 19

ACCEPTED MANUSCRIPT

Our experimental results on the feature database consisting of 3400 images have shown that the proposed method SRIR offers a significantly higher precision when compared to the Basic C+, Jin & French and SSLL method.

CR IP T

References

A. Gupta and R. Jain (1997). Visual information retrieval. Communications of the ACM, 40(5):70–79. A. Pentland, R. W. Picard, and S. Sclaroff (1996). Photobook: content-based manipulation for image databases, International Journal of Computer Vision, 18(3):233–254.

Semantic content-based image

AN US

Alzu'bi, Ahmad; Amira, Abbes; Ramzan, Naeem (2015),

retrieval: A comprehensive study, Journal of visual communication and image representation, Vol. 32, 20-54.

Andre B, Vercauteren T, Buchner AM, Wallace MB, Ayache N (2012). Learning semantic and

Imaging,31(6):1276–88.

M

visual similarity for endomicroscopy video retrieval. IEEE Transactions on Medical

ED

Ashwin, T.V., Gupta, R., Ghosal, S (2002). Adaptable similarity search using non-relevant information. Proceedings of the 28th VLDB Conference, Hong Kong, China.

PT

Bartolini, I., Ciacci, P., Waas, F. (2001), Feedback by pass: A new approach to interactive similarity query processing. Proceedings of the 27th VLDB Conference, Roma, Italy,201–210.

CE

Brunelli, R., Mich, O (2000) . Image retrieval by examples. IEEE Transactions on Multimedia 2 (3),164–171.

AC

Charikar, M., Chekuri, C., Feder, T., Motwani, R (1997). Incremental clustering and dynamic information retrieval. Proceedings of the ACM STOC Conference, 626–635. C.J. van Rijsbergern (1997). Information Retrieval (2nd edition), Butterworths, London. C. Carson, S. Belongie, H. Greenspan, and J. Malik. (2002) Blobworld: image segmentation using expectation-maximization and its application to image querying. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(8): pp.1026–1038.

20

ACCEPTED MANUSCRIPT

D. Liu, K. A. Hua, K. Vu, and N. Yu (2009). Fast Query Point Movement Techniques for Large CBIR Systems. IEEE Transactions on Knowledge and Data Engineering, vol. 21, No. 5, 729743.

CR IP T

French, J. C., Martin, W. N., Watson, J.V.S., Jin, X. (2003) Using Multiple Image Representations to Improve the Quality of Content-Based Image Retrieval. Technical Report CS-2003-10, Dept. of Computer Science, University of Virgina.

Flickner, M., Sawhney, H., Niblack, W., et al. (1995), Query by image and video content: The

AN US

QBIC system. IEEE Computer Magazine 28 (9), pp.23–32.

Hieu. V. Vu, Quynh N. H., (2014). An Image Retrieval Method Using Homogeneous Region and Relevance Feedback, International Conference on Communication and Signal Processing, pp.114-118

H. T. Shen, B. C. Ooi, and X. Zhou (2005). Towards effective indexing for very large video

M

sequence database. In Proceedings of the ACM SIGMOD Conference, pp 730–741. Heath, D.; Norton, D; and Ventura, D (2014). Conveying semantics through visual metaphor,

ED

ACM Transactions on Intelligent Systems and Technology, 5(2):31:1–31:17. I. J. Cox, M. L. Miller, T. P. Minka, T. V. Papathomas, and P. N.Yianilos (2000). The Bayesian

PT

image retrieval system, PicHunter: theory, implementation, and psychophysical experiments. IEEE Transactions on Image Processing, 9(1):20–37.

CE

Ishikawa, Y., Subramanya, R., Faloutsos, C.,.(1998) Mind Reader: Querying databases through multiple examples. Proceedings of the 24th VLDB Conference, New York, USA, , pp 218–227.

AC

J. B. MacQueen (1967), "Some Methods for classification and Analysis of Multivariate Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability", Berkeley, University of California Press, 1:281-297. J. Z.Wang, J. Li, and G. Wiederhold. SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture Libraries (2001), IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 23, no. 9, 2001, 947-963.

21

ACCEPTED MANUSCRIPT

Jin, X., & French, J.C, (2005). Improving Image Retrieval Effectiveness via Multiple Queries. Multimedia Tools and Applications, vol. 26, 2005,pp.221-245. Liu,Y.,Chen,X.,Zhang,C.C.,Sprague,A.,(2009)Semantic clustering for region-based image

CR IP T

retrieval,JVCIR(20), No. 2,pp. 157-166.

K. Vu, K. A. Hua, and W. Tavanapong (2003). Image retrieval based on regions of interest. IEEE Transactions on Knowledge and Data Engineering, 15(4):1045–1049.

K. A. Hua, N. Yu, and D. Liu (2006). Query Decomposition: A Multiple Neighborhood

AN US

Approach to Relevance Feedback Processing in Content-based Image Retrieval. In Proceedings of the IEEE ICDE Conference.

L. Chen, M. T.¨Ozsu, and V. Oria (2004). MINDEX: An efficient index structure for salientobject-based queries in video databases. Multimedia Systems,10 (1):56–71. Mattia Broilo, and Francesco G. B. De Natale (2010), A Stochastic Approach to Image Retrieval

M

Using Relevance Feedback and Particle Swarm Optimization, IEEE Transactions on multimedia. M. Ortega-Binderberger and S. Mehrotra (2004). Relevance feedback techniques in the MARS

ED

image retrieval systems. Multimedia Systems, 9(6): pp.535–547. Norton, D.; Heath, D.; and Ventura, D (2016). Annotating images with emotional adjectives

PT

using features that summarize local interest points.IEEE Transactions on Affective Computing. O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman (2007). Total recall: Automatic query

CE

expansion with a generative feature model for object retrieval, In Proc. ICCV,. Porkaew, K., Chakrabarti, K Porkaew (1999). Query refinement for multimedia similarity

AC

retrieval in MARS.Proceedings of the 7th ACM Multimedia Conference, Orlando, Florida, 1999, pp 235–238. Rui, Y., Huang, T., Mehrotra, S (1997). Content-based image retrieval with relevance feedback in MARS. Proceedings of IEEE International Conference on Image Processing97, Santa Barbara, CA.

22

ACCEPTED MANUSCRIPT

Rui, Y., Huang, T., Ortega, M., Mehrotra, S (1998). Relevance feedback: A power tool for interactive content-based image retrieval. IEEE Transactions on Circuits and Systems for Video Technology 8 (5), pp 644–655.

CR IP T

Rui, Y., Huang, T., Chang, S.F (1999). Image Retrieval: current techniques, promising directions and open issues. Journal of Visual Communication and Image Representation 10, 1999, pp 39–62.

Rocchio, J.J (1971). Relevance feedback in information retrieval. In: Salton, G. (Ed.), The

Englewood Cliffs, NJ, pp 313–323.

AN US

SMART Retrieval System—Experiments in Automatic Document Processing. Prentice Hall,

R. Chang and X. Qi (2011), Semantic Clusters Based Manifold Ranking for Image retrieval, Proc. of IEEE Int. Conf. on Image Processing,pp. 2473-2476.

Robert Davis, Zhongmiao Xiao, and Xiaojun Qi (2012), Capturing Semantic Relationship

M

Among Images in Clusters for Efficient Content-Based Image Retrieval, IEEE Int. Conf. on Image Processing (ICIP'12),pp. 1953-1956.

ED

R. Arandjelovi´c and A. Zisserman (2012). Three things everyone should know to improve object retrieval. In Proc. CVPR.

PT

Smith, J.R., Chang, S.F (1996). VisualSEEk: A fully automated content-based image query system, Proceedings of the ACMInt’l Multimedia Conference, pp 87–98.

CE

T. Gevers and A. Smeulders, Content-based image retrieval: An overview. In G. Medioni and S. B. Kang (2004), Emerging Topics in Computer Vision. Prentice Hall.

AC

V. Ogle and M. Stonebraker (1995), Chabot: retrieval from a relational database of images, IEEE Computer, 28(9): pp 40–48. Wu, L., Faloutsos, C., Sycara, K., Payne, T.R (2000). FALCON: Feedback adaptive loop for content-based retrieval. Proceedings of the 26th VLDB Conference, Cairo, Egypt,pp.297–306. Wang, T., Rui, Y., Hu, S.-M., Optimal adaptive learning for image retrieval. Proceedings of IEEE CVPR 2001, Kauai, Hawaii,2001, pp.1140–1147 23

ACCEPTED MANUSCRIPT

W. Y. Ma and B. Manjunath (1997). Netra: a toolbox for navigating large image databases. In Proceedings of the IEEE International conference on Image Processing, pp 568–571. Y. Chen, J.Z. Wang, and R. Krovetz (2003). An unsupervised learning approach to content-

CR IP T

based image retrieval, Seventh International Symposium on Signal Processing and its Applications, 2003, pp. 197-200

Y. Yan, M.-L. Shyu, and Q. Zhu (2016), Negative correlation discovery for big multimedia data semantic concept mining and retrieval," in Proceedings of the IEEE international Conference on

AN US

Semantic Computing, pp. 55-62.

Zhongmiao Xiao, Xiaojun Qi (2014), Complementary relevance feedback-based content-based image retrieval, Multimedia Tools Appl. 73(3),pp. 2157-2177.

Zhou, X.S., Huang, T.S (2001). Comparing discriminating transformations and SVM for learning during multimedia retrieval. Proceedings of the 9th ACM Multimedia Conference,

AC

CE

PT

ED

M

Orlando, Florida, 2001, pp.137–146.

24

ACCEPTED MANUSCRIPT

APPENDIX A Statistics of average precision for each category in 34 categories (each consists of 100 feature vectors). Table A1. Average precision of 34 query categories with one feedback iteration of the SRIR method.

Category ID

04 clusters 08 clusters 12 clusters 16 clusters 20 clusters

290

32.37

37.18

36.39

36.25

36.48

36.26

2

700

24.89

29.26

30.24

30.78

31.23

31.36

3

750

44.47

45.78

46.01

46.03

45.99

45.97

4

770

27.03

30.85

31.62

32.69

33.27

33.35

5

840

49.25

54.35

54.94

55.03

55.12

55.20

6

1040

21.00

24.93

25.32

25.75

25.73

25.63

7

1050

31.19

36.73

37.45

37.26

37.42

36.47

8

1070

36.40

41.78

42.35

43.37

43.86

43.69

9

1080

31.57

37.73

38.45

38.79

39.16

39.15

10

1090

18.21

21.35

21.52

21.90

21.80

21.89

11

1100

23.37

27.90

28.55

28.86

29.15

29.43

12

1120

25.85

31.52

32.15

32.39

32.59

32.61

13

1340

42.39

46.73

47.15

47.73

48.03

47.91

M

ED

PT

AN US

1

CE

01 cluster

CR IP T

Precision(%) ID

1350

28.02

32.47

33.44

34.00

34.63

34.89

15

1680

21.65

25.06

25.27

25.55

25.63

25.80

AC

14

16

2680

15.88

17.96

18.09

18.20

18.25

17.96

17

2890

19.51

22.87

23.49

23.60

23.71

23.47

18

3260

19.57

22.65

23.07

22.86

23.11

22.58

19

3510

36.75

41.83

41.73

41.43

41.45

40.56

20

3540

49.34

55.73

55.98

55.96

55.99

54.76

ACCEPTED MANUSCRIPT

3910

38.70

46.31

45.49

45.77

45.55

44.78

22

4150

29.18

33.23

33.81

33.50

33.31

33.85

23

4470

52.17

33.83

33.03

33.33

33.11

33.65

24

4580

40.90

33.63

33.51

33.43

33.65

34.46

25

4990

33.83

39.09

39.05

38.85

38.96

39.85

26

5210

29.57

34.59

35.29

36.05

35.98

37.43

27

5350

49.11

54.62

54.47

54.98

55.15

55.41

28

5530

43.41

48.03

48.37

48.84

48.87

48.88

29

5810

61.01

63.96

63.83

63.76

63.80

63.95

30

5910

47.59

52.28

52.61

52.55

52.73

52.58

31

6440

64.52

65.40

65.55

65.53

65.45

65.32

32

6550

57.25

61.05

60.96

61.04

60.91

61.51

33

6610

33.50

38.90

38.78

38.64

38.66

39.23

34

6840

61.23

63.26

63.18

63.47

63.52

63.25

AN US

M

AC

CE

PT

ED

CR IP T

21

26

ACCEPTED MANUSCRIPT

APPENDIX B

AN US

CR IP T

Interface for implementing retrieval of the query feature vector with ID 655009 belonging to category 6550 in the feature database.

AC

CE

PT

ED

M

Figure B1. Interface for inserting initial query into the system of the feature vector with ID 655009 belonging to category 6550 in the feature database.

Figure B2. Result of initial query for four channels of the feature vector with ID 655009 belonging to category 6550 in the feature database3.

27

AN US

CR IP T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

Figure B3. Result of initial query for four channels of the feature vector with ID 655009 belonging to category 6550 in the feature database after merging channels3.

Figure B4. The top results after one feedback iteration for query of the feature vector with ID 655009 belonging to category 6550 in the feature database2

2

Red icons in figures B2 to B4 denote feature vector in the database of the same category 6550 with the query feature vector (has ID 655009).

28