ARTICLE IN PRESS Journal of Visual Languages & Computing
Journal of Visual Languages and Computing 19 (2008) 24–38
www.elsevier.com/locate/jvlc
Indexing for multipoint interactive similarity retrieval in iconic spatial image databases Xiao Ming Zhoua,, Chuan Heng Angb, Tok Wang Lingb a Sybase Asia Development Center, Singapore School of Computing, National University of Singapore, Singapore
b
Abstract Similarity-based retrieval of images is an important task in many image database applications. Interactive similarity retrieval is one way to resolve the fuzzy area involving psychological and physiological factors of individuals during the retrieval process. A good interactive similarity system depends not only on a good similarity measure, but also on the structure of the image database and the related retrieval process. In this paper, we propose to use a dynamic similarity measure on top of the enhanced digraph index structure for interactive iconic image similarity retrieval. Our approach makes use of the multiple feedbacks from the user to get the hidden subjective information of the retrieval, and avoids the high cost of re-computation of an interactive retrieval algorithm. r 2007 Elsevier Ltd. All rights reserved. Keywords: Iconic image; Interactive; Similarity retrieval
1. Introduction Multimedia application is advancing rapidly these years. As one of the key components, image similarity retrieval has generated a great deal of interest. However, most existing image retrieval systems use low-level features such as color or texture for image retrieval. The recent success of image-understanding approaches in various domains suggests the transition to a different level, which is the retrieval by objects identified. This level extends the query capabilities of an image retrieval system to support higher-level queries. One way Corresponding author.
E-mail address:
[email protected] (X.M. Zhou). 1045-926X/$ - see front matter r 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.jvlc.2007.08.009
ARTICLE IN PRESS X.M. Zhou et al. / Journal of Visual Languages and Computing 19 (2008) 24–38
25
to describe objects in an image focuses on the spatial relationships among them [1,2]. The spatial data embedded in images should be preserved in the logical image representation so that users can easily retrieve, visualize, and manipulate images in the image database based on the images’ spatial content. The degree of similarity between two images is measured by the distance under a pre-defined metric system [3,4]. A traditional iconic image retrieval system is an automatic retrieval process based on the measure defined by the system. A user will input a query image to an image system which will output some candidate images from the database based on the system measure without any additional user’s involvement. However, these automatic systems face some problems: (1) The perception of similarity is subjective and it varies from one user to another. It is not merely a matter of finding a good measure for retrieval, but finding an efficient way to understand and reflect users’ changing needs is equally important. However, an automatic system failed in the second respect. (2) There is a gap between the abstract representation and the real image. Unlike textual query, it is difficult to describe an image precisely, and there is no metric that will consider all details of images. An automatic system failed to find the needed information to fill the gap in order to match the user’s need. Therefore, researchers start to look at interactive similarity retrieval [5–8], which delegates some subjective decisions to the user to fill the interpretation gap. In this paper, we propose a dynamic similarity measure to make it easier for the users to make their decisions and to navigate in an iconic image database more intuitively. The rest of the paper is organized as follows. In Section 2, we review the background of spatial relationship retrieval. In Section 3, we introduce the interactive navigation based on a digraph index. In Section 4, we propose an enhanced digraph for the interactive similarity retrieval using dynamic similarity measure. We compare the proposed approach with some related works in Section 5. In Section 6, we analyze the proposed approach with some experiments. We conclude and address some outstanding issues in Section 7. 2. Retrieval by spatial similarity Iconic image databases have been used increasingly widely in recent years. Spatial relationship retrieval is a main topic for the iconic retrieval applications. For example, to answer the query ‘‘find all pictures having a swimming pool to the left of a house’’, we need to keep at least the spatial relationship between the swimming pool and the house for all pictures. However, the spatial relationship required here is qualitative information, not quantitative information such as positions of objects. Therefore, the early study of a qualitative spatial relationship focused on how to capture this information for retrieval. Chang et al. [9] proposed a data structure known as a 2D string that opened up a new area for image indexing and retrieval based on spatial relationship. The basic idea of the 2D string is to project the objects of a picture along the x- and y-axis, respectively, so that two strings can be used to represent the relative positions among objects in the projections. Since a query picture can also be represented as a 2D string, the problem of similarity retrieval becomes a problem of 2D subsequence matching. 2D string is, however, more useful for picture matching instead of similarity retrieval even though there are many follow-up papers for categorizing the spatial similarity in
ARTICLE IN PRESS 26
X.M. Zhou et al. / Journal of Visual Languages and Computing 19 (2008) 24–38
different levels [10,11]. A more generic approach for similarity is to use a degree-based similarity measure. A typical example is IO&T approach that includes topological spatial relationship and orientation spatial relationship [3], and a similarity metric system based on the aggregate similarity of topological spatial relationship and orientation spatial relationship. Briefly, the definition of IO&T is a tuple /A, [TR,OR1,OR2], BS representing the spatial relation between two objects A and B where TR is the topological relation between object A and object B, OR1 represents the orientation relation of A to B based on A’s intrinsic orientation, and OR2 represents the orientation relation of B to A based on B’s intrinsic orientation. Since there are topological part and orientation part in the representation, the similarity distance DTP(T1, T2) (Distance for TuPles) between two IO&T tuples T1 ¼ /Oi, [TRij,OR1ij,OR2ij], OjS and T2 ¼ /Ox, [TRxy,OR1xy,OR2xy], OyS (where Oi and Oj are the same objects as Ox and Oy, respectively, in two different pictures) can be simply illustrated as DTPðT1; T2Þ ¼ W t Dt ðTRij ; TRxy Þ þ W o ðDo ðOR1ij ; OR1xy Þ þ Do ðOR2ij ; OR2xy ÞÞ, (1) where Wt and Wo are used to denote the weights for topological distance and orientation distance, respectively, Dt and Do are topological similarity distance function and orientation similarity distance function, respectively. Similar works can be found in [4,12,13] as well. Since a multi-dimensional (i.e. topological and orientation) similarity measure is used in these systems, the weight for different dimension is a fuzzy factor for such kind of measures. For instance, in Fig. 1, we have three images Q, P1, and P2. Suppose we consider the spatial relationship only, and we want to ask the question: ‘‘Which image between P1 and P2 is more similar to Q?’’ The answers from different readers and different automatic retrieval systems can be different due to various reasons. In this instance, a typical reason is that different readers may give different weight to topological relationship changes and orientation relationship changes based on their perception which is a fuzzy metric. Therefore, to make a similarity retrieval system more usable, the user needs to play a role in deciding what are considered to be similar. Recent similarity retrieval systems allow users to input subjective information in the course of interaction. The added human factors in the retrieval help to resolve the difficulty in a fully automatic retrieval system, and it is a ‘‘feedback-based’’ system. However, most of the existing approaches are based on the users’ feedback to change a pre-defined metric formula. The adjustment of the metric causes the measurement of the similarity between images to be re-calculated completely and makes the retrieval extremely inefficient. The Q
P1
Fig. 1. Example images.
P2
ARTICLE IN PRESS X.M. Zhou et al. / Journal of Visual Languages and Computing 19 (2008) 24–38
27
aim of this paper is to remove this inefficiency when an adjustable similarity measure is used. 3. Digraph for interactive navigation There are many distance measures for spatial relationship similarity. A typical similarity distance between a query image Q and a database image P can be defined as DSIM ðQ; PÞ ¼
m X
W j Dj ðQ; PÞ,
(2)
i¼1
where Dj is the similarity distance in dimension j, m is the number of dimensions considered, Wj is the weight of the similarity distance Dj for dimension j contributing to the total similarity distance DSIM. One example is IO&T metric [3] which includes intrinsic orientation relationship part and topological relationship part. However, the choice of Wj is rather subjective. A naı¨ ve process to adjust Wj has three steps: 1. The user inputs some feedback based on the candidate pictures returned from the system. 2. The similarity system updates Wj. 3. The similarity system recalculates the similarity distance based on the new formula and return new candidate pictures. These steps are repeated until the right picture is identified. The calculation in step 3 is intensive. To reduce the computation, a K-regular digraph index structure is used in [14]. For example, suppose we only display the top three most similar images, then the data structure will look as shown in Fig. 2: This digraph index structure is at the bottom of the typical tree index, such as M-tree [15], i.e. the array of index Ids are actually the leaf nodes of a traditional tree index built for a system measure. The digraph is used for the ease of the interactive navigation to complement the automatic similarity retrieval supported by a tree index. In the following, we will show how to achieve this goal even when the similarity measure is adjusted. 4. Dynamic similarity measure for interactive retrieval In this section, we will discuss a multipoint model for the similarity retrieval, and propose a dynamic similarity measure making use of an enhanced digraph based on the discussion in Section 3. 4.1. Enhanced digraph structure for similarity retrieval An enhanced digraph uses a variable number of pointers instead of a fixed number of pointers for each iconic image comparing to the digraph discussed in the previous section. The index structure is illustrated in Fig. 3.
ARTICLE IN PRESS X.M. Zhou et al. / Journal of Visual Languages and Computing 19 (2008) 24–38
28
Images ID
A
Similarity distance
4
Image pointers
B D
6
E
8
B
4
A C
5 10
C
5
E B F
12 15
D
6
E A E
8
B
11
Fig. 2. K-regular digraph structure for image index.
There are two kinds of pointer based on a pre-defined threshold on similarity distance for each image:
Regular similarity pointer (RSP), denoted as
.
For each image in a database, there are m RSPs pointing to the m most similar images in the database, where m is a variable. All the m images are with similarity distances less than a pre-defined threshold.
Connectional similarity pointer (CSP), denoted as
.
When there is no RSP for an image in a database, a pointer is still needed to connect it to the nearest image even if the distance is bigger than the threshold value. This will avoid frequent searching of nearest neighbors of these images during the retrieval.
ARTICLE IN PRESS X.M. Zhou et al. / Journal of Visual Languages and Computing 19 (2008) 24–38
Images ID array
RSPs or CSPs and the respective similarity distances (suppose the threshold is 20)
A
B 4
D 6
E 8
B
A 4
C 5
E 10
C
B 5
F 12
E 15
D
A 6
E 8
B 11
L
29
F 8
G 16
H 17
E 21
Fig. 3. Enhanced digraph index structure.
With the enhanced digraph, we can have a variable number of images linked from each image. The threshold can be adjusted according to our need (Note: The detail adjustment is not discussed in this paper, but the aim of the adjustment is to make it easy to control the number of pointers used.) In the following, we will discuss how to use the enhanced digraph in our interactive similarity retrieval. 4.2. Multipoint model for an interactive similarity retrieval system In an interactive similarity retrieval system, the user poses an example query picture Q that is similar to the target picture in the database. We may regard the database as a space and each image as a point. We navigate within the space based on the given ‘‘starting’’ query point Q. The user is allowed to modify Q interactively to set ONE new ‘‘starting’’ query point. However, it is more general to have more than one modified query points. Definition 1. (Multipoint query). A multipoint query Q ¼ (n, P, WP, D) for a database consists of the following information: 1. The number n is the size of the query Q. 2. A set of n points P ¼ {P1,y,Pn} in the iconic spatial database. 3. A set of n weights WP ¼ {WP1,y,WPn}, the ith weight WPi being associated with the ith point Pi. 4. A distance function D. For the starting query, we have n ¼ 1 which is a special case. The automatic similarity retrieval based on the starting query may return more than one pictures that look similar to the target picture. The user may use all or some of the similar pictures in a multipoint
ARTICLE IN PRESS 30
X.M. Zhou et al. / Journal of Visual Languages and Computing 19 (2008) 24–38
query. In some existing approaches, such as Mindreader [6], users are supposed to adjust the weight (the goodness score) of each feedback point according to the ranking/degree of similarity. However, such adjustment is too subjective and very fuzzy for an un-trained user since different users may have different yardstick of scoring. Therefore, in our approach, we assign 1/n to all WPi (Note: We will see later how we can still glean some useful information from the multipoint feedback even if we assume an equal weight for all feedback points at this stage.) Using these assignments, a simple similarity distance between a database picture DP and the multipoint query Q can be defined as DðQ; DPÞ ¼
n X
DðPi ; DPÞ=n.
(3)
i¼1
Note that the computation involves multiple pictures and may include multiple spatial features (such as topological, orientation, x-/y-axis, etc.) in the computation of each D(Pi, DP). Multipoint feedback contains richer information as compared to single point feedback. We want to use of this information to adjust the distances without much calculation. In the following, we are going to discuss a dynamic weighting approach to improve the multipoint retrieval process. 4.3. Dynamic weighting based on the multipoint feedback In Section 4.2, we discussed how to use the multipoint feedback and formula (3) to compute the similarity distance for the multipoint query. Ideally, if the user can input the weight for each feedback point in a multipoint query (i.e. the ‘‘goodness’’ scores for the points), the user will be in full control of the navigation in the database. However, it is not practical to expect an un-trained user to provide such information. Using the same weight for each feedback point and combining formula (2) and (3), we have the complete similarity distance of all dimensions for a multipoint query as follows: !, n m X X DðQ; DPÞ ¼ ðW j Dj ðPi ; DPÞÞ n, (4) i¼1
i¼1
where j denotes the similarity dimension, e.g. topological dimension, orientation dimension, etc., i denotes the number of query point, Wj denotes the weight for dimension j, Dj is the similarity distance function for dimension j, Pi means query point i, DP is a database picture. Since we fix the weight for each query point to be the same, 1/n is used as the weight for each query point. However, in a multipoint feedback system, the weight for each similarity dimension is adjustable based on the user feedback, and the adjustment will result in a new similarity measure. In the following discussion, we will introduce a heuristic approach to make use of the weight adjustment for each similarity dimension. Definition 2. The similarity distance range on a dimension is the maximum similarity distance between any two feedback-points on that dimension. Min– max heuristic: Based on the given multiple feedback points, the new similarity measure will assign the biggest weight to the dimension in which we find the smallest
ARTICLE IN PRESS X.M. Zhou et al. / Journal of Visual Languages and Computing 19 (2008) 24–38
31
similarity distance range; it will assign the smallest weight to the dimension in which we find the biggest similarity distance range. When there are some common/similar parts/features among multiple feedback points, the similarity distance range will be narrow in these dimensions, and we may conjecture that the user is looking for these features in the retrieval. On the other hand, if the disparity of some features is large, it may indicate that the user is not interested in those features. This is the rationale behind the proposed heuristic. P Based on this heuristic, we adjust the Wj in formula (4) by observing that Wj ¼ 1. The revised weights apply to the next round of retrieval only (Note: This is different from other systems that will keep the new weights throughout.) Suppose we have the biggest similarity distance range for dimension x, and the smallest distance range for dimension y, we will revise some weights as follows:
the the the the
new Wy ¼ Wmax+(Wy/2), old Wmax ¼ Wy/2, new Wx ¼ Wmin/2, old Wmin ¼ Wx+(Wmin/2),
where Wmax ¼ max(8Wj), and Wmin ¼ min(8Wj). This will reduce the weight for dimension x and increase the weight for dimension y in the new measure Dnew. This revision of weight makes the similarity metric dynamic and more accurate. The flow chart of Algorithm 1 describes the automatic retrieval process based on the initial ‘‘starting’’ query using the system measure and a follow-up interactive process based on the multipoint feedback query using the dynamic measure. Algorithm 1. Similarity retrieval with dynamic measure.
Input a query image Q
Automatic retrieval
Display for user feedback select
Clear status bit for all candidate images to denote un-visited
Based on some tree structure to do the automatic similarity retrieval
User gives some feedbacks for the automatic output
User SELECT only one “similar” from the output
Based on the link list in the enhanced digraph for the new outputs
User SELECT more than one “similar” from the output
Adjust the system measure
Adjust the weight based on the heuristics and use enhanced digraph for the new outputs
Terminate the search if the target image is identified by the user
ARTICLE IN PRESS 32
X.M. Zhou et al. / Journal of Visual Languages and Computing 19 (2008) 24–38
For this algorithm, the cost of re-computing the similarity measure based on each feedback is determined by the size of the link lists involved as compared to the O(N) (where N is the size of the database) without the digraph index. Since we can adjust the threshold to control the size of the link list, it makes sense to assume a constant C as the maximum size of all link lists. Therefore, the complexity for each feedback cycle becomes O(1) which is a big improvement in performance. 5. Related comparison There are two types of relevance feedback systems. The first type is to capture a user’s perceptual consistency in similarity retrieval for a long term. The system will be able to learn the regularities in a user’s perception of similarity for the pictorial database retrieval in the target application domain. After learning, the database and the measure can be adjusted to favor the user’s particular perception for a long-term usage [5,16]. This is not our focus in this paper. The other type of relevance feedback system is based on a short-term learning, i.e. its objective is to shorten the ad hoc interactive similarity retrieval cycle (or session) by discovering the hidden subjective information from the user’s feedback, and navigate efficiently in the search space (i.e. database). This is new to iconic spatial similarity retrieval. However, there are some existing researches in the context of interactive textbased retrieval, and the low-level feature-based image similarity retrieval. These existing approaches can be categorized into some typical types as follows. 5.1. Query point movement approach Query point movement method [6,17] is one of typical approaches in making use of relevance feedback. Typically, Rocchio’s formula [18] is used to compute the new query point Qnew from the old query point Qold based on user feedbacks. In detail, the formula is expressed as follows: ! ! X X new old Q ¼ aQ þ b Di =N R g Di =N N , (5) i2DR
i2DN
where a, b and g are suitable constants, DR and DN are sets of relevant documents and nonrelevant documents, respectively, NR and NN are the number of documents in DR and DN, respectively. This approach is different from the proposed approach. First, a user is supposed to rank the candidates from highly relevant to non-relevant. This is very difficult and inaccurate for an untrained user to give ranking feedback without any guidance. Second, based on Rocchio’s formula, the existing system can always add new keyword (or feature), and remove keyword (or feature) by giving a non-zero weight and a zero weight, respectively. Therefore, the number of keywords (features) can change during a retrieval cycle. Lastly, query point movement approach does not navigate within the reduced search space during a retrieval cycle. There is no combination of indexing the database for reducing the search space during the interactive cycle and adjusting the weight from query point movement approach as well. Typically, the initial outputs are produced by a random scan of the database, and then the system outputs the next query point after some extensive computation involving weights specified by the users.
ARTICLE IN PRESS X.M. Zhou et al. / Journal of Visual Languages and Computing 19 (2008) 24–38
33
5.2. Multi-point re-weighting approach Another typical approach using relevance feedback is multi-point re-weighting [8]. This approach is based on the feedback scores (i.e. highly relevant, relevant, no-opinion, nonrelevant, and highly non-relevant) to update multi-point weights dynamically instead of using Rocchio’s formula to generate a new query point. The new search is to find the similar data points that have the smallest similarity distance to the multiple feedback points. In terms of heuristic weight updating, this approach has some common attributes with the proposed approach. However, this approach relies on the explicit feedback score that is subjective and not really reliable for un-trained users. 5.3. Clustering approach A more recent progress along the direction of multi-point approach is the clustered multi-point re-weighting method [19]. This approach proposed a way to use query point clusters. When user marks several points as relevant, the system clusters sets of relevant points and chooses the centroids of the clusters as their representatives, and then constructs a multi-point query using a small number of good representative points. In such a case, the weight for the query point instead of the feature can be changed based on the size of the represented cluster. The new search will be using the respective selected clusters. However, there is no mention about the search space with balanced data distribution in which clusters formation is not obvious. To summarize major differences among the query point movement approach, the multipoint re-weighting approach, the clustering approach, and the proposed approach:
The query point movement approach will navigate towards the target points based on the new query point after each feedback iteration, i.e. search the top k points near the new query point. But there is no search space reduction. The multi-point re-weighting approach will use multiple new query points instead of one new query point to navigate towards the target points, i.e. a K-NN search (find the points near to k query points). There is also no search space reduction. The clustering approach uses multiple new query points as well. However, instead of searching data points with the smallest total distance to all new query points, it searches the data points near to each individual representative query point of the respective clusters separately. There is some search space reduction. But the following search may be restricted by the early clusters. The proposed approach will use the multi-point re-weighting query to navigate the nearby search space of multiple query points instead of the original search space, i.e. a K-NN search inside the top k neighborhood of the multiple query points of a multipoint query. There is obviously search space reduction. At the same time, the following search will move freely according to the new query points.
6. Experiments In this section, we will use several experiments to test and analyze the proposed approach. First, a coverage experiment for the digraph is used to demonstrate the effectiveness of the interactive index approach. Second, a simulation experiment is
ARTICLE IN PRESS 34
X.M. Zhou et al. / Journal of Visual Languages and Computing 19 (2008) 24–38
discussed to show the differences in the search space explored by different navigation approaches. 6.1. Coverage experiment Similarity retrieval is always fuzzy and subjective. It is hard to find a widely accepted benchmark for similarity retrieval experiments. In particular, it is extremely subjective to design an interactive experiment. In order to make our experiment more objective, we use an automatic approach to measure the result set coverage rather than asking individuals to test the system. The experiment is setup as follow: There are 647 pictures in our database. Each image contains 10–15 objects. The pictures are divided into five groups according to the number of objects’ changes (including translation, rotation, etc.) with respect to the query picture. There are 20, 45, 120, 210, and 252 pictures in the group G1, G2, G3, G4, and G5, respectively. A picture is in Gi when there are i objects changing their positions with respect to the query image. Those pictures with five or more objects changing their positions belong to G5. We use IO&T [3] metric as the base metric to build the index structure of the database. We have built two indexes, one using a fixed bandwidth of five pointers and the other is an enhanced digraph. The retrieval processes based on the two indexes are denoted as Pidx5, and PidxD. The retrieval process without any index is Pidx0. Since Pidx5 is good for displaying five pictures each time, for fair comparison, we display only five candidate pictures for Pidx0 and PidxD as well. The pictures reachable in each loop for Pidx0 are the top 5, the top 10, the top 15 and so on. However, for Pidx5, we can reach the top 5, minimum 10 to maximum 5 5+5 possible candidate pictures, minimum 15 to maximum 5 5 5+5 5+5 possible candidate pictures, and so on. We calculate the possible covered candidate picture distribution among all groups after any possible feedback from the user. It should be noted that we have not used any clustering or filtering approaches in this experiment as it is more objective to compare the results with the table scan approach (i.e. compare all database images one by one) without any index. This is of course for illustration purpose as it is not acceptable in a real application. The results are displayed in Tables 1–3. In Table 1, we have no index for interactive retrieval. Therefore, if we display five pictures each time, the candidate pictures will be the top 5, the top 10, the top 15, and so on based on IO&T metric. Initially, the candidates returned are mostly from G1 since there are few changes in G1. This means the metric IO&T is effective as a common similarity
Table 1 The candidate distribution for Pidx0 with five pictures display each time The possible candidates covered after each loop of the return
5 (the automatic return) 10 (after the first feedback) 15 (after the second feedback) 20 (after the third feedback)
The distribution percentage among groups Group 1
Group 2
Group 3
Group 4
Group 5
100.00 100.00 93.33 85.00
0.00 0.00 6.67 15.00
0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
ARTICLE IN PRESS X.M. Zhou et al. / Journal of Visual Languages and Computing 19 (2008) 24–38
35
Table 2 The candidate distribution for Pidx5 with five pictures display each time The possible candidates covered after each loop of the return
5 (the automatic return) 15 (after the first feedback) 36 (after the second feedback) 58 (after the third feedback)
The distribution percentage among groups Group 1
Group 2
Group 3
Group 4
Group 5
100.00 86.67 50.00 34.48
0.00 13.33 47.22 51.72
0.00 0.00 2.78 12.07
0.00 0.00 0.00 1.72
0.00 0.00 0.00 0.00
Table 3 The candidate distribution for PidxD with five pictures display each time The possible candidates covered after each loop of the return
5 (the automatic return) 13 (after the first feedback) 33 (after the second feedback) 50 (after the third feedback)
The distribution percentage among groups Group 1
Group 2
Group 3
Group 4
Group 5
100.00 92.31 60.61 40.00
0.00 7.69 33.33 35.20
0.00 0.00 6.06 22.80
0.00 0.00 0.00 2.00
0.00 0.00 0.00 0.00
measure. However, if more and more pictures are returned, there will be some pictures retrieved from other groups instead of restricting to all 20 pictures in G1. For example, after the third feedback, the top 20 pictures based on IO&T will be returned. However, there are only 17 pictures, or 85% from G1. The other three pictures of G1 have not been covered by the top 20. If the target picture happens to be in the remaining three of G1, the user may need a few more iterations in order to reach the target picture. Table 2 is for Pidx5. The first five pictures returned by the automatic retrieval are the same as Pidx0 since they are using the same base metric. However, after one feedback from the user, there are maximum 5 5 candidate pictures linked from the first five pictures. Therefore, there should be maximum 5 5+5 candidates reachable. However, the experiment result shows that only 15 pictures are covered after any one feedback from the user. This is because some pictures are close to many other pictures. In spite of the overlap of pointers, the number of pictures reachable in Pidx5 is still larger than that of Pidx0 after the same number of feedbacks from the user. In fact, after the third feedback, all 20 pictures from G1 are reachable by the user (58 34.68% ¼ 20). In Table 3, we use the enhanced digraph. It can be seen that the number of pictures reached after each feedback is less than the number for Pidx5 though more than the number for Pidx0. This is because we no longer use the fixed branch out for each vertex of the digraph. It is possible that there are less than five pictures displayed sometimes. We also notice that distribution percentage is closer to Pidx0, i.e. the distributions are higher for the first two groups. This is because we will not visit pictures that are not similar when we follow the pointers that usually point to pictures in the same group. In general, the
ARTICLE IN PRESS 36
X.M. Zhou et al. / Journal of Visual Languages and Computing 19 (2008) 24–38
enhanced digraph has the advantage of wider coverage as Pidx5, and at the same time, it overcomes some drawbacks of the fixed branch out. We did not consider the situation of rollback due to some poor feedback since this is too subjective to test in the experiments. However, the experiment results show clearly that the user can reach more candidate pictures with fewer feedbacks with an index structure. In particular, the pictures with fewer changes can be fully covered with fewer feedbacks too. Therefore, based on the empirical results on the retrieval coverage through the index structure, it shows that the inclusion of user feedback for similarity retrieval looks promising. 6.2. Experiment for search space explored Most of the effectiveness experiments for interactive image retrieval are based on a set of target selected arbitrarily, and the rate of recall and precision are calculated based on the statistics collected through some interactions with a small group of selected users. The result obtained may have a tint of subjective factor. In our navigation experiment, we generate 2000 simulated data points randomly. This is to reduce the subjectiveness of data set selection, and to simplify the process of generating a real big iconic image database. In the experiment, we compared three navigation approaches, namely the query point movement approach, the multipoint clustering approach, and the proposed digraph approach referred as QPM, MCA, and DIG in the following. The simulated process includes the following steps: Step 1: Mark all 2000 random data points as un-visited. Step 2: Three data points will be randomly selected as the initial feedbacks to the similarity retrieval system. Step 3: Based on the navigation approach selected, the system will find the similar data points using the respective measure and return the next round of candidates. Step 4: Three un-visited data points from the return of Step 3 will be randomly chosen as the next round of feedbacks, and mark all navigated data points as visited. Go back to Step 3, and repeat the process a number of times. We use the initial three randomly generated data points as the base to calculate the possible candidate data region CDR, the region with points that are within some similarity distance to the multiple feedback points. If a data point explored falls in CDR, we consider it as an effective visit. Otherwise, the visit is not effective (wasted). We compare QPM, MCA, and DIG in terms of the recall and the precision for each feedback session defined as follows: The recall ¼
The number of effective visits 100 , The number of data points in CDR
The precision ¼
The number of effective visits 100 . The number of all visitsðincluding effective and ineffective visitsÞ
The results are compared in Figs. 4 and 5. It can be seen that QPM has the best precision in the early interaction sessions. Since the query point can move out of CDR, the precision drops a lot in the later sessions. However, the real problem with QPM is how to reduce the search space. MCA cannot improve its recall after first few iterations as its search space has been restricted. Thinking comparison, DIG has the benefits of integrating the
ARTICLE IN PRESS X.M. Zhou et al. / Journal of Visual Languages and Computing 19 (2008) 24–38
37
120
The precision
100 80 60 40 QPM MCA DIG
20 0 1st
2nd
3rd 5th 4th Interaction session
6th
7th
Fig. 4. The precision comparison.
40 35
The recall
30 25 20 15 10
QPM MCA DIG
5 0 1st
2nd
4th 3rd 5th Interaction session
6th
7th
Fig. 5. The recall comparison.
advantages of both QPM and MCA approaches in terms of usability and flexibility for interactive similarity navigation. 7. Summary In this paper, we proposed to use dynamic similarity measure based on an enhanced digraph for interactive spatial similarity retrieval in an iconic image database. Although we used iconic spatial similarity retrieval for illustration, the proposed approach can be applied to any image retrieval algorithms. There are many indexing approaches introduced by other researchers. However, they are mostly suitable for non-interactive (automatic) similarity retrieval only. On the other hand, as discussed in Section 5, there are indeed some approaches making use of feedbacks from the current retrieval cycle to advise the next retrieval cycle. In comparison, our approach uses an indexing structure on top of the
ARTICLE IN PRESS 38
X.M. Zhou et al. / Journal of Visual Languages and Computing 19 (2008) 24–38
tree structure, and there is no rebuilding of the index for each retrieval cycle with search space reduction. There are still many interesting topics in this area. For example, it is possible to provide individual user a personalized database that can learn and adjust the index based on the feedback from the user. Even though the proposed approach does not accumulate what it learned from the retrieval, as it is orthogonal to a learning system, we can apply any learning approach on top of the current retrieval process when the system is built solely for individual user. We will discuss this topic in another paper. References [1] E. Chang, B. Li, C. Li, Towards perception-based image retrieval, in: Proceedings of IEEE Workshop on Content-Based Access of Image and Video Libraries, 2000, pp. 101–105. [2] E. Di Sciascio, F.M. Donini, M. Mongiello, Spatial layout representation for query-by-sketch content-based image retrieval, Pattern Recognition Letters 23 (13) (2002) 1599–1612. [3] C.H. Ang, T.W. Ling, X.M. Zhou, Qualitative spatial relationships representation IO&T and its retrieval, in: Ninth International Conference, DEXA’98, Vienna, Austria, August 1998. Lecture Notes in Computer Science 1460, pp. 270–279. [4] M. Nabil, A.H.H. Ngu, J. Sheperd, Picture similarity retrieval using 2D projection interval representation, IEEE Transactions on Knowledge and Data Engineering 8 (4) (1998) 533–539. [5] I. Bartolini, P. Ciaccia, F. Wass, FeedbackBypass: a new approach to interactive similarity query processing, in: Proceedings of the 27th VLDB Conference, Roma, Italy, 2001, pp. 201–210. [6] Y. Ishikawa, R. Subramanya, C. Faloutsos, Mindreader: querying databases through multiple examples, in: Proceedings of the 24th VLDB Conference, New York City, New York, USA, 1998, pp. 218–227. [7] K. Porkaew, K. Chakrabarti, S. Mehrotra, Query refinement for multimedia similarity in MARS, in: ACM International Multimedia Conference, Orlando, FL, USA, 1999, pp. 235–238. [8] Y. Rui, T.S. Huang, M. Ortega, S. Mehrotra, Relevance feedback: a power tool for interactive content-based image retrieval, IEEE Transactions on Circuits and Video Technology 8 (5) (1998) 644–655. [9] S.K. Chang, Q.Y. Shi, C.W. Yan, Iconic indexing by 2D strings, IEEE Transactions on Pattern Recognition and Machine Intelligence 9 (3) (1987) 413–428. [10] C.C. Chang, Spatial match retrieval of symbolic pictures, Journal of Information Science and Engineering 7 (1991) 405–422. [11] C.C. Chang, A fast algorithm to retrieve symbolic pictures, International Journal of Computer Mathematics 43 (1) (1992) 133–138. [12] E.A. El-Kwae, M.R. Kabuka, A robust framework for content-based retrieval by spatial similarity in image database, ACM Transactions on Information Systems 17 (2) (1999) 174–198. [13] E.A. El-Kwae, M.R. Kabuka, Efficient content-based indexing of large image databases, ACM Transactions on Information Systems 18 (2) (2000) 171–210. [14] X.M. Zhou, C.H. Ang, T.W. Ling, Indexing iconic image database for interactive spatial similarity retrieval, in: 9th International Conference on Database Systems for Advanced Applications, DASFAA 2004, Jeju Island, Korea, March 2004. Lecture Notes in Computer Science 2973, pp. 314–324. [15] P. Ciaccia, M. Patella, P. Zezula, M-tree: an efficient access method for similarity search in metric spaces, in: Proceedings of the 23rd VLDB Conference, Athens, Greece, 1997, pp. 426–435. [16] C.C. Aggarwal, A human–computer cooperative system for effective high dimensional clustering, in: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data mining, San Francisco, CA, United States, 2001, pp. 221–226. [17] C. Buckley, G. Salton, Optimization of relevance feedback weights, in: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, United States, 1995, pp. 351–357. [18] J.J. Rocchio Jr., Relevance feedback in information retrieval, in: G. Salton (Ed.), The SMART Retrieval System: Experiments in Automatic Document Processing, Prentice-Hall, Englewood Cliffs, NJ, 1971, pp. 313–323. [19] D.H. Kim, C.W. Chung, Qcluster: relevance feedback using adaptive clustering for content-based picture retrieval, in: Proceedings of the 2003 ACM SIGMOD, San Diego, CA, 2003, pp. 599–611.