Iconic pictorial retrieval using multiple attributes and spatial relationships

Iconic pictorial retrieval using multiple attributes and spatial relationships

Knowledge-Based Systems 19 (2006) 687–695 www.elsevier.com/locate/knosys Iconic pictorial retrieval using multiple attributes and spatial relationshi...

201KB Sizes 0 Downloads 26 Views

Knowledge-Based Systems 19 (2006) 687–695 www.elsevier.com/locate/knosys

Iconic pictorial retrieval using multiple attributes and spatial relationships Sam Y. Sung 1, Tianming Hu

*

Department of Computer Science, National University of Singapore, Kent Ridge, Singapore 119260, Singapore Received 19 January 2004; accepted 4 May 2006 Available online 4 August 2006

Abstract This work is on the use of multiple attributes or features and spatial relationships, with the help of a user interface based on an iconic paradigm, to retrieve images represented by iconic pictures. An icon has texture, color, and text attributes. Texture is represented by three statistical textural properties, namely, coarseness, contrast, and directionality. For text, the vector space model is used. For color, a representation based on a modified color histogram method which is less storage-intensive is proposed. The final icon similarity is the combination of the attribute similarity values using a proven adaptive algorithm. 2-D strings and its variants are commonly used to represent spatial relationships and perform spatial reasoning. We extended the method to include similarity ranking by using different similarity functions for different spatial relationships and an efficient embedding algorithm. Furthermore, our method solves the problem of query expressiveness which all methods based on 2-D string representations suffer from.  2006 Elsevier B.V. All rights reserved. Keywords: Content-based indexing; Information retrieval; Pattern recognition; Knowledge base system; Image database

1. Introduction An iconic picture can be viewed as a logical picture, while the image it represented can be viewed as a physical picture. The logical picture representations are highlevel abstracted representations, denoting the logical relationships among picture objects or icons [5]. Each icon in our work is bounded by its MBR (minimum bounded rectangle) and contains additional attributes such as texture, color and text which can be automatically computed or manually specified. The use of multiple attributes is important for effective retrievals because each different single attribute retrieves a different subset

*

Corresponding author. Address: Department of Computer Science, DongGuan University of Technology, DongGuan, GuangDong 523808, China. E-mail addresses: [email protected] (S.Y. Sung), [email protected] (T. Hu). 1 Department of Computer Science, South Texas College, McAllen, TX 78501, USA. 0950-7051/$ - see front matter  2006 Elsevier B.V. All rights reserved. doi:10.1016/j.knosys.2006.05.013

of relevant images. Furthermore, the combination of content-based attributes (texture and color) and text offers a richer way of describing an image object represented by an icon, for text alone is unable to describe an image fully. The use of logical pictures obviates the need for repeated image understanding. Functions for spatial similarity retrieval based on symbolic images are useful in distributed environments where physical images are stored at separate image stores while symbolic images are stored at each local site. Only those images that are relevant by comparison of their logical pictures, are transferred from the image stores to the local site [10]. Also, with this separation, many logical pictures may represent a single image. This is important because the same image may be used in different ways during different time periods. A visual query system can be categorized into four main paradigms, namely tabular, diagrammatic, iconic and hybrid [2,1,4,17]. Our work is based on the iconic paradigm and our contribution is as follows:

688

S.Y. Sung, T. Hu / Knowledge-Based Systems 19 (2006) 687–695

(1) Visual query systems use the full power of new technologies, like two/three dimensional representations, colors, multiple windows, etc., thus extending the man-machine communication bandwidth in several directions [2]. (2) Our implementation is web-based, where most users are non-professional. The iconic paradigm is intuitive and next-to-effortless to learn and caters for this group of users. (3) Useful when the user is not informed about the semantic content of the database in advance (very likely in our web-based implementation) since the visibility of the icons paints a rough semantic content of the database. (4) Can be used to provide a uniform interface for both image registration and querying subsystems. This paradigm has some shortcomings such as possibility of icon ambiguity and icon overcrowding. Icon classes are arranged in an inheritance hierarchy to alleviate the icon overcrowding problem. Our system is divided into two main subsystems: image registration (IRS) and query (QS). In IRS, users instantiate icons from appropriate icon classes by marking their MBRs over possible image objects on the current image to be registered with the database. In QS, users instantiate icons from icon classes and specify spatial relationship constraints between any two instantiated icons to pose a query. 2. Related works Text retrieval techniques can be divided into exact match and partial match [14]. Exact match misses many relevant documents that match partially and also does not rank the result. We use the vector space model, a partial match technique where both the query and documents are represented by vectors of term weights (d1, d2, . . . , dm). A commonly used similarity function between the query and document vectors is the cosine correlation [15]. Texture analysis has been widely studied and many approaches have been developed. Some of the common approaches are statistical, such as Wold transform and Gabor filters. We used the statistical texture properties coarseness, contrast, and directionality which are obtained by computing the local statistical distribution of image intensity and are defined in [22]. The similarity functions used to compare each of the properties are different from those suggested in [22] and will be shown later. Color distribution and color spatial are two main approaches for representing colors [23]. A color distribution is defined by a color histogram [12]. Two histograms Hq and Ht can be compared using the Manhattan distance or other suitable distance measures. Histograms are fast but do not store spatial information; two images having the same color distribution may look different. Further, they are sensitive to background noises and lighting conditions [20]. Our method is based on the color distribution approach but is less storage intensive compared to normal histograms. Color spatial

techniques are known to perform better than color histograms; some examples of this approach are mentioned in [20,11]. Spatial relationships are also intensively researched in the image retrieval literature. Gudivada and Raghavan [10] partitions spatial queries into two categories: spatial similarity approach (ranks results based on a similarity ranking function), spatial exact match approach (requires an algorithm that provides a Boolean response). A common exact match technique is the use of 2-D strings and its variants [7,8,6,18]. For this method, different types of queries with increasing restrictions are defined: Type-0 (existing), Type-1 (category), Type-2 (orthogonal), and Type-3 (coordinate) [18]. Extensive examples of the spatial similarity approach can be found in [10,19]. Neither method in [10] or [19] is able to do spatial reasoning, such as finding an object which ‘‘overlaps’’ another object, while Type-i approach cannot produce a ranked list of results since no effective similarity value is computed. For the best of both worlds, we integrated the two approaches. Integrating similarity computation into the Type-i method using the 2-D string representations may not be effective since this representation suffers from the problem of inadequate resolution, i.e., only a few discrete similarity levels are returned [10]. In order to solve this problem, we represent each icon’s position in the picture by its MBR. Spatial relationships are implicit in this representation and spatial reasoning can be performed. In [10], only a single spatial similarity function based on orientation is used which is insufficient because for certain relationships, same orientations may not mean similar spatial relationships according to human perception. Also it is not very effective for non-point objects. Fig. 1 illustrates this. We, therefore, defined four different spatial similarity functions for the disjoint, meet, contain, and overlap relationships in our work. All methods based on 2-D string representations suffer from the problem of query expressiveness which arises because the user cannot specify the relationship type (existing, category, etc.) at the icon level (between any two icons) but only at the query level (applied to the whole query). This scheme is either too relaxed or too restrictive for certain queries as shown in Fig. 2. Our work solves the query expressiveness problem by allowing the user to specify relationship types between any icons in the query. Finding a sub-picture in an iconic picture that matches the query to some degree of similarity is almost akin to

B

B A A

Graph 1

Graph 2

Fig. 1. Since graphs 1 and 2 have the same centroid orientations, the method in [10,2] will treat them to be exactly similar. However, they do not look similar.

S.Y. Sung, T. Hu / Knowledge-Based Systems 19 (2006) 687–695

A

B

C

D

Query 1: Previous system

A

C

or

or

B

D

Query 2: Proposed system

Fig. 2. Queries in previous and proposed systems to find pictures containing ‘‘A north of B and C north of D, but C and D may be anywhere on the picture’’. For query 1, the user must decide which Type-i to use. Type-2 (orthogonal) is too restrictive because the pictures that will be retrieved must also contain C east of A and D east of B. Type-1 (category) is too relaxed and results in pictures containing A–D that are disjoint and not necessarily in the same orthogonal positions as the query.

sub-graph isomorphism which is a hard-problem and is NP-complete in its pure form. However, by adding constraints (such as requiring only icons from same classes to match, and specifying relationships between the icons), search cases can be considerably reduced and the time complexity can be linear in most cases. An ‘‘embedding algorithm’’ with efficient use of storage and small computing time requirements is used. 3. Color attribute The color attribute is represented by up to N (N = 10 in our work) dominant and perceptually distinct LUV colors, each of which has an associated weight. LUV color model is used because equal Euclidean distance between any two LUV colors corresponds to equal perceptual difference [3]. In order to extract the distinct, dominant LUV colors, a LUV color histogram considering perceptually similar colors must be constructed with the help of a color similarity matrix which is similar to the one described in [13] 3 2 1 S 1;2 S 1;3    S 1;m 7 6 1 S 2;3    S 2;m 7 6 S 2;1 7 6 1    S 3;m 7 S¼6 7 6 S 3;1 S 3;2 7 6 4      5 S m;1 S m;2 S m;3    1 ( 0 if d i;j > rad inf where S i;j ¼ d i;j 1  rad inf otherwise qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi d i;j ¼ ðLi  Lj Þ2 þ ðU i  U j Þ2 ðV i  V j Þ2 m is number of colors, di,j is Euclidean distance between i and j. We can view rad_inf as the radius of the sphere with its center at a certain color K in the LUV color space. All LUV colors that fall inside the sphere are considered similar to K where colors nearer the center have larger similarity with K. rad_inf = 30 in our work. Using the matrix, we

689

can obtain the total number of pixels with colors similar to a certain color. Those with low Si,j have low contributions to the final Npercept Pm () value. N percept ðiÞ ¼ j¼1 S i;j  N ðjÞ, where i 6 i 6 m and N(j) is the number of pixels with color j. The N dominant, perceptually distinct colors selected must satisfy the following conditions. In subsequent paragraphs, the word ‘‘dominant’’ is taken to mean ‘‘dominant and perceptually dissimilar’’. 1. Their Npercept() values are the largest. 2. The Npercept() value of any of the N dominant colors must ideally contain only contributions by colors perceptually similar (Si,j > 0) to it but not similar (Si,j = 0) to the rest of the N colors. In other words, a dominant color has a unique set of similar colors each of which cannot be similar to any of the other N colors. 3. Their average LUV colors are perceptually dissimilar to each other, i.e., Si,j = 0. To explain Condition (2), assume that a dominant Color 1 (satisfies all three conditions) has previously been selected. Color 1 is similar to Colors 2 and 3. Consider a Test Color 4 with similar Colors 3 and 5. Therefore, Npercept(1) = N(1)*1 + N(2)*S1,2 + N(3)*S1,3 and Npercept(4) = N(4)*1 + N(3)*S4,3 + N(5)*S4,5. Notice that Color 3’s contribution is included in both Npercept(1) and Npercept(4) which will produce inaccurate results during similarity computation. Condition (2) is required in order to reject Color 4 as a dominant color. But this criterion may be so strict that even if Color 3’s contribution is small and negligibly affects similarity computation, Test Color 4 may still not be selected. We relaxed condition (2) in our implementation. The ‘‘intersections’’ of the contributions of the common perceptually similar colors between the test color and each of the currently selected dominant colors must first be computed. The intersection between a test color T and a dominant color D is shown below X intersectðT ; DÞ ¼ INT ði; T ; DÞ; i2C

where

C ¼ fijS i;T 6¼ 0; S i;D 6¼ 0g

INT ði; T ; DÞ 8 > < N ðiÞ  ðS i;T þ S i;D  1Þ where S i;T þ S i;D > 1 ¼ N2ðiÞ minðS i;T ; S i;D Þ where 0:5 < S i;T þ S i;D 6 1 > : 0 otherwise If intersect(T, D) 6 threshold for all the current dominant colors D, then T satisfies condition (2). If it also satisfies condition (1) and (3), it will be accepted as the next dominant color. Each dominant color is represented by a binary tuple containing the avg_luv_color and wt attributes: dom_color(j) = Æavg_luv_color(i), wt(i)æ, where 1 6 j 6 N, i 2 D,

690

S.Y. Sung, T. Hu / Knowledge-Based Systems 19 (2006) 687–695

D is the set of N color indices which satisfy conditions 1, 2 and 3. avg_luv_color(i) is the color formed by the weighted sum of all the similar colors of a color i. This color is representative of the colors within the ‘‘sphere’’ of color i and is used in condition (3) to decide whether color i is dissimilar to another color. Computation of both attributes are shown below 3 2P m S i;j N ðjÞLðjÞ 7 6 j¼1 7 6 7 6 m P 1 7 6 avg luv colorðiÞ ¼ Pm 6 S i;j N ðjÞU ðjÞ 7 7 6 S N ðjÞ j¼1 i;j j¼1 7 6 m 5 4P S i;j N ðjÞV ðjÞ j¼1

where L(j), U(j), V(j) are L, U, V components of color j. wtðiÞ ¼

N percept ðiÞ wh

where w denotes width and h denotes height. We can see that this representation is less storage-intensive compared to the normal histogram method, where more than 100 real values may be stored. For similarity computation, the similarity contribution by each dominant query color i of Q is first calculated. Si is the set of color indices j of D where Si.j „ O, D is the database icon color attribute   1 X jwtðiÞ  wtðjÞj SIM percept ðD; iÞ ¼ S i;j 1  jS i j j2S i maxðwtðiÞ; wtðjÞÞ The weighted sum of the contributions of the query colors is taken to be the similarity value between the query and the database color attribute P i2Q wtðiÞ  SIM percept ðD; iÞ P SIMðQ; DÞ i2Q wtðiÞ This equation, however, will give a high similarity value for a database icon containing a few colors that are similar to the query icon colors, and many others that are not similar. On the other hand, SIM(D, Q) will give a lower result that takes into account those dissimilar colors. We utilize the two results to arrive at the final color similarity function SIM color ðQ; DÞ ¼

SIMðQ; DÞ þ SIMðD; QÞ 2

4. Texture attribute Refer to [22] for an explanation of the extraction of coarseness, directionality, and contrast from an image region. Only similarity functions are listed here. Contrast similarity is the simplest SIM con ðQ; DÞ ¼ 1 

jconQ  conD j maxðconQ ; conD Þ

Since both directionality and coarseness is represented by histograms of values, histogram similarity methods are

used. If we assume equal similarity contribution of each of the n histogram buckets, the function that returns a normalized histogram similarity value would be:  n1  1X jH Q ðiÞ  H D ðiÞj 1 SIMðQ; DÞ ¼ n i¼0 maxfH Q ðiÞ; H D ðiÞg Instead of the above assumption, different weights can be used for each bucket contribution. The final similarity function for directionality will look like this o  Pn1 n jH Q ðiÞH D ðiÞj 1  ðiÞ  H Q i¼0 maxfH Q ðiÞ;H D ðiÞg SIM dir ðQ; DÞ ¼ Pn1 i¼0 H Q ðiÞ Coarseness similarity function is similar to the above. In our implementation, the directionality histogram consists of n = 16 buckets, while coarseness is represented by n = 5 buckets. Finally, the three texture similarity values are combined to give a single similarity using a simple weighted sum where wcrs = 0.4, wdir = 0.4 and wcon = 0.2. These values are derived empirically. SIM texture ¼ wcrs  SIM crs þ wcon  SIM con þ wdir  SIM dir wcrs þ wcon þ wdir ¼ 1 5. Text attribute The icon text attribute is represented by a vector of term weights [24]. Salton [21] suggests a term weighting scheme which assigns high term weights to terms which occur frequently in individual documents but rarely in the collection as a whole. Our term weight is ! tf ij wij ¼ 0:5 þ 0:5   idf i maxfreqj where tfij is the frequency of term i in icon j of icon class C, maxfreqj is the maximum term frequency in icon j, idfi is the inverse document frequency of term i. idfi = log2(N/ni) + 1 gives higher weights to terms which appear in fewer icons of icon class C. N is the total number of icons of icon class C and ni is the number of icons of icon class C which contain term i. If N = 0 or ni = 0, then idfi = 0. Once the vectors Q and D are constructed for the query and database icon text attributes, cosine correlation is used to compute similarity. PjQj i¼1 wiQ  wiD SIM text ðQ; DÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P P 2 wiQ  w2iD 6. Icon similarity The icon or local similarity value is a measure of the similarity of two icons from the same icon class based on their attributes. Similarity values of the attributes may be adaptively combined because it may be difficult to determine a good set of ratios for a simple weighted sum due

S.Y. Sung, T. Hu / Knowledge-Based Systems 19 (2006) 687–695

to the fact that similarity calculation and extraction of attributes are at best approximations of human perception and the similarity value for each attribute may vary from image to image with varying degrees of reliability. A simple adaptive method that dynamically calculates weights based on reliability factors will be utilized.

SIM sqaureness ¼ 1 

691

jwidth factor  height factorj maxðwidth factor; height factorÞ

The final icon similarity 0.3SIMsquareness + 0.7SIMadaptive.

is

SIMbasic_icon =

7. Spatial similarity

SIM adaptive ¼ awcolor  SIM color þ awtexture  SIM texture þ awtext  SIM text where awcolor þ awtexture þ awtext ¼ 1; 0 6 awcolor ; awtexture ; awtext 6 1 For ease of presentation, we use subscripts i, j, and k to represent any of the three attributes in any order. The adaptive weights are awi ¼ a  wi ;

awj ¼ b  wj ;

where wi þ wj þ wk ¼ 1;

awk ¼ c  wk

0 6 wi ; wj ; wk 6 1

The values of a, b, and c can be derived as follows: If SIMi P Ti, SIMj P Tj, and SIMk P Tk (all similarity values are reliable), a = b = c = 1. If SIMi < Ti, SIMj < Tj, and SIMk < Tk (all similarity values are unreliable), a = b = c = 0. If SIMi < Ti, SIMj P Tj, and SIMk P Tk (one of the three similarity values are unreliable), reduce the weight of i and increase weights of j and k with a ¼1  wi b ¼c ¼ 1 þ

w2i wj þ wk

If SIMi < Ti, SIMj < Tj, and SIMk P Tk (one of the three similarity values are reliable), reduce the weights of i and j and increase weight of k with a ¼1  wi b ¼1  wj 1 c ¼ ð1  a  wi  b  wj Þ wk The default values set are Tcolor = 0.25, Ttexture = 0.2, Ttext = 0.05, wcolor = 0.3, wtexture = 0.2, and wtext = 0.5. The similarity reliability thresholds should be set such that most of the relevant images have similarity values greater than the thresholds for a specific attribute. A similarity threshold indirectly depends on the goodness of the attribute representation and similarity function, for good representations have low similarity thresholds. The shape attribute is not implemented in the current work. However, we use a simple procedure to compare the aspect ratio or the ‘‘squareness’’ of the MBRs of two icon instances MBR widthdatabase icon width factor ¼ MBR widthquery icon height factor ¼

MBR widthdatabase icon MBR widthquery icon

Assume two pairs of icons, query icons (Ii, Ij) and database icons (xi, xj). The spatial or global similarity, gij(xi, xj), is the measure of similarity of the spatial relationship of (Ii, Ij) compared with (xi, xj). Before gij can be calculated, xi must be from the same icon class as Ii, xj must also be from the same icon class as Ij, the spatial relationship type of (xi, xj) must be the same as (Ii, Ij). Further, the MBRs of xi and xj must be normalized since the icons may belong to iconic pictures of different dimensions. UL and LR are upper-left and lower-right coordinates of the MBRs of xi and xj. query iconic picture width database iconic picture width query iconic picture height scaleY ¼ database iconic picture height

scaleX ¼

ULnorm ¼ðULx  scaleX ; ULy  scaleY Þ LRnorm ¼ðLRx  scaleX ; LRy  scaleY Þ Type-0 Existing relationship: Always return 1. Type-1 Category disjoint relationship: The distance between two MBRs can be used to measure the amount of disjoint (Fig. 3). The similarity function of the distances between (Ii, Ij) and (xi, xj) is SIM disjoint

dist

¼1

jd I  d x j maxðd I ; d x Þ

where dI and dx are distances between Ii and Ij, xi and xj, respectively. Also (Ii, Ij) will look more similar to (xi, xj) if the scale factors of Ii, xi and that of Ij, xj are closely similar. MBR areaðI i Þ ; MBR areaðxi Þ MBR areaðI j Þ scale factorj ¼ MBR areaðxj Þ scale factori ¼

SIM rel

size

¼1

jscale factori  scale factorj j maxðscale factori ; scale factorj Þ

Orientation of the icons is another important aspect in the perception of similarity. The final disjoint similarity function including orientation similarity is SIMdisjoint = 0.25 SIMdisjoint_dist + 0.25SIMrel_size + 0.5SIMorientation.SIMorientation is the measure of orientation similarity which is extended

d d

d

Fig. 3. Distance between two icons.

692

S.Y. Sung, T. Hu / Knowledge-Based Systems 19 (2006) 687–695

a

c

b a

a

b p=a/b

p=0

SIM overlap ¼

d

b p=a/b

p=1

Fig. 4. Amount of meet between two icons.

from the formula used in [16]. j ! pq j denotes a vector from point p to q while ! pq is the magnitude of the vector. The numerator in the following equation is the dot product between two vectors.  !  !  xcent I cent I cent xcent i j i j cos h ¼ !  ! ; 0 6 h 6 p cent cent cent cent jI i I j j  jxi xj j cent cent ; I cent ; xj are centroids of I i ; I j ; xi ; xj : I cent i j ; xi

tI ¼ 0:25  minðlengthðI i Þ; widthðI i Þ; lengthðI j Þ; widthðI j ÞÞ tx ¼ 0:25  minðlengthðxi Þ; widthðxi Þ; lengthðxj Þ; widthðxj ÞÞ 8  !  ! cent cent > when jI cent I cent xj j 6 t x > i j j 6 t I and jxi <1  !  ! cent cent SIM orientation ¼ 0 when jI cent I cent xj j 6 t x > i j j 6 t I or jxi > : 1  ph otherwise Type-1 Category meet relationship: The proportion of intersection of the edges is used to measure how much two icons meet (Fig. 4). The similarity of the ‘‘meet amount’’ between (Ii, Ij) and (xi, xj) is 1 jpI  px j amt 2 maxðpI ; px Þ where pi and px are ‘‘meet amount’’ between Ii and Ij, xi and xj, respectively. The final meet similarity is SIMmeet = 0.25SIMmeet_amt + 0.25SIMrel_size + 0.5SIMorientation. Type-1 Category contain and inside relationships: These two relationships are similar, i.e., if Ii contains Ij, then Ij must be inside Ii. Assume Ii contains Ij, then the amount of containment can be measured by how much of Ii’s area is covered by Ij. SIM meet

¼1

MBR areaðI j Þ ; coverageI MBR areaðI i Þ SIM contain

partial

¼1

MBR areaðxj Þ coveragex MBR areaðxi Þ

jcoverageI  coveragex j maxðcoverageI ; coveragex Þ

The final contain similarity including orientation similarity is SIMcontain = 0.5SIMcontain_partial + 0.5SIMorientation. Type-1 Category overlap relationship: The overlap subtype similarity is given below. SIMi_contain_O is the similarity comparison of area coverage of (Ii \ Ij) in Ii and that of (xi \ xj) in xi. SIMj_contain_O is like the above but for Ij and xj. The above two similarities are computed similarly to SIMcontain. SIMsquareness_of_O is the similarity comparison of aspect ratios of (xi \ xj) and (Ii \ Ij). This is similar to SIMsquareness.

SIM i

contain O

þ

6 SIM orientation þ 2

SIM j

contain O

6

þ

SIM sqaureness 6

of O

Type-2 Orthogonal, Type-3 Coordinate, and Type-4 Position relationship: For these three relationships, the similarity functions SIMdisjoint, SIMmeet, SIMcontain, and SIMoverlap are used depending on the types of category relationship satisfied. 8. An efficient embedding algorithm By analogy, finding the best match between the query and database iconic picture requires moving and stretching a rubber sheet of query icons over the database iconic picture to get the best possible alignment. The amount of stretching is one of the factors used to determine the similarity value. This is a hard problem which can grow exponential in storage and computing time requirements. Fischler et al. [9] proposed an algorithm known as the Linear Embedding Algorithm (LEA) that is able to locate a suitable ‘‘embedding’’ or sub-picture with a reasonably linear growth of storage and time as a function of the number of icons and relationships in the query. This algorithm has been integrated in our work and the trems are explained in Table 1. n¼

pðp  1Þ ; 2

gwt ¼

spatial wt ; n

lwt ¼

1  spatial wt p

g1 ðx1 Þ ¼ lwt  l1 ðx1 Þ Y 1 ¼ fx1 g g2 ðx2 Þ ¼ max ½gwt  g21 ðx2 ; x1 Þ þ lwt  l2 ðx2 Þ þ g1 ðx1 Þ; x1 2W 2

Y 2 ¼ Y 1 [ fx2 g where x1 is the database icon that maximizes the above equation. ...

h i Pi1 gi ðxi Þ ¼ max gwt  j¼1 gij ðxi ; xj Þ þ lwt  li ðxi Þ þ gi1 ðxi1 Þ ; xi1 2W t

Y i ¼ Y i1 [ fxi g

where xi-1 is the database icon that maximizes the above equation. ... gp ðxp Þ ¼ . . . ;

Yp ¼ ...;

G ¼ max½gp ðxp Þ; xp

Y ¼ Yp

where xp is the database icon which maximizes the above equation. The output of the algorithm are G (maximum similarity value, where 0 6 G 6 1) and Y (the set of x1 to xp which gives G). The effectiveness of the embedding algorithm in finding the best match depends on the goodness of the local and global similarities used.

S.Y. Sung, T. Hu / Knowledge-Based Systems 19 (2006) 687–695

693

Table 1 Explanation of terms used in linear embedding algorithm Term

Description

I1, I2, . . . , Ip xi = map(Ii)

p query icons on the query iconic picture A database icon on the Database Iconic Picture that is mapped to the query icon Ii. The mapped icon xi must be from the same icon class as Ii and has not previously been mapped to other query icons. Several database icons may be mapped to Ii Icon or local similarity between a query icon Ii and a possible mapped icon xi. The value is a real number between 0 and 1 Global or spatial similarity between a pair of query icons (Ii, Ij) and a possible pair of database icons that are mapped to (Ii, Ij). The value is a real number between 0 and 1 A set containing x1 to xi. Given xi, Wi is the set of possible xi-1 such that the spatial relationship between (Ii, Ii-1) is the same as (xi, xi-1) The importance of spatial or global similarity compared with icon or local similarity

li(xi) gij(xi, xj) Yi Wi spatial_wt

9. System test results

Precision

Color retrieval effectiveness: Three image collections, namely birds, mammals, and plants, containing actual image objects segmented and extracted from real-life photo images, are available for color retrieval test. The birds collection contains 26 images; mammals, 53; and plants, 28. The recall-precision measure is used as a measure of effectiveness. For each collection, a number of queries are used and the recall-precision pairs are calculated for each query. The average precision is then calculated from the above values and a graph for that collection is plotted. The optimal graph is a horizontal line at precision = 1 for all recall levels. A typical graph for large image collections is shown in Fig. 5. Increasing the number of images is expected to cause a drop in effectiveness in the three collections and cause the 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Mammals Birds Plants Typical

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

graphs to approach the typical case. The ‘‘Mammals’’ graph looks most similar to the typical case as the mammals collection contains the most images. Comparing the shapes of the graphs with the optimal, the collections ranked according to increasing effectiveness are plants, birds, and mammals. The results are good for our image collections in that for most recall levels, the precision is greater than 0.5, i.e., more than 50% of the retrieved images contain relevant images. We can deduce that the retrieval is effective at least for small collections of sizes in multiples of hundreds. However, we cannot assume the same level of performance for large collections. More testing needs to be carried out for large collections. Spatial similarity effectiveness: The similarity functions of disjoint, meet, contain and overlap relationships are tested for their effectiveness. For each relationship, two sets of rankings (rank 1—most similar to query, rank 5—least similar) are obtained using the test cases for that relationship. The first ranking is produced by the system, while the second is obtained from a human user. After the two rankings are obtained, Rnorm which measures how far the system ranking deviates from the user ranking is computed. Let I be a finite set of images. Let Dusr and Dsys be the rank orderings of I provided by a user and the system, respectively. Then Rnorm is defined as   1 Sþ  S 1þ Rnorm ðDsys Þ ¼ 2 Sþ max where S+ is the number of image pairs where a better image is ranked ahead of a worse one; S is the number of pairs

Fig. 5. Recall-precision graphs of color retrieval effectiveness.

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Q1

Q3

Q7

Q9

Q10

Q12

Q17

Q18

Fig. 6. Rnorm of disjoint test.

Q19

Q21

Q25

Q27

694

S.Y. Sung, T. Hu / Knowledge-Based Systems 19 (2006) 687–695

where a worse image is ranked ahead of a better one; and + Sþ max is the maximum number of S . It should be noted that +  þ the calculation of S , S , and S max is based on the ranking of image pairs in Dsys relative to the ranking of corresponding image pairs in Dusr. High Rnorm is desirable. Twelve queries are used for the disjoint test, 13 for meet, 12 for contain and 9 for overlap. Only the Rnorm values for the disjoint test is shown in Fig. 6 due to lack of space. The average Rnorm values for meet, disjoint, overlap and contain relationships for our preliminary tests are 0.8748, 0.7778, 0.7545, and 0.7297, respectively, which are generally satisfactory. To produce more accurate results, it may be desirable to obtain ranking results that are averaged over several different users. Our test cases may not be optimal; more test cases may be formulated. Other tests: Tamura et al. [22] has tested the effectiveness of each texture property such as coarseness, contrast, etc. Our work combines them into a single similarity value. The effectiveness of such a combination must be tested. The adaptive algorithm that is employed in calculating the icon similarity has been tested in an independent study and proven to be more effective than the fixed ratio weighted sum method. The vector space model used for text is quite effective. However, further experiments may be carried out by using different types of term weights. 10. Conclusion Some possible enhancements and future research directions for a more powerful system are described below: Addition of the shape attribute: Currently an icon is represented by its MBR. The disadvantages of using the MBR, other than the inability to query by shape features, are that spatial reasoning may not be accurate and many unwanted background colors may be extracted during automatic color extraction. Improvement to text attribute extraction: Allow the user to enter free text in a natural language. For this to work, text preprocessing such as removal of stop words, word stemming and the use of thesaurus must be incorporated. These steps are outlined in [21]. Filtering: This is the process that helps quickly remove many impossible pictures before performing more accurate time-consuming similarity computation. A common filtering method is to use signature files to act as ‘‘spatial filters’’ [18] where only database pictures that roughly match the query in terms of spatial relationships are retrieved and further tested. The use of multidimensional trees such as the R-tree and its variants, may be used for filtering purposes as well [19]. Multiple paradigm querying: The system should provide multiple modes of query to cater for different types of users. Pictorial or image retrieval is a hard problem which requires knowledge from many disciplines ranging from psychology to image processing to machine vision. Many components of an effective and efficient IIS are still undergoing research. Furthermore, image repositories are

exploding at an alarming rate, making it very difficult to manage them. However, the race for better attribute representations and faster access methods has not slowed down. This, coupled with the increase in computing power and the decrease in computing costs, will hopefully lead to more powerful IISs in the near future. References [1] G. Aggarwal, T.V. Ashwin, S. Ghosal, An image retrieval system with automatic query modification, IEEE Transactions on Multimedia 4 (2) (2002) 201–214. [2] C. Batini, T. Catarci, M.F. Costabile, S. Levialdi, Visual query systems: a taxonomy, Visual Database Systems II, Elsevier Science Publishers, B.V., North Holland, 1992. [3] A. Del Bimbo, M. Mugnaini, P. Pala, F. Turco, PICASSO: visual querying by color perceptive regions, in: Proceedings of the Second International Conference on Visual Information Systems, 15–17 December 1997, pp. 125–131. [4] N.H. Balkir, G. Ozsoyoglu, Z.M. Ozsoyoglu, A graphical query language: VISUAL and its query processing, IEEE Transactions on Knowledge and Data Engineering 14 (5) (2002) 955–978. [5] Shi-Kuo Chang, Principles of Pictorial Information Systems Design, Prentice-Hall, Englewood Cliffs, New Jersey, 1989. [6] Jae-Woo Chang, Yeon-Jung Kim, Ki-Jin Chang, A spatial match representation scheme for indexing and querying in iconic image databases, in: Proceedings of the ACM CIKM 1997 Conference, 1997, pp. 169–176. [7] Shi-Kuo Chang, Cheng-Wen Yan, Iconic indexing by 2-D strings, IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-9 (3) (1987) 413–428. [8] S.K. Chang, C.W. Yan, Donald C. Dimitroff, Timothy Arndt, An intelligent image database system, IEEE Transactions on Software Engineering 14 (5) (1988) 681–688. [9] Martin A. Fischler, Robert A. Elschlager, The representation and matching of pictorial structures, IEEE Transactions on Computers 22 (1) (1973) 67–92. [10] Venkat N. Gudivada, Vijay V. Raghavan, Design and evaluation of algorithms for image retrieval by spatial similarity, ACM Transactions on Information Systems 13 (2) (1995) 115–144. [11] Wynne Hsu, T.S. Chua, H.K. Pung, An integrated color-spatial approach to content-based image retrieval, in: Proceedings of the ACM Multimedia 1995 Conference, 1995, pp. 305–313. [12] Ju Han, Kai-Kuang Ma, Fuzzy color histogram and its use in color image retrieval, IEEE Transactions on Image Processing 11 (8) (2002) 944–952. [13] Milihiro Ioka, A method of defining the similarity of image on basis of color information, Technical Report, Tokyo Research Laboratory, 1993. [14] Nicholas J. Belkin, W. Bruce Croft, Retrieval techniques, Annual Review of Information Science and Technology (ARIST) 22 (1987) 109–145. [15] Guojun Lu, Techniques and data structures for efficient multimedia retrieval based on similarity, IEEE Transactions on Multimedia 4 (3) (2002) 372–384. [16] John Z. Li, M. Tamer Ozsu, STARS: a spatial attributes retrieval system for images and videos, Multimedia Modeling (MMM 97), Modeling Multimedia Information and Systems, World Scientific Publishing Co. Pte. Ltd., 1997, pp. 69–84. [17] Taekyong Lee, Lei Sheng, T. Bozkaya, Hurkan N. Balkir, Meral Z. Ozsoyoglu, G. Ozsoyoglu, Querying multimedia presentations based on content, IEEE Transactions on Knowledge and Data Engineering 11 (3) (1999) 361–385. [18] Suh-Yin Lee, Ming-Chwen Yang, Ju-Wei Chen, Signature file as a spatial filter for iconic image database, Journal of Visual Languages and Computing (1992) 373–397.

S.Y. Sung, T. Hu / Knowledge-Based Systems 19 (2006) 687–695 [19] Euripides G.M. Petrakis, Christos Faloutsos, Similarity searching in medical image databases, IEEE Transactions on Knowledge and Data Engineering 9 (3) (1997) 435–447. [20] Greg Pass, Ramin Zabih, Justin Miller, Comparing images using color coherence vectors, in: Proceedings of the ACM Multimedia 1996 Conference, 1996, pp. 65–73. [21] Gerard Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, AddisonWesley Publishing Company, 1988.

695

[22] Hideyuki Tamura, Shunji Mori, Takashi Tamawaki, Textural features corresponding to visual perception, IEEE Transactions on Systems, Man and Cybernetics SMC-8 (6) (1978) 460–473. [23] Jie Wei, Color object indexing and retrieval in digital libraries, IEEE Transactions on Image Processing 11 (8) (2002) 912–922. [24] Clement Yu, King-Lup Liu, Weiyi Meng, Zonghuan Wu, N. Rishe, Methodology to retrieve text documents from multiple databases, IEEE Transactions on Knowledge and Data Engineering 14 (6) (2002) 1347–1361.