Fuzzy Sets and Systems 121 (2001) 459–470
www.elsevier.com/locate/fss
An image retrieval model based on fuzzy triples Jae Dong Yang ∗ Department of Computer Science, Chonbuk National University, Chonju, 561-756, South Korea Received 2 September 1998; received in revised form 31 January 2000; accepted 14 March 2000
Abstract This paper proposes an image retrieval model for indexing and retrieving images with fuzzy triples. The well-known triple framework based on two-dimensional (2D) strings is a novel index mechanism in that it allows us to simply specify the spatial structure of images, guaranteeing fast retrieval time. However, there are two signi4cant drawbacks in this framework; one is that it cannot accommodate a concept-based image retrieval and the other is that it does not deal with an inexact match among directions. The former is crucial when answers turn out to be relevant only when they are conceptually related with user queries and the latter is useful especially when the spatial relationship cannot be speci4ed in terms of eight directions. Our model supports the concept-based image retrieval as well as the inexact match with a fuzzy triple matching performed when evaluating queries. To demonstrate the feasibility of the model, we also develop a prototypal system directly supporting c 2001 Elsevier Science B.V. All rights reserved. it. Keywords: Information retrieval; Image processing; Fuzzy databases
1. Introduction Currently, with the advent of digital libraries containing a huge volume of images, the necessity of content-based image retrieval techniques is ever increasing. One such technique is to use twodimensional strings (2D strings) in representing images [3]. It represents the spatial relationships between object oi , i = 1; : : : ; n, by r ∈ {=; ¿; :}. When the X or Y coordinate of two objects oi and oj are the same, it is denoted by oi = oj . If the X or Y coordinate of oi is bigger than that of oj , then we represent it by oi ¿oj . oi : oj denotes that they share the same minimum bound
Supported by the KOSEF, no. 97-0100-1010-3. Tel.: + 82-652-270-3388; fax: + 82-652-270-3403. E-mail address:
[email protected] (J.D. Yang).
∗
ing rectangle (MBR). (u; v) = (o1 r1 o2 r2 · · · rn−1 on , op(1) r1 op(2) · · · rn−1 op(n) ) is the general form of 2D strings where P : {1; 2; : : : ; n} → {1; 2; : : : ; n} and the two strings u and v represent the spatial relationships of the X and Y projections, respectively, of the objects contained in the image. For example, the 2D string for the iconic image in Fig. 1 is (u; v) = (a¡b : c = d¡e; b : c¡a¡d = e). The 2D string may be viewed as the symbolic projection of p1 along the x- and y-direction. As the incompleteness of the earlier formulation of the 2D string technique was pointed out by several authors, a number of its extensions and variations were proposed, including [1,2,4,7,10 – 12,17]. Triple indexing [1,2] is a variation of this technique to enhance its semantic expressiveness by translating the 2D strings into semantically equivalent
c 2001 Elsevier Science B.V. All rights reserved. 0165-0114/01/$ - see front matter PII: S 0 1 6 5 - 0 1 1 4 ( 0 0 ) 0 0 0 5 6 - 7
460
J.D. Yang / Fuzzy Sets and Systems 121 (2001) 459–470
Fig. 1. An iconic image.
triples. This approach indexes an image in terms of triples specifying the spatial relationship among its constituent objects. For example, the 2D string in Fig. 1 may be translated into a triple set, {a; e; northeast; b; d; north; b; c; same; : : :} in which the third component is the relative direction of the second component with respect to the 4rst component. The triple framework is a novel index mechanism in that it allows us to simply specify the spatial structure of images, guaranteeing fast retrieval time by a well-designed hashing function [5]. However, there are two signi4cant drawbacks in the framework; one is that it cannot accommodate a concept-based image retrieval and the other is that it does not deal with inexact matches among directions. The former is crucial when answers turn out to be relevant only when they are conceptually related with user queries, and the latter is useful especially when the spatial relationship cannot be speci4ed in terms of the eight directions. For example, to users who want images containing audio located at the left side of furniture, the conventional systems based on this framework would no longer be suHcient since they fail to capture the concepts, “audio” and “furniture”. On the other hand, if a user is allowed to specify the con4guration of target images on a panel window in terms of object icons, the spatial relationships between them may not be precisely converted into the eight directions. In this paper, we develop an image retrieval model based on fuzzy triples to support the concept-based image retrieval as well as the inexact match. The two facilities are made by a fuzzy matching [14,15] performed when evaluating queries. To demonstrate the feasibility of the model, we also provide a prototypal system directly supporting it. This paper proceeds as follows. Section 2 proposes a new data type called fuzzy triple as a framework
for retrieving images as well as indexing them. In Section 3, we introduce a series of k-weight functions as membership functions to enumerate the compatibility between the eight directions and the corresponding angle degrees. The k-weight function is a function mainly used for wellapproximating statistical inference. Sections 4 and 5 discuss a concept-based fuzzy matching and a way of query evaluation based on this matching, respectively. This matching employs thesauri structured by the integration of membership functions. Section 6 is devoted to the demonstration of a prototyped system directly implementing our model and conclusions follow in Section 7. 2. Image indexing by fuzzy triples Before introducing fuzzy triples, we need to brieKy explain the triple-based indexing technique adopted in [1,2]. It represents an iconic image as a set of ordered triples oi ; oj ; Rij . The triples are used to encode a pair-wise spatial relationship Rij between objects oi and oj where Rij is one of eight directions. Since oi ; oj ; Rij implies oj ; oi ; Rji for Rji which is the inverse of Rij , only n(n − 1)=2 triples are enough to represent the iconic image having n constituent objects. For example (Fig. 2), let w, r, s and c be “working table”, “radio”, “speaker” and “clock”, respectively. Then the iconic image p1 is indexed by {w; c; northwest; r; w; east; r; c; north; w; s; north; r; s; northeast; c; s; east}. The generated triples are stored into an inverted 4le together with the links to the images indexed by them. For example, p1 would be retrieved as an answer to the query “search for images containing w; c; northwest” by inverted 4le lookup. The inverted 4le used in [1,2] is given in Fig. 2 and its structure was originally proposed by Cook and Oldehoeft [5]. We 4rst extend the structure of the triples to represent a spatial relationship more Kexibly. The triple structure in De4nition 1 below diLers from Chang [1] and Chang and Lee [2] in that a spatial relationship is speci4ed by an angle degree in [0; 359] or by one element of an ordered list of direction descriptors, D = {east; northeast; north; northwest; west; southwest; south; southeast} in which the order of its elements is preserved. Additionally, when the order is important,
J.D. Yang / Fuzzy Sets and Systems 121 (2001) 459–470
461
Fig. 2. Iconic image p1 . Inverted 4le constructed by a hashing function.
the ith element of D is denoted by di . It is used when precisely specifying the corresponding membership function of each direction along with counterclockwise increase in the angle. Denition 1. Let IDB be the whole set of images in an image retrieval system, and Op the set of all objects possibly occurring in an image p ∈ IDB. Then a set of triples for p, Tp is given by Tp = {t = oi ; oj ; rij for oi ; oj ∈Op | rij ∈ D ∪ [0; 359] and rij is the relative direction of oj with respect to oi }: The objects participating in the triple may be viewed as object identi4ers. However, since object equality is known to be extremely diHcult to implement [8], De4nition 1 needs a re4nement for mapping the object identi4ers to conceptual linguistic terms (or simply terms). The following de4nition is provided to associate object identi4ers with the terms.
Denition 2. A name function, fNAME is de4ned as follows. fNAME : O → N;
where N is a set of terms:
In addition to such an association, the name function is crucial for making our fuzzy triple structure more extensible – it allows the speci4cation of two distinct objects yet sharing the same name. For example, suppose that another radio r appears in Fig. 2, replacing working table w. Then r and r need to be clearly distinguished, if we want to reduce r and the speaker s into a composite concept, “audio set ” together with their spatial relationship north; r should not be used instead of r to capture the concept. Refer to Yang and Yang [17] for further discussion about handling composite concepts with the name functions. If we can treat a term as a fuzzy set characterized by its membership function, we call it a fuzzy linguistic term or brieKy, a fuzzy term. If not, it is referred to as a crisp term. For example, audio is a fuzzy term, since it can be interpreted as a fuzzy set,
462
J.D. Yang / Fuzzy Sets and Systems 121 (2001) 459–470
audio={0:7=radio; 0:9=receiver; 0:86=speaker}. Radio, receiver and speaker are crisp terms, provided that their membership functions are not de4ned. Our fuzzy triple representing the structure of images may now be de4ned with this name function.
i = 0; : : : ; 7 are de4ned as 2 2 k C(45 − (x − bi ) ) ; i = 1; : : : ; 7 %di (x) = if 45 × (i − 1)6x645 × (i + 1); 0 otherwise;
Denition 3. Let Tp be a fuzzy triple set for an image p. Then it is de4ned as
C(452 − (x − b0 )2 )k where b0 = 0 if 06x645; %d0 (x) = b0 = 360 if 45 × 76x645 × 8; 0 otherwise:
Tp = {t = fNAME (oi ); fNAME (oj ); rij where oi ; oj ∈Op and rij ∈D ∪ [0; 359]}: Note that we call it fuzzy triple because the object names participating in the triple can be speci4ed by fuzzy terms. As will be seen in Section 4, it is the fuzzy term that makes the concept-based match possible. On the other hand, the ordered list of direction descriptors, D is needed to well de4ne the corresponding k-weight function for each direction in D. Along with the order, the corresponding k-weight function interprets some angle into one of the eight directions with some degree, when the angle is given as a spatial relationship. The next section is devoted to the development of this function.
3. k-weight function as a membership function of directions To support an inexact match between directions, we introduce k-weight functions, which are symmetric but not always convex. They are used as the membership functions of the fuzzy subsets de4ning the eight directions in terms of the numeric angle degree. A k-weight function [14] is de4ned as f(x; k) = C(a2 − (x − b)2 )k
for |x − b|6a;
where k is positive and C, a¿0 and b are constants. For each direction di , i = 1; : : : ; 8; the following membership function is provided, where east corresponds to 0◦ (i.e., %east (0) = 1) and the angle increases counterclockwise – for example, northeast, and north correspond to 45◦ and 90◦ , respectively. Denition 4. Let b0 = 0 or 360 and let bi = 45×i, i = 1; : : : ; 7. Then membership functions %di for di ∈ D,
For the normalization, C is set as C = 1=2025k for real number k. Since the shape of the functions, %di (x), i = 0; : : : ; 7, is subject to the k value and the retrieval eLectiveness of our framework may depend on the characteristic of the shapes, we need to analyze the versions of each of the functions, generated along with the k value. For example, consider the following versions of %northeast (x) for k = 1; 2; 2:5 and 3. At 4rst, all of the three non-convex versions are obviously superior to the convex one with regard to the precision of matching. In other words, they guarantee high precision by making themselves sharper especially in their boundary angle areas. For example (Fig. 3) at some point x near 45◦ , each value of % east (x) and % north (x) is drastically lower than that of % northeast (x), while % northeast (x) is signi4cantly lower than each of them near 0◦ and 90◦ , respectively (see Fig. 4). However, they may in turn compromise recall due to their relatively low membership values in comparison with the convex one. For example, given a non-convex version obtained by k = 2; an image indexed by table; chair; 20◦ may not be retrieved as an answer of table; chair; northeast, since % northeast (20◦ ) = 0:48¡& for a threshold value & = 0:5. The threshold value is the degree of relevance of results to be accepted as answers. On the other hand, the version of k = 2:5 has an interesting property that the sum of % northeast (x) and either of its neighbor functions, i.e., % east (x) and % north (x) at the corresponding angle range is approximately 1. For example, when % northeast (20◦ ) = 0:39; % east (20◦ ) = 0:61. It may lead to the highest precision since it exactly coincides with our intuition. However, since its recall is even worse than that of the version of k = 2, the latter version is our choice as a trade-oL in this paper. Nevertheless, it is not clear
J.D. Yang / Fuzzy Sets and Systems 121 (2001) 459–470
463
Fig. 3. The four versions of %northeast (x) for k = 1; 2; 2:5; 3.
Fig. 4. Membership functions for eight directions.
which one is the best choice, since which one counts between precision and recall may largely depend on the characteristics of application domains. We now get the following eight direction functions drawn by the version of k = 2. Though the functions leave room for improvement, we avoid further detailed discussion here, since developing functions to maximize retrieval eLectiveness is far from our concern. They are introduced as an example of designing a membership function to enumerate compatibility between a direction and the corresponding angle degrees. 4. Exploiting thesauri for a concept-based match A thesaurus is mainly used for replacing a term with its broader terms (BTs) or narrower terms (NTs),
when the result of a query containing it is unsatisfactory. However, the thesaurus is adopted in our framework for a quite diLerent purpose. Rather than suggesting such BT=NT terms, our thesaurus evaluates the conceptual distance between a term and its BT=NTs, interpreting it like a variable or a template. The evaluation is made by the integrated membership functions of fuzzy terms (or fuzzy sets) in the thesaurus, which has a hierarchical yet nested structure. Fig. 5 shows the structure represented by a fuzzy graph specifying fuzzy membership values. Terms in leaf nodes denote crisp terms in D and the other ones represent fuzzy terms, each taking lower level fuzzy terms as its members with degrees speci4ed on the corresponding edges. Any degree between them is assumed as 0 if unspeci4ed. For example, “furniture” is a fuzzy term taking “table”, “home appliance”, “audio” and “chair” as its
464
J.D. Yang / Fuzzy Sets and Systems 121 (2001) 459–470
Fig. 5. Membership hierarchy of fuzzy terms.
members (or instances), and each of them is in turn a fuzzy term. So, “furniture” is a fuzzy set of second order. The instances of “home appliance”, i.e., “TV”, “radio” and “video” are crisp ones. The corresponding membership functions are therefore given as furniture = {0:9=table; 0:8=home appliance; 0:91=audio; 0:9=chair}, “home appliance” = {0:97=TV; 0:8=radio; 0:92=video}, and so on. There may arise a question on how the hierarchical thesaurus shown in Fig. 5 can be constructed. Until Section 6, we defer implementation issues including the thesaurus construction. For using the thesaurus to match all the triples conceptually related with each other, we may need a composed membership function for obtaining membership values between two terms indirectly connected in the hierarchical thesaurus. For example, we may want to know the conceptual degree of closeness between furniture and radio. De4nition 5 provides a function for computing such a degree. Denition 5 (Yang and Lee [16]). Let F be a fuzzy term. Then %F (c) = max(min(%ai (c); %F (ai )));
∀c ∈ U;
where U is a set of terms in the thesaurus and ai is a fuzzy term in F. For a fuzzy term or a crisp term F, %F (F) = 1. Since ai in De4nition 5 is a fuzzy term, this definition can also be applied to fuzzy terms which have more than two level distances. In other words, %ai (c); ∀c ∈ U can be calculated by the recursive application of De4nition 5 even if c is not directly connected with ai .
Note that the degree of conceptual closeness between two terms not related with any edge cannot be obtained by De4nition 5. For example, it does not de4ne an edge of certain degree between home appliance and audio. Though a similarity relation may be used to quantify such degree, which would label the additional edge, we do not add them for not compromising the simplicity of our thesaurus. Example 1. The degree of conceptual closeness between furniture and radio is calculated by %furniture (radio) = max(min(%home
appliance (radio);
%furniture (home appliance)); min(%audio (radio); %furniture (audio))) = max(min(0:8; 0:8); min(0:7; 0:91)) = 0:8: Our thesaurus has the limitation that it cannot capture any composite concept formed by the aggregation of terms along with their spatial relationships. For example, a concept “study room” may be an aggregation of chairs, table and bookshelf, which have a speci4c spatial relationship with each other. The literature [17] shows a way to solve the limitation by introducing another thesaurus called triple thesaurus. It consists of rules to detect such a concept by using the combination of more than one triple – the logical connective of the triples. For example, “study room” may be de4ned by chairs; table; around ∧ chairs; bookshelf ; above. However, since the approach cannot, in turn, support the fuzzy matching, it would require further research to remedy the limitation by involving the triple thesaurus in the fuzzy matching.
J.D. Yang / Fuzzy Sets and Systems 121 (2001) 459–470
We are now in a position to formally de4ne our image retrieval system. Denition 6. An image information retrieval system I IR is de4ned as follows: I IR = IDB; Tr; M; In; where IDB is the set of all images, Tr is a set of all Tp s for p ∈ IDB, M is a set of fuzzy membership functions including the fuzzy term thesaurus and kweight functions, and In is an inverted 4le. 5. Query evaluation Evaluation of user queries for retrieving images involves translating them into the equivalent query triples and then matching the triples with those stored in the inverted 4le. Two kinds of matching are performed in this process: inexact matching and concept-based matching. Inexact matching between the relative direction speci4ed in each of the query triples and those of the stored triples is performed by the k-weight functions, whereas concept-based matching is made between terms. For a successful conceptbased matching, fuzzy terms in the query triples need to be well de4ned. In this paper, we assume that the queries are formulated in terms of controlled fuzzy terms – fuzzy terms in the thesaurus. The following de4nition provides primitives for formulating queries. Denition 7. A fuzzy triple constitutes a mono query Qm. Denition 8. The disjunction of mono query Qmj ; j = 1; : : : ; s; is called disjunctive query factor Q. It is given below that Q=
s
Qmj :
j=1
Denition 9. Let Qi ; i = 1; : : : ; n; be the disjunctive query factors. Then a conjunctive normal query Q (or simply, query) is de4ned as follows: Q=
n i=1
Qi :
465
Denition 10. I IR| −& Q(p) denotes that an image p in I IR satis4es Q with a degree & ∈ [0; 1]. Additionally, I IR| +& Q(p) if I IR| −& Q(p) does not hold. Denition 11. Q = {p; & | I IR| −& Q(p) and &¿& for all I IR| −& Q(p)} is a set of images satisfying Q at least at a level &. Since an image p is indexed by triples in Tp , I IR| −& Q(p) involves matching of the triples with those of queries. The following two de4nitions are needed to specify the degree of the triple matching given in De4nition 14. Denition 12. For two terms, n; n ∈ N; n =& n
iL %n (n ) = & or %n (n) = &; 0¡&61:
n = n
if both %n (n ) = 0 and %n (n) = 0:
Note that n =1 n ; if n = n ; since %n (n ) = 1 or %n (n) = 1. Denition 13. Let t = n1 ; n2 ; r12 and t = n1 ; n2 ; r12 be triples satisfying n1 =&1 n1 ; n2 =&2 n2 ; and . Then r12 =&3 r12
t =& t
iL
min(&1 ; &2 ; &3 ) = &:
Denition 14. Let Qm be a mono query and t ∈ Tp . Then p; & ∈ Qm
iL
Qm =& t :
Example 2. Let a query be given as “search for images where an audio locates between east and northeast sides of furniture with an angle of 20◦ ”. Then it is converted into Qm1 = furniture; audio; 20◦ . Now, for the stored triple, t = fNAME (r) = radio; fNAME (s) = speaker; northeast in Fig. 2, since %furniture (radio) = 0:8
(see Example 1);
%audio (speaker) = 0:86 (see Fig: 5) and %northeast (20◦ ) =
1 {452 − (20◦ − 45)2 }2 20252
= 0:48; p1 ; 0:48 ∈ Qm1
with & = min(0:8; 0:86; 0:48).
466
J.D. Yang / Fuzzy Sets and Systems 121 (2001) 459–470
DiLerently from Example 2, the third component of the query triple may be speci4ed as an element of D to match a stored triple containing a numeric value as its relative direction. For example, the two triples may be given as Qm1 = furniture; audio; northeast and t = radio; speaker; 20◦ respectively. Apparently, the semantic interpretation of inexact matching between Qm1 and t is not the same as that between Qm1 and t . For example, t could be a more concrete answer of Qm1 than t could be of Qm1 . For simplicity, De4nition 14 ssumes that evaluating this query yields the same result, i.e., p1 ; 0:48 ∈ Qm1 in spite of its semantic diLerence. We now develop a query evaluation for a conjunctive normal query Q. For implementing ∨=∧ in Q, various T -conorms and T -norms operators may be available including max=min, probabilistic sum=product and bounded sum=product [9]. Among them, we adopt the standard max=min operator for conceptual simplicity. s
Lemma 1. Let Q = Then for all p ∈ IDB;
i=1
Qmi be a disjunctive query.
p; &1 ∈ Qm1 ∨ p; &2 ∈ Qm2 ∨ · · · ∨ p; &s ∈ Qms ⇒ p; & ∈ Q; where & = max(&1 ; &2 ; : : : ; &s )¿0. Proof. Suppose that p; &1 ∈ Qm1 ∨ p; &2 ∈ Qm2 ∨ · · · ∨ p; &s ∈ Qms . Then I IR| −&1 Qm1 (p) ∨ I IR| −&2 Qm2 (p) ∨ · · · ∨ I IR| −&s Qms (p) satisfying &i ¿&i for all I IR| −&i Qmi (p); i = 1; : : : ; s by De4nition 11. Consider the case where exactly one of I IR| −&i Qmi (p), i = 1; 2; : : : ; s; say I IR| −&k Qmk (p), 16k 6s holds. Then, obviously, p; &k ∈ Q with &k = max(&1 ; &2 ; : : : ; &k ; : : : ; &s ); since &k ¿0 and &i = 0 for all 16i = k6s. Consider next the other possible case that more than one of I IR| −&i Qmi (p); i = 1; 2; : : : ; s hold. This case
may be generalized as I IR| −&i1 Qmi1 (p) ∧ I IR| −&i2 Qmi2 (p) ∧ · · · ∧ I IR| −&ik Qmik (p)
(1)
with &ik ¿0 and I IR| −&i Qmi (p); &i = 0 for i = ik , 16i1 ; i2 ; : : : ; ik 6s. Note that if all &i ¿0 for 16i6s, (1) would be reduced to I IR| −&1 Qm1 (p) ∧ I IR| −&2 Qm2 (p) ∧ · · · ∧ I IR| −&s Qms (p): Now that Q is the disjunction of Qmi ; i = 1; 2; : : : ; s; (1) can be rephrased as I IR| −&i1 Q(p) ∧ I IR| −&i2 Q(p) ∧ · · · ∧ I IR| −&ik Q(p): Hence, by De4nition 11, we conclude that p; & ∈ Q with & = max(&i1 ; &i2 ; : : : ; &ik ). Since any one of the two cases is always applied to p ∈ IDB and p; & ∈ Q with & = max(&1 ; &2 ; : : : ; &s ) holds regardless of the two cases, this lemma can be proved. Example 3. Let the query Q1 be “search for images where an audio is exactly on the furniture or between east and northeast sides of furniture with 20◦ ”. Then Q1 = Qm1 ∨ Qm2 where Qm2 = furniture; audio; north. Now, the answer is p1 ; 0:86 ∈ Q since p1 ; 0:48 ∈ Qm1 (see Example 2) and p1 ; 0:86 ∈ Qm2 from %furniture (working table) = min{0:95; 0:9} = 0:9 and %audio (speaker) = 0:86. n Lemma 2. Let Q = i=1 Qi be a conjunctive query. Then for all p ∈ IDB; p; &1 ∈ Q1 ∧ p; &2 ∈ Q2 ∧ · · · ∧ p; &n ∈ Qn ⇒ p; & ∈ Q; where & = min(&1 ; &2 ; : : : ; &n ). Proof. Suppose that p; &1 ∈ Q1 ∧ p; &2 ∈ Q2 ∧ · · · ∧ p; &n ∈ Qn . Then I IR| −&1 Q1 (p) ∧ I IR| −&2 Q2 (p) ∧ · · · ∧ I IR| −&n Qn (p)
J.D. Yang / Fuzzy Sets and Systems 121 (2001) 459–470
satisfying &i ¿&i for all I IR| −&i Qi (p); i = 1; : : : ; n. So, we get I IR| −& Q1 (p) ∧ Q2 (p) ∧ · · · ∧ Qn (p) ⇔ I IR| −& Q(p) with & = min(&1 ; &2 ; : : : ; &n ). According to De4nition 11, for proving p; & ∈ Q, it is suHcient to show that I IR| −& Q(p) for & ¿& is not possible. If & ¿&, there would exist &k such that & ¿&k , 16k6n and I IR| −& Qk (p), since & = min(&1 ; &2 ; : : : ; &n ). But, I IR| +& Qk (p) by De4nition 11. Hence, I IR| +& Q(p), which proves this lemma. n Theorem 1. Let Q(p) = i=1 Qi (p); and Qi (p) = si j=1 Qmij (p); for p ∈ IDB be a query. Then p; &∈Q
with & = min(max(&11 ; : : : ; &1s1 ); : : : ; max(&i1 ; : : : ; &isi ); : : : ; max(&n1 ; : : : ; &nsn ))
such that for i; 16i6n; (p; &ij ) ∈ Qmij ; j = 1; : : : ; si . Proof. This theorem can be easily proved by Lemmas 1 and 2. We omit its proof. Example 4. Let the query Q be Q1 ∧ Q2 where Q2 = furniture; clock; north. Then, since p1 ; 0:86 ∈ Q1 (see Example 3) and p1 ; 0:8 ∈ 0Q2 , we get p1 ; 0:8 ∈ Q. 6. Prototype implementation In this section, we provide a prototypal system to demonstrate the feasibility of our model. It was implemented on top of Sun solaris 2.5.1 by using C++ and X=Motif. It consists of four modules; thesaurus editor, fuzzy term matcher, image indexer and fuzzy triple query processor. The thesaurus editor is used for facilitating thesaurus construction. For example, the fuzzy term hierarchy in Fig. 6 can be directly constructed with this editor. With this term thesaurus, the fuzzy term matcher performs a concept-based match between fuzzy terms in a query and their counterparts in fuzzy triples of
467
target images. It also calculates the conceptual distance between them according to De4nition 5 and the associated k-weight functions in De4nition 4. Recall that the queries are assumed to use fuzzy terms in the thesaurus. The same is true in the fuzzy triples, which are generated from the image indexer. Fig. 7 shows the image indexer manually labeling objects in an image. The reason we perform manual labeling is to avoid compromising the recall and precision of our method due to the poor object recognition ratio of the state-of-the-art image processing technologies. Unfortunately, it is widely conceived that object labeling without human intervention would not be made in the near future. In Fig. 7, a dialog box is displayed awaiting a domain knowledge engineer to enter the name of a pointed object in an image whenever he or she draws a minimum bounding rectangle (MBR) surrounding the object. Once the needed names are entered, for example, “television” and “table”, the corresponding fuzzy triples are automatically generated. The triple viewer window shows them. A spatial relationship between a pair of objects is speci4ed by either an angle degree or one of the eight directions. It is calculated by drawing lines between the centers of their MBRs. In labeling objects, a term mismatch problem may be addressed due to the lack of the other relationships except BT=NT in our thesaurus. For example, what if a user tries to retrieve images containing TV instead of the term “television”. It may be left as further research to cover the relationships such as “synonym of ” and “part of ” without compromising the simplicity of our framework. Finally, Fig. 8 is the interface of the fuzzy triple query processor showing the answer images, when the evaluated query is home appliance; furniture; northeast ∨ audio; furniture; 82◦ . You can see images where “TV” is captured as a home appliance, “speaker” as an audio, “table” as a furniture, etc. As mentioned earlier, the fuzzy term matcher is repetitively called by this query processor whenever any fuzzy term matching is needed in the query triples – for example, when “home appliance” needs to match any other term, say, TV, with the evaluation of their conceptual closeness. Practically, such a term matching may imply the reformulation of queries. For each reformulated query, the query processor would search for the inverted 4les to get
468
J.D. Yang / Fuzzy Sets and Systems 121 (2001) 459–470
Fig. 6. Term thesaurus.
Fig. 7. Image indexer.
answers, assigning appropriate ranking according to Theorem 1. 7. Conclusions In this paper, we proposed a new data type called fuzzy triple capable of matching images based on
concepts as well as indexing them. k-weight function was introduced for the precise speci4cation of spatial relationships with angle degrees. We developed a way of query evaluation supporting such a conceptbased match and the functions. To demonstrate the feasibility of our model, a prototype system was also provided. The contribution of this paper is twofold.
J.D. Yang / Fuzzy Sets and Systems 121 (2001) 459–470
469
Fig. 8. Fuzzy triple query processor.
One is to show a systematic way of incorporating a concept-based match facility into the conventional triple framework, making its spatial relationship Kexible. The other is to provide a formal speci4cation of a query evaluation capable of greatly enhancing recall in comparison with the other ones based on the triple framework. Since the fuzzy triple is a powerful yet conceptually simple framework in its matching capability, we expect that it would be easily applied as well to the other content-based image retrieval systems which do not use the triples. For example, it may serve to enhance the functionality of the systems such as QBIC [4] or Photobook [13], if it is used to 4lter images conceptually not relevant to user queries, before retrieving images based on colors, shapes and texture or vice versa.
Acknowledgements Thanks to Dr. S.S. Chang and anonymous referees for their comments. Especially, the author is deeply grateful to one of the referees for his great contribution to this paper.
References [1] C.C. Chang, Spatial match retrieval of symbolic pictures, J. Inform. Sci. Eng. 7 (1991) 405–422. [2] C.C. Chang, S.Y. Lee, Retrieval of similar pictures on pictorial databases, Pattern Recognition 24 (1991) 675 – 680. [3] S.K. Chang, Iconic indexing by 2D strings, in: S.K. Chang, E. Jungert, G. Tortora (Eds.), Intelligent Image Database Systems, Series on Software Engineering and Knowledge Engineering, Vol. 5, World Scienti4c, Singapore, 1996. [4] S.K. Chang, C.W. Yan, D.C. Dimitrof, T. Arndt, An intelligent image database systems, IEEE Trans. Software Eng. 14 (5) (1988) 681–688. [5] C.R. Cook, R. Oldehoeft, A letter-oriented minimal perfect hashing function, ACM SIGplan Notices 17 (1982) 18–27. [6] M. Flickner et al., Query by image and video content: the QBIC system, Comput. J. 28 (9) (1995) 23–32. [7] E. Jungert, S.K-. Chang, An image algebra for pictorial data manipulation, CVGIP: Image Understanding 58 (2) (1993) 147–160. [8] W. Kim, A model of queries for object-oriented databases, Proceedings of International Conference on Very Large Data Bases, August, 1989. [9] G.J. Klir, Fuzzy Sets and Fuzzy Logic: Theory and Applications, Prentice-Hall, Englewood CliLs, NJ, 1995. [10] S.Y. Lee, F.J. Hsu, Spatial reasoning and similarity retrieval of images using 2D C-string knowledge representation, Pattern Recognition 25 (3) (1992) 305–318. [11] S.Y. Lee, F.J. Hsu, 2D C-string: a new spatial knowledge representation for image database systems, Pattern Recognition 23 (1990) 1077–1088.
470
J.D. Yang / Fuzzy Sets and Systems 121 (2001) 459–470
[12] D. Papadias, T. Sellis, The semantics of relations in 2D space using representative points: spatial indexes, European Conference on Spatial Information Theory, 1993. [13] A. Pentland, R.W. Picard, S. SclaroL, Photobook: tools for content-based manipulation of image databases, Internat. J. Comput. Vision (1996). [14] M.P. Wand, M.C. Jones, Kernel Smoothing, Chapman & Hall, London, 1995.
[15] J.D. Yang, F MP: a fuzzy match framework for rule-based programming, Data Knowledge Eng. 24 (1997) 183–203. [16] J.D. Yang, D.G. Lee, Incorporating concept-based match into fuzzy production rules, Inform. Sci. 104 (1998) 213–239. [17] J.D. Yang, H.J. Yang, A formal framework for image indexing with triples: toward a concept-based image retrieval, Internat. J. Intell. Systems 14 (6) (1998) (http:==jiri.chonbuk.ac.kr=∼jdyang) 603–622.