Fuzzy Sets and Systems 117 (2001) 195–201
www.elsevier.com/locate/fss
On extended fuzzy relational database model with proximity relations Supriya Kumar De, Ranjit Biswas ∗ , Akhil Ranjan Roy Department of Mathematics, Indian Institute of Technology, Kharagpur – 721302, West Bengal, India Received November 1997; received in revised form October 1998
Abstract In this paper, Shenoi and Melton’s model of fuzzy relational database is considered. Proximity relation, being an important tool of the model, is characterized theoretically. An optimal -distribution in [0, 1] and consequently the domain partition c 2001 Elsevier Science B.V. All rights reserved. are de ned, and studied with examples on fuzzy queries. Keywords: Fuzzy relational database; Proximity relation; p-partition; Optimal partition; Node point; Lower tolerance (lt); Upper tolerance (ut)
1. Introduction The organization of relational databases introduced by Codd [7] is based on set theory and the theory of relation. Zadeh’s fuzzy set theory is a generalization of crisp set theory whereas the concept of fuzzy relation generalizes the concept of relation. Consequently, the fuzzy relational model of Buckles and Petry [1] incorporating fuzzy information in the relational database called by fuzzy relational database is a generalization of classical relational database. A fuzzy relational database is de ned as a set of relations where each relation is a set of tuples. If ti represents the ith tuple, it has the form (di1 ; di2 ; : : : ; dim ). In an ordinary database, each component, dij , of the tuple is an element of the corresponding scalar (or discrete nite) domain Dj , i.e., dij ∈ Dj . The key departure of the fuzzy model is that tuple components are not con ned to single elements drawn from scalar domain instead dij ⊆ Dj (dij 6= ∅). ∗ Corresponding author. E-mail address:
[email protected] (R. Biswas).
An example of a nite scalar domain is {poor, average, good, excellent}. The domain values of a particular tuple may be single scalars or numbers (including null) or a sequence of scalars or numbers. For example: STUDENT
APTITUDE
AGE
{Tom}
{Average, Good}
{19, 20, 21}
The fuzzy relational model of Buckles and Petry [1] is based on the notion of similarity relation [19] for each domain in the fuzzy database. But there is a drawback in using similarity relation in fuzzy relational databases as pointed out in [16] by Shenoi and Melton. Considering all the “nice” properties of Buckles and Petry’s model, Shenoi and Melton generalized the fuzzy relational model of Buckles and Petry replacing similarity relation with proximity relation. In the new model of fuzzy relational database by Shenoi and Melton, partitioning the scalar domain
c 2001 Elsevier Science B.V. All rights reserved. 0165-0114/01/$ - see front matter PII: S 0 1 6 5 - 0 1 1 4 ( 9 8 ) 0 0 4 3 8 - 2
196
S.K. De et al. / Fuzzy Sets and Systems 117 (2001) 195–201
into equivalence classes or disjoint clusters is made with the help of proximity relation. The partitioning of domains is an important job because it helps to de ne the notion of redundancy, to maintain the consistency of representation of relation and to preserve the wellde nedness of relational algebra. In the present paper, we consider Shenoi and Melton’s model of fuzzy relational database. Since the proximity relation is an important tool in Shenoi and Melton’s fuzzy database we study the proximity relation in more details in Section 3. In Sections 4 and 5, we make some analysis of the notion of Shenoi and Melton’s model focussing our main attention on level values (belonging to [0, 1]) while making fuzzy queries on a fuzzy database. 2. Preliminaries We present here some basic preliminaries for the progress of our works. A fuzzy database relation, R, is a subset of the set of cross product 2D1 × 2D2 × · · · × 2Dm ;
where 2Dj = 2Dj − ∅:
Let R ⊆ 2D1 × 2D2 × · · · × 2Dm be a fuzzy database relation. A fuzzy tuple t (with respect to R) is an element of R. Let ti = (di1 ; di2 ; : : : ; dim ) be a fuzzy tuple. An interpretation of t, is a tuple  = (a1 ; a2 ; : : : ; am ) where aj ∈ dij for each domain Dj . Two tuples ti and tj are redundant if and only if they possess the identical interpretation. Lemma 2.1 (Buckles and Petry [1]). Let R ⊆ 2D1 × 2D2 × · · · × 2Dm be a fuzzy database relation generated by merging tuples according to the “level” constraints on similarity relations. If T represents the set of interpretations of a fuzzy tuple t, then Ti ∩ Tj = ∅
whenever ti 6= tj :
Deÿnition 2.1 (Buckles and Petry [1]). If sj be a proximity relation on Dj , then given an ∈ [0; 1], two elements x; z ∈ Dj are -similar (denoted by xS z) if and only if sj (x; z)¿, and are said to be -proximate (denoted by xS+ z) if and only if either xS z or there exists a sequence y1 , y2 ; : : : ; yr ∈ Dj , such that xS y1 S y2 S : : : S yr S z:
Lemma 2.2 (Shenoi and Melton [16]). If s : D × D → [0; 1] is a proximity relation, then S+ is an equivalence relation, i.e., it partitions D, for any ∈ [0; 1]. Moreover, if s : D × D → [0; 1] is a similarity relation, then S is an equivalence relation for any ∈ [0; 1]. Lemma 2.3 (Shenoi and Melton [16]). Let s : D × D → [0; 1] be a similarity relation, and let ∈ [0; 1] be ÿxed. C ⊆ D is an equivalence class in the partition determined by S with respect to s if and only if C is a maximal subset obtained by merging elements from D that satisfy the constraint min [s(x; y)]¿:
x;y∈C
Lemma 2.4 (Shenoi and Melton [16]). If s : D × D → [0; 1] is a similarity relation, then for any ∈ [0; 1]; S and S+ generate identical equivalence classes, i.e., generate equal partitions of D. Deÿnition 2.2 (Dubois and Prade [8]). The max–min composition of a proximity relation s on a scalar domain D is denoted by s2 and is given by s2 (x; z) = max{min[s(x; y); s(y; z)]}: y∈D
Lemma 2.4 (Shenoi and Melton [16]). If s : D × D → [0; 1] is a proximity relation, then the equivalence classes (disjoint partition) obtained on D with S+ are identical to the equivalence classes (disjoint partition) obtained on D with S on S ∞ : D × D → [0; 1].
3. On proximity relations We see that proximity relation is an important tool for Shenoi and Melton’s fuzzy database. In this section we make some characterizations of proximity relation. A fuzzy relation on a domain D is a mapping f : D × D → [0; 1]. Let f and g be two fuzzy relations on a domain D. The max–min composition = f ◦ g is a fuzzy relation on D de ned as (x; z) =
_ y∈D
{f(x; y) ∩ g(y; z)}:
S.K. De et al. / Fuzzy Sets and Systems 117 (2001) 195–201
197
This induces a new type of relation between the elements x and z with the help of two given relations.
4. The optimal p-partition and the corresponding domain partitions
Deÿnition 3.1. A fuzzy relation f on D is said to be transitive if f2 ⊆ f, where f2 = f ◦ f.
Let D be a nite domain, and p be a proximity relation on D. Arrange the relational values p(x; y) in ascending order of magnitude as below:
Deÿnition 3.2. The transitive closure of the fuzzy relation f is the relation f de ned by
m1 ¡m2 ¡ · · · ¡mk−1 ¡mk = 1:
f =
∞ [
fi :
i=1
Deÿnition 3.3. A fuzzy relation f on D is said to be a proximity relation if ∀x; y ∈ D (i) f(x; x) = 1 (re exivity). (ii) f(x; y) = f(y; x) (symmetry). Deÿnition 3.4. A fuzzy relation on D is said to be a fuzzy equivalence relation if f is re exive, symmetric and transitive. Theorem 3.1. If p is a fuzzy proximity relation on D and e is a fuzzy equivalence relation on D such that p ⊆ e, then p ⊂ e where p is a transitive closure of p. Proof. p =
S∞
n=1
pn ⊆
S∞
n=1
en or, p ⊆ e = e.
Theorem 3.2. If pi ; i = 1; 2; : : : ; n are proximity relations on domain D then n [ i=1
pi ;
n \
pi ;
pi (∀i)
i=1
are also proximity relation on D. Proof. Straightforward. Theorem 3.3. The transitive closure p of a proximity relation p is the minimal fuzzy equivalence relation containing p. Proof. Clearly p is a proximity relation. Also, p is transitive. Thus p is a fuzzy equivalence relation containing p. The minimality is obvious.
The subintervals (mr ; mr+1 ], including [0; m1 ] are called p-partition subinterval denoted by ( [0; m1 ] if m1 6= 0; I1 = {0} if m1 = 0: I2 = (m1 ; m2 ]; I3 = (m2 ; m3 ]; ::: Ik = (mk−1 ; mk ]; The k numbers of clusters-distribution (partition of D) due to the equivalence relation p+ for each value of , are to be calculated, which are as below: (1) for ∈ I1 , partition of D is P1 , (2) for ∈ I2 , partition of D is P2 , ::: (k) for ∈ Ik , partition of D is Pk . Clearly, for a xed r (r = 1; 2; 3; : : : ; k − 1); ∀ ∈ Ir the -cuts of p are identical. ⇒ If cardinality of D; i.e., #(D) is c, then k6(c2 −c)=2 + 1. Now, if Pi = Pi+1 6= Pi+2 (where Pi−1 6= Pi ), merge Ii and Ii+1 ; else if Pi = Pi+1 = Pi+2 , merge Ii ; Ii+1 and Ii+2 and so on. Performing this way, we get an optimal distribution like below: (1) ∀ ∈ J1 , partition of D is Q1 , (2) ∀ ∈ J2 , partition of D is Q2 , ::: (m) ∀ ∈ Jm , partition of D is Qm , where Qi ’s are all distinct and J1 ; J2 ; : : : ; Jm form a disjoint partition of [0, 1], which we call by optimal p-partition or simply by optimal partition of [0, 1]. Clearly m6k. The end points of J1 ; J2 ; : : : ; Jm are called nodes of the optimal partition. The node points will play an important role in decision makings on choosing level values. Signi cance of the node points is studied in more details in the next section. Before that, we present an example showing the optimal partition and corresponding domain partition.
198
S.K. De et al. / Fuzzy Sets and Systems 117 (2001) 195–201
The p-partition subintervals are
Table 1 PHYSICAL CHARACTERISTICS relation NAME
HAIR COLOUR
BUILD
Albert Bob Charles David Eugene Frank Gary Henry Ivan James
Black Dark Brown Auburn Blond Dark Brown Red Bleached Blond Auburn Blond
Large Average Average Small Average Very small Very large Average Large Large
I1 = [0; 0:1];
I2 = (0:1; 0:2];
I3 = (0:2; 0:3];
I4 = (0:3; 0:4];
I5 = (0:4; 0:5];
I6 = (0:5; 0:6];
I7 = (0:6; 0:7];
I8 = (0:7; 0:8];
I9 = (0:8; 1]:
The corresponding partitions of the domain D due to the equivalence relation p+ are, respectively P1 = {BK; DB; A; R; BD; BC}; P2 = {BK; DB; A; R; BD; BC};
Table 2 Proximity relations for scalar domain Black (BK) Dark Brown (DB) Auburn (A) Red (R) Light Brown (LB) Blond (BD) Bleached (BC)
P3 = {BK; DB; A; R; BD; BC};
BK
DB
A
R
LB
Bd
Bc
1.0 0.8 0.6 0.5 0.4 0.3 0.1
0.8 1.0 0.7 0.6 0.6 0.5 0.2
0.6 0.7 1.0 0.8 0.7 0.4 0.3
0.5 0.6 0.8 1.0 0.7 0.5 0.4
0.4 0.6 0.7 0.7 1.0 0.7 0.5
0.3 0.5 0.4 0.5 0.7 1.0 0.8
0.1 0.2 0.3 0.4 0.5 0.8 1.0
HAIR COLOUR = {Black, Dark Brown, Auburn, Red, Light Brown, Blond, Bleached}.
Very Large (VL) Large (L) Average (A) Small (S) Very Small (VS)
VL
L
A
S
VS
1.0 0.8 0.5 0.3 0.1
0.8 1.0 0.6 0.4 0.2
0.5 0.6 1.0 0.6 0.4
0.3 0.4 0.6 1.0 0.7
0.1 0.2 0.4 0.7 1.0
BUILD = {Very Large, Large, Average, Small, Very Small}.
Example 4.1. Consider the example studied in Section 4 by Shenoi and Melton [16]. We produce below the problem they have studied: Table 1 is a representation of PHYSICAL CHARACTERISTIC relation; each tuple has an information about the name of the individual, his hair colour, and build. The domain values and the associated proximity relations are presented in Table 2. Let us focus our attention on the domain “HAIR COLOUR”. The p-partition of [0, 1] is given by
P4 = {BK; DB; A; R; BD; BC}; P5 = {BK; DB; A; R; BD; BC}; P6 = {BK; DB; A; R}{BD; BC}; P7 = {BK; DB; A; R}{BD; BC}; P8 = {BK; DB}{A; R}{BD; BC}; and P9 = {BK}{DB}{A}{R}{BD}{BC}: We see that P1 = P2 = P3 = P4 = P5 and P6 = P7 . Thus the optimal partition of [0, 1] is J1 = I1 ∪ I2 ∪ I3 ∪ I4 ∪ I5 = [0; 0:5]; J2 = I6 ∪ I7 = (0:5; 0:7]; J3 = I8 = (0:7; 0:8]; J4 = I9 = (0:8; 1]; and the corresponding domain partitions are Q1 : {BK; DB; A; R; BD; BC}; Q2 : {BK; DB; A; R}{BD; BC};
0:1¡0:2¡0:3¡0:4¡0:5¡0:6¡0:7¡0:8¡0:9¡1:0:
Q3 : {BK; DB}{A; R}{BD; BC};
We see that k = 9¡(c2 − c)=2 + 1:
Q4 : {BK}{DB}{A}{R}{BD}{BC}:
S.K. De et al. / Fuzzy Sets and Systems 117 (2001) 195–201
199
5. Utility of optimal partitioning of [0, 1] and signiÿcance of node points
Deÿnition 5.2. If is a non-node point then the lower tolerance of is
If (6= 0) is a node point, then we de ne − as below: ( (p ; ] if p 6= 0; − = − = [0; ] if p = 0;
lt() = ( − p )
where p is the previous node of . If (6= 1) is a node point, then we de ne + as below: ( (; n ] if 6= 0; + = [0; n ] if = 0; where n is the next node of . Clearly + = n− and − = p+ . If ∈ − then 6∈ + . Deÿnition 5.1. Let be a node point. The lower tolerance of is denoted by lt() and is de ned by lt() =
length of − ; length of [0; 1]
which may be viewed to be equal to (100 × length of − )% of [0; 1]: The upper tolerance of is denoted by ut() and is de ned by length of + ; ut() = length of [0; 1] which may be viewed to be equal to (100 × length of + )% of [0, 1]. We see that the node points play an important role in decision making while choosing level values for queries because of two major factors: 1. Variable amounts of tolerances lt or ut, for different values of (for some values of ), either − or + or both share a high part of [0, 1]. In the above studied example, lt(0.5) = 50%, ut(0.5) = 20%. 2. For any 1 ∈ − and for any 2 ∈ + the corresponding partitions of the domain D are signi cantly dierent, whatever small be the value of 2 − 1 .
= [100 × ( − p )]% of [0; 1] and the upper tolerance of is ut() = (n − ) = [100 × (n − )]% of [0; 1]: It can be seen that if the chosen level value is a non-node point then a decrease of amount lt() (at most) or an increase of amount ut() (at most) of the level value does not change the answers to the fuzzy queries. However, if the chosen level value is a node point then an increase of by any amount (whatever small it may be) will eect the answers to the fuzzy queries, whereas a decrease of the level value within the range of lt() will not make any eect. Let us observe the example studied by Shenoi and Melton [16] focussing our main interest on the domain of “HAIR COLOUR”. Example 5.1. In the style of Buckles and Petry [1] we can ask Q(Blond; Large) [Who is] more-or-less blond AND [has a] more-or-less large [build] The bracketed words are added for readability. This query can be translated in the following form in the fuzzy relational algebra: (Project (Select (PHYSICAL CHARACTERISTICS) where HAIR COLOUR = “BLOND”; BUILD = “LARGE” with LEVEL(HAIR COLOUR) = 0:8 LEVEL(BUILD) = 0:8) with LEVEL(NAME) = 0:0; LEVEL(HAIR COLOUR) = 0:8; LEVEL(BUILD) = 0:8 giving LIKELY ARSONISTS):
200
S.K. De et al. / Fuzzy Sets and Systems 117 (2001) 195–201
The query gives rise to the following relation: LIKELY ARSONISTS NAME
HAIR COLOUR
BUILD
{Gary, James}
{Blond, Bleached}
{Very Large, Large}
If instead of LEVEL(HAIR COLOUR) = 0:8 ∈0:8− we take LEVEL(HAIR COLOUR) = 0:801 ∈ 0:8+ then the answer will become as below:
change of it within the ranges of tolerance will not effect answers to the fuzzy queries and thus the accuracy in the answers may be understood upto certain extent. In case the level value is a node point a decrement of within the range to left tolerance would not eect the answers to the query, but any small increment will eect the answers. This result reminds the decision maker that if the level value chosen is a node point , it should be his maximum choice on the value of . However, if the chosen level value is a non-node point, an amount of error, if exists in the choice, will not eect the answers to the fuzzy queries.
LIKELY ARSONISTS NAME
HAIR COLOUR
BUILD
{James}
{Blond}
{Large}
which means that a dierent answers we get for a change of amount equal to 0.001 in the level values. This is because 0.8 is a node point. However, if it is 0.9 (instead of 0.8) which is a non-node point, an error in it if any of amount 0.09 (i.e., upto 9% of the length of [0, 1]) will not make any dierence to the answers.
6. Conclusion The fuzzy relational model de ned by Buckle and Petry [1], based on the notion of similarity relation, is a generalization of classical relational database. Shenoi and Melton [16] replaced the similarity relation with proximity relation and framed a model of extended fuzzy relational database. This paper is a work on Shenoi and Melton’s model. Some characterization of proximity relation is made in Section 3. If p is a proximity relation on a domain D, we de ne p-partition of [0, 1] and a sequence of partition of D corresponding to the optimal p-partition of [0, 1]. The end points of this optimal p-partition are called node points. We see that the node points play an important role in decision making while choosing level values for fuzzy queries. For every level value ∈ [0, 1] we have de ned left tolerance (lt) and a right tolerance (rt) of . The signi cance of left tolerance lt() and right tolerance rt(), where is not a node point, lies upon the fact that once the level point is chosen to be , a
7. For further reading The following references are also of interest to the reader: [2 – 6,9 – 15,17,18]. Acknowledgements The authors are grateful to the referees for their valuable suggestions which helped in modifying the rst version of this paper.
References [1] B.P. Buckles, F.E. Petry, A fuzzy representation of data for relational databases, Fuzzy Sets and Systems 7 (3) (1982) 213–226. [2] B.P. Buckles, F.E. Petry, Fuzzy databases and their applications, in: M.M. Gupta, E. Sanchez (Eds.), Fuzzy Information and Decision Processes, North-Holland, Amsterdam, 1982, pp. 361–371. [3] B.P. Buckles, F.E. Petry, Information-theoretic characterization of fuzzy relational databases, IEEE Trans. Systems Man Cybernet. 13 (1) (1983) 74–77. [4] B.P. Buckles, F.E. Petry, Extending the fuzzy database with fuzzy numbers, Inform. Sci. 34 (2) (1984) 145–155. [5] M.K. Chakraborty, M. Das, Studies of fuzzy relations over fuzzy subsets, Fuzzy Sets and Systems 9 (1983) 79–89. [6] M.K. Chakraborty, M. Das, On fuzzy equivalence I, Fuzzy Sets and Systems 11 (1983) 185–193. [7] F.E. Codd, A relational model of data for large share data banks, Comm. ACM 13 (1970) 377–387. [8] D. Dubois, H. Prade Fuzzy Sets and Systems: Theory and Applications, Academic Press, New York, 1980. [9] J.A. Goguen, L-fuzzy sets, J. Math. Anal. Appl. 18 (1967) 145–174.
S.K. De et al. / Fuzzy Sets and Systems 117 (2001) 195–201 [10] M.B. Gorzalzany, A method of inference in approximate reasoning based on interval-valued fuzzy sets, Fuzzy Sets and Systems 21 (1987) 1–17. [11] M.K. Roy, R. Biswas, I–V fuzzy relations and Sanchez’s approach for medical diagnosis, Fuzzy Sets and Systems 47 (1992) 35–38. [12] E. Sanchez, Resolution of composite fuzzy relation equations, Inform. and Control 30 (1976) 38–48. [13] E. Sanchez, Solutions in composite fuzzy relation equations, application to medical diagnosis in Brouwerian Logic, in: M.M. Gupta, G.N. Saridis, B.R. Gaines (Eds.), Fuzzy Automata and Decision Process, Elsevier=North-Holland, New York, 1977. [14] E. Sanchez, Inverses of fuzzy relations: application to possibility distributions and medical diagnosis, Fuzzy Sets and Systems 2 (1979) 75–86.
201
[15] J.A. Schreder, Tolerance spaces, Cybernetics (January 1983) 153–158. [16] S. Shenoi, A. Melton, Proximity relations in the fuzzy relational database model, Fuzzy Sets and Systems 5 (1) (1981) 31–46. [17] S. Tamura, S. Higuchi, K. Tanaka, Pattern classi cation based on fuzzy relation, IEEE Trans. Systems Man Cybernet. 1 (1) (1971) 61–66. [18] L.A. Zadeh, Fuzzy sets, Inform. and Control 8 (1965) 338–353. [19] L.A. Zadeh, Similarity relations and fuzzy orderings, Inform. Sci. 3 (2) (1970) 177–200.