Information Sciences 219 (2013) 1–16
Contents lists available at SciVerse ScienceDirect
Information Sciences journal homepage: www.elsevier.com/locate/ins
On a fuzzy bipolar relational algebra Patrick Bosc, Olivier Pivert ⇑ IRISA/ENSSAT, University of Rennes 1, 6, Rue de Kerampont, BP 80518, 22305 Lannion Cedex, France
a r t i c l e
i n f o
Article history: Received 22 June 2011 Received in revised form 29 May 2012 Accepted 15 July 2012 Available online 24 July 2012 Keywords: Database fuzzy querying Bipolarity Relational algebra
a b s t r a c t This paper presents an extension of relational algebra suitable for the handling of bipolar concepts. The type of queries considered involves two parts: a first one which expresses a (possibly flexible) constraint, and a second one that corresponds to a (possibly flexible) wish. The framework considered is that of bipolar fuzzy relations where each tuple is associated with a pair of satisfaction degrees. 2012 Elsevier Inc. All rights reserved.
1. Introduction The idea of introducing preferences into queries is gaining more and more attention in the database community. In this paper, we focus on the fuzzy-set-based approach to preference queries, which relies on the use of fuzzy set membership functions that describe the preference profiles of the user on each attribute domain involved in the query. Then, satisfaction degrees associated with elementary conditions are combined using a panoply of fuzzy set connectives, which go much beyond conjunction and disjunction. A complementary concept is that of bipolarity in general and its application to queries in the context of databases and information systems. Bipolarity refers to the propensity of the human mind to reason and make decisions on the basis of positive and negative affects [22,23]. Positive information states what is possible, satisfactory, permitted, desired, or considered as being acceptable. On the other hand, negative statements express what is impossible, rejected, or forbidden. Negative preferences correspond to constraints, since they specify which values or objects have to be rejected (i.e., those that do not satisfy the constraints), while positive preferences correspond to wishes, as they specify which objects are more desirable than others (i.e., satisfy user wishes) without rejecting those that do not meet the wishes. Three types of bipolarity have been pointed out [23]. The simplest type, called symmetric univariate bipolarity, uses a bipolar scale whose negative and positive parts are the mirror images of each other. The second type of bipolarity, termed symmetric bivariate, refers to the use of two separate unipolar scales (one for the positive affects, the other one for the negative affects) still pertaining to the same information, with generally a duality relation putting the scales in symmetric correspondence. The third type of bipolarity, called asymmetric, takes place when dealing with two unrelated kinds of information in parallel (see also [3] where this type of bipolarity is described and used). In the rest of this paper, asymmetric (also called heterogeneous) bipolarity is considered, and positive and negative poles are assumed to refer to potentially different notions (attributes). More precisely, we will deal with queries made of two poles, one meant as a constraint—denoted by C—and the other acting as a wish—denoted by W—, and a pair (C, W) is interpreted as: ‘‘C and if possible W’’. In the situation considered later on, the two components of a query, although they can be assessed on a same scale (true/false, or the unit interval, or a ⇑ Corresponding author. E-mail addresses:
[email protected] (P. Bosc),
[email protected] (O. Pivert). 0020-0255/$ - see front matter 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ins.2012.07.018
2
P. Bosc, O. Pivert / Information Sciences 219 (2013) 1–16
qualitative scale), are not of the same nature and it is convenient to specify how any pair of elements (tuples or objects) are compared depending on their scores with respect to the constraint and the wish. Let us recall, however, that inside a complex constraint (resp. wish), the fuzzy-set-based approach requires the elementary preferences to be commensurable. A commonly made choice (see in particular [23]) for interpreting a bipolar condition consists in discriminating between two objects x and y using first the constraint, then if needed (i.e., if x and y are not distinguishable on the constraint) using the wish. In what follows, this point of view is chosen and a lexicographic order is used. If (C(x), W(x)) and (C(y), W(y)) denote the scores of x and y with respect to the constraint C and the wish W, one has:
x y () ðCðxÞ > CðyÞÞ or ðCðxÞ ¼ CðyÞ and WðxÞ > WðyÞÞ
ð1Þ
where x y means that x is preferred to y. A consequence is the fact that an object which is beaten on the constraint cannot win even if it is significantly better on the wish. In this paper—which is a much extended version of [14]—, our aim is to propose an extension of relational algebra in order to have a querying framework capable of handling bipolar fuzzy queries and relations. The set of operators we propose generalizes the ‘‘fuzzy relational algebra’’ that has been previously proposed to handle non-bipolar fuzzy queries and relations (see, e.g., [5]), which itself generalized classical relational algebra. The rest of the paper is structured as follows. In Section 2, we recall some basic notions about database fuzzy querying, as well as the concept of bipolarity in this context. Section 3 is devoted to a presentation of the extended relational algebraic operators in the framework of bipolar fuzzy relations. Section 4 deals with query equivalences whereas Section 5 is devoted to implementation aspects. In Section 6, some related works are briefly discussed. Finally, Section 7 concludes the paper and outlines some perspectives for future work. 2. Preliminaries 2.1. Fuzzy queries The operations from relational algebra can be straightforwardly extended to fuzzy relations by considering fuzzy relations as fuzzy sets on the one hand and by introducing gradual predicates in the appropriate operations on the other hand. The definitions of these extended relational operators can be found in [5]. Let us mention that other extensions of relational algebra aimed at dealing with flexible queries have been proposed, for instance in [27] for top-k queries, or [2] for fuzzy queries in a context where attribute domains are equipped with similarity relations. The present work does not deal with these issues but rather extends the fuzzy querying framework from [5] to the bipolar case. As an illustration, we give the definition of the fuzzy selection hereafter, where r denotes a (fuzzy or crisp) relation and cond is a fuzzy predicate.
lrw ðrÞ ðtÞ ¼ minðlr ðtÞ; lw ðtÞÞ: Let us recall that lrw ðrÞ selects those tuples from r which (somewhat) satisfy condition w. An extension of SQL, named SQLf, which is founded on that extended relational algebra, is described in [8]. In the following, we consider a version of SQLf where the satisfaction degrees belong to an ordinal symbolic scale L made of k + 1 linguistic labels. For instance, with k = 4, one may use:
x0 ¼ \not at all" < x1 ¼ \poorly" < x2 ¼ \medium" < x3 ¼ \rather" < x4 ¼ \totally" where x0 (resp. xk) corresponds to 0 (resp. 1) in the unit interval when a numeric framework is used. The operation 1 () that is used to interpret the negation when the degrees belong to the unit interval is replaced by the order reversal operation denoted by rev(): rev(xi) = xki. 2.2. About bipolarity In the following, we denote by l(u) (resp. g(u)) the degree reflecting the extent to which an element u satisfies a constraint C (resp. a wish W). When C and W concern the same property of a given type of object or, in other terms, the same attribute of a given relation, two consistency conditions may be considered [23]: Strong consistency, as for twofold fuzzy sets [21]:
supu minðgðuÞ; rev ðlðuÞÞÞ ¼ x0
ð2Þ
which expresses that the support of the wish must be included in the core of the constraint. Then the wish can only discriminate between the tuples which get degree xk for the constraint. As noted in [23], the pair (W, C) under condition (2) is a twofold fuzzy set since support(W) # core(C). Weak consistency (implied by the strong one), in the spirit of intuitionistic fuzzy sets [1]:
8u; gðuÞ 6 lðuÞ
ð3Þ
P. Bosc, O. Pivert / Information Sciences 219 (2013) 1–16
3
which expresses that the wish must be included in the constraint (in the sense of Zadeh). Then, the wish can be used to discriminate between the tuples which somewhat satisfy the constraint. As noted in [23], the pair ðW; CÞ under condition (3) is an intuitionistic fuzzy set in the sense of Atanassov [1]. When C and W concern different attributes, the weak consistency of the wish with the constraint can be recovered by replacing (C, W) by (C, C ^ W) as suggested in [23]. In the remainder of the paper, only weak consistency is enforced. We define a bipolar relation (in the database sense) as a relation where each tuple t is associated with two degrees l(t) and g(t) in L expressing the extent to which the tuple satisfies the constraint (resp. wish) that has been used to produce the relation. Basically, a bipolar fuzzy relation can be seen as a mapping from the universe to the Cartesian product of satisfaction degrees L L where the consistency property is enforced. In this framework, a tuple will then be denoted by (l, g)/t. In base relations (i.e., classical relations to which no fuzzy criterion has been applied yet), l(t) = g(t) = xk, "t. It is assumed that tuples such that l = x0 do not appear in the relation (they do not belong to it at all). Let us denote by r a bipolar base relation and by (C, W) the pair constraint/wish that is applied to r in order to build a bipolar fuzzy relation r0 . One has:
r0 ¼ fðl; gÞ=t j ðxk ; xk Þ=t 2 r ^ l ¼ lC ðtÞ ^ g ¼ gW ðtÞg:
ð4Þ
When no constraint is expressed, Dubois and Prade [23] suggest to use C = true. Notice that this case is not likely to be very useful in practice, since it corresponds to the situation where all the tuples from the relation queried are considered equally acceptable. The wish just allows to give a ‘‘bonus’’ to some items; for example, ‘‘I prefer red cars, but if there are not any, any car will do’’. Reciprocally, when no wish is expressed (case of a non-bipolar—fuzzy or not—condition), we use W = C. In this case, the condition acts as a regular select clause, i.e., it discards the items which do not satisfy the constraint at all (and ranks the others if the constraint is fuzzy). We introduce the following operators lmin and lmax, which will play a major role for defining the conjunction (and intersection) and the disjunction (and union):
lminððl; gÞ; ðl0 ; g0 ÞÞ ¼ ðl; gÞ if
l < l0 or ðl ¼ l0 and g < g0 Þ ¼ ðl0 ; g0 Þ otherwise:
lmaxððl; gÞ; ðl0 ; g0 ÞÞ ¼ ðl; gÞ if
l > l0 or ðl ¼ l0 and g > g0 Þ ¼ ðl0 ; g0 Þ otherwise:
These definitions may be reformulated the following way. Let us consider the scale S ¼ fðc; wÞ j c 2 L ^ w 2 L ^ w 6 cg, equipped with the standard lexicographic order 6 lex. Then the lmin and lmax operators are the canonical lattice operations of ðS; 6lex Þ:
ðc; wÞ6lex ðc0 ; w0 Þ () lminððc; wÞ; ðc0 ; w0 ÞÞ ¼ ðc; wÞ () lmaxððc; wÞ; ðc0 ; w0 ÞÞ ¼ ðc0 ; w0 Þ: Straightforwardly, lmin (resp. lmax) is commutative, associative, idempotent, monotonic, has (xk, xk) (resp. (x0, x0)) as a neutral element, and (x0, x0) (resp. (xk, xk)) as an absorbing element. These properties make it legitimate to use lmin and lmax as conjunction and disjunction operators respectively (in the spirit of triangular norms and co-norms). Notice that Formula (4) can also be written:
r0 ¼ fðl; gÞ=t j ðl; gÞ ¼ lminððlC ðtÞ; gW ðtÞÞ; ðlr ðtÞ; gr ðtÞÞÞg
ð5Þ
since r is a base relation (where lr(t) = gr(t) = xk, "t). Note that relying on a lexicographic approach would make the approach questionable for queries involving a continuous attribute scale. Indeed, if one uses all of [0, 1], the following holds: ", (a, a) > lex(a, 0) > lex(a , a ), however small is. This might sound counterintuitive to a user. It makes less problems if [0, 1] is replaced by a small set of qualitative levels, and this is why we introduced the scale L. See for instance [18] on qualitative vs. quantitative scales. Remark 1. Even though the pairs of degrees we use resemble those handled in Atanassov’s intuitionistic fuzzy sets, this resemblance is both natural and fallacious. On the one hand, Atanassov starts with genuine bipolar pairs (p, n) = (positive, negative) but interprets them as partially known membership grades p 6 l 6 1 n, better captured by interval-valued fuzzy sets [19]. We rather use pairs (l, u) = (wishes, constraints) where l 6 u, that in the literature are interpreted as uncertainty intervals, but that we interpret as expressing bipolarity. We could use instead pairs (p, n) = (wishes, rejections) = (l, 1 u). In any case, the query is bipolar because the user expresses what he rejects (via constraints) and what he prefers (‘‘I would like this but certainly not that’’). Moreover, the bipolar semantics we consider leads us to a very different calculus from the one Atanassov advocates, more in line with his positive/negative starting point (while his calculus is not faithful to his original bipolar semantics, see [24]).
3. Extended algebraic operators In this section, we review the operators from classical relational algebra, and we give an extended version of them in the framework of bipolar fuzzy relations. For each operator, the starting point is its usual definition, that we generalize by stating
4
P. Bosc, O. Pivert / Information Sciences 219 (2013) 1–16
how the pairs of degrees attached to every tuple of a bipolar relation must be taken into account. In the following, we will use as an example the case of a user who wants to buy a second-hand car. We will consider the relation Car of schema (#id, make, model, mileage, years, price, color). 3.1. Intersection The straightforward definition of intersection is:
r \ s ¼ fðl00 ; g00 Þ=t j ðl; gÞ=t 2 r ^ ðl0 ; g0 Þ=t 2 s ^ ðl00 ; g00 Þ ¼ lminððl; gÞ; ðl0 ; g0 ÞÞg:
ð6Þ
Example 1. The query ‘‘find the cars that meet both John’s and Peter’s needs/preferences’’ may be expressed as CarJ \ CarP, where CarJ ¼ rðC 1 ; W 1 Þ ðCarÞ; CarP ¼ rðC 2 ; W 2 Þ ðCarÞ and (C1, W1) (resp. (C2, W2)) represents John’s (resp. Peter’s) bipolar requirement. Note: the selection operator r on bipolar fuzzy relations is formally defined in SubSection 3.5. Let us assume that the scale L used is such that k = 7. With the extensions of CarJ and CarP given in Tables 1 and 2 respectively, one gets the result represented in Table 3. 3.2. Union The definition of the union of fuzzy bipolar relations is:
r [ s ¼ fðl00 ; g00 Þ=t jrð9ðl; gÞ=t 2 r ^ 9ðl0 ; g0 Þ=t 2 s ^ ðl00 ; g00 Þ ¼ lmaxððl; gÞ; ðl0 ; g0 ÞÞÞ _ ð9ðl; gÞ=t = ðl; gÞ=t 2 r ^ 9ðl0 ; g0 Þ=t 2 s ^ ðl00 ; g00 Þ ¼ ðl0 ; g0 ÞÞg: 2r ^ 9 = ðl0 ; g0 Þ=t 2 s ^ ðl00 ; g00 Þ ¼ ðl; gÞÞ _ ð 9
ð7Þ
Example 2. The query aimed at finding the cars that meet John’s or Peter’s needs/preferences may be expressed as CarJ [ CarP. With the extensions of CarJ and CarP given in Tables 1 and 2 respectively, one gets the result represented in Table 4.
Table 1 An instance of relation CarJ. #id
make
model
mileage
years
price
l
g
1 2 3 4 5 6
Ford Seat VW Toyota VW Opel
Mondeo Altea Golf Corolla Passat Zafira
35,000 50,000 39,000 60,000 75,000 58,000
2008 2007 2008 2006 2004 2007
22,000 17,000 21,000 18,000 16,000 15,000
x6 x5 x7 x7 x6 x7
x2 x3 x6 x2 x0 x4
Table 2 An instance of relation CarP. #id
make
model
mileage
years
price
l
g
2 4 5 7
Seat Toyota VW Renault
Altea Corolla Passat Laguna
50,000 60,000 75,000 18,000
2007 2006 2004 2008
17,000 18,000 16,000 25,000
x6 x5 x6 x2
x3 x5 x4 x1
#id
make
model
mileage
years
price
l
g
2 4 5
Seat Toyota VW
Altea Corolla Passat
50,000 60,000 75,000
2007 2006 2004
17,000 18,000 16,000
x5 x5 x6
x3 x5 x0
Table 3 Result of CarJ \ CarP.
5
P. Bosc, O. Pivert / Information Sciences 219 (2013) 1–16 Table 4 Result of CarJ [ CarP. #id
make
model
mileage
years
price
l
g
1 2 3 4 5 6 7
Ford Seat VW Toyota VW Opel Renault
Mondeo Altea Golf Corolla Passat Zafira Laguna
35,000 50,000 39,000 60,000 75,000 58,000 18,000
2008 2007 2008 2006 2004 2007 2008
22,000 17,000 21,000 18,000 16,000 15,000 25,000
x6 x6 x7 x7 x6 x7 x2
x2 x3 x6 x2 x4 x4 x1
3.3. Cartesian product The straightforward definition of the Cartesian product of bipolar fuzzy relations is:
r s ¼ fðl00 ; g00 Þ=t t 0 j ðl; gÞ=t 2 r ^ ðl0 ; g0 Þ=t0 2 s ^ ðl00 ; g00 Þ ¼ lminððl; gÞ; ðl0 ; g0 ÞÞg
ð8Þ
where denotes concatenation. 3.4. Difference Let us first recall the definition of the difference in the context of classical relational algebra:
r s ¼ ft j t 2 r ^ t R sg:
ð9Þ
The most commonly used definition of the difference between two unipolar fuzzy relations r and s is:
lrs ðtÞ ¼ minðlr ðtÞ; ls ðtÞÞ ¼ minðlr ðtÞ; rev ðls ðtÞÞÞ:
ð10Þ
In the bipolar fuzzy relation case, one may also consider a definition of the difference based on the negation. The definition of the negation proposed by Dubois and Prade [21] in the framework of twofold fuzzy sets is:
:ðC; WÞ ¼ ð:W; :CÞ
ð11Þ
where : is a standard fuzzy negation (interpreted by rev in our framework). When C and W concern different attributes, one replaces (C, W) by (C, W ^ C) as it has been proposed in [23] to enforce the consistency condition. Then, adapting Formula (11), we get:
:ðC; WÞ ¼ ð:ðC ^ WÞ; :CÞ:
ð12Þ
With this definition, the negation of ‘‘I want a Volkswagen and if possible a black one’’ is ‘‘a black Volkswagen is out of the question, and if possible I would like a car which is not a Volkswagen’’. This negation is involutive and preserves the consistency property but it is not order-reversing. Let us consider for instance the bipolar condition Q = (C, W) and a relation r containing the tuples t1 and t2 such that:
lC ðt1 Þ ¼ x6 ; gW ðt1 Þ ¼ x0 ; lC ðt2 Þ ¼ x4 ; gW ðt2 Þ ¼ x4 ; Assuming k = 7 and denoting (C0 , W0 ) = :(C, W), we get:
lC0 ðt1 Þ ¼ x7 ; gW 0 ðt1 Þ ¼ x1 ; lC0 ðt2 Þ ¼ x3 ; gW 0 ðt2 Þ ¼ x3 ; and (lC(t1), gW(t1)) > lex(lC(t2), gW(t2)) and ðlC 0 ðt 1 Þ; gW 0 ðt1 ÞÞ>lex ðlC 0 ðt2 Þ; gW 0 ðt 2 ÞÞ are both true. In the following, we propose an alternative definition of the negation of a bipolar condition, aimed at guaranteeing at least a relaxed form of order reversal. We indeed consider this property to be a crucial requirement when translating the concept of negation. Due to the use of the lexicographic order for interpreting a bipolar condition, this straightforwardly leads to considering that the constraint part of the negation of a bipolar condition B is the negation of the constraint involved in B. Let us denote (C0 , W0 ) = :(C, C ^ W). We define:
lC0 ðxÞ ¼ rev ðlC ðxÞÞ gW 0 ðxÞ ¼ rev 0 ðgC ^ W ðxÞ; lC ðxÞÞ ^ lC0 ðxÞ where function rev0 is defined as follows, letting xi = lC(x) and xj = gC
ð13Þ ^ W(x):
rev ðxj ; xi Þ ¼ xij : 0
This amounts to saying that the negation of (C, C ^ W) is (:C, :C(W) ^ :C) where :C is a contextual negation which depends on C and is interpreted by a scale reversal operation of the range [0, lC(x)], i.e., the range where gC ^ W(x) is allowed to take its value due to the consistency condition. Notice that if rev(lC(x)) were used instead of rev0 (gC ^ W(x), lC(x)) in Formula (13), one would always have lW 0 ðxÞ ¼ lC 0 ðxÞ and one would then entirely lose the discrimination power attached to the wish.
6
P. Bosc, O. Pivert / Information Sciences 219 (2013) 1–16
However, the negation defined by Formula (13) enforces only a ‘‘weak form’’ of order reversal (due to the enforcement of the consistency requirement):
ðlC ðx1 Þ; gC
^ W ðx1 ÞÞ
> ðlC ðx2 Þ; gC
^ W ðx2 ÞÞ
)
lC0 ðx1 Þ; lW 0 ðx1 Þ 6 lC0 ðx2 Þ; lW 0 ðx2 Þ :
For the same reason, this negation is not perfectly involutive. For instance, with a scale L such that k = 10, (x8, x3) becomes through negation (x2, x2), which itself becomes (x8, x0) through negation. Anyway, this imperfection only concerns the wish part of the bipolar condition, and it may be considered acceptable considering that the involution property is not so crucial in the context considered, i.e. that of database querying. Let us mention that the difficulty of defining a negation that is both involutive, order-reversing and that preserves consistency in a bipolar framework was already pointed out in [12] and the problem becomes even more tricky in the context considered here where a qualitative scale is used. The definition of the difference of bipolar fuzzy relations based on Formula (13) is then as follows:
ðlrs ðxÞ; grs ðxÞÞ ¼ lminððlr ðxÞ; gr ðxÞÞ; ðls ðxÞ; gs ðxÞÞÞ
ð14Þ
where
ðls ðxÞ; gs ðxÞÞ ¼ ðrev ðls ðxÞÞ; minðrev 0 ðgs ðxÞ;
ls ðxÞÞ; rev ðls ðxÞÞÞÞ:
It is assumed that a tuple x which is not present in relation s has the pair of degrees (ls(x), gs(x)) = (x0, x0). An example of a relational algebraic query involving a difference between two bipolar fuzzy relations is given hereafter. Example 3. The query ‘‘find the cars that meet John’s needs/preferences but not Peter’s’’ may be expressed as CarJ CarP where CarJ and CarP represent two bipolar fuzzy relations resulting from two bipolar selection queries on Car issued respectively by John and Peter, as in Example 1. With the extensions of CarJ and CarP from Tables 1 and 2, and using Formula (14) for interpreting the difference, one gets the result represented in Table 5. In this relation, the first car keeps the pair of degrees it has in CarJ since it is totally absent from CarP. On the other hand, the second car is present in CarP with the pair of degrees (x6, x3). Then, according to Formula (13), it is in CarP with the pair of degrees (x1, x1). The pair of degrees it gets in CarJ CarP is also (x1, x1) since this pair is smaller (in the sense of lmin) than the pair (x5, x3) with which it is associated in CarJ.
3.5. Selection The straightforward definition of the selection is:
rðC; WÞ ðrÞ ¼ fðl0 ; g0 Þ=t j ðl; gÞ=t 2 r ^ ðl0 ; g0 Þ ¼ lminðl; gÞ; ðlC ðtÞ; gW ðtÞÞg
ð15Þ
where lC(t) (resp. gW(t)) denotes the satisfaction degree of t wrt C (resp. W). Example 4. The query ‘‘find the cars among those preferred by Paul (in the sense of a bipolar requirement that was used to build the relation CarPaul represented in Table 6) which are less than $20,000 and if possible have a mileage less than 60,000’’ may be expressed as:
rðprice is lt
20k; mileage islt 60kÞ ðCar Paul Þ:
Let us assume that a scale L such that k = 7 is used, and that the fuzzy predicates lt_20k and lt_60k are defined as follows:
llt
20k ðxÞ
¼ fx7 =½0; 20K; x6 =20K; 21K; x5 =21K; 22K; x4 =22K; 23K; x3 =23K; 24K; x2 =24K; 25K;
x1 =25K; 26K; x0 =25K; þ1½g
llt
60k ðxÞ
¼ fx7 =½0; 60K; x6 =60K; 62K; x5 =62K; 64K; x4 =64K; 66K; x3 =66K; 68K; x2 =68K; 70K; x1 =70K; 72K; x0 =72K; þ1½g:
The result of the selection is represented in Table 7 (let us recall that the t-norm minimum is used for interpreting the conjunction when W has to be replaced by C ^ W in order to preserve weak consistency). Table 5 Result of CarJ CarP. #id
make
model
mileage
years
price
l
g
1 2 3 4 5 6
Ford Seat VW Toyota VW Opel
Mondeo Altea Golf Corolla Passat Zafira
35,000 50,000 39,000 60,000 75,000 58,000
2008 2007 2008 2006 2004 2007
22,000 17,000 21,000 18,000 16,000 15,000
x6 x1 x7 x2 x1 x7
x2 x1 x6 x2 x1 x4
7
P. Bosc, O. Pivert / Information Sciences 219 (2013) 1–16 Table 6 An instance of relation CarPaul. #id
make
model
mileage
years
price
l
g
1 2 3 4 5 6 7
Ford Seat VW Toyota VW Seat Toyota
Mondeo Altea Passat Corolla Passat Altea Corolla
35,000 65,000 66,000 60,000 75,000 58,000 18,000
2008 2007 2008 2006 2004 2007 2008
22,000 21,000 23,000 30,000 16,000 15,000 27,000
x5 x7 x6 x7 x6 x7 x2
x1 x2 x6 x3 x5 x4 x1
Table 7 Result of the bipolar selection on relation CarPaul. #id
make
model
mileage
years
price
l
g
1 2 3 5 6
Ford Seat VW VW Seat
Mondeo Altea Passat Passat Altea
35,000 65,000 66,000 75,000 58,000
2008 2007 2008 2004 2007
22,000 21,000 23,000 16,000 15,000
x5 x6 x4 x6 x7
x1 x4 x4 x5 x4
In expression (15), C and/or W can be a complex condition expressing an aggregation (e.g., involving conjunctions and/or disjunctions) of atomic predicates. However, it is important to notice that in general: (C1 ^ C2, W1 ^ W2) – (C1, W1) ^ (C2, W2), (C1 _ C2, W1 _ W2) – (C1, W1) _ (C2, W2). Counter-example. Let us consider a tuple t such that l1(t) = x6, g1(t) = x2, l2(t) = x5, and g2(t) = x3. Interpreting ^ as min and _ as max in the left-hand part of the previous inequalities, we get: (min (x6, x5), min (x2, x3)) = (x5, x2) whereas lmin((x6, x2), (x5, x3)) = (x5, x3), (max (x6, x5), max (x2, x3)) = (x6, x3) whereas lmax((x6, x2), (x5, x3)) = (x6, x2). Therefore, the user must employ the appropriate formulation when expressing his/her query. One may suppose that a formulation of the type (C1, W1) ^ . . . ^ (Cn, Wn)—where the Ci’s and Wi’s are atomic conditions—will be chosen when each atomic bipolar condition involved concerns a specific attribute, as in: ‘‘find the cars which are less than $20,000 (C1) and if possible less than $15,000 (W1), and which are less than 5 years old (C2) and if possible less than 3 years old (W2)’’. On the other hand, in a bipolar condition (C, W) where C and/or W are composite, the set of attributes concerned by the constraint and the wish respectively will be disjoint in general. An example is: ‘‘find the German cars whose price is less than $20,000, and if possible recent and powerful’’. It is argued in [3] that constraints should be combined in a conjunctive manner, thus acknowledging the fact that they are constraints. On the other hand, positive preferences are not compulsory and may be combined disjunctively (see also [22] where this principle was applied to database querying). 3.6. Projection One starts from the usual definition of projection, i.e.:
pX ðrÞ ¼ fx j 9t 2 r such that t X ¼ xg: In the case of bipolar fuzzy relations, the existential quantifier is naturally interpreted by means of operator lmax and we get the definition:
pX ðrÞ ¼ fðl; gÞ=x j 9t 2 r such that t X ¼ x ^ ðl; gÞ ¼ lmax t2r s:t: tX¼x ðlr ðtÞ; gr ðtÞÞg:
ð16Þ
8
P. Bosc, O. Pivert / Information Sciences 219 (2013) 1–16
Example 5. The query ‘‘find the pairs (make, model) corresponding to the cars that meet Paul’s needs/preferences’’ may be expressed as:
pfmake; modelg ðrðC1 ; W 1 Þ ðCarÞÞ where (C1, W1) represents Paul’s bipolar requirement. Let us assume that CarPaul ¼ rðC 1 ; Table 6. The result of the projection is represented in Table 8.
W 1 Þ ðCarÞ
is the relation represented in
3.7. Join The straightforward definition of the join operation is:
ffl ðr; s; ðCðX; YÞ; WðZ; TÞÞÞ ¼ fðl00 ; g00 Þ=t t 0 j ðl; gÞ=t 2 r ^ ðl0 ; g0 Þ=t 0 2 s ^ ðl00 ; g00 Þ ¼ lminðlminððl; gÞ; ðl0 ; g0 ÞÞ; ðlC ðt X; t 0 :YÞ; gW ðt Z; t 0 :TÞÞÞg:
ð17Þ
where X, Y, Z, and T denote sets of attributes, X and Y (resp. Z and T) are compatible. Condition C(X, Y) (resp. W(Z, T)) is of the form X h1 Y (resp. Z h2 T) where h1 and h2 are (fuzzy or not) comparators (=, <, 6, >, P, –, , , etc.). As usual, in the case of a Boolean natural join, only one occurrence of the join attribute is kept (which implies adding a final projection to the definition above). For instance, if C(X, Y) writes ‘‘X = Y’’, only X is kept—and it is the same thing for W. The above definition straightforwardly preserves the usual equivalence between a join and a Cartesian product followed by a selection:
ffl ðr; s; ðCðX; YÞ; WðZ; TÞÞÞ ¼ rðCðX;
YÞ; WðZ; TÞÞ ðr
sÞ:
Example 6. Let us assume the existence of two relations Dealer1 and Dealer2 of the same schema as Car. An example of a bipolar join query is ‘‘find the cars (make, model) which are sold both by Dealer1 and Dealer2 preferably in the same range of price. This query can be expressed as:
ffl ðDealer1 ; Dealer2 ; ðmake1 ¼ make2 and model1 ¼ model2 ; jprice1 price2 j 6 1500ÞÞ: 3.8. Division 3.8.1. Refresher on the usual division We assume that the dividend relation r has the schema (X, A), while that of the divisor relation s is (B) where A and B are compatible sets of attributes. The division of relation r by relation s is classically defined as:
r½A Bs ¼ fx 2 pX ðrÞ j 8a; a 2 s ) ðx; aÞ 2 rg
ð18Þ
where pX(r) denotes the projection of r over X. In other words, an element x belongs to the result of the division of r by s iff it is associated in r with at least all the values a appearing in s. Example 7. Let us take a database involving the two relations order (o) and product (p) with respective schemas O(np, store, qty) and P(np, price). Tuples (n, s, q) of o and (n, pr) of p state that product n has been ordered from store s in quantity q and that its price is pr. Retrieving the stores which have been ordered all the products priced under $127 in a quantity greater than 35, can be expressed thanks to a division as: og35[np np]pu127 where relation og35 corresponds to pairs (s, n) such that product n has been ordered from store s in a quantity over 35 and relation pu127 gathers products whose price is under $127. From the extensions of relations og35 and pu127 given hereafter:
og35 ¼ fhs32 ; p15 i; hs32 ; p12 i; hs32 ; p34 i; hs32 ; p26 i; hs7 ; p12 i; hs7 ; p26 i; hs19 ; p15 i; hs19 ; p12 i; hs19 ; p26 ig pu127 ¼ fhp15 i; hp12 i; hp26 ig the division returns {hs32i, h s19i}. Table 8 Result of p{make,
model}
(CarPaul).
make
model
l
g
Ford Seat VW Toyota
Mondeo Altea Passat Corolla
x5 x7 x6 x7
x1 x4 x6 x3
P. Bosc, O. Pivert / Information Sciences 219 (2013) 1–16
9
3.8.2. Reminder about the division of fuzzy relations When the relations involved are fuzzy (i.e. contain graded tuples), Eq. (18) becomes [15]:
lr½A Bs ðxÞ ¼ inf a2supportðsÞ ls ðaÞ ! lr ðx; aÞ
ð19Þ
where ? denotes a fuzzy implication [20]. Of course, in the context considered here, where a qualitative scale is used, only a certain subset of fuzzy implications may be used (Gödel or Kleene-Dienes, for instance, but not Goguen or Reichenbach). In the following example, an integer-valued version of Łukasiewicz implication is used. Example 8. Let us consider again the relations from Example 7, a scale L such that k = 7, and two fuzzy conditions ‘‘quantity is around 30’’ and ‘‘price is around $100’’ applied on O and P respectively, leading to two fuzzy relations oa30 and pa100 whose extensions are given hereafter:
oa30 ¼ fx4 =hs32 ; p11 i; x7 =hs32 ; p17 i; x5 =hs32 ; p29 i; x6 =hs7 ; p11 i; x5 =hs7 ; p29 i; x7 =hs19 ; p11 i; x2 =hs19 ; p17 i; x6 =hs19 ; p42 ig pa100 ¼ fx7 =hp11 i; x6 =hp17 i; x5 =hp29 ig: Let us consider the following integer-valued version of Łukasiewicz implication:
( p!Lu q ¼
xk if q P p; xkðrankðpÞrankðqÞÞ otherwise;
where rank(xi) = i. The division of oa30 by pa100 based on this implication returns {x4/hs32i, x1/hs7i, x2/h s19i}. 3.8.3. Division of bipolar fuzzy relations When two bipolar fuzzy relations r and s (assumed to be consistent) come into play, a first view of the division consists in starting with Formula (19) and extending it to the case where the arguments of the fuzzy implication are pairs of degrees (l, g). One can use for instance the two ‘‘bipolar fuzzy implications’’ ?G and ?K which are the counterparts of Gödel and KleeneDienes fuzzy implications respectively:
ðl1 ; g1 Þ!G ðl2 ; g2 Þ ¼
ðxk ; xk Þ if ðl1 ; g1 Þ6lex ðl2 ; g2 Þ; ðl2 ; g2 Þ otherwise:
ðl1 ; g1 Þ!K ðl2 ; g2 Þ ¼ lmaxðrev ðl1 Þ; minðrev 0 ðg1 ;
l1 Þ; rev ðl1 ÞÞ; ðl2 ; g2 ÞÞ
ð20Þ ð21Þ
the latter being based on the definition of the negation expressed by Formula (13). However, the interpretation of such an extended division operator may be difficult to grasp for an end-user. A second view which, in our opinion, is both more intuitive and more flexible, starts from the remark that Formula (19) can be used to compute four degrees:
lðxÞ ¼ inf a2supportðsÞ gs ðaÞ ! lr ðx; aÞ gðxÞ ¼ inf a2supportðsÞ ls ðaÞ ! gr ðx; aÞ l0 ðxÞ ¼ inf a2supportðsÞ ls ðaÞ ! lr ðx; aÞ g0 ðxÞ ¼ inf a2supportðsÞ gs ðaÞ ! gr ðx; aÞ
ð22Þ ð23Þ ð24Þ ð25Þ
Due to the monotonicity of fuzzy implications and the consistency condition between the constraint and the wish used to generate each bipolar relation, one has:
8x; gðxÞ 6 l0 ðxÞ 6 lðxÞ and
8x; gðxÞ 6 g0 ðxÞ 6 lðxÞ: In order for the result to be a consistent bipolar relation too, the only possible choices are (l, g), (l, l0 ), (l, g0 ), (l0 , g), and (g0 , g). It seems natural to choose (l, g) for the following reasons: it captures the two extreme cases: l(x) expresses the extent to which every ‘‘somewhat ideal’’ element of the divisor is associated in a somewhat necessary fashion with x (laxist view corresponding to the constraint aimed at retrieving the acceptable elements), whereas g(x) is the extent to which every ‘‘somewhat necessary’’ element of the divisor is associated in a somewhat ideal fashion with x (drastic view corresponding to the wish aimed at retrieving the ideal elements); it keeps its discrimination power intact in the particular cases where r and/or s is a non-bipolar relation, while this is not true with the other pairs (when r is unipolar, one has l(x) = g0 (x) and g(x) = l0 (x), and when s is unipolar, one has: l(x) = l0 (x) and g(x) = g0 (x)).
10
P. Bosc, O. Pivert / Information Sciences 219 (2013) 1–16
Let us denote by birel the operation—which is not part of the query language itself—that builds a bipolar fuzzy relation from two consistent (in the sense of weak consistency) regular fuzzy relations:
birelðr; sÞ ¼ fðl; gÞ=t j t 2 supportðrÞ ^ lr ðtÞ ¼ l ^ t 2 supportðsÞ ^ ls ðtÞ ¼ gg [ fðl; 0Þ=t j t 2 supportðrÞ ^ lr ðtÞ ¼ l ^ t R supportðsÞg:
ð26Þ
where lr(t) (resp. ls(t)) denotes the membership degree associated with tuple t in the regular fuzzy relation r (resp. s). The definition of the division of bipolar fuzzy relations can be rewritten as:
r½A Bs ¼ birelðr C ½A BsW ; rW ½A BsC Þ
ð27Þ
where relC (resp. relW) denotes the fuzzy relation obtained by keeping only the degrees l (resp. g) attached to the tuples from the bipolar fuzzy relation rel. This rewriting will be used in the following two subsections which deal with the quotient property of the result and the non-primitiveness of the division operator respectively. Example 9. Let us consider relations r and s from Table 9. Using Łukasiewicz’ implication and denoting by res the bipolar relation resulting from the division, one gets:
lres ðx1 Þ ¼ minðx3 ! x1 ; gres ðx1 Þ ¼ minðx5 ! x0 ; lres ðx2 Þ ¼ minðx3 ! x0 ; gres ðx2 Þ ¼ minðx5 ! x0 ;
x5 ! x5 ; x7 ! x2 ; x5 ! x7 ; x7 ! x7 ;
x0 ! x7 Þ ¼ minðx5 ; x2 ! x5 Þ ¼ minðx2 ; x0 ! x1 Þ ¼ minðx4 ; x2 ! x0 Þ ¼ minðx2 ;
x7 ; x2 ; x7 ; x7 ;
x7 Þ ¼ x5 x7 Þ ¼ x2 x7 Þ ¼ x4 x5 Þ ¼ x2 :
The result is the bipolar relation: {(x5, x2)/h x1i, (x4, x2)/hx2i}. In practice, it is likely that either the divisor or the dividend will be bipolar but not both since the meaning of the query then becomes rather complex. Anyway, Formulas (22) and (23) make it possible to deal with the general case where both are. Example 10. Let us consider again the relations O and P from Example 7. Let us assume that a bipolar relation Ob is built from O using the twofold condition:
ðC O : qty is higher than 10; W O : qty is higher than 15Þ where higher_than_10 and higher_than_15 are assumed to be fuzzy predicates. An example of a division query involving a bipolar dividend is: ‘‘find the stores from which at least 10 occurrences of each product have been ordered, and if possible at least 15 occurrences of each product. Here, the bipolar fuzzy dividend is Ob whereas the unipolar divisor is P. Now assume that a bipolar relation Pb is built from P using the twofold condition:
ðC P : price less than 200; W P : price less than 150Þ where less_than_200 and less_than_150 are fuzzy predicates. An example of a division query involving a bipolar divisor is: ‘‘find the stores from which all of the products priced less than $200 have been ordered, and if possible all of those priced less than $150’’. In this case, the unipolar dividend is O whereas the fuzzy bipolar divisor is Pb. Let us mention that an alternative semantics for the bipolar division of unipolar fuzzy relations is studied in [11], where bipolarity is conveyed by a stratified divisor. 3.8.4. About the quotient property The justification of the term ‘‘division’’ assigned to this operation relies on the fact that a property similar to that of the quotient of integers holds. Indeed, when the arguments of the division operator are classical relations, the resulting relation res obtained with expression (18) has the double characteristic of a quotient:
Table 9 Relations r (left) and s (right). X
A
l
g
x1 x1 x1 x2 x2
a1 a2 a3 a2 a3
x1 x5 x7 x7 x1
x0 x2 x5 x7 x0
B
l0
g0
a1 a2 a3
x5 x7 x2
x3 x5 x0
P. Bosc, O. Pivert / Information Sciences 219 (2013) 1–16
8t 2 res; s ftg # r 8t R res; s ftg r
11
ð28Þ ð29Þ
denoting the Cartesian product of relations. Expressions (28) and (29) state that the relation res resulting from the division is the largest relation whose Cartesian product with the divisor returns a result smaller than or equal to the dividend (according to set inclusion). Considering the definition given in the previous section, processing a division r[A B]s of two bipolar fuzzy relations r and s comes down to processing two divisions of non-bipolar fuzzy relations, namely rC[A B]sW and rW[A B]sC—cf. Eq. (27). Since it has been proven that the result of the division of two fuzzy relations is a quotient provided that the conjunction operator used for the Cartesian product is appropriately chosen [15], the result of a division of bipolar fuzzy relations can be characterized as a twofold quotient. However, when the implication used is an S-implication, the quotient property can be guaranteed only if the divisor is normalized. In the context of bipolar relations, this means that both sC and sW have to contain a tuple whose associated degree equals 1.
3.8.5. About the non-primitiveness of the operator When regular (non-fuzzy) relations are handled, the division operator can also be defined in terms of other relational operators (which shows its non-primitiveness), noticing that the division comes down to discarding from r[X] all x’s such that $a 2 s, (x, a) R r. It gives:
r½A Bs pX ðrÞ pX ððpX ðrÞ pB ðsÞ rÞÞ:
ð30Þ
In this formula, the expression (pX(r) s r) determines the tuples associating values of attributes X and B that are missing in r and then, X-values present in this set must be discarded from the final result, which is done by the outermost difference. It has been shown in [6] that the division of fuzzy relations is also a non-primitive operator, and that a rewriting similar to formula (30) exists:
X ðrÞ pX ððp X ðrÞ s rÞÞ: r½A Bs ¼ p
ð31Þ
X ðrÞ returns the support of the projection of relation r on the set of attributes X, and is a where the operation denoted by p difference based on the triangular norm or the non-commutative conjunction underlying the implication present in Eq. (19). It is been shown in [15] that Formula (31) is valid for any R- or S-implication that may appear in Formula (19). Since, for bipolar relations, r[A B]s rewrites as the combination of two divisions of regular fuzzy relations—cf. Eq. (27)—, one has [10]:
X ðr C Þ pX ððp X ðr C Þ pB ðsW Þ r C ÞÞ; pX ðr W Þ pX ððp X ðr W Þ p B ðsC Þ r W ÞÞÞ: r½A Bs birelðp
ð32Þ
4. About query equivalences Let us recall that relational algebraic queries can be represented as a tree where the internal nodes are operators, leaves are relations, and subtrees are subexpressions. The primary goal of query optimization is to transform expression trees into equivalent expression trees, where the average size of the relations yielded by subexpressions in the tree are smaller than they were before the optimization. This transformation process uses a set of properties (query equivalences), and the question arises whether these properties remain valid in the fuzzy bipolar model that we consider. The most commonly used query equivalences are: 1. pX(pXY(r)) = pX(r), 2. rw2 ðrw1 ðrÞÞ ¼ rw1 ðrw2 ðrÞÞ ¼ rw1 ^ w2 ðrÞ, 3. rw1 _ w2 ðrÞ ¼ rw1 ðrÞ [ rw2 ðrÞ, 4. rw(pX(r)) = pX(rw(r)) if w concerns X only, 5. rw ðr 1 r 2 Þ ¼ rw1 ^ w2 ^ w3 ðr1 r2 Þ ¼ rw3 ðrw1 ðr1 Þ rw2 ðr 2 ÞÞ where w = w1 ^ w2 ^ w3 and w1 concerns only attributes from r1, w2 concerns only attributes from r2, and w3 is the part of w that concerns attributes from both r1 and r2, 6. rw(r1 [ r2) = rw(r1) [ rw(r2), 7. rw(r1 \ r2) = rw(r1) \ rw(r2) = rw(r1) \ r2 = r1 \ rw(r2), 8. rw(r1 r2) = rw(r1) rw(r2) = rw(r1) r2, 9. pX(r1 [ r2) = pX(r1) [ pX(r2), 10. pZ(r1 r2) = pX(r1) pY(r2) if X (resp. Y) denotes the subset of attributes of Z present in r1 (resp. r2). It is straightforward to prove that all of these equivalences remain valid in our fuzzy bipolar model (the proofs are in the appendix).
12
P. Bosc, O. Pivert / Information Sciences 219 (2013) 1–16
5. Implementation aspects In terms of data representation, a regular DBMS can be used since the only modification with respect to the classical case concerns the schemas of the relations which must include two additional attributes for storing the degrees l and g. In terms of query processing, the changes implied by the presence of bipolarity are discussed hereafter for the different extended operators. 5.1. Intersection and union These operations have the same data complexity as usual. A nested scan of the relations involved is necessary, which can be optimized if indexes exist. With respect to an intersection (resp. union) of unipolar fuzzy relations, the only difference consists in using lmin (resp. lmax) instead of min (resp. max)—cf. Formulas (6) and (7). 5.2. Cartesian product The situation is the same as for intersection and union: a nested scan of the relations involved is necessary. Again, lmin is used to compute the final pair of degrees attached to a tuple of the result—cf. Formula (8). 5.3. Difference For this operator too, the processing algorithm has a similar structure as in the classical case. The computation of r s implies a nested scan of the relations involved. For each ‘‘raw tuple’’ (i.e., tuple without the degrees) t in r, one checks whether it is present in s or not. If it is in s, one uses Formula (14) to compute the final pair of degrees attached to it in the result. If it is not present in s, it is in the result with the pair of degrees (lr(t), gr(t)). 5.4. Selection In the absence of any index on the attribute (s) concerned by the constraint C, the processing of a selection is similar as in the usual case: one sequentially scans the relation concerned and evaluates both conditions C and W. If an index on the attribute (s) involved in C exists, one can take advantage of the derivation method proposed in [9] for efficiently accessing the tuples t which belong to the support of C (i.e., those such that lC(t) > 0), and for each such tuple, one then evaluates W in order to compute the final pair of degrees. In any case, one has the same (linear) data complexity as in the case of unipolar fuzzy queries. 5.5. Projection Here too, the same type of algorithm as usual (e.g., based on a sort of the relation concerned) must be used. The only change concerns duplicate elimination which leads to computing a pair of degrees by means of lmax (instead of max in the unipolar fuzzy case). 5.6. Join If there are no indexes on the join attributes, a nested scan of the relations involved may be performed. The only novelty with respect to the unipolar case is that the join condition is twofold, but this does not imply any overhead in terms of data complexity which remains in O(jrj jsj) (where r and s denote the relations to be joined). If the join condition in the constraint part C of the join condition is Boolean, one evaluates it first, then the wish part of the join condition is computed by means of a (possibly fuzzy) selection. On the other hand, if the condition in the constraint part C of the join condition is fuzzy but concerns attributes on which indexes are available, one may take advantage of the aforementioned derivation principle—described in [9]—to evaluate it. Then, the wish part W of the join condition is evaluated by means of a selection on the result of the previous step. In any case, the data complexity is as usual (between jrj + jsj in the best case—when indexes are available—, and min (jrj (jsj + 1), jsj (jrj + 1)) in the worst case). 5.7. Division Here, it is necessary to process two divisions of unipolar fuzzy relations (cf. Formula (27)), which means that the practical complexity is multiplied by two, but the class of data complexity stays the same. See [7] for more details on the processing of a division of unipolar fuzzy relations. To sum up, one can be reasonably optimistic about the data complexity of bipolar queries based on the extended relational algebra described here, thus of the performances of a ‘‘bipolar DBMS.’’ Indeed, introducing bipolarity into relations
13
P. Bosc, O. Pivert / Information Sciences 219 (2013) 1–16
and queries does not modify the complexity class of any algebraic operator (all of them remain in PTIME, and a naive evaluation just leads to a doubling of the cost with respect to an equivalent non-bipolar query). 6. Related work To the best of our knowledge, this paper is the first to propose a complete algebraic framework for handling bipolar fuzzy relations and conditions. Let us mention, however, that a first step in that direction was made in [28] where the selection, projection and join operations were briefly tackled (but only a non-bipolar version of the join was considered and the setoriented operators were not formally defined). In [32,33,25], Zadrozny and Kacprzyk consider a specific interpretation of bipolar fuzzy selection queries. Instead of using the lexicographic order to combine constraints and wishes, they propose an operator inspired by an aggregation technique, aimed at modeling prioritized constraints, initially introduced by Lacroix and Lavency in their system Preferences [26]. They define a fuzzy version of it which represents an interpretation of the concept of a bipolar query with an and possibly operator. They also show its basic connection with a fuzzy version of the operator winnow whose crisp version was first defined by Chomicki [16]. This fuzzy and possibly operator—whose suitability for representing bipolar preferences is criticized in [23]—corresponds to an approach previously proposed by R. Yager [31], where the interpretation of and possibly is based on the content of the whole database. In [29,30], Matthé et al. also deal with the interpretation of bipolar fuzzy selection queries. They define satisfaction and dissatisfaction degrees following an approach closely related to Atanassov’s intuitionistic fuzzy sets (the only difference w.r.t. Atanassov’s approach is that the consistency condition is dropped). The authors discuss the operators needed for evaluating selection queries involving the standard logical operators conjunction, disjunction and negation, but do not attempt to define any bipolar relational algebra. In [17], this type of approach based on intuitionistic fuzzy sets is applied to uncertain (possibilistic) database querying. To be complete, let us mention that the use of an operator of the type ‘‘and if possible’’ was also proposed in an information retrieval context, notably by Bordogna and Pasi [4]. An alternative definition of this operator in a database querying context can also be found in [13] where unipolar fuzzy relations are considered. 7. Conclusion In this paper, we have defined an extension of relational algebra suitable for the handling of bipolar fuzzy relations and conditions. This framework makes it possible for a user to express twofold requirements made of a (possibly complex) constraint and a (possibly complex) wish. The satisfaction of the constraint and that of the wish can be used to order the tuples of the result by means of the lexicographic order (with a priority given to the constraint). We do not claim that this approach to the handling of bipolarity in a database context is the only possible one, but we have shown that it has the important merit of providing a consistent framework. If one wants to handle two separate degrees—which corresponds to a strict view of bipolarity—and obtain a total ordering of the results, the lexicographic order looks like a rather straightforward choice, with a clear interpretation. As to perspectives for future work, let us mention the definition of a bipolar fuzzy version of SQL, which implies studying how the SQL constructs which have no counterpart in relational algebra (in particular nesting operators, aggregates and partitioning mechanism) can be extended so as to capture a bipolar semantics. Appendix A Proof of Property 1. pX(pXY(r)) = pX(r). One has:
lmaxt2pXY ðrÞ
j tX¼x
ðlpXY ðrÞ ðtÞ; gpXY ðrÞ ðtÞÞ ¼ lmaxt2pXY ðrÞ ¼ lmaxu2r
j tX¼x
j u:X¼x
ðlmaxt0 2r
ðlr ðuÞ; gr ðuÞÞ
due to the associativity of lmax. Proof of Property 2. One has:
rw2 ðrw1 ðrÞÞ ¼ rw1 ðrw2 ðrÞÞ ¼ rw1
^ w2 ðrÞ.
lminðlminððlr ðtÞ; gr ðtÞÞ; ðlw1 ðtÞ; gw1 ðtÞÞÞ; ðlw2 ðtÞ; gw2 ðtÞÞÞ ¼ lminðlminððlr ðtÞ; gr ðtÞÞ; ðlw2 ðtÞ; gw2 ðtÞÞÞ; ðlw1 ðtÞ; gw1 ðtÞÞÞ ¼ lminððlr ðtÞ; gr ðtÞÞ; lminððlw1 ðtÞ; gw1 ðtÞÞ; ðlw2 ðtÞ; gw2 ðtÞÞÞÞ due to the associativity of lmin.
j t 0 X¼x^t 0 Y¼y
ðlr ðt 0 Þ; gr ðt 0 ÞÞÞ
14
P. Bosc, O. Pivert / Information Sciences 219 (2013) 1–16
Proof of Property 3. One has:
rw1
_ w2 ðrÞ
¼ rw1 ðrÞ [ rw2 ðrÞ.
lminððlr ðtÞ; gr ðtÞÞ; lmaxððlw1 ðtÞ; gw1 ðtÞÞ; ðlw2 ðtÞ; gw2 ðtÞÞÞÞ ¼ lmaxðlminððlr ðtÞ; gr ðtÞÞ; ðlw1 ðtÞ; gw1 ðtÞÞÞÞ; lminððlr ðtÞ; gr ðtÞÞ; ðlw2 ðtÞ; gw2 ðtÞÞÞ due to the distributivity of lmin over lmax. Proof of Property 4. rw(pX(r)) = pX(rw(r)) if w concerns X only. One has:
lminððlw ðxÞ; gw ðxÞÞ; lmaxt2r
j tX¼x
ðlr ðtÞ; gr ðtÞÞÞ ¼ lmaxt2r
j tX¼x
lminððlr ðtÞ; gr ðtÞÞ; ðlw ðtÞ; gw ðtÞÞÞ
due to the distributivity of lmin over lmax. Proof of Property 5.
rw ðr1 r2 Þ ¼ rw1
^ w2 ^ w3 ðr 1
r2 Þ ¼ rw3 ðrw1 ðr 1 Þ rw2 ðr 2 ÞÞ
where w = w1 ^ w2 ^ w3 and w1 concerns only attributes from r1, w2 concerns only attributes from r2, and w3 is the part of w that concerns attributes from both r1 and r2. One has:
lminððlw ðt t 0 Þ; gw ðt t0 ÞÞ; lminððlr1 ðtÞ; gr1 ðtÞÞ; ðlr2 ðt 0 Þ; gr2 ðt 0 ÞÞÞÞ ¼ lminðlminððlw1 ðt t 0 Þ; gw1 ðt t 0 ÞÞ; ðlw2 ðt t 0 Þ; gw2 ðt t 0 ÞÞ; ðlw3 ðt t 0 Þ; gw3 ðt t 0 ÞÞÞ; lminððlr1 ðtÞ; gr1 ðtÞÞ; ðlr2 ðt 0 Þ; gr2 ðt 0 ÞÞÞÞ ¼ lminððlw3 ðt t 0 Þ; gw3 ðt t 0 ÞÞ; lminðlminððlw1 ðtÞ; gw1 ðtÞÞ; ðlr1 ðtÞ; gr1 ðtÞÞÞ; lminððlw2 ðtÞ; gw2 ðtÞÞ; ðlr2 ðtÞ; gr2 ðtÞÞÞÞÞ due to the associativity of lmin and the fact that the attributes which are not concerned by a bipolar condition do not play any role in the computation of the degrees related to this condition. Proof of Property 6. rw(r1 [ r2) = rw(r1) [ rw(r2). One has:
lminððlw ðtÞ; gw ðtÞÞ; lmaxððlr1 ðtÞ; gr1 ðtÞÞ; ðlr2 ðtÞ; gr2 ðtÞÞÞÞ ¼ lmaxðlminððlw ðtÞ; gw ðtÞÞ; ðlr1 ðtÞ; gr1 ðtÞÞÞ; lminððlw ðtÞ; gw ðtÞÞ; ðlr2 ðtÞ; gr2 ðtÞÞÞÞ due to the distributivity of lmin over lmax. Proof of Property 7. rw(r1 \ r2) = rw(r1) \ rw(r2) = rw(r1) \ r2 = r1 \ rw(r2). One has:
lminððlw ðtÞ; gw ðtÞÞ; lminððlr1 ðtÞ; gr1 ðtÞÞ; ðlr2 ðtÞ; gr2 ðtÞÞÞÞ ¼ lminðlminððlw ðtÞ; gw ðtÞÞ; ðlr1 ðtÞ; gr1 ðtÞÞÞ; lminððlw ðtÞ;
gw ðtÞÞ; ðlr2 ðtÞ; gr2 ðtÞÞÞÞ ¼ lminðlminððlw ðtÞ; gw ðtÞÞ; ðlr1 ðtÞ; gr1 ðtÞÞÞ; ðlr2 ðtÞ; gr2 ðtÞÞÞ ¼ lminððlr1 ðtÞ; gr1 ðtÞÞ; lminððlw ðtÞ; gw ðtÞÞ; ðlr1 ðtÞ; gr1 ðtÞÞÞÞ due to the associativity of lmin. Proof of Property 8. rw(r1 r2) = rw(r1) rw(r2) = rw(r1) r2. One has:
lminððlw ðtÞ; gw ðtÞÞ; lminððlr1 ðtÞ; gr1 ðtÞÞ; ðlr2 ðtÞ; gr2 ðtÞÞÞÞ ¼ lminðlminððlw ðtÞ; gw ðtÞÞ; ðlr1 ðtÞ; gr1 ðtÞÞÞ; lminððlw ðtÞ;
gw ðtÞÞ; ðlr2 ðtÞ; gr2 ðtÞÞÞÞ ¼ lminðlminððlw ðtÞ; gw ðtÞÞ; ðlr1 ðtÞ; gr1 ðtÞÞÞ; ðlr2 ðtÞ; gr2 ðtÞÞÞ due to the associativity of lmin.
15
P. Bosc, O. Pivert / Information Sciences 219 (2013) 1–16
Proof of Property 9. pX(r1 [ r2) = pX(r1) [ pX(r2). One has:
lmaxt2r1 [r2
j t:X¼x
lmaxððlr1 ðtÞ; gr1 ðtÞÞ; ðlr2 ðtÞ; gr2 ðtÞÞÞ ¼ lmaxðlmaxt2r1
j t:X¼x
ðlr1 ðtÞ; gr1 ðtÞÞ; lmaxt2r2
j t:X¼x
ðlr2 ðtÞ;
gr2 ðtÞÞÞ due to the fact that:
lmaxu2E[F ðlðuÞ; gðuÞÞ ¼ lmaxðlmaxu2E ðlðuÞ; gðuÞÞ; lmaxu2F ðlðuÞ; gðuÞÞÞ: Proof of Property 10. pZ(r1 r2) = pX(r1) pY(r2) if X (resp. Y) denotes the subset of attributes of Z present in r1 (resp. r2). One has:
lmaxtt0 2r1 r2 ¼ lmaxt2r1
j ðtt 0 Þ:Z¼xy j t:X¼x
¼ lminðlmaxt2r1
lminððlr1 ðtÞ; gr1 ðtÞÞ; ðlr2 ðt 0 Þ; gr2 ðt 0 ÞÞÞ
lmaxt0 2r2 j t:X¼x
j t 0 :Y¼y
lminððlr1 ðtÞ; gr1 ðtÞÞ; ðlr2 ðt 0 Þ; gr2 ðt0 ÞÞÞ
ðlr1 ðtÞ; gr1 ðtÞÞ; lmaxt0 2r2
j t 0 :Y¼y ð
lr2 ðt0 Þ; gr2 ðt0 ÞÞÞ
since
lmaxu2E lmaxv 2F lminðu;
v Þ ¼ lminðlmaxu2E
u; lmaxv 2F
v Þ:
References [1] K. Atanassov, Intuitionistic fuzzy sets, Fuzzy Sets and Systems 20 (1) (1986) 87–96. [2] R. Belohlávek, V. Vychodil, Logical foundations for similarity-based databases, in: L. Chen, C. Liu, Q. Liu, K. Deng (Eds.), DASFAA Workshops, Lecture Notes in Computer Science, vol. 5667, Springer, 2009, pp. 137–151. [3] S. Benferhat, D. Dubois, S. Kaci, H. Prade, Bipolar possibility theory in preference modeling: representation, fusion and optimal solutions, Information Fusion 7 (1) (2006) 135–150. [4] G. Bordogna, G. Pasi, A fuzzy query language with a linguistic hierarchical aggregator, in: Proc. of ACM SAC 1994, 1994, pp. 184–187. [5] P. Bosc, B. Buckles, F. Petry, O. Pivert, Fuzzy databases, in: J. Bezdek, D. Dubois, H. Prade (Eds.), The Handbooks of Fuzzy Sets Series, Fuzzy Sets in Approximate Reasoning and Information Systems, vol. 3, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1999, pp. 403–468. [6] P. Bosc, D. Dubois, O. Pivert, H. Prade, Flexible queries in relational databases — the example of the division operator, Theoretical Computer Science 171 (1–2) (1997) 281–302. [7] P. Bosc, C. Legrand, O. Pivert, About fuzzy query processing — The example of the division, in: Proc. of the 8th IEEE International Conference on Fuzzy Systems (FUZZ-IEEE’99), Seoul, Korea, 1999, pp. 592–597. [8] P. Bosc, O. Pivert, SQLf: a relational database language for fuzzy querying, IEEE Transactions on Fuzzy Systems 3 (1) (1995) 1–17. [9] P. Bosc, O. Pivert, SQLf query functionality on top of a regular relational database management system, in: O. Pons, M. Vila, J. Kacprzyk (Eds.), Knowledge Management in Fuzzy Databases, Physica-Verlag, Heidelberg,Germany, 2000, pp. 171–190. [10] P. Bosc, O. Pivert. On the division of bipolar fuzzy relations, in: Proc. of the 29th International Conference of the North American Fuzzy Information Processing Society (NAFIPS’10), Toronto, Canada, 2010. [11] P. Bosc, O. Pivert, On diverse approaches to bipolar division operators, International Journal of Intelligent Systems 26 (10) (2011) 911–929. [12] P. Bosc, O. Pivert, On the negation of bipolar fuzzy conditions, in: Proc. of the 30th International Conference of the North American Fuzzy Information Processing Society (NAFIPS’11), El Paso, TX, USA, 2011. [13] P. Bosc, O. Pivert, On four noncommutative fuzzy connectives and their axiomatization, Fuzzy Sets and Systems 202 (2012) 42–60. [14] P. Bosc, O. Pivert, L. Liétard, A. Mokhtari, Extending relational algebra to handle bipolarity, in: Proc. of the 25th ACM Symposium on Applied Computing (SAC 2010), Sierre, Switzerland, 2010, pp. 1717–1721. [15] P. Bosc, O. Pivert, D. Rocacher, About quotient and division of crisp and fuzzy relations, Journal of Intelligent Information Systems 29 (2) (2007) 185– 210. [16] J. Chomicki, Preference formulas in relational queries, ACM Transactions on Database Systems 28 (2003) 1–40. [17] G. De Tré, S. Zadrozny, A. Bronselaer, Handling bipolarity in elementary queries to possibilistic databases, IEEE Transactions on Fuzzy Systems 18 (3) (2010) 599–612. [18] D. Dubois, The role of fuzzy sets in decision sciences: old techniques and new directions, Fuzzy Sets and Systems 184 (1) (2011) 3–28. [19] D. Dubois, S. Gottwald, P. Hájek, J. Kacprzyk, H. Prade, Terminological difficulties in fuzzy set theory – the case of ‘‘intuitionistic fuzzy sets, Fuzzy Sets and Systems 156 (3) (2005) 485–491. [20] D. Dubois, W. Ostasiewicz, H. Prade, Fuzzy sets: history and basic notions, in: D. Dubois, H. Prade (Eds.), The Handbooks of Fuzzy Sets Series, Fundamentals of Fuzzy Sets, vol. 1, Kluwer Academic Publishers, Dordrecht, The Netherlands, 2000, pp. 21–124. [21] D. Dubois, H. Prade, Twofold fuzzy sets and rough sets — some issues in knowledge representation, Fuzzy Sets and Systems 23 (1) (1987) 3–18. [22] D. Dubois, H. Prade, Bipolarity in flexible querying, in: T. Andreasen, A. Motro, H. Christiansen, H.L. Larsen (Eds.), FQAS, Lecture Notes in Computer Science, vol. 2522, Springer, 2002, pp. 174–182. [23] D. Dubois, H. Prade, Handling bipolar queries in fuzzy information processing, in: J. Galindo (Ed.), Handbook of Research on Fuzzy Information Processing in Databases, Information Science Reference, Hershey, PA, USA, 2008, pp. 97–114. [24] D. Dubois, H. Prade, Gradualness, uncertainty and bipolarity: making sense of fuzzy sets, Fuzzy Sets and Systems 192 (2012) 3–24. _ [25] J. Kacprzyk, S. Zadrozny, Bipolar queries, and intention and preference modeling: Synergy and cross-fertilization, in: Proc. of the World Conference on Soft Computing (WCSC’11), San Francisco, CA, USA, 2011. [26] M. Lacroix, P. Lavency, Preferences: putting more knowledge into queries, in: Proc. of the 13rd VLDB Conference, 1987, pp. 217–225. [27] C. Li, K.C.-C. Chang, I.F. Ilyas, S. Song, RankSQL: query algebra and optimization for relational top-k queries, in: F. Özcan (Ed.), SIGMOD Conference, ACM, 2005, pp. 131–142. [28] L. Liétard, D. Rocacher, P. Bosc, On the extension of SQL to fuzzy bipolar conditions, in: Proc. of the 28th International Conference of the North American Fuzzy Information Processing Society (NAFIPS’09), 2009. [29] T. Matthé, G. De Tré, Bipolar query satisfaction using satisfaction and dissatisfaction degrees, in: Proc. of ACM SAC 2009, 2009, pp. 1699–1703.
16
P. Bosc, O. Pivert / Information Sciences 219 (2013) 1–16
_ [30] T. Matthé, G. De Tré, S. Zadrozny, J. Kacprzyk, A. Bronselaer, Bipolar database querying using bipolar satisfaction degrees, International Journal of Intelligent Systems 26 (10) (2011) 890–910. [31] R. Yager, Fuzzy sets and approximate reasoning in decision and control, in: Proc. of FUZZ-IEEE’92, 1992, pp. 415–428. _ [32] S. Zadrozny, J. Kacprzyk, Bipolar queries and queries with preferences, in: Proc. of DEXA’06 Workshops, IEEE Computer Society, 2006, pp. 415–419. _ [33] S. Zadrozny, J. Kacprzyk, Bipolar queries using various interpretations of logical connectives, in: P. Me lin, O. Castillo, L.T. Aguilar, J. Kacprzyk, W. Pedrycz (Eds.), Proc. of IFSA, Lecture Notes in Computer Science, vol. 4529, Springer, 2007, pp. 181–190.