Accepted Manuscript Fuzzy functional dependencies: A comparative survey
L. Ježková, P. Cordero, M. Enciso
PII: DOI: Reference:
S0165-0114(16)30214-7 http://dx.doi.org/10.1016/j.fss.2016.06.019 FSS 7069
To appear in:
Fuzzy Sets and Systems
Received date: Revised date: Accepted date:
3 December 2014 22 May 2016 29 June 2016
Please cite this article in press as: L. Ježková et al., Fuzzy functional dependencies: A comparative survey, Fuzzy Sets Syst. (2016), http://dx.doi.org/10.1016/j.fss.2016.06.019
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Fuzzy functional dependencies: a comparative survey L. Jeˇzkov´a1 , P. Cordero2 and M. Enciso2 1 DAMOL (Data Analysis and Modeling Laboratory) Dept. Computer Science, Palacky University, Olomouc 17. listopadu 12, CZ–77146 Olomouc, Czech Republic
[email protected] 2 Universidad de M´ alaga, Spain
[email protected],
[email protected]
Abstract. Similarity search and related issues are current topic in databases. Over the last ten years more then 6500 papers dealing with similarity in databases were published according to Web of Science. The rising in the number of articles in the recent years shows that the research in this direction is still in its early stage. From the wide range of topics related with similarities in databases, one received a considerable attention already, namely functional dependencies which take similarities into account. Our main concern in this paper is to review and critically examine the existing work on this topic. Keywords: Functional dependencies, Fuzzy functional dependencies, Relational model, Fuzzy relational model, Fuzzy database, Similarity relations, Similarity-based functional dependencies, Residuated lattice, Fuzzy logic
1
Introduction
More than one hundred papers dealing with functional dependencies (FDs) over domain with similarities can be found in the literature, and many of them contain interesting generalizations regarding the original notion of functional dependency (FD). In our opinion, the wide range of approaches are worthy of an exhaustive review and a comparison of the most significant ones. Often the name “fuzzy functional dependencies” is used for various extensions of FDs which we think is unfortunate for several reasons. One of them is that the name is too broad and concerns various aspects (e.g. even in the ordinary relational model, i.e. without similarity, some things may be considered fuzzy, i.e. a matter of degree). Furthermore, there is no agreement among researchers what the terms “fuzzy database” and “fuzzy functional dependencies” really mean. For example in [97] the term fuzzy database is used for a collection of relations, where each relation is understood as an ordinary subset of the Cartesian product of domains, where domains are collections of possibility distributions. Contrary to that, in [80] the fuzzy relational data model is used for a collection of fuzzy relations, i.e. fuzzy subsets of the Cartesian product of domains. Since the name “fuzzy functional dependency” is usually used for functional dependencies defined within “fuzzy relational model”, it is not clear what “fuzzy functional dependency” really means. Moreover, although many definitions of so-called fuzzy functional dependencies extend the classical one, the dependency usually remains crisp in the sense that either a given relation satisfies the dependency or it does not. In this sense the term fuzzy functional dependency is inadequate, even though we understand that it may be seen as a convenient shorthand. We will therefore use the term generalized functional dependency (GFD) and generalized relational model (GRM) to prevent misunderstanding. It is not surprising that among other topics related to various generalizations of the relational model, functional dependencies are one of the most studied. Functional dependencies are part of Codd’s relational model from its very beginnings and are an essential notion for database design, where they serve as an important tool for redundancy elimination and normalization. The interpretation of an ordinary functional dependency A ⇒ B in a given data table D is the following: if any two tuples have the same values on all attributes from A, then they have the same values on all attributes from B. We say that A determines B or B is functionally dependent on A. We specifically intend to concentrate on FD over domains with similarities and the directly related issues. What does “FD over domains with similarities” mean? We consider it reasonable
2
to understand by this those approaches in which the identities on domains implicitly present in the ordinary model are replaced with similarity relations, i.e. the ordinary model is generalized this way. The idea to use similarities appeared in 1982 [15]. This paper can be considered as one of the pioneering works in this area. In this view, it seems natural to consider as good those approaches which generalize the ordinary FD in this sense: If we apply the notions related to a given generalized approach, such as the concept of FD, the concept of validity of FD etc., to the particular situation in which all the similarities are identities, the generalized notions become (up to notation differences) the ordinary notions. One of the essential features of the ordinary Codd’s model, is its logical-relational foundation. For instance, there is an equivalence between functional dependency statement and implicational statement of propositional logic [51]. A logical viewpoint is therefore a good guide. From the logical point of view, the generalization of FD to FD over domains with similarities may be looked at as replacing two-valued identity relations by many-valued ones which represent similarities, i.e. with fuzzy relations satisfying appropriate properties. This step may be considered as switching from a two-valued logic, as the formal framework in which the ordinary model is implicitly developed 3 , to an appropriate fuzzy logic. Since two-valued logic may be considered as a particular case of a fuzzy logic, the logical viewpoint makes it naturally possible to reflect on whether and in what sense a particular approach to FD over similarities is a proper generalization of the ordinary one. The switch to fuzzy logic naturally brings the question of how the concept of validity, entailment etc. should be dealt with. Assume the following similarity-based FD: Hotels with similar ranking have rooms for similar prices. Should the validity of GFD in a given relation remain bivalent (true or false), or should the validity be many-valued (e.g. taking values from [0, 1])? We will address this point later in Section 2. These are our “ideological roots”: we are aware that other generalizations are possible besides those that may be addressed from the logical viewpoint, but we insist on the claim that good approaches need to have logical foundations to follow the idea of the relational model of data. We are going to focus on FD over domains with similarities. Even this restriction leaves extensive work to be done. Our plan is to do it in two steps. The first one is this paper, where we want to 1. present a reasonably complete list of various definitions of similarity-based FDs and critically examine them, 2. provide a unifying framework for different approaches using fuzzy logic based on complete residuated lattices, 3. objectively compare them using our criterion (which is based on the notion of fuzzy function). We decided to leave their detailed comparison (including comparison of inference systems) to a second paper, see future work at the end of this paper. Classical FDs are also used in the design of databases by means of normalization methods. There are several works that tackled the normalization issue for GFDs. We will not provide detailed comparison of the new normalization methods and notions, but we will mention several works that have already introduced normal forms for GFDs (usually called fuzzy normal forms). There are several works which addressed and examined the various proposals to GFDs. In [9] the authors focused on the semantics of various extensions of functional dependencies. In [10] the authors concentrate on the connection between GFD and redundancy elimination. Recently, in [99], a list of different kinds of approaches to functional dependencies in a generalized relational model was presented. In [23] the authors examined the relationship between some GFDs and their approach, see discussion after Lemma 1. The authors in [23] used fuzzy logic in a narrow sense and stated that it provides conceptual and methodological foundations, clarity and simplicity. The first two works are now almost twenty years old, and thus do not provide up to date information. More importantly none of the works is trying to unify various approaches or to objectively compare them. 3
Codd’s original model was based on two-valued logic, although later Codd himself extended its relational calculus by considering a three-valued logic to manage missing, non-applicable or unknown information via the NULL value [36]. In a further step [37], a four-valued logic was introduced to deal separately with these different types of uncertainties. Nevertheless, these extensions have been subject to criticism in the past (see C.J. Date in [44]). Also, nowadays the NULL value is not considered as a good approach to manage uncertainty or missing information [95].
3
The paper is organized as follows. In Section 2, we give an overview of the approaches we are going to study and explain why we do not include association rules in our comparison. In Section 3, the main results concerning the relational model and fuzzy logic based on residuated lattices are recalled. Section 4 contains an exhaustive presentation of similarity-based approaches to FD which appear in the literature. Each approach is described by paying attention to the structure of truth degrees and to the extension of datatable (if such extension is presented). In Section 5, we introduce a widely accepted definition of fuzzy function as a basis for the development of our comparative criterion, which we further use for comparing the influential generalizations of functional dependencies introduced in Section 4. In Section 6, we summarize the results concerning different generalizations of FD and we present some conclusions.
2
Overview of generalizations of the relational model
We begin this section with an illustrative example which motivates the similarity-based generalizations of functional dependency and also establishes the conceptual differences with the association rules. Consider Table 1 which stores some hotel characteristics provided by users of a web site for trip evaluations. Each row of the table corresponds to an evaluation given by a specific user. We store the following characteristics in columns: Hotel name, City, Year (to indicate in which year the hotel was built or last refurbished), Price, Service, Sleep quality, Cleanliness, Location and Information about the type of the trip (family, friends, couple, business or solo).
r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16
User Allan S. M. Mr. Smith. Jean Y. Norma B. Luke L. Donald P. Peter J.K. Luke L. Louis M. Luke L. Martin K. Jean Y. Todd P. Stephan Y. Frizt L. Jean Y.
Hotel Name Royal Hotel Royal Hotel Royal Hotel Royal Hotel Royal Hotel Main Street Inn. Main Street Inn. Fortune Hotel Fortune Hotel Stradivarius Hotel Stradivarius Hotel Stradivarius Hotel Stradivarius Hotel Stradivarius Hotel Stradivarius Hotel B. M. Parade
City New York New York New York New York New York Washington Washington Washington Washington New York New York New York New York New York New York New York
Year 2005 2005 2005 2005 2005 2005 2005 2004 2004 2013 2013 2013 2013 2013 2013 2011
Price 150$ 180$ 150$ 160$ 160$ 180$ 180$ 185$ 185$ 245$ 245$ 245$ 225$ 235$ 255$ 235$
Service Sleep Quality Cleanliness Location Type of trip ••••◦ ••••◦ ••••• ••••◦ Family ••••◦ ••••◦ ••••◦ ••◦◦◦ Business ••••◦ ••••• ••••• •••◦◦ Family ••••◦ ••••◦ ••••• ••••◦ Friends •◦◦◦◦ ••••◦ ••••• •••◦◦ Couple ••••• •••◦◦ ••◦◦◦ ••••◦ Friends ••••• ••••◦ •••◦◦ ••••◦ Couple ••••◦ ••••◦ ••••• ••••◦ Friends ••••◦ ••••◦ ••••• ••••• Couple ••••◦ ••••◦ ••◦◦◦ ••••◦ Friends •••◦◦ ••••◦ •◦◦◦◦ ••••◦ Friends ••••◦ ••••◦ ••◦◦◦ ••◦◦◦ Solo ••••◦ ••••◦ ••◦◦◦ ••••◦ Family ••••◦ ••••◦ ••◦◦◦ ••••◦ Couple ••••• ••••◦ •◦◦◦◦ ••◦◦◦ Business ••••◦ ••••◦ ••••• ••••• Solo
Table 1. Hotel ranking
Table 1 satisfies, for example, the following (classical) functional dependency: {Hotel Name} ⇒ {City, Year}. But there are other interesting dependencies which cannot be captured by classical functional dependencies. For example you can see that the FD {Hotel Name} ⇒ {Service} is not true, but it is almost true in the sense that it is satisfied in almost all the tuples—there are only few exceptions, namely tuples denoted as r5 , r11 and r15 . This situation can be described by so-called association rules 4 introduced by R. Agrawal in [1]. An association rule A ⇒ B is usually evaluated in terms of support and confidence using probability and conditional probability measure respectively. Functional dependencies with exceptions are usually called approximate dependencies or partial dependencies. There are two basic ways to define exceptions—either as a minimum number of tuples, which must be removed from the relation to obtain a new relation satisfying the FD, see [6,61], or as a pair of tuples that do not satisfy a given FD, see [61]. The second approach was somehow used in [102] as well, where the authors defined the degree to which FD is true as a number of pairs of tuples that satisfy a given FD divided by 4
Although association rules were introduced for binary tables, their definition can also be extended to relation tables.
4
the number of all possible pairs of tuples. In [85], approximate dependencies for relational data were defined as association rules in the corresponding binary table. The same technique was used for relational data with presence of similarity relation in [5]. We do not include association rules and approximate dependencies in our survey, since they address different issues, complementary to those we focus on. Although association rules (or approximate dependencies) show interesting patterns in data, its probabilistic-based approach makes it impossible to reveal another important semantics of data dependencies. In Table 1, there exists a high range of coincidences between each hotel and the service’s rank. According to association rules definition, we may affirm that {Hotel Name} ⇒ {Service} holds (with an appropriately high confidence degree) in the table, because if two tuples have equal value on attribute Hotel Name there is a high probability that they will have equal value on attribute Service as well. As we have already mentioned, the tuple r5 for Royal Hotel and tuples r11 and r15 in the case of Stradivarius Hotel are the exceptions to the aforementioned association rules. The validity of the association rule depends on the number of exceptions (and their definition, of course) but not on how much the unfitting tuples differ from the correct ones 5 . For example, if the tuple r5 has a rank for Service equal to 5, it will still violate the association rule, although we can see that in this case a guest from the same hotel will have a similar opinion of the service. We cannot say that if the value remains equal to 1. Instead of considering approximate dependencies and probabilistic approach, many authors extend the functional dependencies in a different way: they replace the equalities by similarities on the domains of attribute values. From the data in Table 1, we observe that all customers who stayed in a hotel built (or renovated) in a similar year, paid a very similar price. Or we can see that guests staying in the same hotel have very similar opinions regarding cleanliness. Association rules are not helpful in any of the cases, because price values vary from guest to guest (usually you will get a better price if you booked and paid in advance) and the same applies for cleanliness—its ranking varies slightly from customer to customer. Thus it seems natural to consider some similarity relation on each domain and then use the similarities to define similarity-based functional dependencies. Such dependencies (sometimes called fuzzy dependencies) usually try to capture the following dependency: If tuples have similar values on attributes from A, then they have similar values on attributes from B. The main goal of this paper is to examine generalizations of FDs involving similarity relations. The extension of domains with similarity relations usually does not stand alone but comes together with ranked data tables and various data extensions. Therefore, in order to keep this overview as complete as possible, we include generalizations of the relational model that deal with at least one of the following three issues: similarity on domains, ranked data tables and data extensions. 1. Similarity-based approaches (from equality to similarity): In most of the approaches we consider, domains are additionally equipped with some kind of similarity relation to denote the degrees of similarity between domain values. Thus the equality relation that is implicitly presented in Codd’s original model (domain values are either “equal” or “not equal”), is replaced by a binary fuzzy relation that maps every pair of domain values to [0, 1] and is meant to express the similarity (closeness) of domain values [7, 14, 15, 27, 40, 48, 78, 81, 82, 84, 92, 105] and later in [5, 23, 53, 64, 68, 88, 96, 110]. Similarity (also called resemblance or proximity) usually means a reflexive and symmetric measure. Sometimes an additional property, namely transitivity, is required. On the other hand, there are approaches where similarity is defined as (only) reflexive relation [56]. Although the degree of similarity comes usually from [0, 1], there are extensions considering more general algebraic structures, e.g. commutative semiring [56] or residuated lattice [23, 38]. 2. Rank-based approaches (from relation to fuzzy relation): By rank-based approaches we mean extensions of the relational model in which the data table is seen as a fuzzy set of tuples (in the original model, the data table is simply a set of tuples). Thus the data table has an additional column which contains a rank—also called (membership) grade, score or weight—to express to what degree a tuple belongs to a data table. First attempts 5
Recently, this issue was tackled to some extent in [83] by employing a similarity relation, understood in the paper as a bivalent relation (either two values are completely similar or they are not similar at all).
5
to rank-based approaches can be found in [4, 60, 81, 98, 111]. Later works include [23, 52, 72, 74, 82,93,96,104]. There are also extensions in which the rank is assigned to every attribute value, e.g. in [38, 70]. The ranks usually take values from [0, 1], but there are approaches in which the unit interval is replaced by some general algebraic structure, e.g. commutative semiring [55], De Morgan frame [56], or residuated lattice [23]. In one of the pioneering works done by Umano [98] the rank itself is a possibility distribution on [0, 1]. The idea that the rank is a non-single value appears also in [74], where the rank is a pair of possibility and necessity measures to indicate the possibility and necessity that a tuple satisfies a certain constraint. The meaning of the rank differs among approaches and it is seen as: (a) Compatibility with the relation, e.g. in [4] the rank is a “degree to which a tuple t satisfies the relation or is compatible with the relation”. Later in [10] the rank is understood as the “degree to which a tuple belongs to the relation, which is then supposed to have a fuzzy (or gradual meaning)”. For example, consider relation Young employee with attributes Name and Age, then a tuple belongs to the data table to the degree to which it satisfies the concept “Young”. (b) Global confidence level, see [10] for example: “(The weight is a) global confidence level in information stored in the tuple which is a part of a relation representing all or nothing concept”. Consider relation Likes with attributes Name and Movie, then the degree to which a tuple belongs to the relation is understood as the confidence in the information stored in the tuple, but with relation Likes remaining crisp. (c) Compatibility with the set of individual constraints specified on the relation, see [72, 74]. (d) Degree to which a tuple matches a query, see [23,52,82], or the degree to which it is possible that a tuple matches a query, see [16]. The fact that the interpretation of the rank differs among approaches is puzzling, as already pointed out by Dubois and Prade in [48]. Moreover, there are approaches in which the meaning of the rank is not clearly explained, for example in [60]. 3. Data extensions (from precise to imprecise values): The third aspect involved in the various generalizations of the relational model, is data extensions, i.e. the switch from precise to imprecise values. There are several approaches where the authors are trying to incorporate more complex data, namely an attribute value is considered to be a set of (possible) values in [15, 27, 90, 101, 105], a fuzzy set (including linguistic terms) or a possibility distribution in [7, 11, 27, 42, 48, 63, 65, 68, 74, 79, 81, 98], a vague set in [110] or an interval–valued possibility distribution in [73]. When a (fuzzy) set is employed as an attribute value, it is important to know its interpretation. A (fuzzy) set can be interpreted in two different ways, either as a conjunctive set (also called ontic set) or as a disjunctive set (also called epistemic set). The same set can be interpreted in both ways, for example consider the crisp set of Jane’s pets: {dog, cat, parrrot}. From the ontic point of view, the set represents the fact that Jane has three pets at home. By contrast, in the epistemic point of view, the set represents the fact that Jane has one pet at home: a dog or a cat or a parrot, where the “or” is exclusive. In this case, the set is understood as a possibility distribution (all three values are equally possible) and represents an uncertainty in our knowledge. These two kinds of interpretation pertain to fuzzy sets as well. On the one hand, fuzzy sets can represent gradual entities or linguistic variables (conjunctive meaning), and on the other hand, fuzzy sets can be used as possibility distributions [108] (disjunctive meaning). This distinction was made by Zadeh in [109] and later discussed for example in [49]. When the attribute value is considered to be a set or a fuzzy set, it is crucial to know whether the meaning is conjunctive or disjunctive. Unfortunately, the interpretation is not always clear and does not usually affect the semantics of GFD. Nevertheless, in Codd’s relational model there are no limitations to what can and cannot be an attribute value. C. J. Date, one of the leading experts on relational databases, pointed out that [45]: “ . . . the domains over which relations are defined can be of arbitrary complexity. As a consequence, we can have attributes of relations—or columns of tables, if you prefer—that contain geometric points, or polygons, or X rays, or XML documents, or fingerprints, or arrays, or lists, or relations, or any other kinds of values you can think of. But this idea too was always part of the relational model! The idea that the relational model could
6
handle only rather simple kinds of data (like numbers and strings and dates and times) is a huge misconception, and always was. . . . ”. We therefore argue that approaches considering only more complex data (without any further extensions of the model) should not be seen as genuine extensions of the original relational model, even when they consider fuzzy sets as attribute values. We shall examine these generalizations of Codd’s model. At this point, we want to explicitly note that we do not consider any other extensions, namely we are not going to deal with extensions of XML databases, object-oriented databases and nested databases. Also, we do not include, as we have mentioned and explained above, association rules and probabilistic databases. The reason is that probabilistic approaches capture different kind of uncertainty and are formally incomparable to approaches studied in this paper. 6
3
Methodological issues
In this section, we present the main notions needed in the sequel; namely we introduce the relational model of data, functional dependencies and some preliminaries from fuzzy logic. Many of the extensions of the relational model involve fuzzy logic in the broad sense and use one specific implication (e.g. G¨odel) in the semantics of GFD. We use the theory of fuzzy sets and fuzzy logic based on residuated lattices to unify the variety of GFD’s definitions. It will not be possible for all of them, of course, but we will try to use the fuzzy logic based on residuated lattices whenever possible. Residuated lattices: A complete residuated lattice with a truth-stressing hedge (shortly, a hedge) [50, 58] is an algebra L = L, ∧, ∨, ⊗, →, ∗ , 0, 1 such that L, ∧, ∨, 0, 1 is a complete lattice with 0 and 1 being the least and the greatest element of L, respectively; L, ⊗, 1 is a commutative monoid (i.e., ⊗ is commutative, associative, and a ⊗ 1 = 1 ⊗ a = a for each a ∈ L); ⊗ and → satisfy so-called adjointness property: a ⊗ b ≤ c iff a ≤ b → c for each a, b, c ∈ L, where ≤ is the order induced by the lattice structure of L (i.e., a ≤ b iff a = a ∧ b); hedge ∗ satisfies 1∗ a∗ (a → b)∗ a∗∗
= 1, ≤ a, ≤ a∗ → b∗ , = a∗ ,
(1) (2) (3) (4)
for each a, b ∈ L, ai ∈ L (i ∈ I). Elements a of L are interpreted as truth degrees. The operations ⊗ and → are (truth functions of) “fuzzy conjunction” and “fuzzy implication” and are called a multiplication and residuum, respectively. For every ⊗ there is at most one → satisfying adjointness, and similarly → uniquely determines ⊗. Hedge ∗ is a (truth function of) logical connective “very true”, see [57, 58]. Properties (1)–(4) have natural interpretations, e.g., (2) can be read: “if a is very true, then a is true”, (3) can be read: “if a → b is very true and if a is very true, then b is very true”, etc. Two boundary cases of (truth-stressing) hedges are (i) identity, i.e., a∗ = a (a ∈ L); (ii) globalization [94]: a∗ = If 6
∗
1, 0,
if a = 1, otherwise.
(5)
is a globalization, then (a → b)∗ = 1 iff a → b = 1 iff a ≤ b.
One basic difference between probability and fuzzy logic is that fuzzy logic is understood as truth functional. The truth value of a compound formula is uniquely determined by the truth values of formulas appearing in the compound formula and by the definition of truth functions of connectives.
7
Each residuated lattice satisfies: a→1=1
(6)
0→a=1
(7)
1→a=a a⊗b≤b
(8) (9)
a≤b→a
(10)
The following are further properties of residuated lattices that will be used in this paper. For each index set I: bi = (a ⊗ bi ) (11) a⊗ i∈I
a→
i∈I
bi =
i∈I
ai → b =
ai ⊗
i∈I
(12)
(ai → b)
(13)
i∈I
(ai → bi ) ≤
i∈I
(a → bi )
i∈I
i∈I
i∈I
i∈I
bi ≤
ai →
bi
(14)
i∈I
(ai ⊗ bi )
(15)
i∈I
The unit interval: Complete residuated lattices with hedges include structures defined on the real unit interval, i.e., structures L where L = [0, 1], ∧ and ∨ being minimum and maximum, respectively, and ⊗ being a left-continuous triangular norm (shortly, a t-norm) with the corresponding →. More precisely, the structure [0, 1], min, max, ⊗, →, 0, 1 is a complete residuated lattice iff ⊗ is a left continuous t-norm and a → b = max{c|a ⊗ c ≤ b}, see [57]. All complete residuated lattices on the real unit interval with continuous ⊗ can be constructed by means of ordinal sums [35] from the following three pairs of adjoint operations: L ukasiewicz: G¨odel: Goguen:
a ⊗ b = max(a + b − 1, 0), a ⊗ b = min(a, b), a ⊗ b = a · b,
a → b = min(1 − a + b, 1); a → b = b if a > b, 1 otherwise; a → b = ab if a > b, 1 otherwise.
Complete resitudated lattices [0, 1], min, max, ⊗, →, 0, 1 with universe [0, 1] and with L ukasiewicz, G¨odel or Goguen operations will be called standard L ukasiewicz, G¨ odel and Goguen algebra, respectively, and will be denoted as [0, 1]L , [0, 1]G , [0, 1]Π . Sometimes we will denote →L , →G , →Π in order to emphasize the L ukasiewicz, G¨ odel and Goguen implication, respectively. Discretization of the unit interval: If there is a need to iterate over all degrees in L and perform particular operations for each of them (e.g., in algorithms which rely on this type of iteration), it is sometimes necessary to restrict our considerations to finite scales of degrees in order to obtain results in finitely many elementary steps. In these cases, one can take, e.g., a finite L = {a0 = 0, a1 , . . . , an = 1} ⊆ [0, 1] (a0 < · · · < an ) with ⊗ given by ak ⊗ al = amax(k+l−n,0) and the corresponding → given by ak → al = amin(n−k+l,n) . Such an L is called a finite L ukasiewicz chain. Another possibility is a finite G¨ odel chain which consists of L and restrictions of G¨odel operations on [0, 1] to L (or, in general, a finite structure which results as an ordinal sum of finitely many finite L ukasiewicz and G¨ odel chains, etc.). A special case of a complete residuated lattice with hedge is the two-element Boolean algebra {0, 1}, ∧, ∨, ⊗, →, ∗ , 0, 1, denoted by 2, which is the structure of truth degrees of the classical logic. That is, the operations ∧, ∨, ⊗, → of 2 are the truth functions (interpretations) of the corresponding logical connectives of the classical logic and 0∗ = 0, 1∗ = 1. The only t-norm on {0, 1} is the classical conjunction and the corresponding implication is classical implication. Structure of L-sets: Having L, we utilize the following notions: an L-set (fuzzy set) A in universe U is a mapping A : U → L, A(u) being interpreted as “the degree to which u belongs to A”. Let
8
LU denote the collection of all L-sets in U . We will use the following notation for denoting L-sets: If U = {u1 , . . . , un } then an L-set A in U can be denoted by A = {a1/u1 , . . . , an/un } meaning that A(ui ) equals ai for each i = 1, . . . , n. For brevity, we introduce the following convention: we write {. . . , u, . . . } instead of {. . . , 1/u, . . . }, and we also omit elements of U with zero membership degree. For example, we write {u, 0.5/v} instead of {1/u, 0.5/v, 0/w}, etc. An L-set A in U is called empty, if A(u) = 0 for all u ∈ U . By a slight abuse of notation, we denote the empty L-set in U by ∅. The operations with L-set are defined component-wise, for A, B ∈ LU we have: (A ∪ B)(u) = A(u) ∨ B(u), (A ∩ B)(u) = A(u) ∧ B(u), (A ⊗ B)(u) = A(u) ⊗ B(u), (A → B)(u) = A(u) → B(u), A∗ (u) = (A(u))∗ , see [19]. Other set operations as the set-difference or the complement will be not used in this paper and, therefore, we did not include them here; for their definition see [19] or [59]. Possibility and necessity: Fuzzy sets can also be considered with a disjunctive semantics. It corresponds to situations of incomplete information, pervaded with imprecision and uncertainty. When a fuzzy set is used to represent what is known about the value of a single-valued variable, the degree of an specific value represents the possibility level in which this value is indeed the value of the variable. This semantics was introduced by Zadeh in [108]. It constitutes an alternative to probability theory: whereas probability theory uses a single measure—the probability—to describe how likely an event is to occur, possibility theory uses two concepts, the possibility and the necessity of the event. The relation between necessity and possibility resembles the one existing in modal logic, i.e. ν(A) = ¬π(¬A) where π : LU → LU and ν : LU → LU denote the possibility and the necessity respectively, and ¬ : L → L is a negation in the lattice L. L-relations and similarities: Given a finite family of classical sets Xi where i ∈ {1, . . . , n}, a n-ary L-relation is an L-set A ∈ LX with X = X1 × . . . × Xn . More specifically, we are interested in internal binary relations, i.e. elements of LX×X . A given relation ≈: X × X → L is called similarity if it is reflexive and symmetric, that is for all x, y ∈ X: [Ref] (x ≈ x) = 1, [Sym] (x ≈ y) = (y ≈ x) Sometimes other properties are required, e.g., transitivity ([Tra]) and separability ([Sep]): [Tra] (x ≈ y) ⊗ (y ≈ z) ≤ (x ≈ z) for all x, y, z ∈ X [Sep] (x ≈ y) = 1 iff (x = y) A similarity relation which is also ⊗-transitive is called an L-equivalence, or fuzzy equivalence, and will be denoted as ≡. If separability holds, it is called L-equality. We would like to remark that there is not a consensus in the terminology used to name these relations. Thus, some authors use the term “similarity relation” to refer to L-equivalences. The relational model: We will now present basic notions from the relational model. For further details see [69]. Let Y denote a set of attributes names y1 , y2 , . . . and {Dy | y ∈ Y } be a set of domains such that each domain is an arbitrary non-empty set. A relation scheme is a finite subset R ⊆ Y and, for each relation scheme R, Tupl(R) denotes y∈R Dy , i.e. the Cartesian product of domains Dy (y ∈ R). A data table D on R is any finite subset of Tupl(R). Recall that the Cartesian product is the set of all maps r : R → y∈R Dy such that r(y) ∈ Dy holds for all y ∈ R. Each r ∈ Tupl(R) is called a tuple over R and r(y) is called the y-value of r. Moreover, for each A ⊆ R, the restriction of r to the subset A is denoted by r(A) and therefore r(A) : A → y∈A Dy , because r satisfies r(y) ∈ Dy for all y ∈ R. If D is a data table on a relation scheme R, i.e. D ⊆ Tupl(R), and A is a subset of R, then DA denotes the projection of the data set D to the attribute set A. That is, DA = {r(A) | r ∈ D}
(16)
Assume A, B are sets of attributes, i.e. A, B ⊆ R, then we say A determines B (or B is functionally dependent on A) if whenever two tuples of D agree on the attributes from A then they agree on the attributes from B. We write A ⇒ B and call such a statement functional dependency. Formally, FD is satisfied by relation D iff ∀r1 , r2 ∈ D : if r1 (A) = r2 (A), then r1 (B) = r2 (B).
(17)
9
We will denote by A ⇒ BD the degree to which FD (and later generalized FD) A ⇒ B holds in a relation D, obviously from (17) we have A ⇒ BD ∈ {0, 1}. Note that D satisfies the functional dependency A ⇒ B if and only if the relation DA∪B is a (partial) function from DA to DB .
4
Overview of similarity-based approaches
In this section we are going to present an exhaustive set of approaches to similarity-based functional dependencies. To ease the reading of this section we begin with an enumeration of some aspects that need to be solved once the equality is replaced by a similarity relation. 4.1
Once we start with similarity
The definition of classical functional dependency introduced in Equation (17) can be rewritten as follows: A ⇒ BD = min{(r1 (A) = r2 (A)) → (r1 (B) = r2 (B)) | r1 , r2 ∈ D}.
(18)
In the above equation, the equality is understood as bivalent (either two values from a domain are equal or not) and the implication is the classical one (taking values from the set {0, 1}). Again, the underlying logic of Codd’s relational model is a classical predicate logic. Although the following is obvious, we want to explicitly mention that in the classical relational model: the truth value of FD, the degree to which a FD follows from a set of FDs, as well as the degree to which a tuple matches a query, come from the same set, {0, 1}. We will list generalizations of FD which replaced equality by similarity in Equation (18). As a consequence the set {0, 1} is replaced by some more general poset (usually [0, 1]) for expressing the similarity of two domain values. Having similarity relation on each domain, many questions suddenly arise: – [AtrSim] In the case of more complex data (e.g. attribute value is a set of possible values, or a fuzzy set): how should the similarity of attribute values be defined? When the attribute value is considered to be a set or a fuzzy set, the point is the consistency between the character of the (fuzzy) set (disjunctive or conjunctive) and the way in which a domain similarity is extended to such (fuzzy) sets. – [TuplSim] How should the similarity of tuples be defined based on the similarity of the corresponding attribute values? – [Imp] What implication should be used? Note that now r1 (A) ≈ r2 (A), r1 (B) ≈ r2 (B) are degrees from the previously chosen set. – [TrGFD] Should the notion of GFD remain crisp? Meaning that the GFD is either satisfied or not. Or should the concept become many-valued, meaning that we allow the GFD to be satisfied to a certain degree (between the two borderline cases: not satisfied, satisfied). – [Rank] Having similarities on domains, how should the similarity-based queries be evaluated? Should some degree to which data match a similarity-based condition appear? In [52] Fagin said about querying in multimedia database systems that: “. . . it is convenient to introduce “graded” (or “fuzzy”) sets, in which scores are assigned to objects, depending on how well they satisfy atomic queries”. We believe this applies not only to multimedia databases but to all databases where similarities are involved. We would like the reader to keep in mind these questions when going through the survey; the comparison based on answers to these questions can be found in Section 6. 4.2
Survey of similarity-based functional dependencies
We will now go through various attempts at generalizations of FD that use similarity measures instead of equality and reformulate them using residuated lattice as a structure of truth degrees. We will not focus on how various similarity measures were defined (the [AtrSim] problem), although
10
some important and widely cited approaches are mentioned, but mainly on how the similarity is used in the generalization of FDs. If not otherwise stated, we assume a relation scheme R = {y1 , . . . , yn }, A, B ⊆ R. Similarity or equivalence relation on domain Di of attribute yi will be denoted as ≈i , ≡i , respectively. In most of the approaches that used the unit interval as a set of truth degrees, the [TuplSim] issue is solved by using the minimum of the similarities of the corresponding attribute values. If not otherwise stated, we assume (19) r1 (A) ≈A r2 (A) = min r1 (yi ) ≈i r2 (yi ). yi ∈A
If there is no confusion, we will write r1 (A) ≈ r2 (A) instead of r1 (A) ≈A r2 (A). Buckles and Petry (1982): One of the pioneering works was done by Buckles and Petry, see [15]. The authors introduced a model, which will be later referred as Buckles-Petry model, where domains are equipped with fuzzy equivalence relations (called “similarity” in the original work) and tuple values are allowed to be (ordinary) non-empty subsets of the domain. That is, 2Dy , where 2Dy denotes 2Dy {∅}. (20) D⊆ y∈R
Authors called such relation D a fuzzy relation, although relation D is an ordinary relation, i.e. an ordinary subset (not fuzzy subset) of some cross product. Each domain is equipped with fuzzy equivalence, i.e. reflexive, symmetric and transitive relation, which maps every pair of domain values to [0, 1]. The transitivity was given by two different inequalities: T1 :
u ≡i w ≥ max {min{u ≡i v, v ≡i w}}
(21)
T2 :
u ≡i w ≥ max {u ≡i v ∗ v ≡i w},
(22)
v∈Di v∈Di
where * is an arithmetic multiplication. The correspondence with [Tra] from the Section 3 is obvious when one considers G¨ odel and Goguen t-norm. As far as we know, it was the first time some kind of similarity relation was used in database design. In the earlier work [15], the authors considered domains which may consist of a finite (or infinite) set of scalars or a finite (or infinite) set of numbers with appropriate similarity relation. Later [16], the authors began to propound the idea that domains may also consist of linguistic values or fuzzy numbers. In the earlier work [15], the interpretation (conjunctive or disjunctive) of the set was not specified. It seems that both interpretations are possible—compare examples on page 215 and 223 in [15]. Later, in [17] the authors distinguished between their model (as an example of a uniform data model) and possibilistic data models and, therefore, we conclude that the meaning of a set of values is understood as conjunctive. Remark 1. In [16] the authors presented part of relational algebra for their generalized model and introduced the concept of a rank—for every tuple r ∈ D, query Q can induce a membership degree, denoted in the original work as μQ (r), which represents the “possibility of matching the query specifications”. But this means that after executing a query and obtaining a ranked data table, we actually leave the model, because according to (20) a data table is an ordinary set. Put another way: the result of a query may not be a valid data table. Generalized functional dependencies (called fuzzy FD) were defined in [18]: GFD A ⇒β B holds in the Buckles-Petry model, 0 < β ≤ 1 iff for every pair of tuples ri = (di1 , . . . , din ), rj = (dj1 , . . . djn ) min {
min
ak ∈A u∈dik ,v∈djk
u ≡k v} ≤ β ∗ min {
min
br ∈B u∈dir ,v∈djr
u ≡r v},
(23)
where dik is the value of attribute yk for tuple ri and ∗ is the arithmetic product. In the above definition, β is a parameter which influences the validity of the generalized functional dependency. Observe that if β is close to 0, the GFD will hardly be fulfilled in any table. Moreover, if some relation D satisfies classical functional dependency A ⇒ B, then it will satisfy the dependency given by Equation (23) if and only if β = 1 and all values are singletons, see Example 1. Later, this definition was modified and reformulated using the so-called conformance [105]
11
and by moving the β parameter from the right-hand side to the left-hand side of (23). Thanks to this, the classical FD can be captured by the GFD even for β = 1, but still under assumption that all attribute values are singletons. The conformance of attribute yk ∈ R for tuples r1 , r2 ∈ D, denoted as C(yk [r1 , r2 ]), is given by the following formula: C(yk [r1 , r2 ]) =
min
u,v∈d1k ∪d2k
u ≡k v.
(24)
Note that conformance is not necessary reflexive, which leads to odd behavior as demonstrated in Example 1. Moreover, for A ⊆ R: C(A[r1 , r2 ]) = min C(yk [r1 , r2 ]). GFD A ⇒β B is satisfied in yk ∈A
the Buckles/Petry model if and only if for every pair of tuples r1 , r2 : β ∗ C(A[r1 , r2 ]) ≤ C(B[r1 , r2 ]),
(25)
where β ∈ [0, 1] is called linguistic strength and is optional. The default value of β is 1. Example 1. Both (23) and (25) behave unnaturally. Assume R = {y1 , y2 , y3 } with D1 = D2 = D3 = {a, b, c, d} and relation D from Table 2. y1 y2 y3 r1 {a, b} {c, d} {c} r2 {a, b} {c, d} {d} Table 2. Relation D.
First, note that the table can be seen as a classical relation in Codd’s relational model over domains D1 = 2D1 , D2 = 2D2 , D3 = 2D3 and it is in first normal form, since it is a “direct and faithful representation of some relation”, see [46]. We can see that the classical FD {y1 } ⇒ {y2 } is satisfied in D. One would now expect that the GFD {y1 } ⇒ {y2 } holds in D for β = 1 as well. But this is not the case. If a ≡1 b > c ≡2 d, then according to (23) the GFD does not hold for β = 1. If a ≡1 b is much greater than c ≡2 d, then β must be close to 0 in order to make {y1 } ⇒ {y2 } valid. The same remark holds when one takes (25) instead of (23). This problem was solved in [106] by proposing a new definition of conformance which is reflexive: C(yk [r1 , r2 ]) = min{minu∈d1k {maxv∈d2k {u ≡k v}}, minu∈d2k {maxv∈d1k {u ≡k v}}}
(26)
Another possible way how to look at the above mentioned problem with the non-reflexivity of similarity relation is the following: The authors provided a conjunctive interpretation of the sets. However, the extension of similarity in equations (23) and (24) is not in correspondence with such interpretation (the similarity relation is not reflexive). On the other hand, the above definition (equation (26)) established an extension of similarity that fits with the conjunctive interpretation. This last definition of conformance (26) yields to reflexive and symmetric measure and therefore we can employ our notation for similarity relations. Let us denote C(A[r1 , r2 ]) by r1 (A) ≈ r2 (A). Since the ∗ from (25) is arithmetic product and the similarity takes values from [0, 1], authors actually use the standard product algebra [0, 1]Π as structure of truth degrees (i.e. ⊗ and → are Goguen adjoint operations). Equation (25) can be formulated as follows: β ⊗ (r1 (A) ≈ r2 (A)) ≤ (r1 (B) ≈ r2 (B)),
(27)
β ≤ (r1 (A) ≈ r2 (A)) → (r1 (B) ≈ r2 (B)),
(28)
which is also equivalent to:
As a consequence, the definition of GFD can be reformulated as follows: ∗ A ⇒ BD = (β ⊗ r1 (A) ≈ r2 (A)) → r1 (B) ≈ r2 (B) , r1 ,r2 ∈D
where
∗
is the globalization.
(29)
12
Remark 2. A different notion of GFD was proposed for the Buckles-Petry model in [3]. Under the notion of conformance given by (26) and using our notation r1 (A) ≈ r2 (A) for C(A[r1 , r2 ]), the GFD A ⇒ B is true if for all pairs of tuples min{β, (r1 (A) ≈ r2 (A))} ≤ (r1 (B) ≈ r2 (B)). Using G¨ odel structure of truth degrees, the previous definition is equivalent to β ⊗ (r1 (A) ≈ r2 (A)) ≤ (r1 (B) ≈ r2 (B)). Note the similarity with the Equations (27) and (41). In [3] the authors also proposed normal forms for their definition of GFD. Shenoi at al. [89, 90] claimed to extend the Buckles and Petry model by considering a partition on each domain (classic equivalence relation) instead of a fuzzy equivalence relation 7 . The notion of data table remains the same as in Equation (20). Then tuple ri is said to be valid with respect to the partition Pi on Di iff each attribute value ri (yj ) is a subset of some equivalence class in Pi . The partition is linked to active domain, meaning that there is actually a set of partitions on each domain. Each partition is determined by level of precision, denoted as αi , which means that elements in the same equivalence class are “in relation to each other to a degree no lower than αi ”. To understand the definition of GFD, we need one more concept: redundancy at level of partition. Two tuples r1 , r2 are called redundant at level α = (αy )y∈R , denoted by r1 ∼α r2 , iff for every y ∈ R, r1 (y) and r2 (y) are subsets of the same equivalence class in the partition Py (αy ). A functional dependency A ⇒ B holds with respect to partition levels α = (αy )y∈A and β = (βy )y∈B in D if, for every pair of tuples r1 , r2 , if they are redundant at level α for attributes in A, then they are redundant at level β for attributes in B. That is, r1 (A) ∼α r2 (A) implies r1 (B) ∼β r2 (B). In a subsequent paper [71], Shenoi et al. admit that there is no distinction between their model and that of Buckles and Petry:“Our model was thought to generalize Buckles and Petry’s work because tuple components in their model are always non-empty subsets of equivalence classes. However,. . . our early work is essentially a reformulation of Buckles and Petry’s work.” In [71] the authors proposed so-called complete-lattice-equivalence-class model, in which each domain is associated with a complete lattice of (classical) equivalence relations (and thus partitions). The size and structure of each complete lattice of equivalence relations is required to be the same, more precisely to be isomorphic to previously fixed lattice L. The lattice L becomes part of relation scheme. Prade and Testemale(1984): In [79] Prade and Testemale have considered so called possibilistic fuzzy data model, i.e. fuzzy sets employed as attribute’s values are only considered as possibility distributions. When comparing tuple values, it makes sense to give a degree of possibility and a degree of necessity. Consider two tuples with the same value on age attribute (both Jane and John have {1/19, 1/20, 1/21, 1/22, 1/23}), then it is possible (but not necessary) that Jane’s age and John’s age are the same. The need for these two measures (necessity and possibility) is a consequence of a disjunctive interpretation of sets. A model based on the concept of possibility distribution was originally proposed by Umano [98]. Prade and Testemale slightly generalized Umano’s model by introducing an extra element denoted by e, which is used when there is a nonzero possibility that the attribute does not apply. Relation is understood as follows: [0, 1]Dy ∪{e} , (30) D⊆ y∈R
where [0, 1]Dy ∪{e} denotes the set of all possibility distributions on Dy ∪{e}. Moreover, each domain Dy ∪ {e} is associated with a similarity relation (called fuzzy proximity relation) ∼y which takes values from [0, 1]. The similarity relation is then extended to possibility distributions on Dy ∪ {e} as follows: For r1 (y), r2 (y) ∈ [0, 1]Dy ∪{e} : (r1 (y) ≈y r2 (y)) =
max
u,v∈Dy ∪{e}
min{u ∼y v, (r1 (y))(u), (r2 (y))(v)}
(31)
is the possibility that values r1 (y), r2 (y) are similar in the sense of ∼y . Note that this similarity is consistent with the disjunctive interpretation of sets. If all possibility distributions are normal 8 , 7
8
The authors showed how to obtain partition from fuzzy equivalence relation (reflexive, symmetric and max-min transitive): for given α, two values are called α-equivalent (α-similar in the original work) and belong to the same class of partition if their degree of equivalence is greater or equal to α. A possibility distribution π ∈ [0, 1]X is normal if there is an element x ∈ X such that π(x) = 1.
13
then ≈y is a similarity relation for all y ∈ R. The GFD were introduced only for singleton attribute sets. Given a fixed threshold λ ∈ [0, 1], for yi , yj ∈ R the GFD {yi } ⇒ {yj } is satisfied in D if and only if for all r1 , r2 ∈ D (r1 (yi ) = r2 (yi )) → (r1 (yj ) ≈j r2 (yj ) ≥ λ),
(32)
where → is the classical implication. The FD should capture the following: “If the values of the attribute yi are equal for r1 and r2 , we may want to express that the values of the attribute yj for r1 and r2 cannot be far from each other”. This definition can be extended to sets of attributes and reformulated as follows: Assume L = [0, 1] and ∗ being globalization. Then for any t-norm and corresponding residuum: A ⇒ BD = r1 (A) = r2 (A) → (λ → r1 (B) ≈B r2 (B))∗ , (33) r1 ,r2 ∈D
where ≈B is given by (19). Moreover, if all ≈y are separable, then we can use (r1 (A) ≈A r2 (A))∗ instead of r1 (A) = r2 (A). Remark 3. The authors introduced part of relational algebra in [79]. The result of a query consists in general of two fuzzy sets: the set of tuples which possibly satisfy the condition and the set of tuples which necessarily satisfy the condition. This again means that the result of a query is not in correspondence with (30). Moreover, it was shown in [13] that the model proposed by Prade and Testemale is not a representation system. Remark 4. Nakata in [74] used the similarity given by Equation (31) to define the compatibility of relation with a functional dependency. The compatibility itself is a pair of possibility and necessity measures. Given yi , yj ∈ R with yi = yj and assuming a → b = ¬a ∨ b, the possibility that a given relation D is compatible with an FD {yi } ⇒ {yj } is defined as max{¬(r1 (yi ) ≈i r2 (yi )), r1 (yj ) ≈j r2 (yj )}, (34) r1 ,r2 ∈D, r1 =r2
where ¬(r1 (yi ) ≈i r2 (yi )) = max min{1−(u ∼i v), (r1 (yi ))(u), (r2 (yi ))(v)}. The necessity measure u,v∈Di
is then computed from possibility measure replacing r1 (yi ) ≈i r2 (yi ) by 1 − (¬(r1 (yi ) ≈i r2 (yi ))). Raju and Majumdar (1988): Another generalization of FD was proposed by Raju and Majumdar [81]. They considered similarity relation on each domain and ranks associated to each tuple. The ranks as well as the similarity degrees come from [0, 1]. More precisely, a relation is defined as: Dy → [0, 1]. (35) D: y∈R
Thus, every tuple is associated with a degree or rank, denoted as D(r), expressing its belonging to D. Depending on the complexity of domains, the authors classified the model with imprecise values into two categories, namely – Type-1: each domain may be a classical set or a fuzzy set. In this case, attribute values are singletons, taken from some set U . But the domain itself may be a fuzzy set over U . – Type-2: each domain may be a set of fuzzy sets or a set of possibility distributions (page 136 in [81]). The authors considered both interpretations of a fuzzy set (conjunctive as well as disjunctive) and two different interpretation of a rank (page 138 in [81]): “. . . as a possibility measure of association among the data or as a truth value of a fuzzy predicate associated with (relation) r.” In addition, the authors did not restrict their approach to a specific definition of similarity relation, although they proposed several ways how to measure similarity between fuzzy sets. For instance, for two type-2 fuzzy sets X and Y over a finite domain, the similarity is defined as X ≈ Y = min{X(u) Y (u) | u ∈ U } where is a similarity relation over the unit interval. This similarity gets close to the conjunctive semantics. Another provided example is closer to a disjunctive interpretation: the
14
) card(X∩Y ) similarity of two type-1 fuzzy sets, X and Y , is defined to be X ≈ Y = max{ card(X∩Y card(X) , card(X) }, where ‘card’ denotes the cardinality as it was introduced in [107]. Later, we will refer to the model where attribute values are allowed to be fuzzy sets and data table is understood as in (35) as Raju-Majumdar’s model. The generalized functional dependency A ⇒ B is satisfied by relation D iff for all r1 , r2 ∈ D (r1 , r2 ∈ D means r1 , r2 ∈ Tupl(R) with D(r1 ) > 0 and D(r2 ) > 0)
r1 (A) ≈ r2 (A) ≤ r1 (B) ≈ r2 (B).
(36)
The inequality can be reformulated using Rescher-Gaines (RG) implication, a →RG b = 1 iff a ≤ b, 0 otherwise, (37) r1 (A) ≈ r2 (A) →RG r1 (B) ≈ r2 (B). This reformulation often appears in the literature, but RG implication is not a residuated implication and therefore we prefer the following formulation: For L = [0, 1], any t-norm and corresponding residuum, and for hedge being globalization the definition of GFD given by Equation (36) is equivalent to: ∗ r1 (A) ≈ r2 (A) → r1 (B) ≈ r2 (B) . (38) A ⇒ BD = r1 ,r2 ∈D
Remark 5. Some inconveniences arise from this definition: 1. First note that the rank is not involved in the definition of GFD (36). As a consequence, the semantic of the GFD is not affected by the semantic of the rank. Furthermore, tuples with low degree of membership can significantly influence the validity of the GFD. 2. The semantics of the GFD A ⇒ B was given as follows: If tuples have similar values on attributes A, then they have similar values on attributes B (in the original terminology: If A is equal then B is equal, page 148). This semantics of the GFD is changed by using modifiers (“more or less”, “very much”) on the similarity relations. Note that the interpretation of a fuzzy set (conjunctive or disjunctive) has no impact on the semantic of GFD. 3. Furthermore, note that if r1 (A) ≈ r2 (A) = 1, then r1 (B) ≈ r2 (B) must also be equal to 1, in order to satisfy GFD given by (36). This behavior is seen as a weak spot of the definition and was mentioned by several authors. In [84] Saharia and Barron addressed this problem and introduced cluster dependencies to solve this issue. 4. The completeness of inference axioms is conditioned by the following: For each domain Di there is at least one pair u, v ∈ Di such that u ≈i v = 0. It was later shown by Belohlavek and Vychodil in [23] that this condition is not needed. Furthermore, Belohlavek and Vychodil proved that although the semantics of Raju-Majumdar model is richer, we cannot infer anything new when comparing to the original Codd’s model. More precisely: An FD A ⇒ B follows from a set T of other FDs in the sense of Raju-Majumdar iff A ⇒ B follows from T in the sense of the ordinary Codd’s model. The solution proposed by Saharia and Barron [84] for the problem mentioned in second place is based on the notion of so-called cluster dependency. Contrary to Raju-Majumdar model, the authors considered crisp data. At the beginning, a set of initial clusters is computed from the data in relation D. For a set A ⊆ R and an attribute y ∈ R: the relation D satisfies the cluster dependency A ⇒ {y}, if for any tuple r ∈ D which belongs to cluster β, r(y) is similar to xβ (y) at least to degree α, where α is a fixed threshold. Some explanation is needed: In general, one tuple can belong to more than one cluster and the membership is fuzzy. The degree to which a tuple r belongs to a cluster β is computed as r(A) ≈ xβ (A), it means as a similarity of r(A) and a centroid xβ (A). The centroid itself is a tuple (not necessarily from relation D) and its value on each attribute yj is determined as the mean value in the case of numerical data or as a common value for categorical data in the cluster. The similarity r(y) ≈ xβ (y) is computed based on r(A) ≈ xβ (A) and other parameters which are, for example, used for setting the relative importance of nearby and distant clusters. The paper by Raju and Majumdar is probably the most influential, and the definition of GFD given by (36) inspired many authors. In the following, we summarize some of them: For example in [68] the authors used the same definition of the GFD and the same model as Raju-Majumdar. The
15
only difference lies in the similarity of tuple values itself - the authors choose semantic equivalence based on semantic spaces of fuzzy sets. The same notion of GFD as in Equation (36) appeared also in [110] in the framework of so-called vague relational database, i.e. databases where tuple values are allowed to be vague sets. As in [68], the authors proposed a new definition of similarity between tuple values (vague sets) and used this similarity in definition of GFD (Equation (36)). Also, Wei-Yi Liu in [67] used the same idea for a definition of GFD. However, the author considered intervals as fuzzy attribute values and, instead of similarity, he used “semantic proximity (SP )” of fuzzy attribute values. An SP is a fuzzy relation that maps a pair of intervals (fuzzy attribute values) into [0, 1] and satisfies certain properties (see [67] for a more detailed definition). Semantic proximity is in general symmetric but not necessarily reflexive relation. Generalization of functional dependencies, multivalued dependencies and join dependencies are defined using SP . The author also presented Armstrong’s axioms and claimed them to be sound and complete. Unfortunately, the non-reflexivity of semantic proximity yields a mistake as pointed out in [43], where the authors showed that two of the inference rules given by Liu are not sound. The soundness is guaranteed by reflexivity of SP . In previous works Liu considered fuzzy sets given by the center number description (fuzzy set is a sphere with center, radius and degree of confidence) and he used the concept of semantic distance (SD) to define functional dependencies. In [66] the definition of GFD is given by the condition SD(r1 (A), r2 (A)) ≥ SD(r1 (B), r2 (B)) for all tuples r1 , r2 ∈ D. The equivalence of definition using semantic distance and the definition using semantic proximity in Equation (36) is due to the fact that semantic proximity and semantic distance are introduced as complementary notions, more precisely: SP (r1 , r2 ) = 1 − SD(r1 , r2 ). In [65], some concrete examples of SD for different descriptions of fuzzy values can be found. In [63], the authors followed the Liu’s approach. They extended the approach by defining semantic proximity for fuzzy values based on their α-cuts (so based on intervals). The semantic proximity is again not reflexive, though it has to satisfy: SP (Aα , Aα ) ≥ SP (Bα , Bα ) if |Aα | ≥ |Bα |. The work done by Raju and Majumdar also inspired Saxena and Tyagi. In [87] the authors used a special fuzzy set φ to represent null value “does not apply”. The null value unknown is represented differently. This approach differs from its precursors due to the specific treatment of the special fuzzy set φ. A GFD A ⇒ B holds in D according to Saxena and Tyagi if for all pair of tuples r1 , r2 such that D(r1 ) > 0, D(r2 ) > 0, r1 (y) = φ = r2 (y), for each y ∈ A, and r1 (A) ≈ r2 (A) > 0, one of the following conditions holds: 1. r1 (B) = r2 (B) = φ, or 2. there exists a nonempty set B ⊆ B such that r1 (y) = φ = r2 (y) for each y ∈ B , r1 (B B ) = r2 (B B ) = φ and r1 (A) ≈ r2 (A) ≤ r1 (B ) ≈ r2 (B ). Chen (1991): Another significant proposal of definition of GFD was developed by Chen [34], see also [27, 28]. Chen used the possibilistic fuzzy data model: [0, 1]Dy , (39) D⊆ y∈R Dy
where [0, 1] denotes the set of all possibility distributions over domain Dy . Moreover, a similarity relation ≈y (originally called closeness relation) is associated with each domain Dy . The similarity of two possibility distributions π1 , π2 over a domain Dy expresses the possibility that there exists values u ∈ π1 , v ∈ π2 which are close to each other. The GFD A ⇒ B holds in D to degree θ iff min (r1 (A) ≈ r2 (A) →I r1 (B) ≈ r2 (B)) ≥ θ,
r1 ,r2 ∈D
(40)
where →I stands for “fuzzy implication operator”, i.e. →I : [0, 1] × [0, 1] → [0, 1] and satisfies for every a, b, c ∈ [0, 1]: a →I b = 1 iff a ≤ b, a →I b ≤ min(a, c) →I min(b, c), min(a →I b, b →I c) ≤ a →I c. The FD expresses the fact that: “Close B values correspond to close A values”. Later, Chen et al. [33] have proposed a specific form of the previous definition in order to express the fact that
16
“Close B values correspond to close A values, and identical B values correspond to identical A values”. This interpretation of FD given by Chen et al. is not consistent with the interpretation of similarity of attribute values (possibility distribution). In our opinion the following semantic would be more accurate: “If it is possible that tuples have similar values on A, then it is possible that they have similar values on B.” The →I is classical implication when r1 (A) and r2 (A) are identical, and G¨ odel implication otherwise, i.e. the GFD A ⇒ B holds in D with a degree θ if and only if for all pair of tuples r1 , r2 : if r1 (A) = r2 (A) then r1 (B) = r2 (B), (r1 (A) ≈ r2 (A) →G r1 (B) ≈ r2 (B)) ≥ θ otherwise.
(41)
By using G¨odel implication in the second part of the definition, the meaning of the fact that GFD A ⇒ B is true to degree θ is as follows: For each pair of tuples: the similarity on attributes B is at least as high as the similarity on attributes A or greater than θ. The inequality (41) can be reformulated as follows: θ ⊗ r1 (A) ≈ r2 (A) ≤ r1 (B) ≈ r2 (B).
(42)
The correspondence with Equation (27) is obvious, but now ⊗, → are G¨odel operations. For L = odel algebra, the notion of GFD can be defined as follows: [0, 1]G being the standard G¨ A ⇒ BD = θ if θ≤ r1 (A) = r2 (A) → r1 (B) = r2 (B) ∧ (r1 (A) ≈ r2 (A) → r1 (B) ≈ r2 (B)). (43) r1 ,r2 ∈D
The fact that A ⇒ BD = θ does not exclude existence of other θ > θ for which the inequality (43) holds. The soundness and completeness of Armstrong-like inference rules has been proved in [28]. Chen et al. also proposed the concepts of q-key and q-fuzzy normal forms, whose definitions remain almost the same as the classical ones with the notion of FD replaced by the notion of author’s GFD, see [31], [30]. In [29] the authors introduced an algorithm for testing whether or not a decomposition preserves the GFDs. Remark 6. In [100] the authors presented an algorithm for mining GFD based on Chen’s definition. The authors first transform quantitative data into fuzzy data and, then search for GFD. In [91] the authors considered (40) and demonstrated on particular examples how different similarity measures and different implications (Lukasiewicz, G¨ odel, etc) influence the result. The definition of GFD given by Equation (41) was recently used in the framework of interval-valued possibility distribution, see [73]. Bhuniya and Niyogi (1993): According to Bhuniya and Niyogi [7] the generalized functional dependency A ⇒ B holds in a Raju-Majumdar’s model (35) if and only if for all r1 , r2 ∈ D one of the following conditions holds r1 (A) ≈ r2 (A) ≤ r1 (B) ≈ r2 (B),
(44)
r1 (A) ≈ r2 (A) − r1 (B) ≈ r2 (B) ≤ 1 − β,
(45)
where r1 (A) ≈ r2 (A) ≥ α, r1 (B) ≈ r2 (B) ≥ α, and α < β < 1. In another words: if α ≤ r1 (A) ≈ r2 (A) and α ≤ r1 (B) ≈ r2 (B) then either (44) or (45). Or equivalently: A ⇒ B holds in D iff for all r1 , r2 ∈ D at least one of the following conditions holds: r1 (A) ≈ r2 (A) < α
or
r1 (B) ≈ r2 (B) < α
or
(44)
or
(45).
(46)
First note that r1 (A) ≈ r2 (A) < α implies either r1 (B) ≈ r2 (B) < α or (44). Therefore condition (46) is equivalent to r1 (B) ≈ r2 (B) < α
or
(44)
or
(45).
(47)
Now, since β < 1, condition (44) implies condition (45) and therefore the disjunction “(44) or (45)” is equivalent to β ≤ (r1 (A) ≈ r2 (A)) → (r1 (B) ≈ r2 (B))
(48)
17
in the standard L ukasiewicz algebra. As a consequence, A ⇒ B holds (to degree 1) in D iff α ≤ r1 (B) ≈ r2 (B) implies β ≤ (r1 (A) ≈ r2 (A) → r1 (B) ≈ r2 (B)), for all r1 , r2 ∈ D. For hedge being globalization and for the standard L ukasiewicz algebra as a structure of truth degrees, the following definition of GFD is equivalent to the definition given by Bhuniya and Niyogi: A ⇒ BD = (α → r1 (B) ≈ r2 (B))∗ → (β → (r1 (A) ≈ r2 (A) → r1 (B) ≈ r2 (B)))∗
(49)
r1 ,r2 ∈D
Cubero et al. (1994): Cubero et al. [42] proposed the following definition of an FD for the possibilistic fuzzy data model: D⊆ [0, 1]Dy , (50) y∈R
where [0, 1]Dy denote possibility distribution on Dy . Each domain Dy is equipped with similarity relation (called proximity in the original work) ∼y and a fixed threshold cy . GFD A ⇒ B is satisfied iff for all r1 , r2 ∈ D: (r1 (A) ≈ r2 (A) ≥ α) → (r1 (B) ≈ r2 (B) ≥ β). (51) This equation can be expressed in words as follows: If r1 (A) and r2 (A) are similar at least to degree α, then r1 (B) and r2 (B) must be similar at least to degree β. Since → is classical implication, as long as (r1 (B) ≈ r2 (B)) ≥ β, it does not matter to what values r1 (B) and r2 (B) are associated with. The parameters α and β are vectors, α = (cy )y∈A , β = (cy )y∈B , where values cy ∈ [0, 1], y ∈ R, are fixed and common to all GFDs. Thus r1 (A) ≈ r2 (A) ≥ α means r1 (y) ≈y r2 (y) ≥ cy for all y ∈ A. The definition of GFD given by Equation (51) can be reformulated as follows: Let L be any complete residuated lattice with universe L = [0, 1] and with globalization as a hedge: ( cy → r1 (y) ≈ r2 (y))∗ → ( cy → r1 (y) ≈ r2 (y))∗ . (52) A ⇒ BD = r1 ,r2 ∈D y∈A
y∈B
A particular definition of ≈ is not provided by the authors, but they mentioned two plausible choices: those proposed by Chen and by Raju and Majumdar. Remark 7. Later in [40], the authors used two different similarity measures for computing the similarity of tuple values (possibility distributions) in the antecedent and consequent part of GFD. The authors employed the equation (31) for computing the similarity of tuple values in the antecedent, but for the consequent part they used the following:
(r1 (y) ≈y r2 (y)) = min max{u ∼y v, 1 − (r1 (y))(u), 1 − (r2 (y))(v)}. u,v∈Dy
(53)
The definition of GFD remains almost the same, more precisely: A GFD A ⇒ B is satisfied in relation D iff (i) every tuple value is normalized, (ii) ri (y) ≈y ri (y) ≥ cy for every y ∈ B and r ∈ D, and (iii) (r1 (A) ≈ r2 (A) ≥ α) → (r1 (B) ≈ r2 (B) ≥ β). This definition of GFD was then used to define so-called rule-based fuzzy functional dependencies [40, 41]. Remark 8. A very similar idea appeared later in [2], where an FD captures the following: For every pair of tuples: If r1 (A) and r2 (A) are close to each other, then r1 (B) and r2 (B) must also be close to each other, more precisely: ∀r1 , r2 ∈ D If ∀yi ∈ A : |r1 (yi ) − r2 (yi )| ≤ , then ∀yj ∈ B : |r1 (yj ) − r2 (yj )| ≤ .
(54)
Ben Yahia et al. (1999): Ben Yahia, Ounalli, and Jaoua presented their definition of so-called dynamic functional dependency in [104]. The word “dynamic” is used to express that the GFD can be satisfied to some degree. The authors considered the Raju and Majumdar model (35) with
18
uncertain data and ranks from [0, 1]. The dynamic FD A determines B to degree β (A ∼>β B), β, θ ∈ [0, 1] holds in D if for all tuples r1 and r2 we have: (r1 (A) ≈ r2 (A) → r1 (B) ≈ r2 (B)) ≥ θ,
(55)
β = min (r1 (A) ≈ r2 (A) → r1 (B) ≈ r2 (B)),
(56)
where r1 ,r2
and → is the L ukasiewicz implication. The threshold θ is fixed by the database designer. Note that the definition of Raju and Majumdar (36) is a special case for θ = 1. The authors also proposed inference axioms and proved their soundness. The completeness is not proved. The definition ukasiewicz being the standard L = [0, 1]L reformulated as follows: For L
can be algebra, if r1 ,r2 ∈D r1 (A) ≈ r2 (A) → r1 (B) ≈ r2 (B) ≥ θ, then A ⇒ BD =
r1 (A) ≈ r2 (A) → r1 (B) ≈ r2 (B)
(57)
r1 ,r2 ∈D
and A ⇒ BD = 0 otherwise. Bosc et al. (1999): Another generalization was made by Bosc, Pivert, and Ughetto, see [14]. As far as we know, they considered for the first time residuated implication corresponding to some t-norm. The authors worked with crisp data and similarity relation on every domain, and proposed two generalizations of classical FD. Firstly, they relaxed the equality in the consequent only and secondly, they replaced the equality by similarity in both parts of the implication: – Similarity is used only in the consequence part (which is meant to express tolerance) and GFD is defined as: (58) ∀r1 , r2 ∈ D : r1 (A) = r2 (A) → r1 (B) ≈ r2 (B). Note the correspondence with the definition given by Equation (32). However, there is a big conceptual difference: The GFD given by Equation (32) remains bivalent (either it is true or not), whereas the GFD given above can be true to any degree from [0, 1]. – Similarity relation is used in both parts, ∀r1 , r2 ∈ D : r1 (A) ≈ r2 (A) → r1 (B) ≈ r2 (B).
(59)
Meaning: “The closer the A values, the closer the B values”. For example: “Employees with similar experiences and jobs must have similar salaries.” The reformulation using complete residuated lattice with universe [0, 1] is straightforward: A ⇒ BD = (r1 (A) ≈ r2 (A) → r1 (B) ≈ r2 (B)).
(60)
r1 ,r2 ∈D
Unfortunately, the authors presented only those definitions and did not go any further by showing properties of such FD or presenting inference rules. Tyagi et al. (2005): Later Tyagi et al. [96] introduced another generalization of functional dependencies in the framework of so-called fuzzy functions. The authors developed GFD for Raju and Majumdar model (data represented by possibility distribution and ranks from [0, 1]). Contrary to Raju and Majumdar, [0, 1]G -equality (see Section 3) is employed in the model instead of similarity relation. The authors considered relation ≡y on each domain which is separable (therefore reflexive), min-transitive and weakly symmetric (i.e. (u ≡ v) = 1 iff (v ≡ u) = 1 for all u, v ∈ Dy ). This approach was inspired by the definition of fuzzy function provided by Demirci in [47], see Definition 1 in the following section. Relation D satisfies the GFD A ⇒ B if its projection over A ∪ B (denoted as DAB ) is a partial fuzzy function. That is, if ∀r1 , r2 ∈ Tupl(A ∪ B): (DAB (r1 ) ∧ DAB (r2 ) ∧ r1 (A) ≡ r2 (A)) ≤ r1 (B) ≡ r2 (B),
(61)
19
where DAB (r) = sup{D(r ) | r ∈ Tupl(R) such that r (A ∪ B) = r} and D(r) is a degree to which the tuple r belongs to the relation D. The Definition (61) is a generalization of (36) in the sense that if the GFD is true according to (36) then it is also true according to (61). This approach has an advantage, which lies in the fact that the rank is involved in the definition of GFD. Assume that there is a pair of tuples violating r1 (A) ≡ r2 (A) ≤ r1 (B) ≡ r2 (B). In the case of (36), the GFD will be violated regardless of the ranks of these two tuples, but in the case of (61), the GFD may still be satisfied if the ranks are low enough. In general, we can say that the lower the rank the lower the influence on the validity of GFD. In our opinion, this is the way the GFD should behave when ranks are presented. If tuples have zero ranks (tuples do not belong to the relation), they should not influence the validity of GFD at all. Even this definition of GFD can be reformulated using complete residuated lattices. For L being the standard G¨odel algebra (i.e. a ⊗ b = a ∧ b = min(a, b)) equipped with globalization we have: A ⇒ BD =
∗ (DAB (r1 )⊗DAB (r2 )⊗(r1 (A) ≡ r2 (A))) → (r1 (B) ≡ r2 (B)) . (62)
r1 ,r2 ∈Tupl(A∪B)
Normal forms based on the GFD given by Equation (61) were proposed in [86]. Kiss (1991): The idea that the rank should influence the validity of FD can be already found in [60]. Therefore we decided to include this approach here, although the similarity relation is not employed in this model. The author considered ranks from [0, 1] and presented the following Horn-formula of the first order logic: ∀r1 , r2 : (D(r1 ) ∧ D(r2 ) ∧ r1 (A) = r2 (A)) ⇒ r1 (B) = r2 (B).
(63)
When giving the semantic meaning to logical connectives, Kiss substituted ∧, ∀ with the operator inf; ∨, ∃ with sup; ⇒ with L ukasiewicz implication; and ¬a = 1 − a for all a ∈ [0, 1]. The truth value to which the fuzzy relation D satisfies a given FD was given by A ⇒ BD = 1 − sup{inf(D(r1 ), D(r2 ))| r1 (A) = r2 (A) and r1 (B) = r2 (B))}.
(64)
It is clear from (64) that the higher the degree of D(r1 ) and D(r2 ) when r1 , r2 violate the classical FD, the lower the the truth degree of FD A ⇒ B. The reformulation of (63) using residuated lattice is straightforward. For L being [0, 1]L : A ⇒ BD = (D(r1 ) ∧ D(r2 ) ∧ r1 (A) = r2 (A)) → r1 (B) = r2 (B). (65) r1 ,r2 ∈D
Remark 9. When going from (63) to (64) Kiss used the following rule: ¬(a → b) = a ∧ ¬b, which does not hold in general when one takes L ukasiewicz implication and conjunction given by infimum. But the equality holds when b is 0 or 1, which is also our case, because b represents r1 (B) = r2 (B). Belohavek and Vychodil (2006): Another extension is the proposal made by Belohavek and Vychodil, see [21]. The authors used complete residuated lattice with hedge as a structure of truth degrees, data remain crisp and D is understood as a fuzzy relation: D: Dy → L. (66) y∈R
The rank D(r) is viewed as a degree to which a tuple r matches a query. Ranks have mainly a comparative meaning: the higher the rank the better the match. For A, B ∈ LR the functional dependency A ⇒ B (called similarity-based FD) is defined as: (67) A ⇒ BD = (r1 (A) ≈ r2 (A))∗ → (r1 (B) ≈ r2 (B)) , r1 ,r2 ∈Tupl(R)
where r1 (A) ≈ r2 (A) = (D(r1 ) ⊗ D(r2 )) →
y∈R
(A(y) → r1 (y) ≈y r2 (y)).
(68)
20
Authors built their approach on first-order predicate fuzzy logic [57] and thus (67) is the truth degree of the following formula: “For all pairs of tuples: if r1 and r2 have very similar values on attributes from A then r1 and r2 have similar values on attributes from B.” And (68) is the truth degree of the formula: “if r1 , r2 are from D then for each attribute y from A, r1 and r2 have similar values on y.” Notice that if A is a crisp set and all similarities become identities then r1 (A) ≈ r2 (A) = 1 iff r1 and r2 are equal on all attributes from A. Also note that GFD is true to some degree, which comes from the complete residuated lattice with hedge, and that ranks are influencing the resulting degree. When comparing to other approaches, a big difference is also the fact that A, B are fuzzy sets. In [22] authors proposed two query systems for similarity-based databases: a domain relational calculus (based on first-order predicate fuzzy logic), and a relational algebra and proved relational completeness. Belohlavek and Vychodil also showed soundness and completeness of Armstrong-like axioms, see [21]. Recently, in [20], the authors propose a new logic, named Fuzzy Attribute Simplification Logic (FASL), and an automated deduction method, dealing with this notion of generalized dependence. Cordero et al. (2011): The last extension we want to mention in this section was presented by Cordero et al. in [39]. The authors worked with a generalization of Codd’s relational model called Fuzzy Attribute Table. The basic idea is that each attribute value of each tuple is assigned a rank coming from complete residuated lattice. More precisely, Fuzzy Attribute Table is understood as a map D: D y → LR . (69) y∈R
This means that for each tuple r, the mapping D gives a tuple of truth values D(r) ∈ LR . For each tuple r, r(y) denotes the value of the attribute y in the tuple r and D(r)(y) is the truthfulness of the value r(y). For instance, Table 3 depicted a Fuzzy Attribute Table with physical features of a set of suspects. Concerning the information about Mary (tuple 3), we are absolutely sure about her age (her age is 26) and we are almost sure that her eyes are blue.
name John/1 Albert/1 Mary/1 Dave/1 Noa/1
hair
skin
age
eyes
Black/0.9 dark/0.8 34/1 Brown/0.8 Brown/0.7 light/0.7 32/0.9 Blue/0.6 Auburn/0.6 intermediate/0.6 26/1 Blue/0.9 Red/0.4 light/0.9 29/0.6 Blue/0.8 White/0.1 dark/0.7 32/0.4 Green/0.7
factor 10/0.8 50/0.9 50/0.9 50/0.8 30/0.6
Table 3. Fuzzy attribute table
Remark 10. Later in [24] Belohlavek and Vychodil introduced a similar model called Multi-Ranked Data Tables where ranks (degrees) come from similarity-based queries. Furthermore, in [39] the authors focus on the development of a logic to manage GFD, whereas in [24] a relational algebra for this model is provided. The definition of the GFD is accompanied with a Pavelka-style logic [75–77] called “Simplification Logic for fuzzy functional dependencies”. The proof of the completeness theorem is done for the particular case of truth degrees, the unit interval. Authors introduced the following definition: A fuzzy attribute table D is said to satisfy a generalized functional dependency A ⇒ B with θ degree iff θ≤
r1 ,r2 ∈Tupl(R)
(r1 (A) ≈D r2 (A)) → (r1 (B) ≈D r2 (B)),
(70)
21
where → is a residuated implication. The similarity relation is called relative similarity because the similarity of two tuples r1 , r2 on the set of attributes A ⊆ R, depends on ranks: (r1 (A) ≈D r2 (A)) = (D(r1 )(a) ⊗ D(r2 )(a)) → (r1 (a) ≈ r2 (a)) . (71) a∈A
In [39], the authors considered the supremum of the degrees to which a GFD is true. That is: A ⇒ BD = sup{θ ∈ [0, 1] | θ satisfies (70)} The last two approaches are connected with each other in the sense that the validity of GFD given by (70) may be expressed using validity of GFD given by (67) and vice versa. A fuzzy attribute table D : y∈R Dy → LY may be represented using a ranked data table DR over the following family of domains (DR )y = Dy × L (Dy are the original domains for D) and mapping all the tuples to 1. More precisely: the ranks are defined by DR (r) = 1 if there exists y ∈ R such that D(r)(y) = 0, and DR (r) = 0 otherwise. Similarities are defined by d1 , a1 ≈y d1 , a2 = (a1 ⊗ a2 ) → ρy (d1 , d2 )
where ρy is the original similarity for D. On the other side, a ranked data table D : y∈R Dy → L can be transformed to a fuzzy attribute tables DF : y∈R Dy → LY by considering for each tuple r ∈ D: DF (r)(y) = D(r) for all y ∈ R. Lemma 1. For A, B ⊆ R, a fuzzy attribute data table D satisfies A ⇒ B in degree θ ∈ [0, 1] according to (70) iff A ⇒ BDR ≥ θ according to (67). Therefore, A ⇒ BD = A ⇒ BDR . We have seen how to reformulate various definitions of GFD using complete residuated lattice as a structure of truth degree. It can be easily seen that the approaches given by (67), (70) as well as by (59) 9 are the most general ones, leaving other approaches as their particular cases. Discussion on this topic as well as proofs can be found in [23], where the authors showed that approaches given by (32), (36), (40) and (55) are special cases of (67). To summarize the results of the paper [23] note that: 1) Many approaches consider one particular case of t-norm (and corresponding implication), whereas GFD given by Equations (67) (this result applies to (70) and (59) as well) is developed for any t-norm. 2) Since A, B from (67) are in general fuzzy sets, the definition of GFD given by (67) can easily incorporate GFD which use some additional parameters. For example, for GFD given by (52) consider fuzzy sets A, B to be defined as: A(y) = cy for y ∈ A, B(y) = cy for y ∈ B. 4.3
Functional dependencies in terms of fuzzy rules
In most of the previous approaches, FD are generalized by replacing the equality by similarity and by using concrete (not residuated in general) implication. There are approaches in which different techniques are employed. Here we present a few works which have a significant relevancy in their respective fields. Because of the nature of the techniques they used, we cannot compare them with the rest of the works presented in this paper. Rasmussen and Yager (1999): The following approach proposed by Rasmussen and Yager [82] utilizes linguistic summaries for defining GFDs. The authors considered a similarity relation defined on each domain and crisp data. A GFD is defined as follows: let D contains n-tuples r1 , . . . , rn . First, the degree to which GFD A ⇒ B is true for each object (tuple) in the database is computed. For one tuple rk the degree is obtained as follows:
n i=1 (ri (A) ≈ rk (A)) ⊗ (ri (B) ≈ rk (B))
γk = , (72) n i=1 (ri (A) ≈ rk (A)) where ⊗ is a t-norm. Then the truth degree to which FD is true in relation D is obtained as an arithmetic mean of γk :
n γk (73) A ⇒ BD = k=1 n 9
As we have already mentioned, the promising and very general definition given by (59) was not developed any further.
22
When the similarity is replaced by identity, the Equation (72) is in fact the confidence of the association rule (A, rk (A)) ⇒ (B, rk (B)) 10 and Equation (73) is an average of these confidences. In that sense, it is related to [5,85] and can be viewed as a fuzzy approximate dependency. Nevertheless, we decided to include this approach here for two reasons. First of all, the authors called their generalization fuzzy functional dependency. Second, the authors claimed that their GFD have the same semantics as seen in the previous section, namely (page 134 in [82]): “If any two objects in the database have similar values for A then they have similar values for B.” In all previous approaches, the authors were trying to define GFD in order to capture (more or less) the following: The more similar values on attribute A are, the more similar the values on attribute B have to be. One can find several approaches in which the semantics of GFD is completely different and thus the employed techniques for describing such dependencies have also been changed. Nevertheless, the resulting dependencies are still called fuzzy functional dependencies and therefore we briefly comment on some of them. (i) Dubois and Prade in [48] and later Bosc, Dubois and Prade in [10] suggested using fuzzy rules for the definition of a new kind of generalized functional dependencies (also called fuzzy functional dependencies in [10]) in the possibilistic fuzzy data model. Authors used certainty rules (the more x is A, the more certain y lies in B), possibility rules (the more x is A, the more possible B is a range for y), and gradual rules (the more x is A, the more y is B) and employed them in the definition of various types of functional dependencies. (ii) Later in [12], Bosc, Lietard, Pivert defined functional dependencies (called extended functional dependencies) using gradual rules. (iii) The ideas introduced in [104], see Equations (55) and (56), were used in [103] for defining linguistic summaries (also called fuzzy functional dependencies). These are of the generalizations we found in the literature with one thing in common, namely that they used similarity relation instead on equality for comparing domain values. It seems to be very difficult to compare these approaches objectively, and it is impossible for the cases when different semantics are used. Luckily, many of previously introduced definitions of GFD can be reformulated using fuzzy logic based on complete residuated lattices. This fact enables us to compare such approaches using our criterion which is based on the notion of fuzzy function.
5
Fuzzy functions and similarity-based functional dependencies
The semantics of classical FD corresponds to the notion of mathematical function. More precisely: A ⇒ BD = 1 iff {r(A), r(B)| ∀r ∈ D} is a function (from DA to DB ). In this section we will examine how different approaches correspond to the notion of function. Since the similarity and ranks are employed in the various generalization of Codd’s model, the classical definition of function is no longer adequate and we will use the notion of fuzzy function. The definition of a fuzzy function was provided by Gottwald in [54], where the author introduced a notion of fuzzy uniqueness of fuzzy mapping F 11 using a formula of first order fuzzy logic. Every fuzzy mapping F has a degree to which it satisfies the uniqueness property U : ((F (x, u) ⊗ F (y, v)) ⊗ x ≡ y) → u ≡ v , (74) U (F ) = x,y,u,v
where → is a L ukasiewicz implication, ⊗ stands for a L ukasiewicz t-norm or minimum, i.e. by this definition four different notions of fuzzy uniqueness were given. The set of truth degree was considered as [0, 1]. A mapping is called fuzzy function if it is unique to degree 1. Later, the definition (74) was used by Demirci in [47] in terms of L-relation and L-equalities, and a complete residuated lattice (called integral commutative residuated l-monoid) was used as a structure of truth degrees. 10
11
This expression differs from the classical association rule because it refers to one specific tuple rk . The Equation (72) gives us (under the assumption that similarity is replaced by identity) a number which indicates how many tuples that agree with the tuple rk on attributes A also have the same values as tuple rk on attributes B. For the definition of fuzzy mapping as well as for the definition of ≡ see the original paper [54].
23
Definition 1 (Fuzzy function). Let L be a residuated lattice, A and B be crisp sets and ≈A and ≈B be L-equalities. An L-relation ρ : A × B → L (L is a support set of L) is said to be a fuzzy function if for all a1 , a2 ∈ A and b1 , b2 ∈ B we have ρ(a1 , b1 ) ⊗ ρ(a2 , b2 ) ⊗ (a1 ≈A a2 ) ≤ (b1 ≈B b2 ).
(75)
Remark 11. i) Note that (75) is only a reformulation of Equation (74) since in all residuated lattices we have a → b = 1 iff a ≤ b. ii) Tyagi’s definition of GFD [96] was inspired by the condition (75). iii) In [19] the condition (75) corresponds to the notion of compatibility. Relation ρ is called compatible with respect to ≈A and ≈B if it satisfies (75), where ≈A and ≈B are L-equivalencies. iv) Another notion of fuzzy function with respect to similarity relation was defined in [57]. The term fuzzy function was understood as a syntactic notion, and the term fuzzy mapping was used as the corresponding semantic one. v) A partial fuzzy function [47, 62] is used in [19] in the definition of a degree to which a given relation is a fuzzy function. We will use the idea from [54] (and later from [19]) to define a degree to which a ranked data table corresponds to the notion of fuzzy function given by (75). Definition 2. Let L be a complete residuated lattice and D : y∈R Dy → L be a ranked data table. Let ≈i be L-similarities on corresponding domains. Let A, B ⊆ R and let ≈A , ≈B be given by Equation (19). The degree to which D is a fuzzy function with respect to the sets of attributes A and B is defined as: (76) (D(r1 ) ⊗ D(r2 ) ⊗ (r1 (A) ≈A r2 (A))) → (r1 (B) ≈B r2 (B)) . Fun(D, A, B) = r1 ,r2
The verbal description of the previous definition is as follows: The degree to which a relation D corresponds to a fuzzy function from A to B is a degree to which it is true that for all pairs of tuples from D: if they belong to D and have similar values on attributes A, then they have similar values on attributes from B. Remark 12. (i) In the above definition we only assume ≈A and ≈B are similarities, although in the original works of Demirci and Belohlavek L-equalities and L-equivalences were used, respectively. (ii) Note that tuples with zero ranks do not influence the resulting degree of (76). It will make no difference if we use projection of a ranked data table to A ∪ B in Definition 2 as the following lemma shows: Lemma 2. Let L be complete residuated lattice. Given a ranked data table D : y∈R Dy → L and a set A ⊆ R. If the projection of D to A is defined as DA (r) = sup{D(r ) | r ∈ Tupl(R) with r (A) = r}, then Fun(D, A, B) = Fun(DA∪B , A, B). Proof. Consequence of the facts that in every residuated lattice x ⊗ i∈I yi = i∈I (x ⊗ yi ) and
i∈I xi → y = i∈I (xi → y). We decided not to use the projection in Definition 2 since the definition of projection differs among approaches, although supremum or maximum is usually used. We will now illustrate the above definition with a simple example. Example 2. Let L be any complete residuated lattice. Let R = {y1 , y2 , y3 } with Dy1 = {a1 , a2 }, Dy2 = Dy3 = {b1 , b2 }. Let us assume that the L-similarity relations ≈1 , ≈2 =≈3 and the ranked data table D are given as follows: ≈ 1 a1 a2 a 1 1 μ1 a 2 μ1 1
≈ 2 b1 b2 b1 1 μ 2 b2 μ 2 1
D λ1 λ2 λ3
y1 a1 a2 a1
y2 b1 b2 b1
y3 b1 b1 b2
24
The degree to which relation D is a fuzzy function w.r.t. {y1 } and {y2 } is computed as follows: Fun(D, {y1 }, {y2 }) = ((λ1 ⊗ λ2 ⊗ μ1 ) → μ2 ) ∧ ((λ1 ⊗ λ3 ⊗ 1) → 1) ∧ ((λ2 ⊗ λ3 ⊗ μ1 ) → μ2 ) = ((λ1 ⊗ λ2 ⊗ μ1 ) → μ2 ) ∧ 1 ∧ ((λ2 ⊗ λ3 ⊗ μ1 ) → μ2 ) = ((λ1 ⊗ λ2 ⊗ μ1 ) → μ2 ) ∧ ((λ2 ⊗ λ3 ⊗ μ1 ) → μ2 ) = ((λ1 ⊗ λ2 ⊗ μ1 ) ∨ (λ2 ⊗ λ3 ⊗ μ1 )) → μ2 = ((λ1 ∨ λ3 ) ⊗ λ2 ⊗ μ1 ) → μ2 = Fun(D{y1 ,y2 } , {y1 }, {y2 }). The last equality holds iff the projection is defined using supremum. Definition 2 provides a measure for one particular property—the degree to which a relation (data table D) captures the notion of fuzzy function from A to B. Moreover, every definition of a GFD A ⇒ B introduced in Section 4 also provides some kind of measure. The degree to which a relation D satisfies the measure given by GFD A ⇒ B is simply A ⇒ BD , i.e. the degree to which the GFD A ⇒ B is true in D. We are interested in the relationships between the Fun measure introduced in Definition 2 and the measures provided by each definition of GFD described in the previous section: Does the definition of GFD corresponds to the notion of a fuzzy function? The following criteria will give us the degree to which: “For all relations D: If the GFD A ⇒ B is satisfied in relation D, then D corresponds to a fuzzy function from A to B.” A ⇒ BD → Fun(D, A, B) . (77) S(A ⇒ B, Fun) = D:
y∈R
Dy →L
Similarly, the next criteria will give us a degree to which: “For all relations D: If D corresponds to a fuzzy function from A to B, then the GFD A ⇒ B is satisfied by D.” Fun(D, A, B) → A ⇒ BD . (78) S(Fun, A ⇒ B) = D:
y∈R
Dy →L
Finally, combining (77) and (78) we will obtain the degree to which a particular definition of GFD corresponds to the fuzzy function: Fun(D, A, B) ↔ A ⇒ BD E(Fun, A ⇒ B) = D:
y∈R
Dy →L
= S(A ⇒ B, Fun) ∧ S(Fun, A ⇒ B).
(79)
The infimum is going over all ranked data tables on Tupl(R), where ranks are taken from complete residuated lattice with support L. Note that nonranked data tables are special cases of ranked ones with ranks coming from {0, 1}. Moreover, we would like to remark that the → is the residuated implication used in the definition of GFD. As a consequence, the precise form of the criteria may vary and is determined by a complete residuated lattice used in the definition of GFD. The criterion is not artificial. Note that the semantics of GFD is usually given as follows: similar values on attributes from A correspond to similar value on attributes from B. The semantics of (76) is almost the same. The difference is that ranks have impact on the resulting degree. The definition of fuzzy function given by Equation (75) is widely accepted among researchers and thus the idea that lower ranks should have lower influence on the validity seems to be appropriate, and we think it is very natural. Contrary to this stands the fact that many GFD definitions do not depend on the ranks at all. In the rest of this section, we will use E(Fun, A ⇒ B) as a criterion to measure the degree to which a given GFD definition preserves the notion of fuzzy function. We will compute the criterion (79) using (77) and (78). We have selected significant approaches presented in Section 4 for comparison. Approaches that are similar to the selected ones are not mentioned explicitly. For example, the result obtained for Raju and Majumdar’s definition is applicable to all their followers. The following lemma simplifies the computation of (77) and (78) for the cases when GFD is either true or false. We will write D A ⇒ B and D A ⇒ B to denote A ⇒ BD = 1 and A ⇒ BD = 0, respectively. The set of all relations that satisfy a given GFD A ⇒ B will be denoted as Mod({A ⇒ B}) or simply Mod(A, B).
25
Lemma 3. Let L be a complete residuated lattice, let A and B be sets of attributes A, B ⊆ R. If the validity of a GFD A ⇒ B is bivalent, then Fun(D, A, B) (80) S(A ⇒ B, Fun) = D∈Mod(A,B)
S(Fun, A ⇒ B) =
Fun(D, A, B) → 0.
(81)
D ∈Mod(A,B) /
Proof. Using (8), (7) and the fact that 1 ∧ a = a:
S(A ⇒ B, Fun) = D:
=
y∈R
A ⇒ BD → Fun(D, A, B) =
Dy →L
1 → Fun(D, A, B) ∧
D∈Mod(A,B)
=
0 → Fun(D, A, B) =
D ∈Mod(A,B) /
1 → Fun(D, A, B) =
D∈Mod(A,B)
Fun(D, A, B)
D∈Mod(A,B)
Equation (81) is a consequence of (6) and (13).
S(Fun, A ⇒ B) = D:
=
y∈R
Fun(D, A, B) → A ⇒ BD =
Dy →L
D∈Mod(A,B)
=
Fun(D, A, B) → 1 ∧
Fun(D, A, B) → 0 =
D ∈Mod(A,B) /
Fun(D, A, B) → 0 =
D ∈Mod(A,B) /
Fun(D, A, B) → 0
D ∈Mod(A,B) /
We will now apply the criteria given by Equation (79) to all significant approaches. Theorem 1 (Buckles-Petry’s case). Let L = [0, 1]Π , let A, B ⊆ R be sets of attributes and let the GFD be defined as in (25). Assuming β ∈ [0, 1] is the parameter from (25), then S(A ⇒ B, Fun) = β and S(Fun, A ⇒ B) = β → 0. Proof. First, note that the GFD is defined for nonranked data table and therefore (r1 (A) ≈ r2 (A) → r1 (B) ≈ r2 (B)). Fun(D, A, B) =
(82)
r1 ,r2 ∈D
Moreover, since ||A ⇒ B||D ∈ {0, 1} we can use Lemma 3. To prove the first part, observe that if D ∈ Mod(A, B), i.e. ||A ⇒ B||D = 1, then: β ⊗ (r1 (A) ≈ r2 (A)) ≤ r1 (B) ≈ r2 (B) ∀r1 , r2 ∈ D, β ≤ (r1 (A) ≈ r2 (A) → r1 (B) ≈ r2 (B)) ∀r1 , r2 ∈ D, (r1 (A) ≈ r2 (A) → r1 (B) ≈ r2 (B)). β≤ r1 ,r2 ∈D
The last inequality can be written as β ≤ Fun(D, A, B) for any D ∈ Mod(A, B). As a conse
quence β ≤ Mod(A,B) Fun(D, A, B). Finally, for any β there exists D ∈ Mod(A, B) such that Fun(D, A, B) = β (take D with only
two tuples r1 , r2 ∈ D such that r1 (A) ≈ r2 (A) = 1 and r1 (B) ≈ r2 (B) = β). Therefore β = Mod(A,B) Fun(D, A, B) = S(A ⇒ B, Fun). Using our previous observation and Lemma 3 we have: S(Fun, A ⇒ B) = Fun(D, A, B) → 0 = β → 0. D ∈Mod(A,B) /
Note that the proof remains valid for any complete residuated lattice.
26
Theorem 2 (Prade and Testemale’s case). Let L be any complete residuated lattice with universe L = [0, 1]. Let A, B ∈ R. Let the GFD be defined by (32) and let λ ∈ [0, 1] be the parameter from (32). Then S(A ⇒ B, Fun) = 0 and S(Fun, A ⇒ B) = λ → 0.
Proof. Since A ⇒ BD ∈ {0, 1}, by Lemma 3 it is sufficient to show that Mod(A,B) Fun(D, A, B) = 0. Let us fix four different elements a1 , a2 , b1 , b2 and consider M ⊆ Mod(A, B) being the set of models described as follows: ≈ A a1 a2 ≈ B b1 b2 D A B a1 1 α b1 1 0 1.0 a1 b1 a2 α 1 b2 0 1 1.0 a2 b2 where α ∈ [0, 1) is an arbitrary parameter. It means that the relations from M differ from each other by the parameter α (by the similarity relation on domain DA ). Then Fun(D, A, B) ≤ Fun(D, A, B) = (α → 0) = α → 0 = 1 → 0 = 0. M
Mod(A,B)
α∈[0,1)
α∈[0,1)
The second equality follows from the fact that GFD are defined for nonranked data tables and therefore Fun(D, A, B) is given by (82). Also note that if D ∈ / Mod(A, B), then there exist tuples r1 , r2 such that r1 (A) = r2 (A) and r1 (B) ≈ r2 (B) < λ. Therefore, S(Fun, A ⇒ B) = Fun(D, A, B) → 0 =
D ∈Mod(A,B) /
(r1 (A) ≈D r2 (A) → r1 (B) ≈D r2 (B)) → 0 = λ → 0.
r1 ,r2 ∈D D ∈Mod(A,B) /
Theorem 3 (Raju and Majumdar). Let L be any complete residuated lattice with universe L = [0, 1]. Assume R is a relational scheme and A, B ⊆ R. For the GFD given by Equation (36), S(A ⇒ B, Fun) = 1 and S(Fun, A ⇒ B) = 0. Proof. First of all, we have to mention that authors’ extension involve ranked data tables. As we shall see in this proof, the theorem holds for any residuated lattice built over the unit interval. Observe that if ||A ⇒ B||D = 1, then r1 (A) ≈ r2 (A) ≤ r1 (B) ≈ r2 (B) for all r1 , r2 ∈ Tupl(R) with D(r1 ) > 0 and D(r2 ) > 0. Together with the fact that D(r1 ) ⊗ D(r2 ) ⊗ r1 (A) ≈ r2 (A) ≤ r1 (A) ≈ r2 (A), for all r1 , r2 and any t-norm, we obtain Fun(D, A, B) = 1 for any D ∈ Mod(A, B). The proof of the first equality is completed by applying Lemma 3. To prove the second equality, it is sufficient to find a ranked data table D, such that D ∈ / Mod(A, B) and Fun(D, A, B) = 1. Such an RDT is easy to find: consider for example D with only two tuples r1 , r2 such that D(r1 ) = D(r2 ) = 0.2, r1 (A) ≈ r2 (A) = 1 and r1 (B) ≈ r2 (B) = 0.9. Theorem 4 (Chen et al. case). Let L = [0, 1]G , A, B ⊆ R be sets of attributes, θ ∈ [0, 1] and let the GFD be defined as in (41). Then S(A ⇒ B, Fun) = 1 and S(Fun, A ⇒ B) = 0. Proof. Note, that if A ⇒ BD = θ, then Fun(D, A, B) will be at least θ. As a consequence: A ⇒ BD → Fun(D, A, B) = 1 for any D. To prove S(Fun, A ⇒ B) = 0 let us consider the set M = {D over R = {A, B}|D A ⇒ B, |D| = 2 and r1 (A) = r2 (A), r1 , r2 ∈ D}. Note that for each D ∈ M we have r1 (A) = r2 (A) and r1 (B) = r2 (B). Now observe S(Fun, A ⇒ B) = Fun(D, A, B) → A ⇒ BD ≤ (Fun(D, A, B) → 0) = D:
=
y∈R
D∈M
=
Dy →{0,1}
D∈M
(r1 (A) ≈A r2 (A) → r1 (B) ≈B r2 (B)) → 0 = (1 → r1 (B) ≈B r2 (B)) → 0 = r1 (B) ≈B r2 (B) → 0 =
D∈M
= 1 → 0 = 0.
D∈M
27
Theorem 5 (Bhuniya and Niyogi case). Let L = [0, 1]L , A, B ⊆ R. For GFD given in (48) we have S(A ⇒ B, Fun) = β and S(Fun, A ⇒ B) = 0. Proof. The first result follows from using the same arguments as in the proof of Theorem 1. The second result is a consequence of the fact that ranks are not involved in the definition of GFD, see the proof of theorem 3. Theorem 6 (Cubero et al. case). Let L be any complete residuated lattice with universe L = [0, 1]. For the GFD given by Equation (51) and for fixed thresholds cy , y ∈ R: cy → 0 ∧ cy , (83) S(A ⇒ B, Fun) = y∈A
S(Fun, A ⇒ B) =
cy →
y∈A
y∈B
cy → 0.
(84)
y∈B
Proof. Since the degree to which GFD is true remains bivalent, we can again apply Lemma (3)
and compute only Mod(A,B) Fun(D, A, B). First of all, notice that D ∈ Mod(A, B) if for all pair of tuples either r1 (A) ≈ r2 (A) < α or r1 (B) ≈ r2 (B) ≥ β. Since α = (cy )y∈A , β = (cy )y∈B are vectors, r1 (A) ≈ r2 (A) < α means there exists y ∈ A such that r1 (y) ≈y r2 (y) < cy and r1 (B) ≈ r2 (B) ≥ β means that for all y ∈ B: r1 (y) ≈y r2 (y) ≥ cy . We will now look at these two cases separately. r1 (A) ≈ r2 (A) → r1 (B) ≈ r2 (B) = Mod(A,B)
Mod(A,B)
=
r1 ,r2 ∈D r1 (A)≈r2 (A)<α
r1 (A) ≈ r2 (A) → 0 =
r1 ,r2 ∈D r1 (A)≈r2 (A)<α
Mod(A,B)
r1 ,r2 ∈D r1 (A)≈r2 (A)<α
r1 (A) ≈ r2 (A)
→0=
cy → 0.
y∈A
The last equality follows from the fact that r1 (A) ≈ r2 (A) = y∈A r1 (y) ≈y r2 (y). Now observe that: r1 (A) ≈ r2 (A) → r1 (B) ≈ r2 (B) = Mod(A,B)
Mod(A,B)
=1→
r1 ,r2 ∈D r1 (B)≈r2 (B)≥β
r1 ,r2 ∈D r1 (B)≈r2 (B)≥β
y∈B
cy =
1 → r1 (B) ≈ r2 (B) = 1 →
Mod(A,B)
r1 ,r2 ∈D r1 (B)≈r2 (B)≥β
r1 (B) ≈ r2 (B) =
cy ,
y∈B
finishing the proof of (83). The equation (84) follows from Lemma (3), antitony of residuum in the first argument (i.e. if x ≤ y, then y → z ≤ x → z) and monotony in the second. Theorem 7 (Tyagi et al. case). Let L = [0, 1]G , A, B ⊆ R be sets of attributes and let the GFD be defined as in (61). Then S(A ⇒ B, Fun) = 1 and S(Fun, A ⇒ B) = 0 Proof. The proof of S(A ⇒ B, Fun) = 1 is straightforward because of Lemma 3 and due to the fact that the GFD given by (61) is a direct translation of the fuzzy function definition given by Demirci. Also note that a ⊗ b = a ∧ b for all a, b ∈ [0, 1]G . The second equality is perhaps surprising, but it is a consequence of the bivalent notion of GFD. Theorem 8 (Kiss case). Let L = [0, 1]L , A, B ⊆ R be sets of attributes, and let the GFD be defined as in (65). Then S(A ⇒ B, Fun) = 1 and S(Fun, A ⇒ B) = 0.5.
28
Proof. The first result follows from the fact that in any residuated lattice a ⊗ b ≤ a ∧ b and because of the antitony of → in the first argument. To prove the second equality we will use (14) and x → y ≤ (y → z) → (x → z), which holds in all residuated lattices. Therefore Fun(D, A, B) → A ⇒ BD ≥ S(Fun, A ⇒ B) =
= D:
= D:
y∈R
y∈R
D:
y∈R
Dy →L r1 ,r2 ∈D
Dy →L
(D(r1 ) ∧ D(r2 ) ∧ r1 (A) = r2 (A)) → (D(r1 ) ⊗ D(r2 ) ⊗ r1 (A) = r2 (A))
(D(r1 ) ∧ D(r2 )) → (D(r1 ) ⊗ D(r2 )) .
Dy →L r1 ,r2 ∈D
Since the → and ⊗ are L ukasiewicz operations we get: 0.5 ≤ S(Fun, A ⇒ B). It is easy to find relation D such that Fun(D, A, B) → A ⇒ BD = 0.5. Theorem 9 (Ben Yahia et al.). Let R be a relational scheme and A, B ⊆ R. For GFD given by (55) and (59): S(A ⇒ B, Fun) = 1 and S(Fun, A ⇒ B) = 0. Proof. The first result is a consequence of the antitony of → in the first argument, more precisely for any D and any r1 , r2 ∈ D we have: r1 (A) ≈ r2 (A) → r1 (B) ≈ r2 (B) ≤ (D(r1 ) ⊗ D(r2 ) ⊗ r1 (A) ≈ r2 (A)) → r1 (B) ≈ r2 (B). As a consequence, we obtain A ⇒ BD ≤ Fun(D, A, B). The second result is again a consequence of the fact that the rank is not involved in the definition of GFD and thus it is easy to find a relation D, such that A ⇒ BD = 0 and Fun(A, B) = 1. Theorem 10 (Bosc et al.). Let R be a relational scheme and A, B ⊆ R. For GFD given by (59): S(A ⇒ B, Fun) = 1 and S(Fun, A ⇒ B) = 1. Proof. Since authors proposed the definition of GFD for nonranked data tables, we have for all D: Fun(D, A, B) = A ⇒ BD . Theorem 11 (Belohlavek et al. case). Let R be a relational scheme and A, B ⊆ R. For the GFD defined by Equation (67) we have: S(A ⇒ B, Fun) = 1 and S(Fun, A ⇒ B) = 1. Proof. According to (68), ranks are involved in the definition of similarity itself and not in the definition of GFD, therefore it seems somehow inadequate to apply the criteria given by (79). Fortunately, the authors have proved in [26] that for each ranked data table D there exists a nonranked D (ranks come from {0, 1}) such that A ⇒ BD = A ⇒ BD . Therefore, for hedge being identity we obtain: Fun(D, A, B) = A ⇒ BD = A ⇒ BD . Theorem 12 (Cordero et al. case). Let R be a relational scheme and A, B ⊆ R. If the GFD is defined by Equation (70), then S(A ⇒ B, Fun) = 1 and S(Fun, A ⇒ B) = 1. Proof. Consequence of Lemma 1 and Theorem 11.
We have seen that although the definition of fuzzy function is natural and widely accepted, many approaches to GFD failed to satisfy the criterion given by Equation (79). One reason is that the definition of GFD usually remains crisp. Another reason is the fact that although many of the GFDs are defined for some rank-aware model, the ranks do not (usually) influence the validity of GFD.
29
6
Conclusion
Since its origin, the relational model has been studied to incorporate a sound treatment of uncertainty. The pivotal role that functional dependencies play in the original model is probably the reason why there are so many different attempts to generalize this notion. As we have seen, in some cases the definition of GFD stands alone in the sense that it is not connected to the extension of the model per se. For example, in [15] the GFD is defined for nonranked data tables, but ranks may appear after executing a query. Does the GFD remains the same for ranked data table or is the definition not applicable? Or, as we have already mentioned, the GFD proposed in [14] was not developed any further—even Armstrong axioms were not introduced. On the other hand, there are works that may be considered as a significant step toward generalization of Codd’s relational model and specifically FDs. It was the main aim of this work to concentrate on generalization of FD and Codd’s relational model for the cases where the similarity relation replaced the classical equality. This switch (from equality to similarity) is usually accompanied by the switch from precise to imprecise data (complex data domain) and with the introduction of ranks (the degree to which a tuple belongs to a table). We have provided an exhaustive review of all significant approaches to the notion of similarity based functional dependencies. We have established a criterion which eases the comparison of these approaches. The criterion given by Equation (79) gives a degree to which a particular GFD corresponds to fuzzy function. Since the classical functional dependency is in correspondence with the definition of a partial function, and since the definition of fuzzy function given by Equation (75) is widely accepted, we argue that the proposed measure is natural. It also gives us some nontrivial information about behavior of various generalizations. From almost 100 references included in this paper, 79 are devoted to present a definition of GFD (not all of them are different). We selected 12 of them which have significant impact on other authors and (or) which introduced interesting and new approaches to GFD: Buckles and Petry (1983), Prade and Testemale(1984), Raju and Majumdar (1988), Chen (1991), Bhuniya and Niyogi (1993), Cubero et al. (1994), Ben Yahia et al. (1999), Bosc et al. (1999), Tyagi et al. (2005), Kiss (1991), Belohavek and Vychodil (2006) and Cordero et al. (2011). The summary can be found in Table 4.
Authors/approach
GFD
[Imp]
Buckles and Petry [15] Prade and Testemale [79] Raju and Majumdar [81] Chen et al. [33] Bhuniya and Niyogi [7] Cubero et al. [42] Tyagi et al. [96] Kiss [60] Ben Yahia et al. [104] Bosc, Pivert and Ughetto [14]
(23) R-G imp. (32) R-G imp. (36) R-G imp. (41) Classical or G¨ odel (44) R-G imp. (51) R-G imp. (61) R-G imp. (63) L ukasiewicz (55) L ukasiewicz (59) Residuum
Belohavek and Vychodil [23]
(67)
Residuum
Cordero et al. [39]
(70)
Residuum
[TrGFD]
[Rank]
E(Fun, A ⇒ B)
{0, 1} {0, 1} {0, 1} [0, 1] {0, 1} {0, 1} {0, 1} [0, 1] [0] ∪ [θ, 1] [0, 1] Complete residuated lattice [0, 1]
No No Yes No Yes No Yes Yes Yes No
β ∧ (β → 0) 0 0 0 0 (83)∧(84) 0 0.5 0 1
Yes
1
Yes
1
Table 4. Compendium of similarity-based functional dependencies. In the [Imp] column the implication which is used in GFD is highlighted. The choice of implication influences the degree to which the GFD is true, this is shown in column [TrGFD]. The column [Rank] indicates if the GFD is defined for data table with ranks. In the last column, the degree to which a GFD corresponds to fuzzy function is presented.
As it is shown in the table, many approaches reduce the new (generalized) concept of FD to a bivalent one. This is usually done by introducing some extra parameter and by letting the
30
GFD be satisfied if some criteria exceed the parameter. Otherwise the GFD is not satisfied. We strongly believe that a proper approach (from the logical point of view) should consider a richer framework. The same applies for evaluating similarity-based queries. One approach is to let the result of similarity-based query be a classical data table (this usually means that a tuple appears in the resulting data table if some criteria based on tuple values is satisfied). Also here some parameter is usually involved. A second approach is to let the GFD be satisfied to some degree and to accept partial matches when evaluating queries. This means that the resulting data table contains ranks to indicate how much a tuple matches a query. This issue, together with exploration of different relational algebras, is a part of the future work. We would like to mention that interpretation of the rank is not usually “the degree to which a tuple matches a query” and that the meaning of the rank is usually not clear. We want to emphasize that the rank is not usually involved in the definition of functional dependency. This fact yields odd behavior: tuples with very low ranks may cause the GFD to be satisfied to low degree (even 0). Note that our criterion (Equation (79)) is able to capture this kind of behavior. Also the semantics of the rank does not usually influence the semantics of the GFDs. For example, Raju and Majumdar introduced two different interpretations of the rank and only one definition of GFD. These issues are closely connected to the usefulness of GFDs for data mining purposes. If the semantics of the ranks or the semantics of the GFD are not clearly given, one may not expect that such GFD will be used in practice. Also, the algorithmic issues for mining GFDs (e.g. algorithm for non-redundant bases of GFDs) in a given (ranked) data table are usually not discussed. One exception is the work proposed by Belohlavek and Vychodil, see [21, 26], where the authors introduced an algorithm for mining non-redundant base in a given ranked data table. We have focused mainly on semantics of GFDs and there are many interesting tasks that are left as a future work. The first one is a mutual relationship between different approaches, following the results from [23]. The second one is a deeper comparison of the inference systems for functional dependencies: given a set of FD, can anything new be inferred if generalized inference rules are used comparing to what can be inferred using Armstrong axioms? The third task is related with the relational languages: although relational algebra is usually defined, no completeness with respect to domain calculus is presented, and if so, the domain calculus is usually based on classical predicate logic, although [25] is an exception. The fourth issue we want to study more deeply is the usefulness of GFDs in database design and normalization problem. There are few works that addressed normalization [3,31,86] and lossless-join decomposition [32, 81]. The lossless-join decomposition is presented in the context of GFDs which remain bivalent. Regarding the normalization issue, in [86], the definition of normal forms remains exactly the same as in the classical case, only the GFD given by the Equation (61) is used instead of the classical FD in the definitions of normal forms. Note that the GFD remains crisp. A step further was done by Chen et al. [31], where the normal forms (called q-normal forms) are developed for many-valued GFD (given by the Equation (41)). Also Chen et al. in [29] proposed an algorithm for testing whether or not a decomposition preserves GFDs. On the other hand, Bosc et al. are critical about the role that GFDs could play in database design [8–10]. In [10] the authors focused on some GFDs studied here: definitions given by Equations (36), (44), (45), (51), (63), and (61). The authors led to the following general conclusion: these definitions “do not capture redundancy and are then useless for database design (in the usual sense).” We agree with the authors that GFDs are not useful for classical lossless decomposition and classical database design. In order to use GFDs for database design we need new concepts that can benefit from the fuzzy logic in narrow sense. There are for example attempts towards approximate decomposition, see [25], where the authors generalized the lossless decomposition to approximate decomposition and introduced a degree of decomposability of a table. In our opinion, this is a promising approach, because we believe that the new concepts related to generalized relational model (GFDs, relational algebra, domain calculus, decomposition, etc.) should not be transformed to the original (two-valued) concepts.
Acknowledgements P. Cordero and M. Enciso are supported by projects reg. no. TIN2011-28084 and TIN2014-59471P of the Science and Innovation Ministry of Spain and the European Social Fund. L. Jeˇzkov´a is
31
supported by project reg. no. CZ.1.07/2.3.00/20.0059 of the European Social Fund in the Czech Republic and by an internal grant from Palacky University no. PrF_2014_034.
References 1. R. Agrawal, T. Imieli´ nski, and A. Swami. Mining association rules between sets of items in large databases. SIGMOD Rec., 22(2):207–216, June 1993. 2. A. Aussem and J. M. Petit. e-functional dependency inference: application to dna microarray expression data. In BDA, 2002. 3. O. Bahar and A. Yazici. Normalization and lossless join decomposition of similarity-based fuzzy relational databases. Int. J. Intell. Syst., 19:885–917, October 2004. 4. J.F Baldwin and S.Q Zhou. A fuzzy relational inference language. Fuzzy Sets and Systems, 14(2):155 – 174, 1984. 5. F. Berzal, I. Blanco, D. S´ anchez, J. M. Serrano, and M. A. Vila. A definition for fuzzy approximate dependencies. Fuzzy Sets Syst., 149(1):105–129, January 2005. 6. F. Berzal-Galiano, J. C. Cubero, F. Cuenca, and J. M. Medina. Relational decomposition through partial functional dependencies. Data & Knowledge Engineering, 43:207–234, 2002. 7. B. Bhuniya and P. Niyogi. Lossless join property in fuzzy relational databases. Data & Knowledge Engineering, 11(2):109–124, 1993. 8. P. Bosc, D. Dubois, O. Pivert, and H. Prade. On the connection between fuzzy functional dependencies and redundancy. In Proc. of Fourth European Congress On Intelligent Techniques and Soft Computing, pages 803–805, 1996. 9. P. Bosc, D. Dubois, and H. Prade. Fuzzy functional dependencies - an overview and a critical discussion. In Proceedings of the Third IEEE International Conference on Fuzzy Systems, pages 325–330, 1994. 10. P. Bosc, D. Dubois, and H. Prade. Fuzzy functional dependencies and redundancy elimination. J. Am. Soc. Inf. Sci., 49:217–235, March 1998. 11. P. Bosc, L. Lietard, and O. Pivert. Functional dependencies revisited under graduality and imprecision. In Fuzzy Information Processing Society, 1997. NAFIPS ’97., 1997 Annual Meeting of the North American, pages 57–62, 1997. 12. P. Bosc, L. Lietard, and O. Pivert. Extended functional dependencies as a basis for linguistic summaries. In Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery, PKDD ’98, pages 255–263, London, UK, UK, 1998. Springer-Verlag. 13. P. Bosc and O. Pivert. About projection-selection-join queries addressed to possibilistic relational databases. Trans. Fuz Sys., 13(1):124–139, February 2005. 14. P. Bosc, O. Pivert, and L. Ughetto. Database mining for the discovery of extended functional dependencies. In Fuzzy Information Processing Society, 1999. NAFIPS. 18th International Conference of the North American, pages 580 –584, jul 1999. 15. B. P. Buckles and F. E. Petry. A fuzzy representation of data for relational databases. Fuzzy Sets and Systems, 7(3):213 – 226, 1982. 16. B. P. Buckles and F. E. Petry. Extending the fuzzy database with fuzzy numbers. Information Sciences, 34(2):145 – 155, 1984. 17. B. P. Buckles and F. E. Petry. Uncertainty models in information and database systems. Journal of Information Science, 11(2):77 – 87, 1985. 18. B. P. Buckles, F. E. Petry, and H. S. Sachar. Design of similarity-based relational databases. In Constantin V. Negoita Henri Prade, editor, Fuzzy logic in knowledge engineering, pages 3–17. TUV Rheinland, 1986. 19. R. Bˇelohl´ avek. Fuzzy Relational Systems: Foundations and Principles. Kluwer Academic Publishers, 2002. 20. R. Bˇelohl´ avek, P. Cordero, M. Enciso, A. Mora, and V. Vychodil. Automated prover for attribute dependencies in data with grades. International Journal of Approximate Reasoning, 70:51–67, 2016. 21. R. Bˇelohl´ avek and V. Vychodil. Data tables with similarity relations: Functional dependencies, complete rules and non-redundant bases. In Proceedings of the 11th International Conference on Database Systems for Advanced Applications, DASFAA’06, pages 644–658, Berlin, Heidelberg, 2006. Springer-Verlag. avek and V. Vychodil. Query systems in similarity-based databases: Logical foundations, 22. R. Bˇelohl´ expressive power, and completeness. In Proceedings of the 2010 ACM Symposium on Applied Computing, SAC ’10, pages 1648–1655, New York, NY, USA, 2010. ACM. 23. R. Bˇelohl´ avek and V. Vychodil. Codd’s relational model from the point of view of fuzzy logic. Journal of Logic and Computation, 21:851–862, 2011.
32
24. R. Bˇelohl´ avek and V. Vychodil. Relational algebra for multi-ranked similarity-based databases. In FOCI, pages 1–8. IEEE, 2013. 25. R. Bˇelohl´ avek and V. Vychodil. Relational similarity-based databases, part 1: Foundations and query systems. Submitted, 2014. 26. R. Bˇelohl´ avek and V. Vychodil. Relational similarity-based databases, part 2: Dependencies in data. Submitted, 2014. 27. G. Chen. Fuzzy functional dependencies and a series of design issues of fuzzy relational databases. In Fuzzines in Database Management Systems, pages 166–185. Physica Verlag, Heidelberg, 1995. 28. G. Chen, E. E. Kerre, and J. Vandenbulcke. A computational algorithm for the ffd transitive closure and a complete axiomatization of fuzzy functional dependencies. International Journal of Intelligent Systems, 9:421–439, 1994. 29. G. Chen, E. E. Kerre, and J. Vandenbulcke. The dependency-preserving decomposition and a testing algorithm in a fuzzy relational data model. Fuzzy Sets and Systems, 72(1):27–37, 1995. 30. G. Chen, E. E. Kerre, and J. Vandenbulcke. An extended boyce-codd normal form in fuzzy relational databases. In Fuzzy Systems, 1996., Proceedings of the Fifth IEEE International Conference on, volume 3, pages 1546–1551 vol.3, Sep 1996. 31. G. Chen, E. E. Kerre, and J. Vandenbulcke. Normalization based on fuzzy functional dependency in a fuzzy relational data model. Information Systems, 21:299–310, 1996. 32. G. Q. Chen, E. E. Kerre, and J. Vandenbulcke. On the lossless-join decomposition in a fuzzy relational data model. In Proceedings of International Symposium on Uncertainty Modeling and Analysis (ISUMA’93), pages 440–446. IEEE Press, 1993. 33. G. Q. Chen, J. Vandenbulcke, and E. E. Kerre. Fuzzy functional dependency and its axiomatic system in a fuzzy relational data model. In Proceedings of the International Conference on Information Processing and Management of Uncertainty (IPMU), pages 313–316, 1992. 34. G.Q. Chen. A step towards the theory of fuzzy relational database design. In Proc. of IFSA’91 World Congress, pages 44–47, 1991. 35. R. Cignoli, F. Esteva, L. Godo, and A. Torrens. Basic fuzzy logic is the logic of continuous t-norms and their residua. Soft Computing - A Fusion of Foundations, Methodologies and Applications, 4:106–112, 2000. 36. E. F. Codd. Extending the database relational model to capture more meaning. ACM Trans. Database Syst., 4(4):397–434, December 1979. 37. E. F. Codd. More commentary on missing information in relational databases (applicable and inapplicable information). SIGMOD Rec., 16(1):42–50, March 1987. 38. P. Cordero, M. Enciso, A. Mora, and I. Perez de Guzman. A complete axiomatic system for fuzzy functional dependencies over domains with similarity relations. Lecture Notes Computer Science IWANN 09, 5517:261–269, 2009. 39. P. Cordero, M. Enciso, A. Mora, I. Perez de Guzman, and J.M. Rodriguez-Jimenez. Specification and inference of fuzzy attributes. In Foundations of Computational Intelligence (FOCI), 2011 IEEE Symposium on, pages 107–114, 2011. 40. J. C. Cubero, J. M. Medina, O. Pons, and M. A. Vila. Non-transitive fuzzy dependencies (i). Fuzzy Sets Syst., 106:401–431, September 1999. 41. J. C. Cubero, J. M. Medina, O. Pons, and M. A. Vila. Transitive fuzzy dependencies (ii). Fuzzy Sets Syst., 106:433–448, September 1999. 42. J. C. Cubero and M. A. Vila. A new definition of fuzzy functional dependency in fuzzy relational databases. International Journal of Intelligent Systems, 9(5):441–448, 1994. 43. T. H. Dang and D. K. Tran. Comments on fuzzy data dependencies and implication of fuzzy data dependencies. Fuzzy Sets and Systems, 148(1):153–156, 2004. Web Mining Using Soft Computing. 44. C. J. Date. Relational Database:Selected Writings. Addison Wesley Publishing Company, 1986. 45. C. J. Date. Date on Database: Writings 2000–2006. Apress, 2006. 46. C. J. Date. Database Design and Relational Theory: Normal Forms and All That Jazz. O’Reilly Media; 1 edition, 2012. 47. M. Demirci. Fuzzy functions and their applications. Journal of Mathematical Analysis and Applications, 252(1):495 – 517, 2000. 48. D. Dubois and H. Prade. Certainty and uncertainty of (vague) knowledge and generalized dependencies in fuzzy databases. In Fuzzy Engineering Toward Human Friendly Systems, pages 239–249. IOS Press, 1992. 49. D. Dubois and H. Prade. Gradualness, uncertainty and bipolarity: Making sense of fuzzy sets. Fuzzy Sets and Systems, 192(0):3 – 24, 2012. 50. F. Esteva, L. Godo, and C. Noguera. A logical approach to fuzzy truth hedges. Information Sciences, 232:366–385, 2013. 51. R. Fagin. Functional dependencies in a relational database and propositional logic. IBM J. Res. Dev., 21(6):534–544, November 1977.
33
52. R. Fagin. Fuzzy queries in multimedia database systems. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS ’98, pages 1–10, New York, NY, USA, 1998. ACM. 53. W. Fan, H. Gao, X. Jia, J. Li, and S. Ma. Dynamic constraints for record matching. The VLDB Journal, 20(4):495–520, August 2011. 54. S. Gottwald. Fuzzy uniqueness of fuzzy mappings. Fuzzy Sets and Systems, 3(1):49 – 74, 1980. 55. T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In Proceedings of the Twentysixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’07, pages 31–40, New York, NY, USA, 2007. ACM. 56. M. Hajdinjak and G. Bierman. Extending relational algebra with similarities. Mathematical. Structures in Comp. Sci., 22(4):686–718, August 2012. 57. P. H´ ajek. Metamathematics of Fuzzy Logic. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1998. 58. P. H´ ajek. On very true. Fuzzy Sets and Systems, 124(3):329–333, 2001. 59. T. N. Hung and E. Walker. A first course in fuzzy logic. CRC Press, 1997. 60. A. Kiss. λ decomposition of fuzzy relational databases. Annales Univ. Sci. Budapest, 12:133–142, 1991. 61. J. Kivinen and H. Mannila. Approximate dependency inference from relations. In Proceedings of the 4th International Conference on Database Theory, ICDT ’92, pages 86–98, London, UK, UK, 1992. Springer-Verlag. 62. F. Klawonn. Fuzzy points, fuzzy relations and fuzzy functions. In Vil´em Nov´ ak and Irina Perfilieva, editors, Discovering the World with Fuzzy Logic, pages 431–453. Physica-Verlag GmbH, Heidelberg, Germany, Germany, 2000. 63. W. H. Lee and C. T. Pang. An extension of semantic proximity for fuzzy functional dependencies. In The 28th North American Fuzzy Information Processing Society Annual Conference (NAFIPS2009), 2009. 64. J.Y. C. Liu and C. H. Huang. Handling missing data in extended possibility-based fuzzy relational databases. In Innovations in Bio-Inspired Computing and Applications (IBICA), 2012 Third International Conference on, pages 57 –62, sept. 2012. 65. W. Y. Liu. Extending the relational model to deal with fuzzy values. Fuzzy Sets Syst., 60:207–212, December 1993. 66. W. Y. Liu. Constraints on fuzzy values and fuzzy functional dependencies. Information Sciences, 78(3-4):303–309, 1994. 67. W. Y. Liu. Fuzzy data dependencies and implication of fuzzy data dependencies. Fuzzy Sets Syst., 92:341–348, December 1997. 68. Z. M. Ma, W. J. Zhang, W. Y. Ma, and F. Mili. Data dependencies in extended possibility-based fuzzy relational databases. International Journal of Intelligent Systems, 17(3):321–332, 2002. 69. D. Maier. Theory of Relational Databases. Computer Science Pr, Rockville, MD, USA, 1983. 70. J. M. Medina, M. A. Vila, J. C. Cubero, and O. Pons. Towards the implementation of a generalized fuzzy relational database model. Fuzzy Sets Syst., 75:273–289, November 1995. 71. A. Melton and S. Shenoi. Fuzzy relations and fuzzy relational databases. Computers & Mathematics with Applications, 21(11-12):129–138, 1991. 72. N. Mouaddib and N. Bonanno. New semantics for the membership degree in fuzzy databases. In Uncertainty Modeling and Analysis, 1995, and Annual Conference of the North American Fuzzy Information Processing Society. Proceedings of ISUMA - NAFIPS ’95., Third International Symposium on, pages 655 –660, sep 1995. 73. K. Myszkorowski. Analysis of fuzzy n-ary relations with the use of interval-valued fuzzy functional dependencies. International Journal of General Systems, 42, 2013. 74. M. Nakata. Dependencies in fuzzy databases: functional dependency. In Proceedings of 1995 IEEE International Conference on Fuzzy Systems, volume 2, pages 757–764, Yokohama, Japan, 1995. 75. J. Pavelka. On fuzzy logic I: Many-valued rules of inference. Mathematical Logic Quarterly, 25(3– 6):45–52, 1979. 76. J. Pavelka. On fuzzy logic II: Enriched residuated lattices and semantics of propositional calculi. Mathematical Logic Quarterly, 25(7–12):119–134, 1979. 77. J. Pavelka. On fuzzy logic III: Semantical completeness of some many-valued propositional calculi. Mathematical Logic Quarterly, 25(25–29):447–464, 1979. 78. H. Prade. Lipski’s approach to incomplete information data bases restated and generalized in the setting of zadeh’s possibility theory. Information Systems, 9(1):27–42, 1984. 79. H. Prade and C. Testemale. Generalizing database relational algebra for the treatment of incomplete or uncertain information and vague queries. Information Sciences, 34:115–143, 1984. 80. K. V. S. V. N. Raju and A. K. Majumdar. The study of joins in fuzzy relational databases. Fuzzy Sets Syst., 21(1):19–34, January 1987.
34
81. K. V. S. V. N. Raju and A. K. Majumdar. Fuzzy functional dependencies and lossless join decomposition of fuzzy relational database systems. ACM Trans. Database Syst., pages 129–166, 1988. 82. D. Rasmussen and R. R. Yager. Finding fuzzy and gradual functional dependencies with summarysql. Fuzzy Sets and Systems, 106(2):131 – 142, 1999. 83. A. Y. Rodr´ıguez-Gonz´ alez, J. F. Mart´ınez-Trinidad, J. A. Carrasco-Ochoa, and J. Ruiz-Shulcloper. Mining frequent patterns and association rules using similarities. Expert Syst. Appl., 40(17):6823– 6836, December 2013. 84. A. N. Saharia and T. M. Barron. Approximate dependencies in database systems. Decision Support Systems, 13(3-4):335–347, March 1995. 85. D. S´ anchez, J. M. Serrano, I. Blanco, M. J. Martin-Bautista, and M. A. Vila. Using association rules to mine for strong approximate dependencies. Data Mining and Knowledge Discovery, 16(3):313–348, 2008. 86. P. C. Saxena and D. K. Tayal. Normalization in type-2 fuzzy relational data model based on fuzzy functional dependency using fuzzy functions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 20(1):99–138, 2012. 87. P. C. Saxena and B. K. Tyagi. Fuzzy functional dependencies and independencies in extended fuzzy relational database models. Fuzzy Sets and Systems, 69(1):65–89, 1995. 88. A.K. Sharma, A. Goswami, and D.K. Gupta. Fuzzy inclusion dependencies in fuzzy relational databases. In Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on, volume 1, pages 507 – 510 Vol.1, april 2004. 89. S. Shenoi and A. Melton. An extended version of the fuzzy relational database model. Information Sciences, 52:35–52, 1990. 90. S. Shenoi, A. Melton, and L. T. Fan. An equivalence classes model of fuzzy relational databases. Fuzzy Sets and Systems, 38(2):153–170, 1990. 91. M. Shirvanian and W. Lippe. Optimization of the normalization of fuzzy relational databases by using alternative methods of calculation for the fuzzy functional dependency. In 2008 IEEE International Conference on Fuzzy Systems, pages 15–20, 2008. 92. M. I. Sozat and A. Yazici. A complete axiomatization for fuzzy functional and multivalued dependencies in fuzzy database relations. Fuzzy Sets and Systems, 117:161–181, 2001. 93. Y. Takahashi. Fuzzy database query languages and their relational completeness theorem. Knowledge and Data Engineering, IEEE Transactions on, 5(1):122–125, Feb 1993. 94. G. Takeuti and S. Titani. Globalization of intuitionistic set theory. Annals of Pure and Applied Logic, 33:195–211, 1990. 95. G Tr´e, R. Caluwe, and H. Prade. Null values in fuzzy databases. J. Intell. Inf. Syst., 30(2):93–114, April 2008. 96. B. K. Tyagi, A. Sharfuddin, R. N. Dutta, and D. K. Tayal. A complete axiomatization of fuzzy functional dependencies using fuzzy function. Fuzzy Sets and Systems, 151:363–379, 2005. 97. M. Umano. Freedom-O: A fuzzy database system. In Gupta Sanchez, editor, Fuzzy Information and Decision Processes, pages 339–347. North-Holand Pub. Comp., 1982. 98. M. Umano. Retrieval from fuzzy databases by fuzzy relational algebra. In G. Sanchez, editor, Fuzzy Information Knowledge Representation and Decision Analysis, pages 1–6. Pergamon Press, Oxford, 1983. 99. M. Vucetic and M. Vujosevic. A literature overview of functional dependencies in fuzzy relational database models. Technics Technologies Education Management-TTEM, 7(4):1593–1604, 2012. 100. S. L. Wang, J. W. Shen, and T. P. Hong. Mining fuzzy functional dependencies from quantitative data. In Systems, Man, and Cybernetics, 2000 IEEE International Conference on, volume 5, pages 3600 –3605 vol.5, 2000. 101. S. L. Wang, J. S. Tsai, and B. C. Chien. Mining approximate dependencies using partitions on similarity-relation-based fuzzy databases. In Systems, Man, and Cybernetics, 1999. IEEE SMC ’99 Conference Proceedings. 1999 IEEE International Conference on, volume 5, pages 871 –875 vol.5, 1999. 102. Q. Wei and G. Chen. Efficient discovery of functional dependencies with degrees of satisfaction. Int. Journal Intelligent Systems, 19:1089–1110, November 2004. 103. S. B. Yahia and A. Jaoua. Mining linguistic summaries of databases using based lukasiewicz implication fuzzy functional dependency. In Fuzzy Systems Conference Proceedings, 1999. FUZZ-IEEE ’99. 1999 IEEE International, volume 3, pages 1246 –1250 vol.3, aug. 1999. 104. S. B. Yahia, H. Ounalli, and A. Jaoua. An extension of classical functional dependency: dynamic fuzzy functional dependency. Information Sciences, 119(3-4):219 – 234, 1999. 105. A. Yazici, E. Gocmen, B.P. Buckles, R. George, and F.E. Petry. An integrity constraint for a fuzzy relational database. In Proc. of Second IEEE Int. Conf. on Fuzzy Systems 1, pages 496–499, 1993. 106. A. Yazici and M. I. Sozat. The integrity constraints for similarity-based fuzzy relational databases. International Journal of Intelligent Systems, 13:641–659, 1998.
35
107. L. A. Zadeh. A computational approach to fuzzy quantifiers in natural languages. Computers & Mathematics With Applications, 9:149–184, 1983. 108. L.A. Zadeh. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems, 1(1):3 – 28, 1978. 109. L.A. Zadeh. Pruf—a meaning representation language for natural languages. International Journal of Man-Machine Studies, 10(4):395 – 460, 1978. 110. F. Zhao and Z.M. Ma. Functional dependencies in vague relational databases. In 2006 IEEE International Conference on Systems, Man, and Cybernetics, pages 4006–4010, 2006. 111. A. Zvieli. A fuzzy relational calculus. In Expert Database Conf.’86, pages 311–326, 1986.