Rough set analysis of a general type of fuzzy data using transitive aggregations of fuzzy similarity relations

Rough set analysis of a general type of fuzzy data using transitive aggregations of fuzzy similarity relations

Fuzzy Sets and Systems 139 (2003) 635 – 660 www.elsevier.com/locate/fss Rough set analysis of a general type of fuzzy data using transitive aggregati...

356KB Sizes 3 Downloads 41 Views

Fuzzy Sets and Systems 139 (2003) 635 – 660 www.elsevier.com/locate/fss

Rough set analysis of a general type of fuzzy data using transitive aggregations of fuzzy similarity relations J.M. Fern'andez Salido∗ , S. Murakami Department of Computer Science, Faculty of Engineering, Kyushu Institute of Technology, 1-1 Sensui-cho, Tobata-ku, Kitakyushu-city, Fukuoka 804-8550, Japan Received 5 September 2002; received in revised form 13 January 2003; accepted 11 March 2003

Abstract In this paper, we initially present and compile the tools and theories that are necessary for a Rough Set analysis of a general type of fuzzy data. From this study, we have developed two original contributions that can enhance the application of Fuzzy Rough Sets techniques to data that may be a.ected by several sources of uncertainty in its measurement process. For this type of analysis, we have /rst considered the construction of homogeneous granules in which re0exivity, symmetry and a certain transitivity property are satis/ed. This has encouraged us to study the application of T -similarity relations in this framework and to examine the available theorems that permit us to aggregate this type of relations. As a part of this study, we have also developed a new theorem that allows us to use the dual of the generalized means as an aggregation operator in the Fuzzy Rough Sets context. Another theoretical development that tries to respond to a practical problem is the concept of -precision aggregation. We use this new notion to export Ziarko’s Variable Precision Rough Set model to the analysis of fuzzy data, a generalization that becomes a necessity for the Fuzzy Rough Sets analysis of large databases. The utility of this approach is demonstrated with an application example in the Vehicle database donated by the Turing Institute. c 2003 Elsevier B.V. All rights reserved.  Keywords: Fuzzy Rough Sets; T-similarity relations; Variable Precision Rough Sets; -precision aggregation

1. Introductory remarks In the past several years, numerous papers have been published that deal with the relationships that exist between the theories of fuzzy sets and rough sets, and with the possibility of combining them [9,10,26,33]. ∗

Corresponding author. Tel.: +81-938843244; fax: +81-938715835. E-mail addresses: [email protected] (J.M. Fern'andez Salido), [email protected] (S. Murakami).

c 2003 Elsevier B.V. All rights reserved. 0165-0114/03/$ - see front matter  doi:10.1016/S0165-0114(03)00124-6

636

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

One of the conclusions of most of this research is that both theories are complementary, as they represent di.erent aspects of uncertainty. Indeed, fuzzy sets theory is an optimal tool in knowledge engineering to deal with the vagueness implicit in human language. Rough sets theory, on the other hand, is mainly concerned with the coarseness with which a set or concept can be de/ned. As the information sources used to describe a set are always limited, some of its members may have the same description and be indiscernible or indistinguishable. In knowledge and information systems, this can be a reason for inconsistencies, as objects with the same description (indiscernibles) may be classi/ed under di.erent categories. The analysis of these inconsistencies is the basis of Rough Sets theory, which has proven its usefulness in the /elds of optimal feature selection and knowledge discovery in databases. In practice, the extraction of knowledge from a set of measured data will face both problems: the attributes of this data can be presented in linguistic terms (which will usually be subjective and vague), and a rough description of the data elements may be a cause for indiscernibility among some of them. For this reason, several extensions of the original and deterministic rough sets theory have been proposed that can handle data that may have a fuzzy nature. This has resulted in the so called Fuzzy Rough Sets concept, which can be conceived under several forms and from which an increasing number of applications has been derived [31]. In the Fuzzy Rough Sets literature that we have examined, the main analysis is focused on the most simple type of fuzzy data [3,9,21]. Here, every sample xi (i ∈ [1; N ]) from a dataset of N objects is described by NF characteristics or features that are fuzzy, that is, they have been measured as partial membership degrees, ij (j = 1 · · · NF ), which are graded in the interval [0; 1]: xi = [i1 ; i2 ; : : : ; iNF ]:

(1)

For the ij values to be consistent, it is usually required that they form a weak fuzzy partition, as it is explained in Section 3. As an example, consider the representation that Bodjanova uses in [3]. This is a Fuzzy Rough Sets analysis of credit card applicants, in which each applicant is described by eight fuzzy characteristics: C1 = low bank balance, C2 = medium bank balance, C3 = high bank balance, C4 = low monthly expenses, C5 = medium monthly expenses and C6 = high monthly expenses. Applicant No. 1 in this example is described by the values: x1 = [0; 0:8; 0:3; 0:7; 0:2; 0]: This paper has been motivated by our need to make a rough set analysis of a more general type of fuzzy data than the one expressed by Eq. (1). This is the kind of fuzzy data, which appears in many classi/cation problems and pattern matching processes, in which every feature is not expressed by a single degree of membership, but by a set of membership values to several linguistic labels. These labels provide a qualitative description of a feature’s state, such as “largely increased”, “normal”, “very decreased” or any other linguistic attribute. Therefore, each feature is, in fact, described by a fuzzy partition. Formally, in a dataset of N samples, sample xi (i ∈ [1; N ]) is described by NF features, while the value of each feature j is expressed by a set of NLL grades of membership, ijk (k = 1 · · · Nc LL), to

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

637

NLL linguistic labels. Thus, sample xi can be characterized by the following values: xi = [(i11 ; : : : ; i1k ; : : : ; i1NLL ); (i21 ; : : : ; i2k ; : : : ; i2NLL ); : : : ; (ij1 ; : : : ; ijk ; : : : ; ijNLL ); : : : (iNF 1 ; : : : ; iNF k ; : : : ; iNF NLL )]

(2)

where i = 1 · · · N , j = 1 · · · NF , k = 1 · · · NLL and ij ∈ [0; 1]. (For simplicity in the notation, we assume the same number of linguistic labels, NLL , for all features). Under this nomenclature, in Bodjanova’s example only two features would be considered: C1 = bank balance and C2 = monthly expenses, with values expressed as degrees of memberships to the partition “low”, “medium”, “high”. Applicant No. 1 would then be expressed as x1 = [(0; 0:8; 0:3); (0:7; 0:2; 0)]: It may be argued that the information given by Eq. (2) can also be expressed under the form of Eq. (1), (as it has been done in the given example) and that the standard Fuzzy Rough Sets methods can then be applied to this data. However, we may also want to consider the situation in which every feature has a di.erent importance, which can be modeled by weights, in order to establish the indiscernibility or indistinguishability of two samples. Each weight should then be applied to a feature, which is in fact represented by a fuzzy partition. The introduction of feature importances will not be possible if the original fuzzy data (Eq. (2)) is transformed to the expression of Eq. (1), as the values of each feature’s partition attributes will become, under this notation, independent features themselves. As far as we know, only Slowinski and Stefanowski [41] have considered the problem of carrying out a Rough Set analysis for this type of data. They propose a method to transform the fuzzy dataset into a new set of crisp objects in which the standard Rough Sets tools can be applied. However, Section 3 explains how their approach may be too demanding computationally when the number of features is high. For this reason, we have considered the problem of how other Fuzzy Rough Set approaches found in the current literature can be applied to the fuzzy data represented by Eq. (2). From our documentation work, we have sensed that most of these theories lack a sense of connection, and that, although most of these papers take as main reference (Dubois, Prade and Fari˜nas del Cerro’s seminal papers [9,10,12]) they fail to cite relationships among themselves and to other crisp extensions of Pawlak’s original model. Furthermore, we feel that most of these studies are centered on the theoretical aspects of this problem, disregarding some practical problems that may arise while handling real data. First of all, the practical application of these theories to a set of fuzzy data requires the construction of either a fuzzy similarity relation (and the aggregation of these relations) or of a fuzzy partition. In this respect, many interesting results can be found in papers originally unrelated to Rough Sets theory, which can be of use in our context. We have tried to make a comprehensive presentation of all these tools in one paper, always bearing in mind that we are dealing with fuzzy data of the Eq. (2) type. Sections 3 and 4 are concerned with this. In particular, we have given particular importance to the aggregation of T -transitive similarity relations in the Fuzzy Rough Sets framework. For this, we have found conditions on which these relations can be aggregated using standard compensated and weighted operators like the generalized means [11] and its dual. These functions are well known

638

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

for their 0exibility to model a wide range of aggregators comprised between the minimum and the maximum operators. Another important theoretical development has been the concept of -precision Aggregation which, it shall be argued, is necessary for the construction of set approximations in problems with high cardinality. A theoretical examination of this concept is developed in another paper [13], although it is also introduced here for the de/nition of a new Fuzzy Rough Sets model. The usefulness of these proposals is demonstrated with an application example, with which this paper closes. Prior to this, a review of the basic concepts of the original Rough Set theory is necessary for this paper to be self-contained. The next section is concerned with this. 2. Basic concepts of crisp Rough Sets theory Rough Sets theory was started in the early 80s by Pawlak and his collaborators. Since then, many generalizations of this idea have been explored. The original proposal, of which we shall give an overview, is based on a simple concept of set approximation based on a crisp indiscernibility relation. Let U stand for the universal set of objects, which are described by a set of attributes. An indiscernibility relation R on U is an equivalence relation (re0exive, symmetric and transitive) whose equivalence classes, U=R, are composed of objects with the same measured attributes, which makes them indistinguishable in this context. For every x ∈ U , we shall represent by [x]R the equivalence class of x in U=R. If A represents an arbitrary set of U , then the equivalence classes of U=R may be conceived as granules which can be used as building blocks to form an approximation of A from the inside, and another one from the outside: R∗ (A) = {x | [x]R ⊆ A};

(3)

R∗ (A) = {x | [x]R ∩ A = ∅}:

(4)

Here, using Dubois and Prade’s notation [9,10], R∗ (A) is called the lower approximation of A by R, while R∗ (A) is the upper approximation of A by R. These two sets can be used to provide a rough approximation of A based on the indiscernibility relation R. The di.erence set between both approximations, R∗ (A) − R∗ (A) is called the boundary set of A. The construction, in this way, of lower and upper approximations for a speci/c set is the basis of Rough Sets theory. From Eqs. (3), (4), the physical meaning of both approximations is easy to understand: R∗ (A) is the set of all elements of U which, according to the information provided by R, belong with certainty to A. On the other hand, R∗ (A) is the set of all the elements that may possibly be included in A (always according to R). These approximations can be of great use for the construction of certain and possible classi/cation rules. As Dubois and Prade point out, the philosophy behind this approach is closely related with Dempster–Shafer’s theory of evidence [39]. Pawlak also explains [34] how some related measures can be useful to provide an idea of how accurate is the information given by R for a certain classi/cation. For example, if A represents a set and R is an indiscernibility relation, then R (A) =

cardR∗ (A) cardR∗ (A)

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

639

is called the accuracy of the approximation of A by R. According to Pawlak, this measure is intended to capture the degree of completeness of our knowledge about the set A. This concept can also be extended to a partition of sets. Let F = A1 ; A2 ; : : : ; An be a family of nonempty sets (that is, a classi/cation). Then, two measures can be used to describe the inexactness of the approximation of the classi/cation. The accuracy of the approximation of F by R is  card(R∗ (Ai )) : (5) R (F) = i ∗ i card(R (Ai )) And the quality of the approximation of F by R is de/ned as  card(R∗ (Ai )) : R (F) = i card(U )

(6)

The correct application of these concepts can be extremely useful for the selection of attributes that are better suited for a certain classi/cation or for the formation of classi/cation rules in the context of knowledge discovery. We shall end this section with the examination of two basic extensions of this theory, which will be needed for the application of these concepts to fuzzy data. Many of the numerous extensions of Pawlak’s original de/nition of the lower and upper approximation of a set deal with a relaxation of the concept of indiscernibility relation, on which the theory is founded. For example, Slowinski and Vanderpooten [42,43] proposed the substitution of the equivalence relation (re0exive, symmetric, transitive) that models indiscernibility, with a relation for which only the re0exivity property is required. They call this type of relation a Similarity Relation (crisp case), as they argue that, according to Tversky’s seminal paper on the mathematical treatment of similarity [46], in some realworld situations the propagation of similarity does not hold and the transitivity property should not be required. Furthermore, in their opinion, similarity is often considered as similarity from a reference object, with symmetry not being essential (As an example given by Tversky, it is generally assumed that North Korea is politically similar to China, but not so often said that China is similar to North Korea). Therefore, Slowinski and Vanderpooten propose the following rede/nition of Pawlak’s approximation for set A (Eqs. (3), (4)) when, instead of crisp equivalence, only the re0exivity property is required from relation R: R∗ (A) = {x | R−1 (x) ⊆ A};  R∗ (A) = R(x);

(7) (8)

x ∈A

where R(x) is the set of objects which, according to the re0exive crisp relation R, are similar to x. On the other hand, R−1 (x) represents the set of objects to which x is similar according to R. If R is an equivalence relation, then R(x) = R−1 (x). In their paper, these authors prove that these de/nitions are the most appropriate for their similarity concept. They also show how these expressions will be reduced to Eqs. (3) and (4) when the relation R is also symmetric and transitive.

640

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

Another important extension with which we shall deal is Ziarko’s Variable Precision Rough Set Model [18,54]. This model is based on a relaxation of the concept of set inclusion, and can be specially meaningful in classi/cation problems in which some data has been misclassi/ed due to human error or noise. The best way to explain the rationale behind the Variable Precision Rough Set Model is through a simple example. Let us take a basic case in which the data with which we are dealing is described by only one attribute, a. Let us suppose as well that a certain set, A, can be perfectly de/ned through the value of this attribute, which becomes 1 when an element belongs to the set and 0 when it does not. If we apply Pawlak’s lower approximation de/nition to this set, it will be coincident with the set, and the quality of classi/cation will become 1. From this, a certain rule can be extrapolated: if attribute a takes a value of 1, then the element belongs to A. However, if by some error or noise, this attribute takes a value of 0 in one of the set’s elements, then the set’s lower approximation will be the null set, and the quality of classi/cation, 0. It should be noted that this e.ect does not depend on the cardinality of A, and that it will take place independent of whether there is an error in a population of two elements or an error in a million. This serves to illustrate the lack of robustness to noise of Pawlak’s original model. Ziarko’s extension approaches this problem with a relaxation of the de/nition of set inclusion. According to this relaxation, the degree of inclusion of set A in B, I (A; B) is de/ned as [54]  card(A ∩ B)=card(A) if card(A) ¿ 0; I (A; B) = (9) 1 if card(A) = 0: Based on this, it is said that A is included in B with an admissible classi/cation error of , 

A ⊆ B i. I (A; B) ¿ 1 − : Applying this concept to Pawlak’s expressions, they can be rede/ned as 

R∗ (A) = {x | [x]R ⊆ A}; 1− 

R∗ (A) = {x | [x]R ⊆ A}:

(10) (11)

Using these formulas, certain and possible classi/cation rules can be extrapolated that admit a classi/cation error of . Ziarko’s extension becomes a necessity for a Rough Set analysis of large databases, as some misclassi/cation errors are certain to come up. As it shall be reviewed in the next section, this proposal has already inspired one of the Rough Sets extensions for fuzzy data. However, in order to apply Ziarko’s model to the most general approaches for Fuzzy Rough Sets theory, the concept of -precision aggregation will have to be introduced. 3. Extension of Rough Sets theory to fuzzy data: Fuzzy Rough Sets Since the initial conception of Rough Sets theory, the extension of its principles to a fuzzy environment has been a topic of study. This stems from the fact that often the description of many

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

641

database items may be linguistic and have a fuzzy nature. Other times, some attributes are measured in a continuous domain, and their values are described according to a partition of this domain, which is discretized in intervals. The application results of Rough Sets theory to a set of data will depend greatly on the values taken by the limits that de/ne these intervals [40]. Smoothing these limits through fuzzy membership functions is a good way to improve the system’s robustness to small changes in the measured data. As we said, we are interested in the application of Rough Sets theory to the general type of fuzzy data described by Eq. (2). Slowinski and Stefanowski already proposed an approach to handle this data in the framework of Rough Sets [41]. Brie0y explained, their proposal consists in substituting every object of a database, whose attributes are represented by membership degrees to a set of linguistic labels (Eq. (2)), with several subobjects that represent the possible combinations of linguistic labels in which a non-zero degree of membership has been measured. A modi/ed version of crisp rough set theory can then be applied to this new set of data. This can be clari/ed with a simple example. Let us suppose that a sample x1 is described by two attributes, each expressed by degrees of memberships to two linguistic labels (for example, increased and not increased). If the values x1 = [(0:6; 0:4); (1; 0)] had been measured, then this object would be substituted by two sub-objects, x11 = [0:6; 1] and x12 = [0:4; 1], which are the possible combinations of non-zero values. In a more complex case, if x1 = [(0:6; 0:4); (0:7; 0:3)] were the measured values, then x1 would be substituted by four sub-objects, x11 = [0:6; 0:7], x12 = [0:6; 0:3], x13 = [0:4; 0:7] and x14 = [0:4; 0:3]. The obvious drawback to this approach is that it is only eRcient when the number of measured attributes and of possible linguistic labels is small. If, for instance, we measure 10 attributes, each with two associated linguistic labels, then, in a worst case scenario, each sample would be substituted by 210 = 1024 sub-samples, which would be too demanding computationally. Furthermore, the importance of the di.erent features towards the indistinguishability or similarity of the samples is not taken into account. For these reasons, we have examined how other approaches for Rough Sets analysis of fuzzy data may be applied to the type of data expressed by Eq. (2). Although several Fuzzy Rough Sets proposals have been introduced in recent papers, many of these concentrate on their topological and algebraic properties [5,22,23]. From a practical point of view, we want to point out the characteristics of three di.erent approaches. All of them are based on the same concept: the substitution, in Pawlak’s original set approximation de/nitions (Eqs. (3) and (4)), of the crisp equivalence relation R by a fuzzy relation S, on which several conditions are imposed. The most general approach, as taken by Greco et al. [16] only requires that S be a re0exive fuzzy relation (S(x; x) = 1). Further restrictions are imposed by Dubois and Prade [9,10], who demand that S be a T -similarity relation, i.e. a fuzzy relation, S : U × U → [0; 1] for which the following conditions should hold: • Re0exivity: ∀x ∈ U; S(x; x) = 1. • Symmetry: ∀x; y ∈ U; S(x; y) = S(y; x). • T-transitivity: ∀x; y; z ∈ U; S(x; z)¿T (S(x; y); S(y; z)). Here, T is a T-norm [38], that is, a commutative, monotonic and associative aggregation operator, T (x; y) : [0; 1] × [0; 1] → [0; 1], that satis/es the boundary condition T (a; 1) = a. If T is the minimum operator, then this de/nition coincides with Zadeh’s original expression for similarity relations [50].

642

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

Instead of directly substituting R with a fuzzy relation, its set of equivalence classes (quotient set), U=R, can also be replaced by a family  of fuzzy sets F1 ; F2 ; : : : ; Fn , for which it is usually required that they form a weak fuzzy partition, that is, the following conditions should be satis/ed: • inf x maxi=1; n Fi (x)¿0, • ∀i; j; supx min(Fi (x), Fj (x))¡1. The /rst requirement ensures that  covers all of U , while the second one imposes a disjointness condition between the Fi . Using either a fuzzy similarity relation, S, or a weak fuzzy partition, , the following three approaches have been proposed: (1) Approach based on possibility theory (Dubois, Prade and Fari˜nas del Cerro). This is probably the most cited approach to Fuzzy Rough Sets, which was introduced in the seminal papers [9,10,12]. According to this proposal, if F represents a fuzzy set with membership function F and S is a fuzzy similarity relation (which here is assumed to be re0exive, symmetric and T -transitive) with membership degree S (x; y), then the upper and lower approximations of F in regard to S can be calculated as the degrees of necessity and possibility of F (in the sense of Zadeh [51,8]) taking as referential the equivalence classes of S. This can be expressed as S ∗ (F) (x) = sup min(F (y); s (x; y));

(12)

S ∗ (F) (x) = inf max(F (y); 1 − s (x; y)):

(13)

y

y

If S is crisp, the re0exivity, symmetry and T -transitivity requirements would make it a crisp equivalence relation (Note that this does not depend on T ). These equations would then be reduced to Pawlak’s original de/nitions (Eqs. (3) and (4)). Greco et al. [16] presented a fuzzy extension of Slowinski and Vanderpooten’s proposal [42] in which, as we recall from Section 2, only crisp re0exive relations were used (as opposed to Pawlak’s equivalence relations). According to their proposal, if T (x; y) represents a T-norm, C(x; y) its associated T-conorm, N (x) is a negation operator, S(x; y) a fuzzy re0exive relation (which does not have to be symmetric or transitive), and F a fuzzy set with membership function F (x; y), then the upper and lower approximations of S can be de/ned as S ∗ (F) (x) = C T (F (y); s (x; y));

(14)

S∗ (F) (x) = T C(F (y); N (s (x; y))):

(15)

y

y

We can see that these equations become Dubois and Prade’s expressions (Eqs. (12) and (13)) when T and C represent the standard intersection and union operators (minimum and maximum) and S is also symmetric and transitive for a certain T-norm, T  , which does not necessarily have to coincide with T . Using logic transformations, these equations can also be expressed in terms of T-norms and related implication operators. Based on this, Radzikowska and Kerre have recently carried out an exhaustive formal study on the theoretical properties of these Fuzzy Rough Sets [35].

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

643

Ziarko’s Variable Precision Rough Set model [54], mentioned in Section 2, can also be introduced in Dubois and Prade’s Fuzzy Rough Set framework. To do this, Eqs. (12) and (13) should be rewritten in the following form: S ∗ (F) (x) = max(F (x); IS ∗ (F) (x));

(16)

S∗ (F) (x) = min(F (x); IS∗ (F) (x));

(17)

IS ∗ (F) (x) = maxy min(F (y); s (x; y));

(18)

IS∗ (F) (x) = miny max(F (y); 1 − s (x; y)):

(19)

with (y=x)

(y=x)

IS∗ (F) (x) is an index that expresses the degree of inclusion of all similar objects to x in the Fuzzy Set F. In the same fashion, IS ∗ (F) (x) expresses the degree of inclusion of at least one similar object to x in F. It may be noted that the mere presence of only one sample that is very similar to x but has low degree of membership in F will force IS∗ (F) (x) to be low and x will be considered to be excluded from S∗ (F). In the same way, only one sample that may be very similar to x but has a high degree of membership in F will cause a high IS ∗ (F) (x). If the cardinality of the dataset is high, this single sample may be the result of noise or an error in classi/cation. In this case, the values of membership calculated for IS ∗ (F) (x) and IS∗ (F) (x) may not be adequate for decision making. In order to deal with this limitation, we have developed the concept -precision aggregation operators. These aggregators allow for some tolerance to distorting values in the aggregation operands when the cardinality of the aggregated values is high. The properties associated to this concept were characterized in [13]. Its application to T-norms and T-conorms results in what we have called -precision quasi-T-norms and -precision quasi-T-conorms, whose formal de/nition can be found in Appendix A, as well as in [13]. Therefore, we propose to extend Ziarko’s Variable Precision Rough Set Model to the Dubois and Prade Fuzzy Rough Set framework by extending the maximum and minimum operators used to calculate the inclusion indexes of Eqs. (12) and (13) to their -precision counterparts, max and min : IS ∗ (F) (x) = max min(F (y); s (x; y));

(20)

IS∗ (F) (x) = min max(F (y); 1 − s (x; y)):

(21)

(y=x)

(y=x)

Dubois and Prade’s lower and upper approximation equations can then be expressed in the -precision context as S ∗ (F)  (x) = max(F (x); IS ∗ (F)  (x));

(22)

S∗ (F) (x) = min(F (x); IS∗ (F)  (x)):

(23)

644

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

In the practical implementation of these formulas, adequate values for  could be around 0.98 or 0.99, which allow for a 1–2% of noisy operands in the aggregation process. However, the optimal value of  will depend on the problem’s domain and the accuracy of the description of the attributes. The maximum value to which  can be set (which should determine the generalization capability of this approach) will also depend on these circumstances. (2) Approach based on fuzzy inclusions (Kuncheva and Bodjanova). Kuncheva [21] and Bodjanova [3] give new de/nitions of Fuzzy Rough Sets in their papers. Both approaches deal with the approximation of a fuzzy set in terms of a weak fuzzy partition, for which they use di.erent measures of fuzzy set inclusion, many of which have been studied in [7]. As to the generation of the weak fuzzy partition from a set of fuzzy data, Kuncheva does not impose any restrictions on how this is to be done: she implies that this can be resolved through fuzzy clustering or any other technique like, for example, the generation of fuzzy equivalence classes from a fuzzy similarity relation. Bodjanova, on the other hand, generates fuzzy partitions using unions and intersections of the fuzzy features measured in the data. This way of resolving a rough set analysis of fuzzy data through degrees of inclusions of fuzzy sets can also be considered to be an extension of Ziarko’s Variable Precision Rough Set Model. (3) Approach based on -levels of fuzzy sets (Yao and Nakamura). First Nakamura [24,25], and later on Yao [49] proposed a rough set analysis of fuzzy data through the application of crisp rough set theory to the -levels of a fuzzy similarity relation obtained from this data. The computational complexity of this approach increases with the level of resolution with which the -levels of the fuzzy similarity relation are formed. In practice, all three approaches can be applied to our type of data, as long as a fuzzy similarity relation or a fuzzy partition is constructed from it. Another question is how we de/ne our concept of similarity, and whether some kind of transitivity (T -transitivity) should be required or not from the fuzzy similarity relation to be constructed. If this is the case, then Eqs. (14) and (15) are good tools to deal with T -similarity relations. However, as our data is described by many di.erent features (and each one of these features may have a di.erent importance towards determining similarity), we would still have to face the additional problem of how these di.erent T -similarities can be aggregated without losing the T -transitivity of the aggregated similarity relation. 4. Construction of transitive fuzzy similarity relations through the aggregation of basic relations The last section pointed out that the implementation of any of the Fuzzy Rough Sets proposals requires the calculation, from the fuzzy data under study, of a fuzzy similarity relation or a partition of fuzzy similarity classes. As a fuzzy partition from a set of fuzzy data can be obtained using any clustering technique, we shall center our attention in the problem of constructing fuzzy similarity relations for the most general type of fuzzy data (Eq. (2)). Let us consider two samples of data given by this equation, xi and xj , each described by NF features which have been measured in a fuzzy partition with NLL components. We shall represent by Sijk the similarity that we measure between samples i and j using the information given by feature k. The /nal similarity of xi and xj , Sij , will be obtained by aggregating all of the Sijk : Sij = M(Sij1 ; : : : ; Sijk ; : : : ; SijNF ):

(24)

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

645

Therefore, our problem consists in establishing an appropriate way to calculate the Sijk and in choosing an adequate aggregation operator M(x) that is consistent with the theoretical conditions in which we want to de/ne the concept of similarity. The only restrictions to be imposed on M(x) are that it should be a monotonic operator, and that the boundness conditions M(0) = 0 and M(1) = 1 should be satis/ed. The construction of fuzzy similarity relations and the mathematical treatment of the similarity concept have already been studied from di.erent points of view, with many papers stressing the close relationship between the concept of similarity and of metric distance [2,17,20,46]. For example, many proposals have been presented in order to measure the similarity between fuzzy values using distance or pseudodistance functions [4,6,32,48]. One common characteristic of these proposals is that they are not concerned with the issue of transitivity [45]. If we follow Slowinski and Vanderpooten’s idea that re0exivity should be the only restriction imposed on the modeling of similarity, then the problem of constructing fuzzy similarity relations is not a diRcult one, and can be addressed by pro/ting from any of these approaches or from any other distance-like function that better re0ects the concept of similarity related to our application. This seems to be specially meaningful when prototype-based reasoning is being applied and we are interested in generating clusters of objects that are similar to certain prototypical samples. In these cases, the transitivity and symmetry properties do not have to be a necessary requirement. However, there may be other situations in which we may want to obtain homogeneous similarity classes or granules whose members satisfy symmetry and some kind of transitive property. From a practical point of view, symmetric and transitive classes or granules are easy to distinguish, which facilitates the clustering process. The knowledge extracted from a sample of these granules can also be applied in the same fashion to all the other granule members, depending on their degree of similarity. Accordingly, symmetry and transitivity in similarity relations can be considered to be desirable properties for knowledge extraction. The adoption of these requirements would also /t better in the spirit of granular computing [53]. For these reasons, in this paper we shall require that the similarities that can be calculated using only the information given by every feature k, Sijk , be in fact T -similarity relations for some T-norm. In order to do this, we must /rst face the problem of constructing transitive similarity relations from a set of data. Initially, this was done by calculating the transitive closure [50] of non-transitive fuzzy relations. However, when the number of samples is high this method is extremely demanding computationally. A more feasible approach, the construction of F-indistinguishability operators introduced by Valverde [47] is available. The application of this method in the framework of Fuzzy Rough Set theory is also suggested by Dubois and Prade [9,10]. In Valverde’s paper, T -similarity relations are denominated F-indistinguishability operators. He proposes a construction method for T -similarity relations when T is an archimedean T-norm (nonidempotent and continuous T-norm). This method is the subject of our following discussion, as we shall use it to construct the partial similarities, Sijk , based on every feature k. As it is well known in T-norm theory [38], any archimedean T-norm can be obtained from its additive generator using the following formula: T (x; y) = g[−1] (g(x) + g(y));

(25)

where g(x) is any continuous strictly decreasing function from [0; 1] into [0; ∞], such that g(1) = 0.

646

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

Here, g[−1] represents the pseudo-inverse of additive generator g and is obtained by g[−1] (x) = g−1 (min(x; g(0))):

(26)

Then, the quasi-inverse of T-norm T (x; y), T ∧ (x|y), is de/ned as T ∧ (x|y) = g[−1] (g(y) − g(x)):

(27)

Based on the quasi-inverse concept, a construction method for a T-similarity relation from a set of fuzzy data is given in Valverde’s representation theorem, which we shall present adapted to the notation used in our type of fuzzy data (Eq. (2)). If two samples xi and xj are considered, then feature k measured in both samples will be a vector of membership degrees to a set of NLL linguistic labels, xik = (ik1 ; : : : ; ikl ; : : : ; ikNLL ) and xjk = (jk1 ; : : : ; ikl ; : : : ; jkNLL ). According to Valverde, the T -similarity, based on feature k, between samples xi and xj can be constructed in the following way: Sijk (xi ; yj ) =

Inf

l∈[1;:::;NLL ]

T ∧ (Max(ikl ; jkl ) | Min(ikl ; jkl )):

(28)

If this equation is applied to some well known T-norms, the following results are obtained: • Minimum operator (T (x; y) = Min(x; y)):   Infk (Min(ikl ; jkl )) if Jijk = ∅; Sijk (xi ; yj ) = l∈Jij  1 otherwise: • Product operator (T (x; y) = xy):  kl kl  Inf (Min( ikl ; jkl )) if Jijk = ∅; j  i Sijk (xi ; yj ) = l∈Jijk  1 otherwise: • Lukasiewicz’s operator (T (x; y) = Max(x + y − 1; 0)): Sijk (xi ; yj ) = Inf (1 − |ikl − jkl |): l∈Jijk

• Yager’s family of operators (T (x; y) = 1 − min((1; ((1 − x)p + (1 − y)p )1=p ))): Sijk (xi ; yj ) = Inf (1 − |(ikl )p − (jkl )p |1=p ) l∈Jijk

Here, Jijk = {l ∈ [0; : : : ; NLL ] | ikl = jkl }. Looking at these expressions, we can see how an optimistic T-norm like the minimum produces a very pessimistic T -similarity operator, while a pessimistic conjunction like Lukasiewicz’s bounded sum generates a less strict similarity operator. This is made more obvious with the following example. The similarity of values (0.8, 0.2) and (0.8, 0.2) is, of course, 1 (using both similarity operators). However, a slight change in membership in one of the samples, like calculating the similarity of (0.8, 0.2) and (0.81, 0.19) will bring a similarity value of 0.19 using the minimum operator, while Lukasiewicz’s operator gives a result of 0.99, which seems to be closer to our concept of similarity.

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

647

This example reveals the special adequacy of Lukasiewicz’s conjunction for the construction of T -similarity operators with physical meaning. The importance of TL -similarity relations from physical and mathematical grounds was originally contended by Bezdek and Harris [2]. Sudkamp [44] also understands the special importance of Lukasiewicz’s T-norm and goes as far as to de/ne a fuzzy similarity relation as a Lukasiewicz similarity relation. Once we have dealt with the problem of constructing basic T-similarity relations, Sijk , we still have to aggregate them through an appropriate operator M(x). This operator does not have to be restricted to the domain of T-norms or T-conorms, as in many cases it has more physical sense to use a compensated aggregation function located between the minimum and maximum. Furthermore, we may also need to introduce importance weights for the di.erent features. If more complex feature interactions are to be modeled, then a more general approach using fuzzy measure-based aggregation can also be considered. In order to comply with our rationale of homogeneous, transitive similarity, at the time of choosing M(x) the following question must be addressed: what are the requirements that M(x) must meet so that the aggregation of T -similarity relations through Eq. (24) will result as well in a T -Similarity relation? Finding such pairs (M; T ) (or general conditions that (M; T ) should satisfy) has been a neglected problem in Fuzzy Systems theory. As far as we know, only Ovchinnikov [30,14] and Fonck et al. [15] have addressed this problem formally. Ovchinnikov has focused much of his research on the properties of fuzzy similarity relations and the characteristics of transitivity [27–29]. He calls the aggregation procedure given by Eq. (24) a well-de7ned aggregation procedure if the resulting relation is also T -transitive. Therefore, the problem at hand is that of /nding well-de/ned aggregation procedures. The results of Ovchinnikov and of Fonck et al. can be complemented with Sudkamp’s proposition [44] for the case of Lukasiewicz’s T-norm. In what follows, we compile these results in the form of theorems, which may be of use for the design of aggregation operators in the framework of Fuzzy Rough Sets theory. We present them in their most general form of T -transitive relations, although they can be perfectly applied to T -similarity relations. We also introduce a new result, which we feel can be of particular utility in this problem. The proofs of these theorems can be found in the corresponding papers, while that of our new theorem is given in Appendix B. Theorem 1 (Sudkamp [44]). The weighted average R of n TL -transitive relations, where TL stands for Lukasiewicz’s T-norm, R(x; y) =

n 

wi Ri (x; y);

i=1

with wi ¿0 and



wi = 1 is also a TL -transitive relation.

Theorem 2 (Fonck et al. [15]). Let f1 ; : : : ; fn be non-decreasing mappings from [0,1] into [0,1] and R1 ; : : : ; Rn min-transitive relations. Then, R(x; y) = min fj (Rj ) j=1:::n

is also min-transitive.

648

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

Theorem 3 (Fonck et al. [15]). Let R1 ; : : : ; Rn be T-transitive relations and f1 ; : : : ; fn be nondecreasing mappings from [0,1] into [0,1] such that fj (T (x; y)) ¿ T (fj (x); fj (y))

∀x; y ∈ [0; 1];

then R = Tj=1; :::; n fj (Rj ) is T-transitive. Theorem 4 (Ovchinnikov [30]). Let R1 ; : : : ; Rn be Tg -transitive relations, with Tg being a T-norm with additive generator g (see Eq. (25)). Then, R = Mg (R1 ; : : : ; Rn ); with Mg (x) being de7ned as  n  1 g(Ri (x; y)) Mg (x) = g−1 n i=1 is also a Tg -transitive relation. Ovchinnikov also points out that Mg de/ned in this way is actually a quasi-arithmetic mean [19]. Theorem 5 (Ovchinnikov [30]). Let R1 ; : : : ; Rn be T-transitive relations. Then, R = M(R1 ; : : : ; Rn ); with M(x) being the arithmetic mean: n 1  xi M(x) = n i=1

is T-transitive if and only if the surface given by  = T (; ) is convex. This can be considered as a generalization of Theorem 1 (Sudkamp), as Lukasiewicz’s T-norm is convex. Another di.erent generalization of Sudkamp’s theorem is given by Theorem 6, which we introduce in this paper. Theorem 6 (Fern'andez Salido and Murakami [13]). Let R1 ; : : : ; Rn be TL -transitive relations, with TL being Lukasiewicz’s T-norm. Then, R = M(R1 ; : : : ; Rn ); is TL -transitive if and only if the De Morgan’s dual of M(x), de7ned as N(x) = 1 − M(1 − x1 ; : : : ; 1 − xn ), satis7es the following condition: ∀x; y; z ∈ [0; 1]n | z = x + y; Proof in Appendix B.

N(z) 6 N(x) + N(y):

(29)

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

649

The importance of our new theorem lies in the fact that Eq. (29) is actually one of the conditions that a function must satisfy to be a metric norm. Therefore, as a corollary to our theorem, all AND aggregation operators whose corresponding OR dual are metric norms are TL -transitive. Note that the norm condition only has to be satis/ed inside the space [0; 1]n . As we mentioned, the relationships between similarity relations and metric spaces have been widely studied. In the same way, as Beliakov [1] shows, aggregations operations can also be interpreted in terms of distance, and some of them, like Dyckhorf’s [11] generalized means operator (M(x)) = ((x1p + · · · + xnp )=n)1=p have been derived directly from some type of distance function (in this case, Minskowski’s distance). The utility of these operators and their weighted counterparts for modelling compensatory aggregation situations that fall between the maximum and minimum categories has been proven in fuzzy decision making. Theorem 6 provides us, among other things, with the capability of using De Morgan’s dual of the generalized means (M(x)) = 1 − (((1 − x1 )p + · · · + (1 − xn )p )=n)1=p and its weighted version (M(x; w) = 1 − (((1 − w1 :x1 )p + · · · +  (1 − wn :xn )p )=( i wi ))1=p (p ∈ [1; ∝)) as a transitive similarity aggregator in the TL framework. This is a 0exible aggregator which can behave like a minimum or a mean average operator, depending on how parameter p is tuned. Its importance in Knowledge Engineering was already shown by Sanchez [37], based on the work of Salton [36]. The total similarity relation obtained through these aggregators can then be used in Eqs. (14) and (15) (using TL ) to calculate the lower and upper approximation of a fuzzy set. The application results in the next section will show how the introduction of importance weights in similarity aggregation, coupled with the tunability of these operators, can have a de/nite impact to improve the results of Fuzzy-Rough Sets analysis. 5. An application example As a test application, we have conducted a Rough Set analysis of the vehicle dataset which can be downloaded as one of the Stalog project databases from the UCI Machine Learning data repository. This dataset was donated by the Turing Institute, Glasgow, Scotland, to whom we acknowledge our appreciation for making it publicly available. There are 846 instances included in the dataset, each with 18 features extracted from 2D images, viewed from di.erent angles, of four types of vehicles: Cheverolet van (199 examples), a double decker bus (218 examples), Saab 9000 (217 examples), and an Opel Manta 400 (212 examples). The features measured in the images are a set of geometrical and statistical characteristics (compactness, circularity, kurtosis, etc), whose description and formulation can be found in the documentation that accompanies the dataset. Before conducting any analysis on them, the value of every measured feature, Fi has been represented as a normalized feature value, Fi ∗ , where Fi∗ =

Fi − i : i

Here, i represents the average value of feature Fi measured in the whole feature set. After normalization, we have fuzzi/ed every feature and transformed it into a set of membership degrees to the attributes represented by the fuzzy partition of Fig. 1(a). As it can be seen, the parameters that de/ne these membership functions take a value that depends on (i , the standard

650

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

Largely Decreased

Decreased

Normal 1.0

Largely Increased

Increased

-0.9σ j -0.6σ j -0.3σ j 0.00.3σ j 0.6σ j 0.9σ j Normalized Value for Feature j

(a)

Largely Decreased

Decreased

Normal

Increased

Largely Increased

1.0

-0.75σ j (b)

-0.15σ j 0.0 0.15σ j

0.75σ j

Normalized Value for Feature j

Fig. 1. Fuzzy partition (a) and crisp partition (b) used to express every feature’s attributes.

deviation of normalized feature i in the dataset. If the dataset is to be analyzed using standard (crisp) Rough Sets theory, a corresponding crisp partition, as shown in Fig. 1(b), shall be used. Our purpose with this application study is to compare the behavior of di.erent Rough Set analysis techniques on the analysis this set of data. With this, we intend to evaluate the theoretical contributions proposed in this paper. In particular, we shall focus our study on the following techniques: (1) Crisp Technique 1: Original Rough Sets proposal by Pawlak (Eqs. (3) and (4)). (2) Fuzzy Technique 1: Fuzzy Rough Sets proposal by Dubois and Prade (Eqs. (12) and (13)). All the similarity relations for the features under consideration are aggregated using the minimum operator. (3) Fuzzy Technique 2: Fuzzy Rough Sets proposal by Dubois and Prade (Eqs. (12) and (13)). All the similarity relations for the features under consideration are aggregated using De Morgan’s dual of the Weighted Generalized Means operator. (4) Fuzzy Technique 3: Fuzzy Rough Sets proposal by Dubois and Prade with the application of -precision maximum and minimum operators (Eqs. (22) and (23)). All the similarity relations for the features under consideration are aggregated using De Morgan’s dual of the Weighted Generalized Means operator. Our objective with this problem will not be so much the formation of a set of rules with the ability to classify our complete dataset but, rather, the comparison of the characteristics of the di.erent granules that are formed by these techniques. To justify this objective, let us remember that Rough

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

651

Sets analysis is usually applied in classi/cation problems with the following purposes: • Extracting a set of certain classi/cation rules by examining the granules that constitute the lower approximation of a partition of sets. Here, every granule can be considered to represent a rule. • Extracting a set of possible classi/cation rules by examining the granules that constitute the upper approximation of a partition of sets. Again, every granule represents a rule. • Reducing the number of considered features to an optimum set. An optimum feature set could be considered to be a reduced feature set whose use does not imply that the cardinality of the lower approximation of the classi/cation partition would decrease too much or that the cardinality of this partition’s upper approximation would increase excessively. In this last item, the concern with the cardinalities of the lower and upper approximation of a set results from the fact that, when fewer features are considered, the granules that constitute a set’s approximations will become larger, forming a less re/ned partition. For example, the classi/cation rules derived from the granules of the lower approximation of a set will adopt a more simple form—there are less granules—but, at the same time, as these granules become larger, it will be more diRcult to /t them in the lower approximation of a set. The cardinality of this lower approximation would decrease, with fewer samples belonging with certainty to it. An opposite e.ect would be experimented by the cardinality of the upper approximation of a set. Both these consequences can be measured numerically as a decrease in the quality or accuracy of the classi/cation (Eqs. (5) and (6)). In our comparative analysis, our main interest will be to reduce our original number of 18 features to a maximum of six features. At the same time, we want to produce granules in the lower approximation of the four class sets to be classi/ed (Double Decker Bus, Cheverolet, Van, Saab 9000) that contain a large number of samples. This will mean that the rules derived from these granules will be able to classify with certainty a relatively large number of samples. They can be added to the Knowledge Base of a Classi/cation System without generating contradictions when they are applied to this database. In order to carry out this search, we have pro/ted from the optimization power of genetic algorithms (GA). In the conducted experiments, a population of 101 chromosomes has been let to evolve for 50 generations using a standard GA implementation, with mutation rate of 0.0077 and crossover rate of 0.77. The possible consideration of a feature is codi/ed as a bit in the basic chromosome composed of 18 bits. As evaluation function, the total number of elements in large granules of the lower approximations for the four class sets is taken. A granule is considered to be large if its cardinality exceeds a number of eight members. Furthermore, a maximum number of six features is included in every valid chromosome. This has been implemented by introducing a large penalizing factor in the evaluation of all chromosome whose number of features exceed this threshold. Under these circumstances, the following results have been measured: Crisp Technique 1: In this case, Pawlak’s original proposal has been used to calculate the lower approximation of the four classes that compose our dataset. The crisp partition of Fig. 1(b) was employed to determine the attributes taken by each sample’s features, through which the indiscernibility relation matrix of the complete sample set is calculated. As explained, a GA search algorithm was applied to locate the optimum features whose application would result in large granules containing more than eight samples.

652

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660 Table 1 Crisp Technique 1: large granules in the four classes’ lower approximation set

Number of large granules Card. of all large granules Card. of the largest granule

C1 (Van)

C2 (Bus)

C3 (Saab)

C4 (Opel)

Total

4 55 19

3 86 38

0 0 0

0 0 0

7 141 38

Table 2 Fuzzy Technique 1: large granules in the four classes’ lower approximation set C1 (Van) Number of large granules Card. of all large granules Card. of the largest granule

7 106.63 25.69

C2 (Bus) 5 71.20 20.91

C3 (Saab)

C4 (Opel)

Total

0 0 0

0 0 0

12 177.83 25.69

Table 1 shows the results of this search. In this case, the number of features was reduced by the GA method to 4. Using these features, only 55 samples were included in large granules of Class 1’s lower approximation set, while 38 samples were included in large granules of Class 2’s lower approximation set. None of the other classes had any large granules in their lower approximation set. Therefore, the representational ability of the rules extracted from such granules would be 19.09% (38 samples out of 199) for Class 1 (four granules, or four rules), and 25.22% (55 samples out of 218) for Class 2 (three granules, or three rules). With this feature set, no rules could be extracted that could represent large numbers of Classes 3 and 4. Although the representational power of the extracted rules is not great, it must be stressed that these are certain rules, that produce no contradictions in the classi/cation of the dataset. Therefore, they can be added to any other learning scheme more apt to handling noisy data without risking a decrease in classi/cation accuracy. Fuzzy Technique 1: In this case, the measured features in all samples have been fuzzi/ed using the fuzzy partition of Fig. 1(a), and TL -similarity relations in the dataset have been calculated for all 18 features. As always, TL stands for Lukasiewicz’s T-norm. The similarity relation for this T-norm has been calculated through Valverde’s proposal (Section 4). Finally, these similarity relations have been aggregated through the minimum operator. As in the crisp case, a GA search has been performed to look a set of reduced features that would result in large granules of the classi/cation partition’s lower approximation set. In order to calculate the cardinalities for the formed granule, a clustering process was carried out. For a sample to be considered included in a cluster or granule, a threshold value of 0.2 was imposed in its similarity value to the cluster center, as well as in its degree of membership to the lower approximation set. Once these granules were formed, the cardinality of every fuzzy granule was calculated using Zadeh’s )-Count [52]. Under these conditions, the GA optimization method reduced the number of considered features to 5. Looking at the /nal results presented in Table 2, the representational ability of the rules

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

653

Table 3 Fuzzy Technique 2: large granules in the four classes’ lower approximation set C1 (Van) Number of large granules Card. of all large granules Card. of the largest granule

4

C2 (Bus) 4

C3 (Saab) 4

C4 (Opel) 3

Total 15

89.83

108.16

68.81

78.45

345.25

26.31

43.02

21.70

34.48

43.02

extracted from the granules is 53.58% (106.63 samples out of 199) for Class 1 (seven granules, or seven rules), and 32.66% (71.20 samples out of 218) for Class 2 (/ve granules, or /ve rules). Again, no large granules could be obtained for Classes 3 and 4. Compared to the crisp case, the representational power of the extracted rules has increased. One reason for this is the fact that fuzzi/cation alleviates to some extent the e.ect on classi/cation contradictions that the discretization in a partition of the features’ attributes (Fig. 1(b)) can have. However, the overlapping in some samples of membership to di.erent fuzzy granules must also be accounted to have an e.ect in the increase of these clusters cardinalities. Fuzzy Technique 2: This Fuzzy Rough Sets analysis has been carried out in the same conditions of Fuzzy Technique 1, except for the fact that the dual of Pedrycz’s Generalized means has been used as an AND-like operator to aggregate the similarity relations that were calculated for every feature. For this, the values for the weights and p parameter included in this operator have also been codi/ed in every chromosome (/ve bits for every weight and eight bits for the p parameter). Using a tunable weighted aggregation operator makes the aggregation process of similarity relations much more 0exible when searching for the feature set that would produce the largest granules. Moreover, as, according to our proposed Theorem 6 (Section 4), these aggregated relations are TL -transitive, the clustering process necessary to analyze the formed granules is greatly facilitated. Table 3 shows how this additional 0exibility in the aggregation of similarity relations has resulted in the formation of a greater number of large granules. Now, using six features, the large granules of Class 1 have a representational ability of 45.14%. For Class 2, it would be 49.61%. Furthermore, rules to classify Classes 3 and 4 have also been produced, with 31.80% and 37% respresentational ability. The improvement obtained by using this aggregator, compared to the use of the minimum operator in Fuzzy Technique 1 is notable. Fuzzy Technique 3: Here, the conditions of Fuzzy Technique 2 are relaxed one step further with the introduction of -precision operators in Dubois and Prade’s Fuzzy Rough Sets analysis. Eqs. (22) and (23) have been applied for the calculations of the lower and upper approximations of our classi/cation partition. Setting ’s value to 0.98, the precision with which these approximations are computed is relaxed. This results, as can be appreciated in Table 4, in a signi/cant increase of the cardinalities of large lower approximation set granules, in comparison to Fuzzy Technique 2. In particular, using six features, the representational ability for the four classes has raised to 62.23%, 65.9%, 62.69% and 65.76%. This signi/cant improvement comes with the price that these granules

654

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660 Table 4 Fuzzy Technique 3: large granules in the four classes’ lower approximation set C1 (Van) Number of large granules Card. of all large granules Card. of the largest granule

C2 (Bus)

C3 (Saab)

C4 (Opel)

Total

3

5

3

4

15

123.85

143.67

136.05

139.42

543.03

62.71

41.86

77.34

67.98

77.34

do not represent classi/cation rules with 100% of certainty. The application of these rules is expected to produce 2% of classi/cation errors. With these experimental results, we hope to have provided support to our conclusion that the selection of adequate aggregators for fuzzy similarity relations that maintain some type of transitivity can greatly improve the results of rough sets analysis of fuzzy data. At the same time, the introduction of -precision aggregations can be an optimal way to relax the strictness with which lower and upper approximation sets are calculated, along the lines of the Variable Precision Rough Sets approach. 6. Conclusions In this paper, we have presented and compiled the tools and theories that are necessary for a rough sets analysis of a general type of fuzzy data. In order to deal with the practical circumstances of our problem, we have focused our novel contributions in two areas: First, we have developed the concept of -precision aggregation, which we have used to introduce Ziarko’s Variable Precision Rough Sets approach in the fuzzy domain. The experimental analysis that we have conducted has strengthened our intuition that variable precision is a necessary consideration for Rough Sets analysis of large databases. We have also realized the importance to construct homogeneous granules in which re0exivity, symmetry and a certain transitivity property are satis/ed. For this, and as suggested in Dubois and Prade’s papers [9,10], we have considered the application of T -similarity relations in the Fuzzy Rough Sets framework. We have presented the studies of other authors on the aggregation of these relations and have also developed a new theorem that allows us to use the dual of the generalized means as an aggregation operator in this context. Through the application example on the Turing Institute’s vehicle database, we believe to have shown the special suitability of both proposals in this /eld. We also hope that this paper will serve to stimulate new research on the problem of aggregating T -transitive relations. We feel that this issue has been particularly ignored, even though it can be of great importance in many applications in which the similarity concept plays an important role. Appendix A. De,nition of ÿ-precision quasi T-norms and ÿ-precision quasi T-conorms In this appendix, the de/nitions of -precision quasi-T-norms and T-conorms are given, which are used in our suggested extension of Dubois and Prade’s Fuzzy Rough Sets proposal to Ziarko’s

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

655

Variable Precision Rough Set Model (Section 3). A more complete examination of the concept of -precision aggregation can be found in [13]. The terms quasi-T-norm quasi-T-conorm were adopted to express the idea that, for high values of  and=or low cardinalities, these operators behave like standard T-norms and T-conorms. However, as can be deduced from the following de/nitions, when the cardinality of the application increases or a lower value for  is set, only monotonicity and the basic boundness conditions (T (0; : : : ; 0) = 0, T (1; : : : ; 1) = 1) are satis/ed. De,nition A.1 (-precision quasi-T-norm): Let T be a T-norm operator, T : I 2 → I , which, applying the associative property, can be extended to the N -dimensional case, x, T : I N → I . Its corresponding -precision quasi-T-norm, T (with  ∈ [0; 1]), shall be a mapping, T : I N → I , such that, ∀x = (x1 ; x2 ; : : : ; xN ) ∈ I N , expressed in descending order (∀i; j ∈ [1; : : : ; N ]; (i¿j) ⇒ (xi 6 xj )), T (x) = T (x1 ; : : : ; xn ) with n = max{k ∈ [0; 1; 2; : : : ; N ] : k 6 )1N xi (1 − )}: k

De,nition A.2 (-precision quasi-T-conorm): Let S be a T-conorm operator, S : I 2 → I , which, applying the associative property, can be extended to the N -dimensional case, S : I N → I . Its corresponding -precision quasi-T-conorm, S (with  ∈ [0; 1]), shall be a mapping, S : I N → I , such that, ∀x = (x1 ; x2 ; : : : ; xN ) ∈ I N , expressed in ascending order (∀i; j ∈ [1; : : : ; N ]; (i6j) ⇒ (xi 6 xj )), S (x) = S(x1 ; : : : ; xn ) with n = max{k ∈ [0; 1; 2; : : : ; N ] : k 6 )1N (1 − xi )(1 − )}: k

Although these are the default de/nitions, in [13], we also showed how other adequate expressions for this concept could be de/ned in the same fashion. For this reason, we call the operator de/ned by the formula above Type I -precision quasi-T-norms. For instance, another less conservative de/nition, Type II quasi-T-norms and quasi-T-conorms, would be obtained if the number of discarded elements were calculated in the more simple form: n = max {k ∈ [0; 1; 2; : : : ; N ] : k 6 N (1 − )}: k

Appendix B. Proof of Theorem 6 Proof. Let R1 ; R2 ; : : : ; Rn be n TL -transitive relations, and let M(x) be a monotonic aggregation operator and N be its De Morgan dual, N(x) = 1 − M(1 − x1 ; : : : ; 1 − xn ).

656

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

First, we shall prove that if N(x) satis/es the norm condition: ∀x; y; z ∈ [0; 1]n | z = x + y;

N(z) 6 N(x) + N(y);

then M(R1 ; : : : ; Rn ) is a TL -transitive relation. Let us suppose that N(x) satis/es the norm condition. Then, in order to show the TL -transitivity of M(R1 ; : : : ; Rn ), we have to demonstrate the following proposition: ∀x; y; z ∈ [0; 1]n ; M(R1 (x; z); : : : ; Rn (x; z)) ¿ TL (M(R1 (x; y); : : : ; Rn (x; y))); M(R1 (y; z); : : : ; Rn (y; z)): For simplicity, we shall adopt the following symbols: a1 = R1 (x; z); : : : ;

an = Rn (x; z);

b1 = R1 (x; y); : : : ;

bn = Rn (x; y);

c1 = R1 (y; z); : : : ;

cn = Rn (y; z):

Using the new notation, the proposition to be proved becomes ∀x; y; z ∈ [0; 1]n ; M(a1 ; : : : ; an ) ¿ TL (M(b1 ; : : : ; bn ); M(c1 ; : : : ; cn )): Substituting TL by its actual formula, ∀x; y; z ∈ [0; 1]n ; M(a1 ; : : : ; an ) ¿ Max(M(b1 ; : : : ; bn ) + M(c1 ; : : : ; cn ) − 1; 0) is obtained. M(a1 ; : : : ; an ) is always ¿ 0. Therefore, ∀x; y; z ∈ [0; 1]n ; M(a1 ; : : : ; an ) ¿ (M((b1 ; : : : ; bn ) + M(c1 ; : : : ; cn )) − 1) has to be proved. As M(x) = 1 − N(1 − x1 ; : : : ; 1 − xn ), this expression becomes ∀x; y; z ∈ [0; 1]n ; 1 − N(1 − a1 ; : : : ; 1 − an ) ¿ 1 − N(1 − b1 ; : : : ; 1 − bn ) + 1 − N(1 − c1 ; : : : ; 1 − cn ) − 1; which can be transformed to ∀x; y; z ∈ [0; 1]n ; N(1 − a1 ; : : : ; 1 − an ) 6 N(1 − b1 ; : : : ; 1 − bn ) + N(1 − c1 ; : : : ; 1 − cn ):

(B.1)

As it has been assumed that N(x) satis/es the norm condition, then the following statement is always true: ∀x; y; z ∈ [0; 1]n ; N(2 − b1 − c1 ; : : : ; 2 − bn − cn ) 6 N(1 − b1 ; : : : ; 1 − bn ) + N(1 − c1 ; : : : ; 1 − cn ):

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

657

Therefore, Eq. (B.1) will be satis/ed if ∀x; y; z ∈ [0; 1]n ; N(1 − a1 ; : : : ; 1 − an ) 6 N(2 − b1 − c1 ; : : : ; 2 − bn − cn ) is proved. As N(x) is a monotonic operator, this statement is true if the following can be demonstrated for all x; y; z ∈ [0; 1]n : 1 − a 1 6 2 − b 1 − c1 ··· 1 − a n 6 2 − b n − cn ; which can be transformed to a 1 ¿ b1 + c 1 − 1 ··· an ¿ bn + cn − 1: If we substitute back these symbols with their original form, these expressions become R1 (x; z) ¿ R1 (x; y) + R1 (y; z) − 1 ··· Rn (x; z) ¿ Rn (x; y) + Rn (y; z) − 1; which will always be true, as all the R1 are TL -transitive. Therefore, our proposition has been demonstrated. We shall now prove that M(R1 ; : : : ; Rn ) being a TL -transitive relation implies that ∀x; y; z; ∈ [0; 1]n | z = x + y; N(z) 6 N(x) + N(y). We shall demonstrate this by proof by contradiction. Let us /rst suppose that, although the relation M(R1 ; : : : ; Rn ) is TL -transitive, there are two particular vectors a0 =(a10 ; : : : ; an0 ) and b0 =(b10 ; : : : ; bn0 ), such that a0 + b0 ∈ [0; 1]n and N(a0 ) + N(b0 ) ¡ N(a0 + b0 ): to

Introducing De Morgan’s law, N(x) = 1 − M(1 − x1 ; : : : ; 1 − xn ), this expression is transformed M(1 − a10 ; : : : ; 1 − an0 ) + M(1 − b10 ; : : : ; 1 − bn0 ) − 1 ¿ M(1 − a1 − b10 ; : : : ; 1 − an0 − bn0 );

or, what is equivalent, TL (M(1 − a10 ; : : : ; 1 − an0 ); M(1 − b10 ; : : : ; 1 − bn0 )) ¿ M(1 − a10 − b10 ; : : : ; 1 − an0 − bn0 ):

(B.2)

658

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

Then, as a0 +b0 ∈ [0; 1]n , a vector d0 = (d10 ; : : : ; dn0 ), can be de/ned such that the following conditions are satis/ed: d 10 + a 10 + b 1 0 6 1 ··· dn0 + an0 + bn0 6 1: Two other vectors e0 = (e10 ; : : : ; en0 ) and f0 = (f10 ; : : : ; fn0 ) can also be de/ned in the following way e1 0 = d 1 0 + a 1 0 ··· e n 0 = dn 0 + a n 0

f 1 0 = d1 0 + a 1 0 + b 1 0 ··· fn0 = dn0 + an0 + bn0 :

It is easy to see that d0 ; e0 ; f0 ∈ [0; 1]n . Looking back at Valverde’s representation theorem for indistinguishability operators, we recall that the TL -similarity relation for two fuzzy values u and v (u; v ∈ [0; 1]) can be calculated as RL (u; v) = 1 − |u − v|: Introducing this concept with our prede/ned d0 ; e0 ; f0 , in Eq. (B.2), the expression TL (M(R1 (d0 ; e0 ); : : : ; Rn (d0 ; e0 )); M(R1 (e0 ; f0 ); : : : ; Rn (e0 ; f0 ))) ¿ M(R1 (d0 ; f0 ); : : : ; Rn (d0 ; f0 )) is obtained. Therefore, we have found three vectors, d0 , e0 and f0 , for which the TL -transitivity condition of M(R1 ; : : : ; Rn ) is not satis/ed, which contradicts our initial hypothesis. References [1] G. Beliakov, De/nition of general aggregation operators through similarity relations, Fuzzy Sets and Systems 114 (2000) 437–453. [2] J.C. Bezdek, J.O. Harris, Fuzzy partitions and relations: an axiomatic basis for clustering, Fuzzy Sets and Systems 84 (1996) 143–153. [3] S. Bodjanova, Approximation of fuzzy concepts in decision making, Fuzzy Sets and Systems 85 (1997) 23–29. [4] B. Bouchon-Meunier, M. Rifqi, S. Bothorel, Towards general measures of comparison of objects, Fuzzy Sets and Systems 51 (1992) 147–153. [5] G. Cattaneo, Fuzzy extension of rough set theory, in: L. Polkowski, A. Skowron (Eds.), Proc. 1st Int. Conf. on Rough Sets and Current Trends in Computing, RSTC’98, Warsaw, Poland, 1998, pp. 275–282. [6] S.M. Chen, M.-S. Yeh, P.-Y. Hsiao, A comparison of similarity measures of fuzzy values, Fuzzy Sets and Systems 72 (1995) 79–89. [7] D. Dubois, H. Prade, Fuzzy Sets and Systems: Theory and Applications, Academic Press, New York, 1980. [8] D. Dubois, H. Prade, Possibility Theory, Plenum Press, New York, 1988. [9] D. Dubois, H. Prade, Rough fuzzy sets and fuzzy rough sets, Internat. J. Gen. Systems 17 (1990) 191–209. [10] D. Dubois, H. Prade, Putting rough sets and fuzzy sets together, in: R. Slowinski (Ed.), Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory, Kluwer, Dordrecht, The Netherlands, 1992, pp. 203–222. [11] H. Dyckho., W. Pedrycz, Generalized means as model of compensative connectives, Fuzzy Sets and Systems 14 (1984) 143–154.

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

659

[12] L. Fari˜nas del Cerro, H. Prade, Rough sets, twofold fuzzy sets and modal logic-Fuzziness in indiscernibility and partial ˝ information, in: A. di Nola, A.G.S. Ventre (Eds.), The Mathematics of Fuzzy Systems, Verlag TUV, Rheinland, K˝oln, Germany, 1986, pp. 103–120. [13] J.M. Fern'andez Salido, S. Murakami, On -precision aggregation, Fuzzy Sets and Systems 139 (2003), this issue. [14] J.C. Fodor, S. Ovchinnikov, On aggregation of T-transitive fuzzy binary relations, Fuzzy Sets and Systems 72 (1995) 135–145. [15] P. Fonck, J. Fodor, M. Roubens, An application of aggregation procedures to the de/nition of measures of similarities between fuzzy sets, Fuzzy Sets and Systems 97 (1998) 67–74. [16] S. Greco, B. Matarazzo, R. Slowinski, Fuzzy similarity relations as a basis for rough approximations, in: L. Polkowski, A. Skowron (Eds.), Proc. 1st Int. Conf. on Rough Sets and Current Trends in Computing, RSTC’98, Warsaw, Poland, 1998, pp. 283–289. [17] J. Jacas, L. Valverde, On fuzzy relations, metrics and cluster analysis, in: M. Delgado, J.L Verdegay ˝ (Eds.), Approximate Reasoning Tools for Arti/cial Intelligence, Verlag TUV, Rheinland, K˝oln, Germany, 1990, pp. 21–38. [18] J.D. Katzberg, W. Ziarko, Variable precision extension of rough sets, Fund. Inform. 27 (1996) 155–168. [19] A.N. Kolmogorov, Sur la notion de la moyenne, Rend. Accad. Lin. 12 (1930) 388–391. [20] C.L. Krumhansl, Concerning the applicability of geometric models to similarity data: the interrelationship between similarity and spatial density, Psychol. Rev. 85 (5) (1985) 445–463. [21] L.I. Kuncheva, Fuzzy rough sets: application to feature extraction, Fuzzy Sets and Systems 51 (1992) 147–153. [22] T.Y. Lin, Topological and fuzzy rough sets, in: R. Slowinski (Ed.), Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory, Kluwer, Dordrecht, The Netherlands, 1992, pp. 287–304. [23] N.N. Morsi, M.M. Yakout, Axiomatics for fuzzy rough sets, Fuzzy Sets and Systems 100 (1998) 327–342. [24] A. Nakamura, Application of fuzzy-rough classi/cations to logics, in: R. Slowinski (Ed.), Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory, Kluwer, Dordrecht, The Netherlands, 1992, pp. 233–250. [25] A. Nakamura, J.M. Gao, A logic for fuzzy data analysis, Fuzzy Sets and Systems 39 (1992) 127–132. [26] S. Nanda, S. Majumdar, Fuzzy rough sets, Fuzzy Sets and Systems 45 (1992) 157–160. [27] S. Ovchinnikov, Structure of fuzzy binary relations, Fuzzy Sets and Systems 6 (1981) 169–195. [28] S. Ovchinnikov, On the transitivity property, Fuzzy Sets and Systems 20 (1986) 241–243. [29] S. Ovchinnikov, Similarity relations, fuzzy partitions, and fuzzy orderings, Fuzzy Sets and Systems 40 (1991) 107–126. [30] S. Ovchinnikov, Aggregating transitive fuzzy binary relations, Internat. J. Uncertainty, Fuzziness Knowledge-based Systems 3 (1) (1992) 47–55. [31] S.K. Pal, A. Skowron (Eds.), Rough Fuzzy Hybridization: A New Trend in Decision-Making, Springer, Singapore, 1999. [32] C.P. Pappis, N.I. Karacapilidis, A comparative assessment of measures of similarities of fuzzy values, Fuzzy Sets and Systems 56 (1993) 171–174. [33] Z. Pawlak, Rough sets and fuzzy sets, Fuzzy Sets and Systems 17 (1985) 99–102. [34] Z. Pawlak, Rough Sets, Theoretical Aspects of Reasoning about Data, Kluwer, Dordrecht, The Netherlands, 1991. [35] A.M. Radzikowska, E.E. Kerre, A comparative study of fuzzy rough sets, Fuzzy Sets and Systems 126 (2002) 137–155. [36] G. Salton, E.A. Fox, H. Wu, Extended Boolean Information Retrieval, Commun. ACM 26 (1983) 1022–1036. [37] E. Sanchez, Importance in knowledge systems, Inform. Systems 14 (6) (1989) 455–464. [38] B. Schweizer, A. Sklar, Probabilistic Metric Spaces, Elsevier Science Publishing Company, New York, 1983. [39] G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, Princeton, 1976. [40] K. Slowinski, R. Slowinski, Sensitivity analysis of rough classi/cation, Internat. J. Man-Mach. Studies 32 (1990) 693–705. [41] R. Slowinski, J. Stefanowski, Rough-set reasoning about uncertain data, Fund. Inform. 27 (1996) 229–243. [42] R. Slowinski, D. Vanderpooten, Similarity relations as a basis for rough approximations, in: P.P. Wang (Ed.), Advances in Machine Intelligence and Soft Computing, Bookwrights, Raleigh, NC, 1997, pp. 17–33. [43] R. Slowinski, D. Vanderpooten, A generalized de/nition of rough approximation based on similarity, IEEE Trans. Knowledge Data Eng. 12 (2000) 331–336.

660

J.M. Fern)andez Salido, S. Murakami / Fuzzy Sets and Systems 139 (2003) 635 – 660

[44] T. Sudkamp, Similarity, interpolation, and fuzzy rule construction, Fuzzy Sets and Systems 58 (1993) 73–86. [45] Y.A. Tolias, S.M. Panas, L.H. Tsoukalas, Generalized fuzzy indices for similarity matching, Fuzzy Sets and Systems 120 (2001) 255–277. [46] A. Tversky, Features of similarity, Psychol. Rev. 84 (1977) 327–352. [47] L. Valverde, On the structure of F-indistinguishability operators, Fuzzy Sets and Systems 17 (1985) 313–328. [48] W-J Wang, New similarity measures on fuzzy sets and on elements, Fuzzy Sets and Systems 85 (1997) 305–309. [49] Y. Yao, Combination of rough and fuzzy sets based on -level sets, in: T.Y. Lin, N. Cercone (Eds.), Rough Sets and Data Mining, Kluwer, Dordrecht, The Netherlands, 1997, pp. 301–321. [50] L.A. Zadeh, Similarity relations and fuzzy orderings, Inform. Sci. 3 (1971) 177–200. [51] L.A. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets and Systems 1 (1978) 3–28. [52] L.A. Zadeh, A computational approach to fuzzy quanti/ers in natural languages, Comput. Math. 9 (149) (1983) 149–184. [53] L.A. Zadeh, Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic, Fuzzy Sets and Systems 90 (2) (1997) 111–127. [54] W. Ziarko, Variable precision rough set model, J. Comput. System Sci. 46 (1993) 39–59.