Available online at www.sciencedirect.com
European Journal of Operational Research 193 (2009) 520–529 www.elsevier.com/locate/ejor
Computing, Artificial Intelligence and Information Management
Statistically grounded logic operators in fuzzy sets W. Pedrycz * Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada T6R 2G7 Warsaw School of Information Technology, Newelska 6, Warsaw, Poland Received 22 September 2006; accepted 5 December 2007 Available online 29 January 2008
Abstract In this study, we are concerned with a category of logic operators for fuzzy sets that is inherently associated with underlying statistical properties of membership grades they operate on. These constructs are referred to as statistically grounded logic operators, namely statistically grounded ORs (SORs, for short) and statistically grounded ANDs (SANDs, for short) operators. In essence, they arise as a solution to the optimization problem of the following form: N X arg minm2½0;1 wðxk Þjxk mj; k¼1
where {xk}, k = 1, 2, . . ., N are the corresponding membership grades to be combined. The weight function w: [0, 1] ? [0, 1] captures the nature of the logic and and or aggregation, respectively. We show that the weight function could be induced by t-norms and t-conorms. The weight functions could be also implied by some statistical characteristics of data (membership grades). The choice of the t-norm (t-conorm) depends upon the predefined form of the logic operators to be developed. We demonstrate that SANDs and SORs offer an efficient operational framework for constructing fuzzy rough sets; lead to the increased sensitivity of computing possibility and necessity measures, bring a new insight into fuzzy relational equations and deliver an interpretation vehicle for fuzzy clustering (that is provided in the form of dependency analysis). Ó 2007 Elsevier B.V. All rights reserved. Keywords: Statistically grounded logic operators; Triangular norms; Statistical evidence; Median; Aggregation; Possibility and necessity measure; Fuzzy rough sets; Fuzzy clustering; Dependency analysis; Inter cluster relationships
1. Introductory notes There is a wealth of various formal models of logic connectives in fuzzy sets. Alluding to the semantics of fuzzy sets and their fundamental logic operators, one can refer to such evident accomplishments in the area as t-norms and t-conorms [11–13,24], compensative operators [33], aggregative operators [2,4,7,25,27], ordered weighted operators, OWA [26], uninorms [8,28], and nullnorms [3]. Each of these categories provides additional functionality and in this way offers a highly desirable flexibility to cope with the existing diversity of problems in which fuzzy sets are used. There *
Present address: Warsaw School of Information Technology, Newelska 6, Warsaw, Poland. E-mail address:
[email protected] 0377-2217/$ - see front matter Ó 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.ejor.2007.12.009
has been a long way we moved from the introduction of original lattice (min and max) operators on fuzzy sets. In spite of the progress being witnessed in the area, all the pursuits have been predominantly (if not exclusively) motivated by algebraic and logical underpinnings. Surprisingly, not the same amount of attention has been paid to the properties of logic operators and their developments pertinent to handling of numeric experimental data (and membership grades), cf. [10,33]. These issues are crucial given the need for fostering more advanced and effective techniques of fuzzy modeling. It is needless to say that further advancements in the development of fuzzy systems along with their further applications have posed significant modeling challenges both at the conceptual as well as the optimization end. To address them, there is a definite need for more advanced and computationally plausible logic operators.
W. Pedrycz / European Journal of Operational Research 193 (2009) 520–529
In particular, parameterized versions of logic operators are of interest as they bring to system modeling the highly desirable flexibility that becomes a genuine necessity when dealing with experimental data. In spite of the number of accomplishments in the realm of the fabric of the logic operators, there are still open questions that deserve careful attention. This concerns issues dealing with a non-pointwise (localized) nature of fuzzy set connectives, cf. [17] and a carefully organized mechanism of incorporation of statistical evidence into logic operators. The problem we are interested in this study relates to the following scenario. Let us consider a finite collection of membership grades, denoted here by x1, x2, . . ., xN2 [0, 1] that are subject to some and or or aggregation. If the number of arguments (N) involved in either of this aggregation increases, then the result of the operations x1 and x2 . . . and xN and x1 or x2 . . . or xN with the and and or operators implemented with the use of Archimedean t-norms and tconorms, respectively, quite quickly tend to zero or one, respectively. The only exception is the idempotent minimum and maximum operators. Nevertheless they come with a certain drawback such as a complete lack of interactivity; no matter how the membership grades are distributed, we exclusively rely on the maximal and minimal values in the entire set and completely ignore the remaining components. Their distribution, which could be quite crucial, is neglected in the construct of the logic operator. We encounter a number of constructs in which there is a significant number of arguments involved in the computing with the use of fuzzy sets. Here we offer just a few of highly representative examples: in system modeling, we consider architectures with a significant number of inputs being membership grades associated with the corresponding fuzzy sets. For instance, in multivariable fuzzy rule-based systems, we encounter rules of the form – if input1 is A1 and input2 is B1 and . . . and inputn is W1 then conclusion is Z. where A1, B1, etc. are fuzzy sets defined in the corresponding spaces. The activation level of such rule is typically taken as an and aggregation of the activation levels (membership grades) being caused by the given vector of inputs. Definitely, if the dimensionality of the problem (number of inputs increases) then the activation level of any rule (no matter how close to 1 these activation levels are) tends quite rapidly to zero. Consider two fuzzy sets A = [a1a2. . .aN] and B = [b1b2. . .bN] defined in a certain finite space X where card (X) = N. The generalized possibility measure of A and B, Poss(A, B), is defined as follows: PossðA; BÞ ¼ S Ni¼1 ðai tbi Þ ¼ S Ni¼1 xi
ð1Þ
with xi = ait bi where S is a certain t-conorm taken over the successive arguments x1, x2, . . ., xN. The generalization is sought in terms of the use of some t-conorm in the definition of the measure. In particular, one could
521
consider the maximum operation in which case we end up with a commonly encountered definition of the possibility measure. Given a large number of elements of X, we could easily end up with possibility measure approaching the values close to 1. Likewise, the generalized necessity measure, Nec(A, B), comes in the form ð2Þ NecðA; BÞ ¼ TNi¼1 ðð1 ai Þsbi Þ ¼ TNi¼1 zi with zi = (1 ai)sbi and T being a certain t-norm computed over ‘‘N” arguments. In particular, one could envision here the application of the minimum operation returning the ‘‘standard” necessity measure. The aggregation completed over ‘‘N” xi’s leads to the results that converge to zero. The generalization of rough sets coming in the form of so-called fuzzy rough sets involves fuzzy sets described in terms of a family of sets (forming the indiscernibility relation) and leads to the description in the form of the upper and lower approximations. In essence, these approximations are computed in the form of the possibility measure (upper approximation) and necessity measure (lower approximation). The effect of convergence of these measures to 1 and 0 shown before fully applies to this case as well. Subsequently we may easily end up with very ‘‘loose” description of the fuzzy rough set. In fuzzy relational calculus we typically encounter fuzzy relational equations [5,18,19] of the form A R = B where A and B are fuzzy sets defined in some finite spaces X and Y with R being a certain fuzzy relation defined in Cartesian product of X and Y. The aggregation operator combining A and R is the one of s–t composition and max–min, in particular. The dual type of the fuzzy relational equation, AR = B uses the t–s composition operator (with the min–max composition being its very special case). Again when the number of elements of X increases, then the results of these two compositions tend to 1 and 0, respectively. Being alerted by this phenomenon (and its practical implications) in which we are faced with a finite (yet quite large) population of membership grades, we are at position that the logic operators have to be revisited so that they exhibit clear logic facet however in their design we take advantage of the underlying experimental evidence. In this sense, we envision that such revisited constructs could benefit when being positioned at the junction of logic and the use of the available statistical evidence (results). Our ultimate objective is to introduce logic operators whose construction seamlessly embrace the logic fabric and augment it by the existing experimental evidence. Given this, we will be referring to them as statistically grounded OR (SOR) and statistically grounded AND (SAND) logic operators. In the context of our investigations, it is worth referring to the study by Greco et al. [9] in which they offered a view at aggregation mechanisms realized in terms of gradual rules. In this sense, the approach proposed by them is complementary to the ideas developed here.
522
W. Pedrycz / European Journal of Operational Research 193 (2009) 520–529
The study is organized in the following manner. We start with the introduction of the concept of the statistically grounded logic operators (Section 2), present the underlying functionality of the constructs and show how SORs and SANDs are constructed as a result of the solutions to a certain optimization problem. Next, in Section 3, we show one of the direct applications of these logic connectives to the development of fuzzy rough sets. We also present how the dependencies between fuzzy clusters are quantified in terms of SANDs and SORs (Section 4). Conclusions are covered in Section 5. Throughout the paper, we adhere to the commonly utilized notation encountered in fuzzy sets; in particular ‘‘t” and ‘‘s” will be referred to as t-norms and t-conorms (s-norms), respectively. 2. The concept of statistically oriented logic connectives The statistical support incorporated into the structure of the logic connective helps address the issues raised in the previous section. We introduce a concept of statistically augmented (directed) logic connectives by constructing a connective that takes into consideration a statistically driven aggregation with some weighting function being reflective of the nature of the underlying logic operation. 2.1. SOR logic connectives The (SOR) connective is defined as follows. Denote by w(x) a monotonically non-decreasing weight function from [0, 1] to [0, 1] with the boundary condition w(1) = 1. The result of the aggregation of x = [x1, x2, . . ., xN], denoted by SOR(x; w), is obtained from the minimization of the following expression (performance index) Q Q¼
N X
wðxi Þjxi yj
Miny Q;
ð3Þ
(b) w(x) is equal identically to 1, w(x) = 1. It becomes obvious that the result of the minimization of the following expression: N X jxi yj ð5Þ i¼1
is a median of x, median(x). Subsequently SOR(x, w) = median(x). Interestingly, the result of the aggregation is a robust statistics of the membership grades involved in this operation. We can consider different forms of weight functions. In particular, one could think of an identity function w(x) = x. There is an interesting and logically justified alternative which links the weight functions with the logic operator standing behind the logic operations. In essence, the weight function can be induced by various t-conorms (s-norms) by defining w(x) to be in the form w(x) = xsx. In particular, for the maximum operator, we obtain the identity weight function w(x) = max(x, x) = x. For the probabilistic sum, we obtain w(x) = (x + x x x) = 2x(1 x). For the Lukasiewicz or connective, the weight function comes in the form of some piecewise linear relationship with some saturation region, that is w(x) = max(1, x + x) = max(1, 2x). The plots of these three weight functions are included in Fig. 1. In general, the weight functions (which are monotonically non-decreasing and satisfy the condition w(1) = 1) occupy the region of the unit square as portrayed in Fig. 2. Obviously the weight functions induced by t-conorms are subsumed by the weight functions included in Fig. 2. For all these weight functions implied by t-conorms, the following inequality holds median(x) 6 SOR(x, w) 6 max(x). 2.2. SAND logic connectives
i¼1
where the value of ‘‘y” minimizing the above expression is taken as the result of the operation P SOR(x, w) = y. Put it N differently SORðx; wÞ ¼ arg miny2½0;1 k¼1 wðxk Þjxk yj The weight function ‘‘w” is used to model a contribution of different membership grades to the result of the aggregation. Several models of the relationships ‘‘w” are of particular interest; all of them are reflective of the or type of aggregation
The statistically grounded AND (SAND) logic connective is defined in an analogous way as it was proposed in the development of the SOR. Here w(x) denotes a monotonically non-increasing weight function from [0, 1] to [0,1] with the boundary condition w(0) = 1. The result of the aggregation of x = [x1, x2, . . ., xN], denoted by SAND(x; w), is obtained from the minimization of the 1
(a) w(x) assumes a form of a certain step function 1; if x P xmax ; wðxÞ ¼ 0; otherwise;
max
ð4Þ
probalilistic sum 0.5 Lukasiewicz
where xmax is the maximal value reported in x. This weight function effectively eliminates all the membership grades but the largest one. For this form of the weight function, we effectively end up with the maximum operator, SOR(x, w) = max (x1, x2, . . ., xN).
0
0
0.5
1
Fig. 1. Plots of selected weight functions induced by selected t-conorms.
W. Pedrycz / European Journal of Operational Research 193 (2009) 520–529
w(u)
523
1
min
1
product 0.5
Lukasiewicz
1
0
u
Fig. 2. Examples of weight functions generating SORs logic operators induced by t-conorms; all of them are localized in the shaded region of the unit square.
same expression (3) as introduced before. Thus we produce the logic operator SAND(x, w) = y with ‘‘y” being the solution to the corresponding minimization problem. As before, we can envision several models of the weight function; all of them are reflective of the and type of aggregation (a) w(x) assumes a form of some step function 1; if x 6 xmin ; wðxÞ ¼ 0; otherwise;
ð6Þ
where xmin is the minimal value in x. This weight function eliminates all the membership grades but the smallest one. For this form of the weight function, we effectively end up with the maximum operator, SAND(x, w) = min (x1, x2, . . ., xN). (b) for w(x) being equal identically to 1, w(x) = 1, SAND becomes a median, namely SAND(x, w) = med(x). (c) more generally, the weight function is defined on a basis of some t-norm as follows, w(x) = 1 xtx. Depending upon the specific t-norm, we arrive at different forms of the mapping. For the minimum operator, w(x) = 1 min(x, x) = 1 x which is a complement of ‘‘x”. The use of the product operation leads to the expression w(x) = 1 x2. In the case of the Lukasiewicz and connective, one has w(x) = 1 max(0, x + x-1) = 1 max(0, 2x 1). If we confine ourselves to monotonically non-increasing functions of [0,1] with the boundary condition of w(0) = 1, they can be illustrated as shown in Fig. 4. Note that the general inequality relationship holds min(x) 6 SAND(x, w) 6 median(x). Investigating the fundamental properties of the logic connectives, we note that the commutativity and monotonicity properties hold. The boundary condition does not hold when being considered with respect to a single membership grade (which is completely understood given the fact that the operation is expressed by taking into consideration a collection of membership grades). Assuming the t-norm and t-conorm driven format of the weight function (where we have w(1) = 1 and w(0) = 0 for or operators and w(0) = 1 and w(1) = 1 for and operators) we have
0
0.5
1
Fig. 3. Examples of weight functions used in the construction of the SAND operation.
w(u) 1
1
u
Fig. 4. Localization of weight functions induced by t-norms generating SANDs logic operators.
SOR(1, w) = 0, SAND(0, w) = 0. The property of associativity does not hold. This is fully justified given that the proposed operators are inherently associated with the processing of all membership grades not just individual membership values. There is another interesting alternative as to the choice of the weight function which brings an interesting issue of some probabilistic underpinnings. Denote by F(x) a cumulative density function (cdf) of values of ‘‘x”. The weight function w(x) = F(x) can be used as one of the alternatives for the weight function in the SOR operator. Likewise by taking the complement of the cdf F(x), w(x) = 1 F(x), this operator could be regarded as one of the alternatives in the realization of the SAND operator. Interestingly enough when dealing with a uniform probability density function (in which case we have F(x) = x), the weight function becomes the same as generated by the maximum and minimum, respectively. These observations form a certain motivating factor behind the statistically inclined naming of the proposed logic operators. A brief numeric example serves as an illustration of the concepts of the statistically grounded logic operators. The collection of membership grades to be aggregated consists of 13 values located in the unit interval f0:4 0:1 0:8 0:6 0:5 0:4 0:35 0:9 1:0 0:55 0:22 0:10:7g The median of these membership grades is 0.50. The optimization of the SOR and SAND operators, refer to Fig. 5, leads to the following aggregation results tabulated below:
524
W. Pedrycz / European Journal of Operational Research 193 (2009) 520–529
a 7 median
6 5
Lukasiewicz
4
probabilistic sum
3
max
2 1 0
0
0.2
0.4
0.6
0.8
1
b 9 8 7 6 5 4 3 2 1 0
Lukasiewicz median product min
0
0.2
0.4
0.6
0.8
1
Fig. 5. Performance index Q versus ‘‘m”: (a) SOR and (b) SAND.
for SOR
There are several interesting properties of the proposed logic operators that are worth including here. First, SAND and SOR are not necessarily monotone and this depends upon the form of the introduced weight function as well as the data themselves. It is worth stressing that the monotonicity of the logic operators studied so far may produce a highly undesired saturation effect especially when dealing with a large number of arguments. In essence, AND-like operators tend to produce results converging to zero while the convergence to one is noted for the OR-like operators. In other words, when such situation becomes encountered in fuzzy modeling, we may end up having fuzzy models for which an effectiveness of any estimation technique becomes questionable. Second, in case of possible non-uniqueness of the result (which might happen e.g., in case of median), it could be resolved by accepting a midpoint of the interval containing all possible results. One has to bear in mind that when running an optimization method, it is very likely that we end up with a single solution. Furthermore in case of population – based optimization (such as genetic algorithms, particle swarm optimization, ant colonies or alike), the result will form a global minimum to the problem. With the regard of non-uniqueness, the reader might refer to [21,22] that deals with an interesting class of mixture operators, a so-called mixture_OR and mixture_AND Pn wðxi Þxi . There is also an interbeing described in the form Pi¼1 n i¼1
t-Norm induced weight function
Max
Lukasiewicz
Probabilistic sum
Result of aggregation (m)
0.70
0.55
0.60
for SAND t-Conorm induced weight function
Min Lukasiewicz Product
Result of aggregation (m)
0.35 0.35
0.40
We note that all SOR values are located above the median while the SAND operators generate aggregations with the values below the median. For the given data set, the specific values of the aggregation depend on the character of the weight function. Referring to Figs. 1 and 3, the monotonicity in the weight functions induced by the corresponding t- and t-conorms is fully reflected in the order of the aggregation results. In other words, as the weight function implied by the max function is shifted to the right in comparison with the one induced by the probabilistic sum, then the result of SOR for the min is higher than the one for the SOR formed with the aid of the probabilistic sum.
wðxi Þ
esting property of duality, that is SAND(x1, x2, . . ., xn, w) = 1-SOR(1 x1, 1 x2, . . ., 1 xn, w) when the weights are governed by the relationship wAND(x) is = wOR(1 x) with wAND and wOR being the weights applied to the AND and OR type of the logic operator. The introduced SAND and SOR operators can also help deal with the lack of discriminatory capabilities of some original concepts used in fuzzy sets. The lack of any discriminatory power could have been directly attributed to the realization of the logic operators. Alluding to the possibility measure we used so far (1) and (2), no matter whether we use some t-conorm or the maximum operator, it becomes evident that it cannot express any difference between the possibility for the case of two identical normal fuzzy sets A, Poss(A, A) = 1 and normal fuzzy sets A and B that assume the maximal membership value at the same point. Again here Poss(A, B) = 1. The realization of the possibility measure proposed here clearly distinguishes between these two situations. Say, if both A and B are described by the Gaussian membership functions with the same modal value and different spreads then the possibility values depend upon the differences between these spreads. If we consider that the membership grades to be aggregated are governed by some probability density function p(x) then the optimization problem can be expressed in the following format:
W. Pedrycz / European Journal of Operational Research 193 (2009) 520–529
Q¼ ¼
Z Z
1
direct calculations, we come up with the following approximations:
wðxÞjx mjpðxÞdx
0 m0
wðxÞðx m0 ÞpðxÞdx þ 0
Z
1
wðxÞðm0 xÞpðxÞdx: ð7Þ m0
3. The development of fuzzy rough sets with the aid of SANDs and SORs As indicated in Section 1, the calculations of the possibility and necessity measures are central to the development of rough and fuzzy rough sets, cf. [6,14– 16,23,29,30]. Let us recall that those are constructs forming a direct generalization of the fundamental concept of rough sets [14]. Let us recall that the crux of rough sets is to express a given concept in the language of a finite vocabulary of information granules (forming a certain indiscernability relation). Typically, this description leads to its realization in the form of so-called lower and upper approximations. Depending upon the character of the information granules involved in the above construct, we may talk about rough sets, fuzzy rough sets, rough fuzzy sets and others. In what we are interested in is the following situation. Given is a finite collection of set-based information granules {B1, B2, . . ., Bp}. Using them expresses a certain fuzzy set A in terms of individual Bi’s. Assume that all information granules are defined in the same finite space X, card(X) = N. The following notation is in use A = [a1a2. . .aN], Bi = [bi1bi2. . .biN]. We follow the well-known formulas for the lower A and upper A approximation of A, cf. [13,16] where the ith entry of the approximations ai and ai is governed by the following expressions: N
ai ¼ NecðA; Bi Þ ¼ T ðð1 aj Þsbij Þ; j¼1 N
ai ¼ PossðA; Bi Þ ¼ S ðaj tbij Þ; j¼1
ð8Þ ð9Þ
i = 1, 2, . . ., p. As a matter of fact, those two expressions given above are the possibility and necessity measures of A. Given the character of the information granules Bi, it becomes apparent that they are just the maximum and minimum of the membership function of A taken over the support of Bi. More specifically, we have ai ¼ NecðA; Bi Þ ¼
N
T
j¼1 j:bij 2suppðBi Þ
ð1 aj Þsbij
ð10Þ
and N
ai ¼ PossðA; Bi Þ ¼
S
j¼1 j2suppðBi Þ
525
ðaj tbij Þ:
ð11Þ
Evidently, if A is a normal fuzzy set, the above formulas return very loose bounds as they tend to 0 and 1. For instance, consider that the membership function of A and the characteristic function of Bi are as follows A = [0.2 0.8 1.0 0.7 0.4 0.2 1.0] and Bi = [0 1 1 1 0 0 0]. Thorough
– lower approximation ai ¼ minðmaxð0; 0:8Þ; maxð0:2; 1Þ; maxð0; 1Þ; maxð0:3; 1Þ; maxð0:6; 0Þ; maxð0:8; 0Þ; maxð0; 0ÞÞ ¼ 0. – upper approximation ai = max {min(0, 0.2), min(1, 0.8), min(1, 1), min(1, 0.7), min (0, 0.4), min(0, 0.2), min(0, 1)} = 1. They produce a very loose characterization of A not being reflective of the semantics of A. Note that the membership function of A could vary quite substantially still yielding the same approximations. The main drawback resides with the lack of sensitivity and an ability to incorporate of the statistical evidence residing within a collection of the membership grades of A. In plain words, the computations of the approximations are invoked by the extreme values of the membership grades of A. While the definition of the lower and upper approximations is completely sufficient in the case of the set based information granules, its straightforward acceptance and a mechanical usage when dealing with fuzzy sets reveals some evident weaknesses. The use of the SANDs and SORs becomes essential to their elimination. 4. Cluster analysis with the possibility and necessity measures It is known that fuzzy clustering offers an interesting and comprehensive insight into the structure of numeric data, cf. [1,18,20]. Fuzzy clusters form a granular representation of numeric data and therefore constitute their meaningful abstract manifestation [31,32]. Let us consider that the results of fuzzy clustering come in the form of ‘‘c” clusters built upon a basis of ‘‘N” numerical data. Each cluster is described by some fuzzy set Ai, i = 1, 2, . . ., c. As a matter of fact, we can envision that Ai forms the ith row of some partition matrix being the result of the fuzzy clustering. Fuzzy clusters deliver detailed information about the structure in data. The membership grades of individual data to the individual clusters form a useful indicator of their location in the discussed structure. If one of the membership grades is visibly dominant, we may regard the corresponding point to be highly representative for the cluster. On the other hand, if the membership grades of some data are very much equally distributed across all clusters, this sends a strong ‘‘flagging” signal as to the borderline character of this point which may eventually trigger further analysis of its properties. In addition to the analysis outlined above, it could be equally advantageous to investigate the relationships between the clusters themselves. In essence, such task reduces to some description of global relationships between fuzzy sets. The two typical mechanisms worth exploring in this case deal with the determination of the possibility and necessity measures. Let us recall that the possibility, Poss(Ai, Aj), i, j = 1, 2, . . ., c quantifies an extent to which two
526
W. Pedrycz / European Journal of Operational Research 193 (2009) 520–529
clusters (fuzzy sets) Ai and Aj overlap. The higher the overlap is, the higher the value of the resulting possibility measure. The necessity measure, Nec(Ai, Aj) expresses an extent to which the Ai is included in Aj. Notably, the inclusion relationship is asymmetric. Moreover, the higher the value of the necessity measure, the more profound the inclusion of Ai in Aj. In this sense, the possibility and necessity can be used to articulate the structural dependencies between the corresponding information granules (fuzzy clusters). Evidently, there are tangible advantages of using here SAND and SOR given the fact that usually in practice we encounter high values of data (N). The analysis of dependencies between the clusters becomes of interest in forming some global inter-relationships existing at the level of the granular findings. These developments presented here could be found useful in the design of granular (fuzzy) models. There are two main frameworks in which the relationship analysis is of interest: (a) presentation of structural relationships between the clusters generated within the single run of the clustering algorithm. Here, as illustrated in Fig. 6, we encounter ‘‘c” clusters for which we establish and quantify pertinent dependencies. After computing possibility values, Poss (Ai, Aj), we can represent them in the form of a certain undirected graph (possibility dependence graph) formed at the higher conceptual level. Here each node corresponds to the devel-
oped cluster while the edges of the graph are quantified by the values of the already computed values of the possibility measures. Depending on some threshold value applied to the graph, we can extract the most essential linkages by the clusters thus identifying the clusters that are strongly linked, Fig. 4b. Likewise, we can determine the necessity values and by doing that highlight to which extent some pairs of clusters are in the relationship of inclusion. This could be of interest in understanding whether some clusters could be represented by others. This scheme is applicable to cases when the clustering algorithm(s) is run twice or more times thus leading to several corresponding families of fuzzy clusters, Fig. 7. There could be a situation when the same clustering is run for two different number of clusters (say, c1 and c2) thus leading to the formation of information granules at different levels of granularity. Here the dependency analysis could be realized in terms of the necessity and possibility measure measures. As an illustration of the concept of dependency analysis, we consider the use of the Fuzzy C-Means (FCM) being applied to the Boston housing data (see http://www.ics. uci.edu/~mlearn/MLRepository.html). The values of the parameters of the FCM used in the experiment are made standard: the fuzzification coefficient is set up to 2 while the distance is taken as a weighted Euclidean function. The FCM was run for 60 iterations at which point the changes in the successive partition matrices do not exceed 106. The t-norm was product and the t-conorm was realized as the probabilistic sum, respectively. The experiment
Fig. 6. The development of possibility graph (a) and their thresholding leading to the visualization of the strongly linked clusters; (b) presentation of structural relationships between the clusters generated within two or more runs of the clustering algorithm.
W. Pedrycz / European Journal of Operational Research 193 (2009) 520–529
527
Fig. 7. The dependency analysis realized for clusters built during different runs of the clustering algorithm (a), and the formation of the necessity links between information granules coming from the runs of the clustering for different numbers of clusters (b).
was carried for c = 2, 3, 4, and 5 clusters. We also used (3) with the same t-norm and conorm as specified before. In all cases the results are equal to 1 so the possibility measure realized in this ‘‘standard” manner does not offer any discriminatory capabilities. In addition to the possibility values, we show the prototypes produced by the FCM. c = 2: Poss(A1, A2) = 0.34 prototypes: v1 = [0.69 22.11 6.51 0.05 0.47 6.50 49.36 5.14 5.07 307.14 17.63 385.84 8.56 26.50] v2 = [2.43 5.23 12.90 0.11 0.58 6.11 81.18 3.11 7.52 389.64 18.44 372.38 14.29 21.69] c = 3: Poss(A1, A2) = 0.39
Poss(A1, A3) = 0.18 Poss(A2, A3) = 0.26 v1 = [0.55 31.27 5.73 0.05 0.46 6.61 43.81 5.59 4.91 307.91 17.28 386.05 7.71 27.72] v2 = [0.84 7.51 8.41 0.06 0.51 6.26 64.63 4.14 5.21 315.70 18.28 384.58 10.87 23.58] v3 = [3.79 3.28 15.84 0.14 0.63 6.01 87.88 2.53 9.66 441.56 18.55 365.85 16.08 20.58] The strongest linkages occur for A1 and A2. On the other hand, A1 and A3 are the pair that is loosely inter-related. The strength of relationships between A2 and A3 is positioned in-between the intensity of dependencies reported for the other information granules. c=4 Poss(A1, A2) = 0.18 Poss(A1, A3) = 0.11
528
W. Pedrycz / European Journal of Operational Research 193 (2009) 520–529
Fig. 8. A series of the most essential dependencies between fuzzy clusters as revealed by a series of threshold levels.
Poss(A1, A4) = 0.23 Poss(A2, A3) = 0.15 Poss(A2, A4) = 0.27 Poss(A3, A4) = 0.13 v1 = [0.38 38.29 5.20 0.04 0.45 6.66 39.98 5.90 4.63 307.87 17.02 386.17 7.17 28.34] v2 = [0.86 5.47 10.13 0.07 0.53 6.19 74.16 3.68 5.09 332.73 18.41 380.44 12.45 22.37] v3 = [8.01 1.63 17.48 0.15 0.68 5.85 92.37 1.99 17.34 564.66 19.24 367.45 18.48 18.66] v4 = [0.60 9.54 7.85 0.06 0.50 6.35 59.00 4.41 4.82 302.85 18.07 384.84 9.94 24.70] c=5 Poss(A1, A2) = 0.09 Poss(A1, A3) = 0.19 Poss(A1, A4) = 0.26 Poss(A1, A5) = 0.17 Poss(A2, A3) = 0.13 Poss(A2, A4) = 0.07 Poss(A2, A5) = 0.08 Poss(A3, A4) = 0.19 Poss(A3, A5) = 0.13 Poss(A4, A2) = 0.09 Poss(A4, A5) = 0.17 The plots of the most essential relationships between the 5 clusters are illustrated in a sequence of plots where we
have applied a series of increasing values of the threshold operation, Fig. 8. For c = 3 we look at the necessity values produced for the pairs of the fuzzy sets obtained in this clustering process, Nec(A1, A2) = 0.45, Nec(A1, A3) = 0.54, Nec(A2, A1) = 0.71, Nec(A2, A3) = 0.58, Nec(A3, A1) = 0.82, Nec(A3, A2) = 0.58. The most evident inclusion relationship occurs between A3 and A1 meaning that A3 is included quite visibly in A1. A similar strength of relationship occurs for A2 and A1. The weakest relationship of inclusion occurs for A1 and A2. It is worth mentioning that all necessity values computed with the use of (2) when using the product operation return the values equal to zero. This clearly indicates that the use of t-norms leads to a lack of discrimination provided by the results. 5. Conclusions Being fully cognizant of the challenges of fuzzy modeling, by proposing statistically grounded logic operators, we have emphasized the need for more data driven – constructs that dwell on available experimental evidence. We
W. Pedrycz / European Journal of Operational Research 193 (2009) 520–529
have developed logic operators that take into consideration collections of numeric membership grades and exploit their statistical characteristics through the use of the weight function. Interestingly the weight function underlines the logic nature of the operator. The OR class of logic operators, named here SOR, is generated by the weight functions that are monotonically non-decreasing functions and constructed by involving some t-norm, w(u) = utu or more generally w(u) = g(usu) with ‘‘g” being a certain monotonically non-decreasing mapping. The category of statistically grounded AND operators, SAND, is generated by the weight functions that are monotonically non-increasing over the unit interval. We discussed the weight functions of the form w(u) = 1 (utu); the general form of the relationship could be sought as w(u) = h(1 (utu)) with ‘‘h” being a monotonically non-decreasing mapping on the unit interval. The choice of the t-norm or t-conorm used in the SOR or SAND could be treated as a part of the design process: given some data that are to be approximated by the logic operator, we can choose a suitable triangular norm (conorm) in the weight function so that the best approximation (viz. with the lowest approximation error) is achieved. Similarly, it is worth stressing a probabilistic nature of the weight functions hence the motivation behind the naming of these aggregation operators. We have identified several fundamental constructs of fuzzy sets in which SOR and SANDs can play an important role; in particular, this concerns cases in which a number of membership grades has to be aggregated. For instance, this takes place in the computations of possibility and necessity measures or fuzzy modeling realized in presence of many system variables. We linked the SANDs and SORs to the problems of computing fuzzy rough sets by underlining their role in the incorporation of the experimental evidence about the existing collection of membership grades. Acknowledgements The author fully acknowledges constructive comments of the reviewers which were helpful in the enhancements of the discussion on the properties of the logic operators. References [1] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, 1981. [2] B. Bouchon-Meunier, Aggregation and Fusion of Imperfect Information, Physica-Verlag, Heidelberg, 1998. [3] T. Calvo, B. De Baets, J. Fodor, The functional Equations of Frank and Alsina for uninorms and nullnorms, Fuzzy Sets and Systems 120 (2001) 385–394. [4] T. Calvo, A. Kolesa´rova´, M. Komornı´kova´, R. Mesiar, Aggregation operators: Properties, classes and construction methods, in: Aggregation Operators: New Trends and Applications, Physica-Verlag, Heidelberg, 2002, pp. 1–104. [5] A. Di Nola, W. Pedrycz, S. Sessa, Fuzzy relational structures: The state of art, Fuzzy Sets and Systems 75 (1995) 241–262.
529
[6] D. Dubois, H. Prade, Rough-fuzzy sets and fuzzy-rough sets, International Journal of General Systems 17 (2–3) (1990) 191–209. [7] H. Dyckhoff, W. Pedrycz, Generalized means as a model of compensative connectives, Fuzzy Sets and Systems 14 (1984) 143– 154. [8] J. Fodor, R. Yager, A. Rybalov, Structure of uninorms, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 5 (4) (1997) 113–126. [9] S. Greco, B. Matarazzo, R. Slowinski, Granular computing for reasoning about ordered data: The dominance-based rough set approach, in: W. Pedrycz, A. Skowron, V. Kreinovich (Eds.), Handbook of Granular Computing, J. Wiley & Sons, Hoboken, NJ, 2008. [10] K. Hirota, W. Pedrycz, A distributed model of fuzzy set operators, Fuzzy Sets and Systems 68 (1994) 157–170. [11] E. Klement, R. Mesiar, E. Pap, Triangular Norms, Kluwer Academic Publishers, Dordrecht, 2000. [12] G. Klir, B. Yuan, Fuzzy Sets and Fuzzy Logic: Theory and Applications, Prentice-Hall, Upper Saddle River, NJ, 1995. [13] H. Nguyen, E. Walker, A First Course in Fuzzy Logic, Chapman Hall, CRC Press, Boca Raton, Fl, 1999. [14] Z. Pawlak, Rough sets, International Journal of Computer and Information Sciences 11 (1982) 341–356. [15] Z. Pawlak, Rough Sets. Theoretical Aspects of Reasoning About Data, Kluwer Academic Publishers, Dordercht, 1991. [16] Z. Pawlak, J.G. Busse, R. Slowinski, W. Ziarko, Rough sets, Communications of the ACM 38 (11) (1995) 89–95. [17] W. Pedrycz, Associations of fuzzy sets, IEEE Transactions on Systems, Man and Cybernetics 6 (1992) 1483–1488. [18] W. Pedrycz, F. Gomide, An Introduction to Fuzzy Sets: Analysis and Design, MIT Press, Cambridge, MA, 1998. [19] W. Pedrycz, Fuzzy relational equations: Bridging theory, methodology and practice, International Journal of General Systems 29 (2000) 529–554. [20] W. Pedrycz, Knowledge-Based Clustering, John Wiley, Hoboken, NJ, 2005. [21] R.A.M. Pereira, R.A. Ribeiro, Aggregation with generalized mixture operators using weighting functions, Fuzzy Sets and Systems 137 (1) (2003) 43–58. [22] R.A.M. Ribeiro, A.J. Rodrigues, P. Zarate´, Decision support systems: Current research, European Journal of Operational Research 145 (2) (2003) 329–342. [23] A.M. Radzikowska, E.E. Kerre, A comparative study of fuzzy rough sets, Fuzzy Sets and Systems 126 (2002) 137–155. [24] B. Schweizer, A. Sklar, Probabilistic Metric Spaces, North-Holland, New York, NY, 1983. [25] V. Torra, Aggregation operators and models, Fuzzy Sets and Systems 156 (2005) 407–410. [26] R. Yager, On ordered weighted averaging aggregation operations in multicriteria decision making, IEEE Transactions on Systems, Man, and Cybernetics 18 (1988) 183–190. [27] R. Yager, Aggregation operators and fuzzy systems modeling, Fuzzy Sets and Systems 67 (1994) 129–146. [28] R. Yager, A. Rybalov, Uninorm aggregation operators, Fuzzy Sets and Systems 80 (1996) 111–120. [29] Y.Y. Yao, Two views of the theory of rough sets in finite universes, International Journal of Approximate Reasoning 15 (1996) 291–317. [30] Y.Y. Yao, Probabilistic approaches to rough sets, Expert Systems 20 (5) (2003) 287–297. [31] L.A. Zadeh, Fuzzy logic = Computing with words, IEEE Transactions on Fuzzy Systems 4 (1996) 103–111. [32] L.A. Zadeh, Toward a generalized theory of uncertainty (GTU)—An outline, Information Sciences 172 (2005) 1–40. [33] H. Zimmermann, P. Zysno, Latent connectives in human decision making, Fuzzy Sets and Systems 4 (1980) 37–51.