Knowledge-Based Systems 23 (2010) 220–231
Contents lists available at ScienceDirect
Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys
A rough set approach for selecting clustering attribute Tutut Herawan a, Mustafa Mat Deris a,*, Jemal H. Abawajy b a b
University of Tun Hussein Onn Malaysia, Faculty of Information Technology and Multimedia, Parit Raja, 86400, Batu Pahat, Johor, Malaysia Deakin University, School of Engineering and Information Technology, Geelong, VIC, Australia
a r t i c l e
i n f o
Article history: Received 9 February 2009 Received in revised form 6 November 2009 Accepted 18 December 2009 Available online 24 December 2009 Keywords: Clustering Rough set theory Dependency of attributes Performance
a b s t r a c t A few of clustering techniques for categorical data exist to group objects having similar characteristics. Some are able to handle uncertainty in the clustering process while others have stability issues. However, the performance of these techniques is an issue due to low accuracy and high computational complexity. This paper proposes a new technique called maximum dependency attributes (MDA) for selecting clustering attribute. The proposed approach is based on rough set theory by taking into account the dependency of attributes of the database. We analyze and compare the performance of MDA technique with the bi-clustering, total roughness (TR) and min–min roughness (MMR) techniques based on four test cases. The results establish the better performance of the proposed approach. Ó 2009 Elsevier B.V. All rights reserved.
1. Introduction Cluster analysis is a data analysis tool used to group data with similar characteristics. It has been used in data mining tasks such as unsupervised classification and data summation. Cluster analysis techniques have been used in many areas such as manufacturing, medicine, nuclear science, radar scanning and research and development planning. For example, Wu et al. [1] develop a clustering algorithm specifically designed to handle the complexities of gene data that can estimate the correct number of clusters and find them. Wong et al. [2] present an approach used to segment tissues in a nuclear medical imaging method known as positron emission tomography (PET) and Haimov et al. [3] use cluster analysis to segment radar signals in scanning land and marine objects. These clustering techniques are only applicable for clustering data having numerical values for attributes. Unlike numerical data, categorical data have multi-valued attributes. Thus, similarity can be defined as common objects, common values for the attributes, and the association between the two. In such cases, a number of algorithms for clustering categorical data have been proposed including work by Huang [4], Gibson et al. [5], Guha et al. [6], Ganti et al. [7], and Dempster et al. [8]. While these methods make important contributions to the issue of clustering categorical data, they are not designed to handle uncertainty in the clustering process. This is an important issue in many real world applications where there is often no sharp boundary between clusters. Huang [4] and Kim et al. * Corresponding author. Tel.: +60 7 4538001; fax: +60 7 4532199. E-mail addresses:
[email protected] (T. Herawan),
[email protected] (M.M. Deris),
[email protected] (J.H. Abawajy). 0950-7051/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.knosys.2009.12.003
[9] work in the area of applying fuzzy sets in clustering categorical data. However, these algorithms require multiple runs to establish the stability needed to obtain a satisfactory value for one parameter used to control the membership fuzziness [10]. Recently, there has been work in the area of applying rough set theory to handle uncertainty in the process of selecting clustering attribute. Mazlack et al. [11] proposes two techniques to select clustering attribute: i.e. bi-clustering (BC) technique based on bivalued attributes and total roughness (TR) technique. Mazlack et al. suggested that BC technique will be attempted first in order to achieve low dissonance inside the cluster. With this technique, there are three different approaches of selecting clustering attribute, i.e. arbitrary, imbalanced, and balanced. For balanced or unbalanced clustering approaches, it is likely that two kinds of problems may occur. First, it may have several candidates of biclustering attributes. Then, a decision has to be made to which one should be chosen as the clustering attribute. Second, no twovalued attribute can be found to form balance clustering. At this point, clustering on multiple-valued attributes will be considered. Therefore, for selecting clustering attribute for data set with multiple-valued attributes, Mazlack et al. proposed a technique using the average of the accuracy of approximation (accuracy of roughness) in the rough set theory [12–14] called total roughness (TR). In other words, it is based on the average of mean roughness of an attribute with respect to the set of all other attributes in an information system, where the higher the total roughness is, the higher the accuracy of selecting clustering attribute. Parmar et al. [11] proposes a new technique called min–min roughness (MMR) to improve bi-clustering technique for data set with multi-valued attributes. In this technique, bi-valued and
T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231
multi-valued attributes are equally treated and it’s accuracy of approximation is measured using the well-known Marczeweski– Steinhaus metric applied to the lower and upper approximations of a subset of the universe in an information system [15–17]. However, MMR is the complementary of TR that produces the same accuracy and complexity with TR technique. It has been shown in Section 4.1 that TR and MMR have the same result in selecting clustering attribute. With this technique, the complexity is however still an issue due to all attributes are considered to obtain the clustering attribute. Therefore, there is a need for a technique in data clustering to improve the accuracy and computational complexity. One way to select clustering attribute is to discover the dependency between attributes. In this paper, a technique called maximum dependency of attributes (MDA) is proposed. It is based on the dependency of attributes using rough set theory in an information system. Four test cases are considered to evaluate and compare the performance of MDA with BC, TR and MMR techniques: The credit card promotion dataset as in [18], the student’s enrollment qualifications dataset, animal’s dataset as in Hu [19] and the dataset as in Parmar et al. [11]. We show that the proposed technique provides better performance with that of BC, TR and MMR techniques. The rest of this paper is organized as follows. Section 2 describes Pawlak’s rough set theory in information systems. Section 3 describes the algorithm of MDA. Section 4 describes the performance comparison of MDA with TR and MMR techniques. The comparison tests of MDA with that of BC, TR and MMR techniques based on four test cases are described in Section 5. Finally, the conclusions of this work are described in Section 6.
2. Pawlak’s rough set model Rough set theory, proposed by Pawlak in 1980s as a result of a long term program on mathematical fundamental research of information systems can be seen as a new mathematical approach to vagueness (set) and uncertainty (element) [12–14]. The rough set theory is founded on the assumption that with every object of the universe of discourse we associate some information (or knowledge). For example, if objects are patients suffering from a certain disease, symptoms of the disease form information about patients. Objects characterized by the same information are indiscernible (or similar) in view of the available information about them. The indiscernibility relation generated in this way is the mathematical basis of rough set theory. Any set of all indiscernible objects is called an elementary set, and forms a basic granule (or atom) of knowledge about the universe. Any union of some elementary sets is referred to as crisp (or precise) set – otherwise the set is rough (or imprecise or vague) [12–17,20–26]. Consequently, each rough set has boundary-line cases, i.e. objects which cannot with certainty be classified either as members of the set or of its complement. Obviously crisp sets have no boundary-line elements at all. This means that boundary-line cases cannot be properly classified by employing available knowledge. Hence, rough set theory expresses vagueness not by means of membership, but by employing a boundary region of a set. The theory is different from, and complementary to, fuzzy sets [27]. The theory is a generalization of standard (crisp) set theory and has been successfully applied in many fields, for example, data mining, data analysis, medicine, expert systems, and many more [14,21–26]. The original main goal of the rough set theory is induction of approximations of concepts. The idea of rough set consists of approximation of a set by a pair of two crisp sets called the lower and upper approximations of the set [12–17,20–26]. Motivation for rough set theory has come from the need to represent subsets of a universe in terms of equivalence classes of a clustering of the universe. Rough set the-
221
ory is an approach to aid decision making in the presence of uncertainty [21]. Here, we use the concept of rough set theory in term of data containing in an information system. The notion of information system provides a convenient tool for the representation of objects in terms of their attribute values. An information system as in [14] is a 4-tuple (quadruple) S ¼ ðU; A; V; f Þ, where U is a non-empty finite set of objects, A is a S non-empty finite set of attributes, V = a2A Va, Va is the domain (value set) of attribute a, f:U A ? V is a total function such that f(u,a) 2 Va, for every (u,a) 2 U A, called information (knowledge) function. Definition 1 ( see [14]). Let S ¼ ðU; A; V; f Þ be an information system and let B be any subset of A. Two elements x, y 2 U is said to be B-indiscernible (indiscernible by the set of attribute B # A in S) if and only if f ðx; aÞ ¼ f ðy; aÞ, for every a 2 B. Obviously, every subset of A induces unique indiscernibility relation. Notice that, an indiscernibility relation induced by the set of attribute B, denoted by IND (B), is an equivalence relation. It is well-known that, an equivalence relation induces unique clustering. The clustering of U induced by IND(B) in S ¼ ðU; A; V; f Þ denoted by U/B and the equivalence class in the clustering U/B containing x 2 U, denoted by [x]B. The notions of lower and upper approximations of a set can be defined as follows [14]: Definition 2. Let S ¼ ðU; A; V; f Þ be an information system, let B be any subset of A and let X be any subset of U. The B-lower approximation of X, denoted by B(X) and B-upper approximations, denoted by BðXÞ of X, respectively, are defined by
BðXÞ ¼ fx 2 Uj½xB # Xg and BðXÞ ¼ fx 2 Uj½xB \ X – /g:
ð1Þ
It can easily be seen that the B-upper approximation of a subset X # U can be expressed using set complement and lower approximation of X by
BðXÞ ¼ U Bð:XÞ;
ð2Þ
where :X denotes the complement of X relative to U. The accuracy of approximation (accuracy of roughness) of any subset X # U with respect to B # A, denoted aB(X) is measured by
aB ðXÞ ¼
jBðXÞj jBðXÞj
;
ð3Þ
where jXj denotes the cardinality of X. For empty set /, we define aB(/) = 1. Obviously, 0 6 aB(X) 6 1. If X is a union of some equivalence classes of U, then aB(X) = 1. Thus, the set X is crisp (precise) with respect to B. And, if X is not a union of some equivalence classes of U, then aB(X) < 1. Thus, the set X is rough (imprecise) with respect to B [14]. This means that the higher of accuracy of approximation of any subset X # U is the more precise (the less imprecise) of itself. The accuracy of roughness in Eq. (3) can also be interpreted using the well-known Marczeweski–Steinhaus (MZ) metric [15– 17]. Let S ¼ ðU; A; V; f Þ be an information system and given two subsets X, Y # U, the MZ metric measuring the distance X and Y is defined as
DðX; YÞ ¼
jX DYj ; jX [ Yj
where, XDY = (X [ Y) (X \ Y) denotes the symmetric difference between two sets X and Y. Then we have
DðX; YÞ ¼
ðX [ YÞ ðX \ YÞ jX \ Yj ¼1 : jX [ Yj jX [ Yj
ð4Þ
222
T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231
Notice that, (a) If X and Y are totally different, i.e. X \ Y = / (in other words X and Y are disjoint), then the metric reaches the maximum value of 1. (b) If X and Y are exactly the same, i.e. X = Y, then the metric reaches minimum value of 0. By applying the Marczeweski–Steinhaus metric to the lower and upper approximations of a subset X # U in information system S, we have
DðRðXÞ;
RðXÞÞ ¼ 1
jRðXÞ \ RðXÞj jRðXÞ [ RðXÞj
;
¼1
jRðXÞj jRðXÞj
;
¼ 1 aR ðXÞ: ð5Þ
The accuracy of roughness may be viewed as an inverse of MZ metric when applied to lower and upper approximations. In other words, the distance between the lower and upper approximations determines the accuracy of the rough set approximations. Note that, the measurement in Eq. (3) not only depends on the approximation of X, but also depends on the approximation of :X. From (2) and (3), the accuracy of approximation of any subset X # U with respect to B # A, aB(X), can be defined as
aB ðXÞ ¼
jBðXÞj jBðXÞj jBðXÞj þ jBð:XÞj ¼ : ¼ U jBð:XÞj U jBðXÞj
Definition 4. Let S = (U,A,V,f) be an information system and let D and C be any subsets of A. Dependency attribute D on C in a degree k(0 6 k 6 1), is denoted by C ) kD. The degree k is defined by
P
jUj
:
aD ðXÞ 6 aC ðXÞ; for every X # U.
Definition 3. Let S = (U,A,V,f) be an information system and let D and C be any subsets of A. D is functionally depends on C, denoted C ) D, if each value of D is associated exactly one value of C.
k¼
Proposition 1. Let S = (U,A,V,f) be an information system and let D and C be any subsets of A. If D depends totally on C, then
ð6Þ
The measurement in (6) can be generalized to find the dependency degree of attributes in information systems as illustrated in Definitions 3 and 4 below.
X2U=D jCðXÞj
the dependency of attributes in the rough set theory in information systems. The MDA algorithm consists of several main steps. The first step deals with the computation of the equivalence classes of each attribute. The equivalence classes of the set of objects U can be obtained using the indiscernibility relation of attribute ai 2 A in information system S ¼ ðU; A; V; f Þ. The second step deals with the determination of the dependency degree of each attributes. The degree of dependency attributes can be determined using formula (5). The third step deals with selecting the maximum dependency degree. In the last step, if all dependency degree is computed, then the clustering attribute will be selected based on that dependency. The maximum degree of dependency of attributes is the more accurate (higher of accuracy of approximation) for selecting clustering attribute. The justification is that the higher the degree of dependency of attributes, the more accurate for selecting clustering attribute as stated in the proposition 2. We first present the relation between the properties of roughness of a subset X # U with the dependency of two attributes as stated in Proposition 1.
ð7Þ
D is said to be fully depends (in a degree of k) on C if k = 1. Otherwise, D is partially depends on C.
Proof. Let D and C be any subsets of A in information system S = (U,A,V,f). From the hypothesis, we have IND(C) # IND(D). Furthermore, the clustering U/C is finer than that U/D, thus, it is clear that any equivalence class induced by IND(D) is a union of some equivalence class induced by IND(C). Therefore, for every x 2 X # U, we have
½xC # ½xD : And hence, for every X # U, we have
DðXÞ # CðXÞ X CðXÞ # DðXÞ: Consequently,
aD ðXÞ ¼
jDðXÞj
6
jDðXÞj
jCðXÞj jCðXÞj
¼ aC ðXÞ:
The generalization of Proposition 1 is given below.
Thus, D fully (partially) depends on C, if all (some) elements of the universe U can be uniquely classified to equivalence classes of the clustering U/D, employing C. Based on Definition 4, we can select the clustering attributes based on the maximum degree of k.
Proposition 2. Let S = (U,A,V,f) be an information system and let C1,C2,. . .,Cn and D be any subsets of A. If C 1 )k1 D; C 2 )k2 D; . . . ; C n )kn D, where kn 6 kn1 6 6 k2 6 k1, then
3. Maximum dependency of attributes
for every X # U.
In this section, we will present the proposed technique, which we refer to as maximum dependency of attributes (MDA). Fig. 1 shows the pseudo-code of the MDA algorithm. The algorithm uses
Proof. Let C1,C2,. . .,Cn and D be any subsets of A in information system S. From the hypothesis and follows from Proposition 1, we have
aD ðXÞ 6 aCn ðXÞ 6 aCn1 ðXÞ 6 6 aC2 ðXÞ 6 aC1 ðXÞ;
Fig. 1. The MDA algorithm.
223
T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231
aD ðXÞ 6 aC1 ðXÞ aD ðXÞ 6 aC2 ðXÞ
Table 2 The degree of dependency of attributes of Table 1.
.. .
Attribute (depends on)
Degree of dependency
A
B 0.2
C 0.2
D 1
1 0.2
B
A 0.4
C 0.2
D 1
1 0.4
C
A 0.4
B 0.2
D 0.6
0.6
D
A 0.4
B 0.2
C 0.2
0.4
aD ðXÞ 6 aCn ðXÞ Since kn 6 kn1 6 6 k2 6 k1, then
½xC n # ½xCn1 ½xC n1 # ½xC n2 .. .
MDA
½xC 2 # ½xC 1 : Obviously,
4. Performance comparison
aD ðXÞ 6 aCn ðXÞ 6 aCn1 ðXÞ 6 6 aC2 ðXÞ 6 aC1 ðXÞ:
The measurement of accuracy of MDA comparing with that TR and MMR for selecting clustering attribute will be described in Section 4.2. The example to find the degree of dependency of attributes in an information system based on formula in Eq. (7) will be illustrated as in Example 3.
We present and analyze our MDA technique and its algorithm for selecting clustering attribute. The performance comparison between the proposed technique and, total roughness (TR) and min– min roughness (MMR) techniques will be discussed in terms of the accuracy of roughness and computational complexity. 4.1. The TR and MMR techniques
Example 3. To illustrate in finding the degree of dependency of attributes, we consider the information system as shown in Table 7. We modify an information system from Example 3 in [20]. From Table 1, based on each attribute, there are four partitions of U induced by indiscernibility relation on each attribute, i.e.
U=A ¼ ff1; 2; 5g; f3; 4gg;
U=B ¼ ff1g; f2; 3; 4; 5gg;
¼ ff1; 2; 3; 4g; f5gg;
U=C
U=D ¼ ff1g; f2; 5g; f3; 4gg:
Based on formula in Eq. (7), the degree of dependency of attribute B on attribute A, denoted A ) kB, can be calculated as follows.
P A)k B;
X2U=B jAðXÞj
k¼
jUj
¼
The definition of information system is based on the notion of information system as stated in Section 2. From the definition, suppose that attribute ai 2 A has k-different values, say bk, k = 1,2,. . .,n. Let X(ai = bk), k = 1,2,. . .,n be a subset of the objects having k-different values of attribute ai. The roughness of TR technique of the set X(ai = bk), k = 1,2,. . .,n, with respect to aj, where i – j, denoted by Raj ðXjai ¼ bk Þ, as in [11] is defined by
Raj ðXjai ¼ bk Þ ¼
B)k C;
k¼
C)k D;
k¼
X2U=C jBðXÞj
jUj P X2U=D jCðXÞj jUj
jVða Pi Þj
Roughaj ðai Þ ¼
jf1gj ¼ ¼ 0:2: jf1; 2; 3; 4; 5gj ¼
jX aj ðai ¼ bk Þj
;
k ¼ 1; 2; . . . ; n:
ð8Þ
From TR technique, the mean roughness of attribute ai 2 A with respect to attribute aj 2 A, where i – j, denoted Roughaj ðai Þ, is evaluated as follows
jf3; 4gj ¼ 0:4: jf1; 2; 3; 4; 5gj
Using the same way, we obtain
P
jX aj ðai ¼ bk Þj
jf5gj ¼ 0:2: jf1; 2; 3; 4; 5gj
The degree of dependency of all attributes in Table 1 can be summarized as in Table 2. With MDA technique, if there are two values of MDA and if the highest value of an attribute is the same with other attributes, then it is recommended to look at the next highest MDA inside the attributes that are tied and so on until the tie is broken. From Table 2, the first maximum degree of dependency of attributes, i.e. 1 occurs in attributes A and B. The second maximum degree of dependency of attributes occurs in attribute B is 0.4 while in attribute A is 0.2. Thus, attribute B is selected as clustering attribute.
Table 1 A modified information system from [20].
k¼1
Raj ðXjai ¼ bk Þ jVðai Þj
where V(ai) is the set of values of attribute ai 2 A. The total roughness of attribute ai 2 A with respect to attribute aj 2 A, where i – j, denoted TR(ai), is obtained by the following formula
PjAj TRðai Þ ¼
j¼1 Roughaj ðai Þ
jAj 1
A
B
C
D
1 2 3 4 5
Low Low High High Low
Bad Good Good Good Good
Loss Loss Loss Loss Profit
Small Large Medium Medium Large
ð10Þ
:
As stated in Mazlack et al. [11] that the highest value of TR, the best selection of clustering attributes. Meanwhile, the value of roughness of MMR technique is the opposite of that TR technique which is equivalent with that has been proposed in [13], i.e.
MMRaj ðXjai ¼ bk Þ ¼ 1 Raj ðXjai ¼ bk Þ ¼ 1
U
ð9Þ
;
jX aj ðai ¼ bk Þj jX aj ðai ¼ bk Þj
ð11Þ
It is clear that MMR technique uses MZ metric to measure the roughness of the set X(ai = bk), k = 1,2,. . .,n, with respect to aj, where i – j. Thus, the value of mean roughness of MMR technique is also the opposite of that TR technique, i.e.
224
T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231
PjVðai Þj k¼1
MMRoughaj ðai Þ ¼
PjVðai Þj ¼
Next, we present a comparison tests between BC, TR, MMR and MDA techniques.
MMRaj ðXjai ¼ bk Þ jVðai Þj
ð1 Raj ðXjai ¼ bk ÞÞ jVðai Þj jVða Pi Þj PjVðai Þj Raj ðXjai ¼ bk Þ k¼1 1 k¼1
5. Comparison tests Four test cases are considered to compare and evaluate the accuracy and the complexity of each technique: The credit card promotion dataset from Roiger and Geatz [18], the student’s enrollment qualifications dataset, animal data set from Hu [19] and the dataset from Parmar et al. [10].
k¼1
¼
jVðai Þj PjVðai Þj Raj ðXjai ¼ bk Þ jVðai Þj k¼1 ¼ jVðai Þj jVðai Þj ¼ 1 Roughaj ðai Þ; for i – j:
ð12Þ
The MMR technique is based on the minimum value of mean roughness as in (12), without calculating total roughness. According to Parmar et al. [10], the least mean roughness is the best selection of clustering attribute. In an information system, S ¼ ðU; A; V; f Þ, an attribute ai 2 A may have several different values, say bk, k = 1,2,. . .,n. Thus, from Eqs. (6) and (8), we can generalize the roughness of the sets as follows:
Raj ðXÞ ¼
jðXaj ðai ¼ b1 ÞÞj
jðXaj ðai ¼ b2 ÞÞj
U jðXaj ðai ¼ bn ÞÞj
þ P ¼
þ
U
þ
U U=ai jXðai
U
¼ bk Þj
;
k ¼ 1; 2; . . . ; n:
ð13Þ
The formula in Eq. (13) is a degree of dependency of attribute aj on attribute ai, where i – j. Thus, for MDA technique, it is based on Eq. (13). 4.2. The accuracy In Section 4.1, TR, MMR and MDA used different techniques in selecting clustering attribute. TR uses the total average of mean roughness, MMR uses the minimum of mean roughness and MDA uses the maximum degree of dependency to select the clustering attribute. To measure the accuracy of selecting clustering attribute, we use the formula of mean roughness in Eq. (9) to represent all techniques. The higher the mean roughness is the higher the accuracy of the selecting clustering attribute. 4.3. The computational complexity Suppose that in an information system, there are n objects, m attributes and l is the maximum of distinct values of each attribute. The TR, MMR and MDA need nm computation for determining elementary set of all attribute. For TR and MMR, the computation of calculating roughness of all subsets of U having different value of attribute ai and mean roughness of attribute ai with respect to attribute aj, where i – j is n2l. The computation of calculating TR of all attributes is n times and time to calculate MMR is 2n. Thus, the computational complexity of TR and MMR, are of O(n2l + nm + n) and O(n2l + nm + 2n), respectively. Meanwhile, MDA technique needs (n(n 1)) times to determine the dependency degree of attributes. Thus, the computational complexity for MDA technique is of the polynomial, O(n(n 1) + nm). Proposition 7. Computational complexity of MDA is lower than that of TR and MMR. Proof. Let S ¼ ðU; A; V; f Þ be an information system. Based on the computational complexities as mentioned above, the TR and MMR has the same complexity with MDA when l = 1, otherwise, MDA has lower computational complexity than that of TR and MMR. Thus, the MDA has lower computational complexity as compared with TR and MMR. h
5.1. Case 1: The credit card promotion dataset in [18] Table 3 shows the credit card promotion dataset as in [18]. There are five categorical attributes (n = 5): magazine promotion (MP), watch promotion (WP), life insurance promotion (LIP), credit card insurance (CCI) and sex (S). All attributes have two distinct values, (l = 2), i.e. yes and no and ten objects (m = 10) are considered. Notice that with the BC technique, the attribute of the least distinct balanced-valued will be selected as a clustering attribute without consideration of the maximum value of total roughness of each attributes. Thus, attribute LIP will be chosen as a clustering attribute. In the next step, we will present the procedures to find TR and MMR values. To obtain the values of TR, MMR and MDA, firstly, obtain the equivalence classes induced by indisceribility relation of singleton attributes. From Table 3, it can be illustrated as follows; (a) X(MP = yes) = {1,2,4,5,7,9,10}, X( MP = no) = {3,6,8}, U=MP ¼ ff1; 2; 4; 5; 7; 9; 10g; f3; 6; 8gg: (b) X(WP = yes) = {2,4,8,10}, X(WP = no) = {1,3,5,6,7,9}, U=WP ¼ ff2; 4; 8; 10g; f1; 3; 5; 6; 7; 9gg: (c) X(LIP = yes) = {2,4,5,7,10}, X(LIP = no) = {1,3,6,8,9}, U=LIP ¼ ff2; 4; 5; 7; 10g; f1; 3; 6; 8; 9gg: (d) X(CCI = yes) = {4,7}, Xt(CCI = no) = {1,2,3,5,6,8,9,10}, U=CCI ¼ ff4; 7g; f1; 2; 3; 5; 6; 8; 9; 10gg: (e) X(S = male) = {1,3,4,7,8,9}, X(S = female) = {2,5,6,10}, U=S ¼ ff1; 3; 4; 7; 8; 9g; f2; 5; 6; 10gg: Secondly, obtain the lower and upper approximations of subsets of U based on attribute LIP with respect to attributes MP, WP, CCI and S. It can be obtained using the formula in Eq. (1). The lower and upper approximations of subsets of U having different values of attribute LIP with respect to attributes MP, WP, CCI and S are given below. (a) LIP with respect to MP XðLIP ¼ yesÞ ¼ /;
XðLIP ¼ yesÞ ¼ f1; 2; 4; 5; 7; 9; 10g;
XðLIP ¼ noÞ ¼ f3; 6; 8g;
XðLIP ¼ noÞ ¼ f1; 2; 3; 4; 5; 6; 7; 8; 9; 10g:
Table 3 A subset of the credit card promotion dataset from Acme Credit Card Company database [18]. Person
Magazine promotion
Watch promotion
Life insurance promotion
Credit card insurance
Sex
1 2 3 4 5 6 7 8 9 10
Yes Yes No Yes Yes No Yes No Yes Yes
No Yes No Yes No No No Yes No Yes
No Yes No Yes Yes No Yes No No Yes
No No No Yes No No Yes No No No
Male Female Male Male Female Female Male Male Male Female
225
T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231
(b) LIP with respect to WP
XðLIP ¼ yesÞ ¼ /;
Table 4 The total roughness of all attributes in Table 9 using TR technique.
XðLIP ¼ yesÞ ¼ f1; 2; 3; 4; 5; 6; 7; 8; 9; 10g;
XðLIP ¼ noÞ ¼ /;
XðLIP ¼ noÞ ¼ f1; 2; 3; 4; 5; 6; 7; 8; 9; 10g:
Attribute (with respect to)
TR mean roughness
MP
WP 0
LIP 0.25
CCI 0.1
S 0
0.0875
WP
MP 0
LIP 0
CCI 0
S 0
0
LIP
MP 0.15
WP 0
CCI 0.1
S 0
0.0625
CCI
MP 0.15
WP 0
LIP 0.25
S 0.2
0.15
S
MP 0
WP 0
LIP 0
CCI 0.1
0.025
(c) LIP with respect to CCI XðLIP ¼ yesÞ ¼ f4; 7g; XðLIP ¼ noÞ ¼ /;
XðLIP ¼ yesÞ ¼ f1; 2; 3; 4; 5; 6; 7; 8; 9; 10g;
XðLIP ¼ noÞ ¼ f1; 2; 3; 5; 6; 8; 9; 10g:
(d) LIP with respect to S
XðLIP ¼ yesÞ ¼ /; XðLIP ¼ noÞ ¼ /;
XðLIP ¼ yesÞ ¼ f1; 2; 3; 4; 5; 6; 7; 8; 9; 10g; XðLIP ¼ noÞ ¼ f1; 2; 3; 4; 5; 6; 7; 8; 9; 10g:
Thirdly, obtain the roughness of all attributes, where TR uses formula in Eq. (8). The roughness of subsets of U having different values of attribute LIP with respect to attributes MP, WP, CCI and S are given below. (a) LIP with respect to MP Roughness
RMP ðXjLIP ¼ yesÞ ¼ 0;
RMP ðXjLIP ¼ NoÞ ¼ 0:3000:
(b) LIP with respect to WP Roughness
RWP ðXtjLIP ¼ yesÞ ¼ 0;
gleton attribute are the same as in TR technique. The mean roughness of MMR technique uses formula as in Eq. (12) which is different from that in TR technique. In this case we give an example to obtain MMR of attribute LIP with respect to attributes MP, WP, CCI and S. The mean roughness of attribute LIP with respect to attributes MP, WP, CCI and S are given below. (a) LIP with respect to MP Mean roughness
RWP ðXjLIP ¼ noÞ ¼ 0:
(c) LIP with respect to CCI Roughness
RCCI ðXjLIP ¼ yesÞ ¼ 0:2000;
MMRoughMP ðLIPÞ ¼ 1 RoughMP ðLIPÞ ¼ 1 0:15 ¼ 0:85: (b) LIP with respect to WP Mean roughness
RCCI ðXjLIP ¼ noÞ ¼ 0: MMRoughWP ðLIPÞ ¼ 1 RoughWP ðLIPÞ ¼ 1 0 ¼ 1:
(d) LIP with respect to S Roughness
RS ðXjLIP ¼ yesÞ ¼ 0;
TR
RS ðXjLIP ¼ noÞ ¼ 0:
(c) LIP with respect to CCI Mean roughness
MMRoughCCI ðLIPÞ ¼ 1 RoughCCI ðLIPÞ ¼ 1 0:1 ¼ 0:9: Fourthly, obtain the mean roughness of all attributes, where TR uses formula in Eq. (9). The mean roughness of attribute LIP with respect to attributes MP, WP, CCI and S are given below. (a) PLIP with respect to M Mean roughness
RoughMP ðLIPÞ ¼ 0:1500: (a) LIP with respect to WP Mean roughness
RoughWP ðLIPÞ ¼ 0: (c) LIP with respect to CCI Mean roughness
RoughCCI ðLIPÞ ¼ 0:1000: (d) LIP with respect to S Mean roughness
RoughS ðLIPÞ ¼ 0: And, lastly, obtain the total roughness of attribute LIP with respect to the set of all attributes {MP, WP, CCI and S} using the formula in Eq. (8), i.e.
TRðai Þ ¼
0:15 þ 0 þ 0:1 þ 0 ¼ 0:0625: 4
The total roughness of all attributes in Table 3 can be summarized in Table 4. From Table 4, the value of TR of LIP, i.e. 0.0625 is lower than that of MP, i.e. 0.0875 and CCI, i.e. 0.15. Thus, the decision to select LIP as a clustering attribute is not appropriate, because the total roughness of attribute LIP is lower than that of attribute CCI. With the MMR technique, all attributes are equally treated. The procedures used to obtain the lower and upper approximations of subsets of U having different values of singleton attribute and to obtain the roughness of subsets of U having different values of sin-
(c) LIP with respect to S Mean roughness
MMRoughS ðLIPÞ ¼ 1 RoughS ðLIPÞ ¼ 1 0 ¼ 1: The MMR of all attributes in Table 3 can be summarized as in Table 5. From Table 5, the mean roughness of MP and CCI has the same minimum value, i.e. 0.75. It has to look at the next lowest minimum value, and so on until the difference value is obtained. In this case, the CCI has the minimum value, i.e. 0.8, as compared to MP, i.e. 0.9. Thus, the attribute CCI is selected as the clustering attribute. However, with the MDA technique, all attributes are evaluated. Unlike TR and MMR, MDA only needs the equivalence classes of the clustering induced by indiscernibility relation of singleton attribute as stated in step 1 in Section 4. Using the formula in Eq. (13), the value of MDA of all attributes in Table 3 can be summarized as in Table 6.
Table 5 The minimum–minimum roughness of all attributes in Table 3 using MMR technique. Attribute (with respect to)
MMR mean roughness
MP
WP 1
LIP 0.75
CCI 0.9
S 1
0.75 0.9
MMR
WP
MP 1
LIP 1
CCI 1
S 1
1
LIP
MP 0.85
WP 1
CCI 0.9
S 1
0.85
CCI
MP 0.85
WP 1
LIP 0.75
S 0.8
0.75 0.8
S
MP 1
WP 1
LIP 1
CCI 0.9
0.9
226
T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231
Table 6 The degree of dependency of all attributes in Table 3 using MDA technique. Attribute (depends on)
Degree of dependency
MDA
MP
WP 0
LIP 0.5
CCI 0.2
S 0
0.5 0.2
WP
MP 0
LIP 0
CCI 0
S 0
0
LIP
MP 0.3
WP 0
CCI 0.2
S 0
0.3
CCI
MP 0.3
WP 0
LIP 0.5
S 0.4
0.5 0.4
S
MP 0
WP 0
LIP 0
CCI 0.2
0.2
From Table 6, the attributes MP and CCI has the same maximum degree of dependency, i.e. 0.5. Based on the MDA algorithm, the next degree of attributes will be considered, until the tie is broken. In this case, the second degree corresponding to attribute CCI, i.e. 0.4 is higher than that of MP, i.e. 0.2. Therefore, attribute CCI is selected as the clustering attribute. From Figs. 2 and 3, the accuracy of selecting clustering attribute using TR, MMR and MDA techniques is the same, i.e. 0.25. But, the accuracy of BC technique, i.e. 0.15 is lower than that of TR, MMR and MDA techniques. Nevertheless, the computational complexity is better than the other techniques, because the BC technique only select the clustering attribute based on the least distinct balancedvalue of attribute, where no iteration is needed. However, MDA
180
0.3
160 140 120
0.2
Iteration
Mean of Roughness
0.25
0.15
100 80 60
0.1
40
0.05
20
Accuracy
Computational complexity
0
0
BC
TR
MMR
BC
MDA
Fig. 2. The accuracy of BC, TR, MMR and MDA techniques for Case 1.
TR
MMR
MDA
Fig. 3. The computational complexity of BC, TR, MMR and MDA techniques for Case 1.
Table 7 An information system of student’s enrollment qualification. Student
Degree
English
Experience
IT
Maths
Programming
Statistics
1 2 3 4 5 6 7 8
Ph.D Ph.D M.Sc M.Sc M.Sc M.Sc B.Sc B.Sc
Good Medium Medium Medium Medium Medium Medium Bad
Medium Medium Medium Medium Medium Medium Good Good
Good Good medium medium Medium Medium Good Good
Good Good Good Good Medium Medium Medium Medium
Good Good Good Good Medium Medium Medium Medium
Good Good Good Medium Medium Medium Medium Good
Table 8 The total roughness of all attributes of Table 7 using TR technique. Attribute (w.r.t)
TR mean roughness
Degree
English 0.005
Experience 0.333
IT 0.333
Mathematics 0
Programming 0
Statistics 0
TR 0.112
English
Degree 0.167
Experience 0
IT 0.167
Mathematics 0
Programming 0
Statistics 0.167
0.083
Experience
Degree 1
English 0.5
IT 0.25
Mathematics 0.25
Programming 0.25
Statistics 0
0.375
IT
Degree 1
English 0.125
Experience 0.125
Mathematics 0
Programming 0
Statistics 0
0.208
Mathematics
Degree 0. 333
English 0.143
Experience 0.125
IT 0
Programming 1
Statistics 0
0.267
Programming
Degree 0. 333
English 0.143
Experience 0.125
IT 0
Mathematics 1
Statistics 0
0.267
Statistics
Degree 0.125
English 0.125
Experience 0
IT 0
Mathematics 0
Programming 0
0.042
227
T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231
technique has better computational complexity than TR and MMR techniques. 5.2. Case 2: The student’s enrollment qualifications Table 7 shows an information system of student’s enrollment qualification. There are eight objects (m = 8) with seven categorical attributes (n = 7): Degree, English, Experience, IT, Mathematics, Programming and Statistics. The attributes Degree and English have three values, and other attributes have two values. Note that, attributes Mathematics and Programming in Table 7 have the same value with respect to each object (student). Thus, the accuracy of clustering of objects using either attribute Mathematics or attribute Programming provides the same result. Based on BC technique, four attributes can be considered as clustering attributes i.e. Experience, IT, Mathematics and Statistics.
The attribute of the least distinct balanced-valued will be selected as clustering attribute. Thus, the candidate attributes to be selected as a clustering attributes are IT, Mathematics and Statistics. In this case, the process to evaluate TR, MMR and MDA of each technique is the same as in Case 1. The total roughness from those techniques can be summarized as in Tables 8–10, respectively. With TR technique, since there are three candidates for clustering attribute, we must compare the value of total roughness of those attributes. From Table 8, attribute Mathematics is considered as clustering attribute because it has the highest total roughness, i.e. 0.267 as compared to IT, i.e. 0.208 and Statistics, i.e. 0.042. However, the value of total roughness of attribute Mathematics, i.e. 0.267 is lower than the value of total roughness of attribute Experience, i.e. 0.375. Thus, the decision to select Mathematics as a clustering attribute is not appropriate, because the accuracy of Mathematics is lower than that Experience.
Table 9 The minimum–minimum roughness of all attributes in Table 7 using MMR technique. Attribute (w.r.t)
MMR mean roughness
Degree
English
Experience
IT
Mathematics
Programming
Statistics
MMR
0.905
0.667
0.667
1
1
1
Degree
Experience
IT
Mathematics
Programming
Statistics
0.833
1
0.833
1
1
0.833
Degree
English
IT
Mathematics
Programming
Statistics
0
0.500
0.750
0.750
0.750
1
Degree
English
Experience
Mathematics
Programming
Statistics
0
0.875
0.875
1
1
1
Degree
English
Experience
IT
Programming
Statistics
0. 667
0.857
0.875
1
0
1
Degree
English
Experience
IT
Mathematics
Statistics
0.667
0.857
0.875
1
0
1
Degree
English
Experience
IT
Mathematics
Programming
0.875
0.875
1
1
1
1
0.667 English
0.833 Experience
0 0.500 IT
0 0.875 Mathematics
0.667 Programming
0.667 Statistics
0.875
Table 10 The degree of dependency of all attributes in Table 7 using MDA technique. Attribute (depends on)
Degree of dependency
Degree
English 0.25
Experience 0.25
IT 0.5
Mathematics 0
Programming 0
Statistics 0
MDA 0.5
English
Degree 0.5
Experience 0
IT 0.5
Mathematics 0
Programming 0
Statistics 0.5
0.5
Experience
Degree 1
English 0.25
IT 0.5
Mathematics 0.5
Programming 0.5
Statistics 0
1 0.5 0.5
IT
Degree 1
English 0.25
Experience 0.25
Mathematics 0
Programming 0
Statistics 0
1 0.25 0.25
Mathematics
Degree 0.5
English 0.25
Experience 0.25
IT 0
Programming 1
Statistics 0
1 0.5 0.25
Programming
Degree 0.5
English 0.25
Experience 0.25
IT 0
Mathematics 1
Statistics 0
1 0.5 0.25
Statistics
Degree 0.25
English 0.25
Experience 0
IT 0
Mathematics 0
Programming 0
0.25
228
T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231 0.6
250
200
0.4
Iteration
Mean of Roughness
0.5
0.3
150
100
0.2 50
0.1 Accuracy
Computational complexity 0
0 BC
TR
MMR
BC
MDA
TR
MMR
MDA
Fig. 5. The computational complexity of BC, TR, MMR and MDA techniques for Case 2.
Fig. 4. The accuracy of BC, TR, MMR and MDA techniques for Case 2.
On the other hand, with the MMR technique, all attributes (bivalued, multi-valued) are to be evaluated. The procedure to obtain the minimum–minimum roughness is the same as in case 1. Table 9 shows the MMR of all attributes of Table 7. From Table 9, the attribute Experience will be selected as clustering attribute. This is due to the fact that the second mean roughness of attribute Experience, i.e. 0.500 is less than that of the attribute IT, i.e. 0.875. Meanwhile, with the MDA technique, the first maximum degree of dependency of attributes, i.e. 1 occurs in attributes Experience,
IT, Mathematics and Programming as shown in Table 10. The second maximum degree of dependency of attributes, i.e. 0.5 occurs in attributes Experience, Mathematics and Programming and the third maximum degree of dependency of attributes, i.e. 0.5 only occurs in attribute Experience. Thus, from Table 10, attribute Experience is selected as clustering attribute. Figs. 4 and 5, illustrate the accuracy of selecting clustering attributes and computational complexity for case 2, respectively. The accuracy of selecting clustering attribute using TR, MMR and
Table 11 Animal world data set from [19]. Animal
Hair
Teeth
Eye
Feather
Feet
Eat
Milk
Fly
Swim
Tiger Cheetah Giraffe Zebra Ostrich Penguin Albatross Eagle Viper
Y Y Y Y N N N N N
Pointed Pointed Blunt Blunt N N N N Pointed
Forward Forward Side Side Side Side Side Forward Forward
N N N N Y Y Y Y N
Claw Claw Hoof Hoof Claw Web Claw Claw N
Meat Meat Grass Grass Grain Fish Grain Meat Meat
Y Y Y Y N N N N N
N N N N N N Y Y N
Y Y N N N Y Y N N
Table 12 The total roughness of all attributes in Table 11 using TR technique. Attribute (w.r.t)
TR mean roughness
TR
Hair
Teeth 0.485
Eye 0
Feather 0.222
Feet 0.286
Eat 0.380
Milk 1
Fly 0.111
Swim 0
0.310
Teeth
Hair 0
Eye 0
Feather 0.333
Feet 0.472
Eat 0.476
Milk 0
Fly 0.074
Swim 0
0.169
Eye
Hair 0
Teeth 0.380
Feather 0
Feet 0.271
Eat 1
Milk 0
Fly 0
Swim 0
0.206
Feather
Hair 0.222
Teeth 1
Eye 0.222
Feet 0.271
Eat 0.380
Milk 0.222
Fly 0.111
Swim 0
0.303
Feet
Hair 0
Teeth 0
Eye 0
Feather 0
Eat 0.5
Milk 0
Fly 0
Swim 0
0.0625
Eat
Hair 0
Teeth 0.357
Eye 0
Feather 0.5
Feet 0
Milk 0
Fly 0
Swim 0
0.138
Milk
Hair 1
Teeth 0.485
Eye 0
Feather 0.222
Feet 0.286
Eat 0.380
Fly 0.111
Swim 0
0.310
Fly
Hair 0.222
Teeth 0.277
Eye 0
Feather 0.277
Feet 0.222
Eat 0.166
Milk 0.222
Swim 0
0.173
Swim
Hair 0
Teeth 0.111
Eye 0
Feather 0
Feet 0.271
Eat 0.196
Milk 0
Fly 0
0.072
229
T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231
MDA techniques is the same, i.e. 0.5 which is higher than BC technique, i.e. 0.33. Meanwhile, the MDA technique has lower computational complexity due to less iteration required, as compared to BC, TR and MMR techniques.
values of TR, MMR and MDA of all attributes of Table 11 can be summarized as in Tables 12, 13 and 14, respectively. Since there are four candidates for clustering attribute, we must compare the value of total roughness of those attributes. From Table 12, attribute Hair is considered as clustering attribute because it has the highest total roughness, i.e. 0.310 as compared to Feather, i.e. 0.303, Fly, i.e. 0.173 and Swim, i.e. 0.072. Since, the value of total roughness of attribute Hair, i.e. 0.310 is the highest as compared to other attributes, then the decision to select Hair as the clustering attribute is the best choice. From Table 13, MMR select attribute Hair as clustering attribute. This is due to the fact that the second mean roughness of attribute Hair, i.e. 0.515 is lower than that other attributes. Meanwhile, with the MDA technique, the first maximum degree of dependency of attributes, i.e. 1 occurs in attributes Hair (Milk), Eye and Feather as Table 14 shows. The second maximum degree of dependency of attributes, i.e. 0.666 occurs in attributes Hair. Thus, from Table 14, attribute Hair is selected as clustering attribute.
5.3. Case 3. Animal data set in Hu [19] Table 11 shows an animal world data set as in [19]. There are nine animals (m = 9) with nine categorical attributes (n = 9): Hair, Teeth, Eye, Feather, Feet, Eat, Milk, Fly and Swim. The attributes Hair, Eye, Feather, Milk, Fly and Swim have two values. Attributes Teeth has three values, and other attributes have four values. Since the number of animal is 9 (odd), then we cannot make a balance clustering. With the BC, we refer to select clustering attribute with least distinct values. They are Hair, Milk, Feather, Fly and Swim. We use the same procedure as in Case 2, to select the clustering attribute between the four candidates. In this case, the process to evaluate the TR, MMR and MDA is the same as in Case 1. The Table 13 The MMR of all attributes in Table 11 using MMR technique. Attribute (w.r.t)
MMR mean roughness
MMR
Hair
Teeth 0.515
Eye 1
Feather 0.778
Feet 0.714
Eat 0.620
Milk 0
Fly 0.889
Swim 1
0 0.515
Teeth
Hair 1
Eye 1
Feather 0.667
Feet 0.528
Eat 0.524
Milk 1
Fly 0.926
Swim 1
0.524
Eye
Hair 1
Teeth 0.620
Feather 1
Feet 0.729
Eat 0
Milk 1
Fly 1
Swim 1
0 0.620
Feather
Hair 0.778
Teeth 0
Eye 0.778
Feet 0.729
Eat 0.620
Milk 0.778
Fly 0.889
Swim 1
0 0.620
Feet
Hair 1
Teeth 1
Eye 1
Feather 1
Eat 0.5
Milk 1
Fly 1
Swim 1
0.5
Eat
Hair 1
Teeth 0.643
Eye 1
Feather 0.5
Feet 1
Milk 1
Fly 1
Swim 1
0.5
Milk
Hair 0
Teeth 0.515
Eye 1
Feather 0.778
Feet 0.714
Eat 0.620
Fly 0.889
Swim 1
0 0.515
Fly
Hair 0.778
Teeth 0.723
Eye 1
Feather 0.723
Feet 0.778
Eat 0.834
Milk 0.778
Swim 1
0.723
Swim
Hair 1
Teeth 0.889
Eye 1
Feather 1
Feet 0.729
Eat 0.804
Milk 1
Fly 1
0.729
Table 14 The degree of dependency of all attributes in Table 11 using MDA technique. Attribute (depends on)
Degree of dependency
MDA
Hair
Teeth 0.666
Eye 0
Feather 0.444
Feet 0.444
Eat 0.555
Milk 1
Fly 0.222
Swim 0
1 0.666
Teeth
Hair 0
Eye 0
Feather 0.444
Feet 0.444
Eat 0.555
Milk 0
Fly 0.222
Swim 0
0.555
Eye
Hair 0
Teeth 0.555
Feather 0
Feet 0.444
Eat 1
Milk 0
Fly 0
Swim 0
1 0.555
Feather
Hair 0.444
Teeth 1
Eye 0
Feet 0.444
Eat 0.555
Milk 0.444
Fly 0.222
Swim 0
1 0.555
Feet
Hair 0
Teeth 0.222
Eye 0
Feather 0
Eat 0.333
Milk 0
Fly 0
Swim 0
0.333
Eat
Hair 0
Teeth 0.555
Eye 0.444
Feather 0
Feet 0.333
Milk 0
Fly 0
Swim 0
0.555
Milk
Hair 1
Teeth 0.666
Eye 0
Feather 0.444
Feet 0.444
Eat 0.555
Fly 0.222
Swim 0
1 0.666
Fly
Hair 0.444
Teeth 0.555
Eye 0
Feather 0.555
Feet 0.444
Eat 0.333
Milk 0.444
Swim 0
0.555
Swim
Hair 0
Teeth 0.222
Eye 0
Feather 0
Feet 0.444
Eat 0.333
Milk 0
Fly 0
0.444
230
T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231
0.6
Table 16 The total roughness of all attributes in Table 15 using TR technique.
Mean of Roughness
0.5
Attribute (w.r.t)
TR mean roughness
a1
a2 0.389
a3 0.477
a4 0
a5 0.096
a6 0
0.1920
a2
a1 0.250
a3 0.108
a4 0
a5 0.072
a6 0.250
0.1357
a3
a1 0.477
a2 0.156
a4 0
a5 0.193
a6 0
0.1231
a4
a1 0
a2 0.334
a3 0
a5 0.237
a6 0
0.1139
a5
a1 0
a2 0.118
a3 0
a4 0
a6 0.150
0.0336
a6
a1 0
a2 0.375
a3 0
a4 0
a5 0.167
0.1083
0.4 0.3 0.2 0.1
Accuracy
0 BC
TR
MMR
MDA
TR
Fig. 6. The accuracy of BC, TR, MMR and MDA techniques for Case 3.
Table 17 The minimum–minimum roughness of all attributes in Table 15 using MMR technique.
800 700
Attribute (w.r.t)
MMR mean roughness
a1
a2 0.611
a3 0.523
a4 1
a5 0.904
a6 1
0.523 0.611
400
a2
a1 0.750
a3 0.892
a4 1
a5 0.928
a6 0.750
0.750
300
a3
a1 0.523
a2 0.944
a4 1
a5 0.907
a6 1
0.523 0.907
a4
a1 1
a2 0.666
a3 1
a5 0.763
a6 1
0.666
a5
a1 1
a2 0.882
a3 1
a4 1
a6 0.950
0.882
a6
a1 1
a2 0.625
a3 1
a4 1
a5 0.933
0.625
600
Iteration
500
200 100
Computational complexity
0 BC
TR
MMR
MDA
Fig. 7. The computational complexity of BC, TR, MMR and MDA techniques for Case 3.
Table 15 An information system from Parmar et al. [10].
MMR
Table 18 The degree of dependency of all attributes in Table 15 using MDA technique.
#
a1
a2
a3
a4
a5
a6
Attribute (depends on)
Degree of dependency
1 2 3 4 5 6 7 8 9 10
Big Medium Small Medium Small Big Small Small Big Medium
Blue Red Yellow Blue Yellow Green Yellow Yellow Green Green
Hard Moderate Soft Moderate Soft Hard Hard Soft Hard Moderate
Indefinite Smooth Fuzzy Fuzzy Indefinite Smooth Indefinite Indefinite Smooth Smooth
Plastic Wood Plush Plastic Plastic Wood Metal Plastic Wood Plastic
Negative Neutral Positive Negative Neutral Positive Positive Positive Neutral Neutral
a1
a2 0.5
a3 0.6
a4 0
a5 0.2
a6 0
0.6 0.5
a2
a1 0
a3 0.3
a4 0
a5 0.2
a6 0.2
0.3
a3
a1 0.6
a2 0.1
a4 0
a5 0.2
a6 0
0.6 0.2
a4
a1 0
a2 0.4
a3 0
a5 0.5
a6 0
0.5
a5
a1 0
a2 0.3
a3 0
a4 0
a6 0.2
0.3
a6
a1 0
a2 0.3
a3 0
a4 0
a5 0
0.3
Figs. 6 and 7, illustrate the accuracy of selecting clustering attributes and computational complexity for case 3, respectively. The accuracy of selecting clustering attribute of BC, TR, MMR and MDA techniques is the same, i.e. 0.515. Nevertheless, the MDA technique has lower computational complexity due to less iteration required, as compared to BC, TR and MMR techniques. 5.4. Case 4: Dataset in Parmar et al. [10] In Table 15, there are ten objects (m = 10) with six categorical attributes (n = 6):a1, a2, a3, a4, a5 and a6. Each attribute has more than two values (l > 2). In this case, we cannot use BC techniques for selecting clustering attribute. Thus, we employ TR technique
MDA
to all attributes. The process to evaluate the accuracy of each technique is the same as in the previous cases. The values of TR, MMR and MDA can be summarized as in Table 16–18, respectively. Since there is no bi-value attributes, then BC technique is not applicable. The total roughness of all attributes can be summarized as in Table 16. From Table 16, attribute a1 has higher accuracy as compared to ai, i = 2, 3, 4, 5 and 6. Thus, attribute a1 is considered as the clustering attribute. From Table 17, two attributes are of equal MMR (a1 and a3, i.e. 0.523). But, the second value corresponding to attribute a1, i.e.
T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231
0.5
ing clustering attribute called maximum of dependency attributes (MDA). The proposed technique is based on rough set theory using the dependency of attributes in information systems. The analysis of the MDA was presented in terms of accuracy and computational complexity. Four test cases were selected. It showed that, the MDA technique provides a convenient approach to high accuracy with low computational complexity as compared to the three existing techniques. The proposed approach could also be applied in clustering data in large databases.
0.45
Mean of Roughness
0.4 0.35 0.3 0.25 0.2 0.15 0.1
References Accuracy
0.05 0 TR
MMR
MDA
Fig. 8. The accuracy of TR, MMR and MDA techniques for Case 4.
300 250
Iteration
200 150 100 50
231
Computational complexity
0 TR
MMR
MDA
Fig. 9. The computational complexity of TR, MMR and MDA techniques for Case 4.
0.611 is lower than that of a3, i.e. 0.907. Therefore, attribute a1 is selected as the clustering attribute. Table 18 shows the degree of dependency of all attributes using MDA technique. From Table 18, two attributes are of equal accuracy (a1 and a3, i.e. 0.6). But, the second value corresponding to attribute a1, i.e. 0.5 is higher than that of a3, i.e. 0.2. Therefore, attribute a1 is selected as the clustering attribute. Figs. 8 and 9, illustrate the accuracy of selecting clustering attributes and computational complexity for case 4, respectively. In this case, the accuracy of selecting clustering attribute of TR, MMR and MDA techniques is the same, i.e. 0.476. However, the MDA technique has lower computational complexity due less iteration required, as compared to other techniques. 6. Conclusions A few techniques designed to cluster categorical data to handle uncertainty in the data set. The most popular techniques are biclustering, total roughness and min–min roughness. However, the performance of these techniques is an issue due to low accuracy and high computational complexity. Thus, there is a need for a clustering technique that can handle the uncertainty in the clustering process with high accuracy and lower computational complexity. In this paper, we proposed a new technique for select-
[1] S. Wu, A. Liew, H. Yan, M. Yang, Cluster analysis of gene expression data based on self splitting and merging competitive learning, IEEE Transactions on Information Technology in Bio Medicine 8 (1) (2004) 5–15. [2] K. Wong, D. Feng, S. Meikle, M. Fulham, Segmentation of dynamic pet images using cluster analysis, IEEE Transactions on Nuclear Science 49 (1) (2002) 200– 207. [3] S. Haimov, M. Michalev, A. Savchenko, O. Yordanov, Clustering of radar signatures by autoregressive model fitting and cluster analysis, IEEE Transactions on Geo Science and Remote Sensing 8 (1) (1989) 606–610. [4] Z. Huang, Extensions to the k-means algorithm for clustering large data sets with categorical values, Data Mining and Knowledge Discovery 2 (3) (1998) 283–304. [5] D. Gibson, J. Kleinberg, P. Raghavan, Clustering categorical data: an approach based on dynamical systems, The Very Large Data Bases Journal 8 (3–4) (2000) 222–236. [6] S. Guha, R. Rastogi, K. Shim, ROCK: a robust clustering algorithm for categorical attributes, Information Systems 25 (5) (2000) 345–366. [7] V. Ganti, J. Gehrke, R. Ramakrishnan, CACTUS – clustering categorical data using summaries, in: Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999, pp. 73–83. [8] A. Dempster, N. Laird, D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society 39 (1) (1977) 1–38. [9] D. Kim, K. Lee, D. Lee, Fuzzy clustering of categorical data using fuzzy centroids, Pattern Recognition Letters 25 (11) (2004) 1263–1271. [10] D. Parmar, T. Wu, J. Blackhurst, MMR: an algorithm for clustering categorical data using rough set theory, Data and Knowledge Engineering 63 (2007) 879– 893. [11] L.J. Mazlack, A. He, Y. Zhu, S. Coppock, A rough set approach in choosing clustering attributes, in: Proceedings of the ISCA 13th, International Conference (CAINE-2000), 2000, pp. 1–6. [12] Z. Pawlak, Rough sets, International Journal of Computer and Information Science 11 (1982) 341–356. [13] Z. Pawlak, Rough Sets: A Theoretical Aspect of Reasoning about Data, Kluwer Academic Publisher, 1991. [14] Z. Pawlak, A. Skowron, Rudiments of rough sets, Information Sciences: An International Journal 177 (1) (2007) 3–27. [15] Y.Y. Yao, Two views of the theory of rough sets in finite universes, Approximate Reasoning: An International Journal 15 (4) (1996) 191–317. [16] Y.Y. Yao, Constructive and algebraic methods of the theory of rough sets, Information Science: An International Journal 109 (1–4) (1998) 21–47. [17] Y.Y. Yao, Information granulation and rough set approximation, International Journal of Intelligent Systems 16 (1) (2001) 87–104. [18] R.J. Roiger, M.W. Geatz, Data Mining: A Tutorial-Based Primer, Addison Wesley, 2003. [19] X. Hu, Knowledge discovery in databases: an attribute oriented rough set approach. Ph.D. Thesis, University of Regina, 1995. [20] Z. Pawlak, Rough classification, International Journal of Human Computer Studies 51 (1983) 369–383. [21] Z. Pawlak, Rough set approach to knowledge-decision support, European Journal of Operational Research 99 (1997) 48–57. [22] J. Komorowski, L. Polkowski, A. Skowron, Rough sets: a tutorial, in: S.K. Pal, A. Skowron (Eds.), Rough-Fuzzy Hybridization, Springer-Verlag, Berlin, Heidelberg, 1999, pp. 3–98. [23] I. Düntsch, G. Gediga, Rough approximation quality revisited, Artificial Intelligence 132 (2001) 219–234. [24] Z. Pawlak, Rough sets, decision algorithms and Bayes’ theorem, European Journal of Operational Research 136 (2002) 181–189. [25] Z. Pawlak, Rough sets and intelligent data analysis, Information Sciences: An International Journal 147 (2002) 1–12. [26] J.F. Peters, A. Skowron, Zdzislaw Pawlak: life and work (1926–2006), in: J.F. Peters, A. Skowron (Eds.), Transaction on Rough Set V, LNCS 4100, SpringerVerlag, Berlin, Heidelberg, 2006, pp. 1–24. [27] L.A. Zadeh, Fuzzy set, Information and Control 8 (1965) 338–353.