A rough set approach for selecting clustering attribute

A rough set approach for selecting clustering attribute

Knowledge-Based Systems 23 (2010) 220–231 Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locat...

559KB Sizes 1 Downloads 86 Views

Knowledge-Based Systems 23 (2010) 220–231

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

A rough set approach for selecting clustering attribute Tutut Herawan a, Mustafa Mat Deris a,*, Jemal H. Abawajy b a b

University of Tun Hussein Onn Malaysia, Faculty of Information Technology and Multimedia, Parit Raja, 86400, Batu Pahat, Johor, Malaysia Deakin University, School of Engineering and Information Technology, Geelong, VIC, Australia

a r t i c l e

i n f o

Article history: Received 9 February 2009 Received in revised form 6 November 2009 Accepted 18 December 2009 Available online 24 December 2009 Keywords: Clustering Rough set theory Dependency of attributes Performance

a b s t r a c t A few of clustering techniques for categorical data exist to group objects having similar characteristics. Some are able to handle uncertainty in the clustering process while others have stability issues. However, the performance of these techniques is an issue due to low accuracy and high computational complexity. This paper proposes a new technique called maximum dependency attributes (MDA) for selecting clustering attribute. The proposed approach is based on rough set theory by taking into account the dependency of attributes of the database. We analyze and compare the performance of MDA technique with the bi-clustering, total roughness (TR) and min–min roughness (MMR) techniques based on four test cases. The results establish the better performance of the proposed approach. Ó 2009 Elsevier B.V. All rights reserved.

1. Introduction Cluster analysis is a data analysis tool used to group data with similar characteristics. It has been used in data mining tasks such as unsupervised classification and data summation. Cluster analysis techniques have been used in many areas such as manufacturing, medicine, nuclear science, radar scanning and research and development planning. For example, Wu et al. [1] develop a clustering algorithm specifically designed to handle the complexities of gene data that can estimate the correct number of clusters and find them. Wong et al. [2] present an approach used to segment tissues in a nuclear medical imaging method known as positron emission tomography (PET) and Haimov et al. [3] use cluster analysis to segment radar signals in scanning land and marine objects. These clustering techniques are only applicable for clustering data having numerical values for attributes. Unlike numerical data, categorical data have multi-valued attributes. Thus, similarity can be defined as common objects, common values for the attributes, and the association between the two. In such cases, a number of algorithms for clustering categorical data have been proposed including work by Huang [4], Gibson et al. [5], Guha et al. [6], Ganti et al. [7], and Dempster et al. [8]. While these methods make important contributions to the issue of clustering categorical data, they are not designed to handle uncertainty in the clustering process. This is an important issue in many real world applications where there is often no sharp boundary between clusters. Huang [4] and Kim et al. * Corresponding author. Tel.: +60 7 4538001; fax: +60 7 4532199. E-mail addresses: [email protected] (T. Herawan), [email protected] (M.M. Deris), [email protected] (J.H. Abawajy). 0950-7051/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.knosys.2009.12.003

[9] work in the area of applying fuzzy sets in clustering categorical data. However, these algorithms require multiple runs to establish the stability needed to obtain a satisfactory value for one parameter used to control the membership fuzziness [10]. Recently, there has been work in the area of applying rough set theory to handle uncertainty in the process of selecting clustering attribute. Mazlack et al. [11] proposes two techniques to select clustering attribute: i.e. bi-clustering (BC) technique based on bivalued attributes and total roughness (TR) technique. Mazlack et al. suggested that BC technique will be attempted first in order to achieve low dissonance inside the cluster. With this technique, there are three different approaches of selecting clustering attribute, i.e. arbitrary, imbalanced, and balanced. For balanced or unbalanced clustering approaches, it is likely that two kinds of problems may occur. First, it may have several candidates of biclustering attributes. Then, a decision has to be made to which one should be chosen as the clustering attribute. Second, no twovalued attribute can be found to form balance clustering. At this point, clustering on multiple-valued attributes will be considered. Therefore, for selecting clustering attribute for data set with multiple-valued attributes, Mazlack et al. proposed a technique using the average of the accuracy of approximation (accuracy of roughness) in the rough set theory [12–14] called total roughness (TR). In other words, it is based on the average of mean roughness of an attribute with respect to the set of all other attributes in an information system, where the higher the total roughness is, the higher the accuracy of selecting clustering attribute. Parmar et al. [11] proposes a new technique called min–min roughness (MMR) to improve bi-clustering technique for data set with multi-valued attributes. In this technique, bi-valued and

T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231

multi-valued attributes are equally treated and it’s accuracy of approximation is measured using the well-known Marczeweski– Steinhaus metric applied to the lower and upper approximations of a subset of the universe in an information system [15–17]. However, MMR is the complementary of TR that produces the same accuracy and complexity with TR technique. It has been shown in Section 4.1 that TR and MMR have the same result in selecting clustering attribute. With this technique, the complexity is however still an issue due to all attributes are considered to obtain the clustering attribute. Therefore, there is a need for a technique in data clustering to improve the accuracy and computational complexity. One way to select clustering attribute is to discover the dependency between attributes. In this paper, a technique called maximum dependency of attributes (MDA) is proposed. It is based on the dependency of attributes using rough set theory in an information system. Four test cases are considered to evaluate and compare the performance of MDA with BC, TR and MMR techniques: The credit card promotion dataset as in [18], the student’s enrollment qualifications dataset, animal’s dataset as in Hu [19] and the dataset as in Parmar et al. [11]. We show that the proposed technique provides better performance with that of BC, TR and MMR techniques. The rest of this paper is organized as follows. Section 2 describes Pawlak’s rough set theory in information systems. Section 3 describes the algorithm of MDA. Section 4 describes the performance comparison of MDA with TR and MMR techniques. The comparison tests of MDA with that of BC, TR and MMR techniques based on four test cases are described in Section 5. Finally, the conclusions of this work are described in Section 6.

2. Pawlak’s rough set model Rough set theory, proposed by Pawlak in 1980s as a result of a long term program on mathematical fundamental research of information systems can be seen as a new mathematical approach to vagueness (set) and uncertainty (element) [12–14]. The rough set theory is founded on the assumption that with every object of the universe of discourse we associate some information (or knowledge). For example, if objects are patients suffering from a certain disease, symptoms of the disease form information about patients. Objects characterized by the same information are indiscernible (or similar) in view of the available information about them. The indiscernibility relation generated in this way is the mathematical basis of rough set theory. Any set of all indiscernible objects is called an elementary set, and forms a basic granule (or atom) of knowledge about the universe. Any union of some elementary sets is referred to as crisp (or precise) set – otherwise the set is rough (or imprecise or vague) [12–17,20–26]. Consequently, each rough set has boundary-line cases, i.e. objects which cannot with certainty be classified either as members of the set or of its complement. Obviously crisp sets have no boundary-line elements at all. This means that boundary-line cases cannot be properly classified by employing available knowledge. Hence, rough set theory expresses vagueness not by means of membership, but by employing a boundary region of a set. The theory is different from, and complementary to, fuzzy sets [27]. The theory is a generalization of standard (crisp) set theory and has been successfully applied in many fields, for example, data mining, data analysis, medicine, expert systems, and many more [14,21–26]. The original main goal of the rough set theory is induction of approximations of concepts. The idea of rough set consists of approximation of a set by a pair of two crisp sets called the lower and upper approximations of the set [12–17,20–26]. Motivation for rough set theory has come from the need to represent subsets of a universe in terms of equivalence classes of a clustering of the universe. Rough set the-

221

ory is an approach to aid decision making in the presence of uncertainty [21]. Here, we use the concept of rough set theory in term of data containing in an information system. The notion of information system provides a convenient tool for the representation of objects in terms of their attribute values. An information system as in [14] is a 4-tuple (quadruple) S ¼ ðU; A; V; f Þ, where U is a non-empty finite set of objects, A is a S non-empty finite set of attributes, V = a2A Va, Va is the domain (value set) of attribute a, f:U  A ? V is a total function such that f(u,a) 2 Va, for every (u,a) 2 U  A, called information (knowledge) function. Definition 1 ( see [14]). Let S ¼ ðU; A; V; f Þ be an information system and let B be any subset of A. Two elements x, y 2 U is said to be B-indiscernible (indiscernible by the set of attribute B # A in S) if and only if f ðx; aÞ ¼ f ðy; aÞ, for every a 2 B. Obviously, every subset of A induces unique indiscernibility relation. Notice that, an indiscernibility relation induced by the set of attribute B, denoted by IND (B), is an equivalence relation. It is well-known that, an equivalence relation induces unique clustering. The clustering of U induced by IND(B) in S ¼ ðU; A; V; f Þ denoted by U/B and the equivalence class in the clustering U/B containing x 2 U, denoted by [x]B. The notions of lower and upper approximations of a set can be defined as follows [14]: Definition 2. Let S ¼ ðU; A; V; f Þ be an information system, let B be any subset of A and let X be any subset of U. The B-lower approximation of X, denoted by B(X) and B-upper approximations, denoted by BðXÞ of X, respectively, are defined by

BðXÞ ¼ fx 2 Uj½xB # Xg and BðXÞ ¼ fx 2 Uj½xB \ X – /g:

ð1Þ

It can easily be seen that the B-upper approximation of a subset X # U can be expressed using set complement and lower approximation of X by

BðXÞ ¼ U  Bð:XÞ;

ð2Þ

where :X denotes the complement of X relative to U. The accuracy of approximation (accuracy of roughness) of any subset X # U with respect to B # A, denoted aB(X) is measured by

aB ðXÞ ¼

jBðXÞj jBðXÞj

;

ð3Þ

where jXj denotes the cardinality of X. For empty set /, we define aB(/) = 1. Obviously, 0 6 aB(X) 6 1. If X is a union of some equivalence classes of U, then aB(X) = 1. Thus, the set X is crisp (precise) with respect to B. And, if X is not a union of some equivalence classes of U, then aB(X) < 1. Thus, the set X is rough (imprecise) with respect to B [14]. This means that the higher of accuracy of approximation of any subset X # U is the more precise (the less imprecise) of itself. The accuracy of roughness in Eq. (3) can also be interpreted using the well-known Marczeweski–Steinhaus (MZ) metric [15– 17]. Let S ¼ ðU; A; V; f Þ be an information system and given two subsets X, Y # U, the MZ metric measuring the distance X and Y is defined as

DðX; YÞ ¼

jX DYj ; jX [ Yj

where, XDY = (X [ Y)  (X \ Y) denotes the symmetric difference between two sets X and Y. Then we have

DðX; YÞ ¼

ðX [ YÞ  ðX \ YÞ jX \ Yj ¼1 : jX [ Yj jX [ Yj

ð4Þ

222

T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231

Notice that, (a) If X and Y are totally different, i.e. X \ Y = / (in other words X and Y are disjoint), then the metric reaches the maximum value of 1. (b) If X and Y are exactly the same, i.e. X = Y, then the metric reaches minimum value of 0. By applying the Marczeweski–Steinhaus metric to the lower and upper approximations of a subset X # U in information system S, we have

DðRðXÞ;

RðXÞÞ ¼ 1 

jRðXÞ \ RðXÞj jRðXÞ [ RðXÞj

;

¼1

jRðXÞj jRðXÞj

;

¼ 1  aR ðXÞ: ð5Þ

The accuracy of roughness may be viewed as an inverse of MZ metric when applied to lower and upper approximations. In other words, the distance between the lower and upper approximations determines the accuracy of the rough set approximations. Note that, the measurement in Eq. (3) not only depends on the approximation of X, but also depends on the approximation of :X. From (2) and (3), the accuracy of approximation of any subset X # U with respect to B # A, aB(X), can be defined as

aB ðXÞ ¼

jBðXÞj jBðXÞj jBðXÞj þ jBð:XÞj ¼ : ¼ U  jBð:XÞj U jBðXÞj

Definition 4. Let S = (U,A,V,f) be an information system and let D and C be any subsets of A. Dependency attribute D on C in a degree k(0 6 k 6 1), is denoted by C ) kD. The degree k is defined by

P

jUj

:

aD ðXÞ 6 aC ðXÞ; for every X # U.

Definition 3. Let S = (U,A,V,f) be an information system and let D and C be any subsets of A. D is functionally depends on C, denoted C ) D, if each value of D is associated exactly one value of C.



Proposition 1. Let S = (U,A,V,f) be an information system and let D and C be any subsets of A. If D depends totally on C, then

ð6Þ

The measurement in (6) can be generalized to find the dependency degree of attributes in information systems as illustrated in Definitions 3 and 4 below.

X2U=D jCðXÞj

the dependency of attributes in the rough set theory in information systems. The MDA algorithm consists of several main steps. The first step deals with the computation of the equivalence classes of each attribute. The equivalence classes of the set of objects U can be obtained using the indiscernibility relation of attribute ai 2 A in information system S ¼ ðU; A; V; f Þ. The second step deals with the determination of the dependency degree of each attributes. The degree of dependency attributes can be determined using formula (5). The third step deals with selecting the maximum dependency degree. In the last step, if all dependency degree is computed, then the clustering attribute will be selected based on that dependency. The maximum degree of dependency of attributes is the more accurate (higher of accuracy of approximation) for selecting clustering attribute. The justification is that the higher the degree of dependency of attributes, the more accurate for selecting clustering attribute as stated in the proposition 2. We first present the relation between the properties of roughness of a subset X # U with the dependency of two attributes as stated in Proposition 1.

ð7Þ

D is said to be fully depends (in a degree of k) on C if k = 1. Otherwise, D is partially depends on C.

Proof. Let D and C be any subsets of A in information system S = (U,A,V,f). From the hypothesis, we have IND(C) # IND(D). Furthermore, the clustering U/C is finer than that U/D, thus, it is clear that any equivalence class induced by IND(D) is a union of some equivalence class induced by IND(C). Therefore, for every x 2 X # U, we have

½xC # ½xD : And hence, for every X # U, we have

DðXÞ # CðXÞ  X  CðXÞ # DðXÞ: Consequently,

aD ðXÞ ¼

jDðXÞj

6

jDðXÞj

jCðXÞj jCðXÞj

¼ aC ðXÞ:



The generalization of Proposition 1 is given below.

Thus, D fully (partially) depends on C, if all (some) elements of the universe U can be uniquely classified to equivalence classes of the clustering U/D, employing C. Based on Definition 4, we can select the clustering attributes based on the maximum degree of k.

Proposition 2. Let S = (U,A,V,f) be an information system and let C1,C2,. . .,Cn and D be any subsets of A. If C 1 )k1 D; C 2 )k2 D; . . . ; C n )kn D, where kn 6 kn1 6    6 k2 6 k1, then

3. Maximum dependency of attributes

for every X # U.

In this section, we will present the proposed technique, which we refer to as maximum dependency of attributes (MDA). Fig. 1 shows the pseudo-code of the MDA algorithm. The algorithm uses

Proof. Let C1,C2,. . .,Cn and D be any subsets of A in information system S. From the hypothesis and follows from Proposition 1, we have

aD ðXÞ 6 aCn ðXÞ 6 aCn1 ðXÞ 6    6 aC2 ðXÞ 6 aC1 ðXÞ;

Fig. 1. The MDA algorithm.

223

T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231

aD ðXÞ 6 aC1 ðXÞ aD ðXÞ 6 aC2 ðXÞ

Table 2 The degree of dependency of attributes of Table 1.

.. .

Attribute (depends on)

Degree of dependency

A

B 0.2

C 0.2

D 1

1 0.2

B

A 0.4

C 0.2

D 1

1 0.4

C

A 0.4

B 0.2

D 0.6

0.6

D

A 0.4

B 0.2

C 0.2

0.4

aD ðXÞ 6 aCn ðXÞ Since kn 6 kn1 6    6 k2 6 k1, then

½xC n # ½xCn1 ½xC n1 # ½xC n2 .. .

MDA

½xC 2 # ½xC 1 : Obviously,

4. Performance comparison

aD ðXÞ 6 aCn ðXÞ 6 aCn1 ðXÞ 6    6 aC2 ðXÞ 6 aC1 ðXÞ:



The measurement of accuracy of MDA comparing with that TR and MMR for selecting clustering attribute will be described in Section 4.2. The example to find the degree of dependency of attributes in an information system based on formula in Eq. (7) will be illustrated as in Example 3.

We present and analyze our MDA technique and its algorithm for selecting clustering attribute. The performance comparison between the proposed technique and, total roughness (TR) and min– min roughness (MMR) techniques will be discussed in terms of the accuracy of roughness and computational complexity. 4.1. The TR and MMR techniques

Example 3. To illustrate in finding the degree of dependency of attributes, we consider the information system as shown in Table 7. We modify an information system from Example 3 in [20]. From Table 1, based on each attribute, there are four partitions of U induced by indiscernibility relation on each attribute, i.e.

U=A ¼ ff1; 2; 5g; f3; 4gg;

U=B ¼ ff1g; f2; 3; 4; 5gg;

¼ ff1; 2; 3; 4g; f5gg;

U=C

U=D ¼ ff1g; f2; 5g; f3; 4gg:

Based on formula in Eq. (7), the degree of dependency of attribute B on attribute A, denoted A ) kB, can be calculated as follows.

P A)k B;

X2U=B jAðXÞj



jUj

¼

The definition of information system is based on the notion of information system as stated in Section 2. From the definition, suppose that attribute ai 2 A has k-different values, say bk, k = 1,2,. . .,n. Let X(ai = bk), k = 1,2,. . .,n be a subset of the objects having k-different values of attribute ai. The roughness of TR technique of the set X(ai = bk), k = 1,2,. . .,n, with respect to aj, where i – j, denoted by Raj ðXjai ¼ bk Þ, as in [11] is defined by

Raj ðXjai ¼ bk Þ ¼

B)k C;



C)k D;



X2U=C jBðXÞj

jUj P X2U=D jCðXÞj jUj

jVða Pi Þj

Roughaj ðai Þ ¼

jf1gj ¼ ¼ 0:2: jf1; 2; 3; 4; 5gj ¼

jX aj ðai ¼ bk Þj

;

k ¼ 1; 2; . . . ; n:

ð8Þ

From TR technique, the mean roughness of attribute ai 2 A with respect to attribute aj 2 A, where i – j, denoted Roughaj ðai Þ, is evaluated as follows

jf3; 4gj ¼ 0:4: jf1; 2; 3; 4; 5gj

Using the same way, we obtain

P

jX aj ðai ¼ bk Þj

jf5gj ¼ 0:2: jf1; 2; 3; 4; 5gj

The degree of dependency of all attributes in Table 1 can be summarized as in Table 2. With MDA technique, if there are two values of MDA and if the highest value of an attribute is the same with other attributes, then it is recommended to look at the next highest MDA inside the attributes that are tied and so on until the tie is broken. From Table 2, the first maximum degree of dependency of attributes, i.e. 1 occurs in attributes A and B. The second maximum degree of dependency of attributes occurs in attribute B is 0.4 while in attribute A is 0.2. Thus, attribute B is selected as clustering attribute.

Table 1 A modified information system from [20].

k¼1

Raj ðXjai ¼ bk Þ jVðai Þj

where V(ai) is the set of values of attribute ai 2 A. The total roughness of attribute ai 2 A with respect to attribute aj 2 A, where i – j, denoted TR(ai), is obtained by the following formula

PjAj TRðai Þ ¼

j¼1 Roughaj ðai Þ

jAj  1

A

B

C

D

1 2 3 4 5

Low Low High High Low

Bad Good Good Good Good

Loss Loss Loss Loss Profit

Small Large Medium Medium Large

ð10Þ

:

As stated in Mazlack et al. [11] that the highest value of TR, the best selection of clustering attributes. Meanwhile, the value of roughness of MMR technique is the opposite of that TR technique which is equivalent with that has been proposed in [13], i.e.

MMRaj ðXjai ¼ bk Þ ¼ 1  Raj ðXjai ¼ bk Þ ¼ 1 

U

ð9Þ

;

jX aj ðai ¼ bk Þj jX aj ðai ¼ bk Þj

ð11Þ

It is clear that MMR technique uses MZ metric to measure the roughness of the set X(ai = bk), k = 1,2,. . .,n, with respect to aj, where i – j. Thus, the value of mean roughness of MMR technique is also the opposite of that TR technique, i.e.

224

T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231

PjVðai Þj k¼1

MMRoughaj ðai Þ ¼

PjVðai Þj ¼

Next, we present a comparison tests between BC, TR, MMR and MDA techniques.

MMRaj ðXjai ¼ bk Þ jVðai Þj

ð1  Raj ðXjai ¼ bk ÞÞ jVðai Þj jVða Pi Þj PjVðai Þj Raj ðXjai ¼ bk Þ k¼1 1  k¼1

5. Comparison tests Four test cases are considered to compare and evaluate the accuracy and the complexity of each technique: The credit card promotion dataset from Roiger and Geatz [18], the student’s enrollment qualifications dataset, animal data set from Hu [19] and the dataset from Parmar et al. [10].

k¼1

¼

jVðai Þj PjVðai Þj Raj ðXjai ¼ bk Þ jVðai Þj  k¼1 ¼ jVðai Þj jVðai Þj ¼ 1  Roughaj ðai Þ; for i – j:

ð12Þ

The MMR technique is based on the minimum value of mean roughness as in (12), without calculating total roughness. According to Parmar et al. [10], the least mean roughness is the best selection of clustering attribute. In an information system, S ¼ ðU; A; V; f Þ, an attribute ai 2 A may have several different values, say bk, k = 1,2,. . .,n. Thus, from Eqs. (6) and (8), we can generalize the roughness of the sets as follows:

Raj ðXÞ ¼

jðXaj ðai ¼ b1 ÞÞj

jðXaj ðai ¼ b2 ÞÞj

U jðXaj ðai ¼ bn ÞÞj

þ P ¼

þ

U

þ 

U U=ai jXðai

U

¼ bk Þj

;

k ¼ 1; 2; . . . ; n:

ð13Þ

The formula in Eq. (13) is a degree of dependency of attribute aj on attribute ai, where i – j. Thus, for MDA technique, it is based on Eq. (13). 4.2. The accuracy In Section 4.1, TR, MMR and MDA used different techniques in selecting clustering attribute. TR uses the total average of mean roughness, MMR uses the minimum of mean roughness and MDA uses the maximum degree of dependency to select the clustering attribute. To measure the accuracy of selecting clustering attribute, we use the formula of mean roughness in Eq. (9) to represent all techniques. The higher the mean roughness is the higher the accuracy of the selecting clustering attribute. 4.3. The computational complexity Suppose that in an information system, there are n objects, m attributes and l is the maximum of distinct values of each attribute. The TR, MMR and MDA need nm computation for determining elementary set of all attribute. For TR and MMR, the computation of calculating roughness of all subsets of U having different value of attribute ai and mean roughness of attribute ai with respect to attribute aj, where i – j is n2l. The computation of calculating TR of all attributes is n times and time to calculate MMR is 2n. Thus, the computational complexity of TR and MMR, are of O(n2l + nm + n) and O(n2l + nm + 2n), respectively. Meanwhile, MDA technique needs (n(n  1)) times to determine the dependency degree of attributes. Thus, the computational complexity for MDA technique is of the polynomial, O(n(n  1) + nm). Proposition 7. Computational complexity of MDA is lower than that of TR and MMR. Proof. Let S ¼ ðU; A; V; f Þ be an information system. Based on the computational complexities as mentioned above, the TR and MMR has the same complexity with MDA when l = 1, otherwise, MDA has lower computational complexity than that of TR and MMR. Thus, the MDA has lower computational complexity as compared with TR and MMR. h

5.1. Case 1: The credit card promotion dataset in [18] Table 3 shows the credit card promotion dataset as in [18]. There are five categorical attributes (n = 5): magazine promotion (MP), watch promotion (WP), life insurance promotion (LIP), credit card insurance (CCI) and sex (S). All attributes have two distinct values, (l = 2), i.e. yes and no and ten objects (m = 10) are considered. Notice that with the BC technique, the attribute of the least distinct balanced-valued will be selected as a clustering attribute without consideration of the maximum value of total roughness of each attributes. Thus, attribute LIP will be chosen as a clustering attribute. In the next step, we will present the procedures to find TR and MMR values. To obtain the values of TR, MMR and MDA, firstly, obtain the equivalence classes induced by indisceribility relation of singleton attributes. From Table 3, it can be illustrated as follows; (a) X(MP = yes) = {1,2,4,5,7,9,10}, X( MP = no) = {3,6,8}, U=MP ¼ ff1; 2; 4; 5; 7; 9; 10g; f3; 6; 8gg: (b) X(WP = yes) = {2,4,8,10}, X(WP = no) = {1,3,5,6,7,9}, U=WP ¼ ff2; 4; 8; 10g; f1; 3; 5; 6; 7; 9gg: (c) X(LIP = yes) = {2,4,5,7,10}, X(LIP = no) = {1,3,6,8,9}, U=LIP ¼ ff2; 4; 5; 7; 10g; f1; 3; 6; 8; 9gg: (d) X(CCI = yes) = {4,7}, Xt(CCI = no) = {1,2,3,5,6,8,9,10}, U=CCI ¼ ff4; 7g; f1; 2; 3; 5; 6; 8; 9; 10gg: (e) X(S = male) = {1,3,4,7,8,9}, X(S = female) = {2,5,6,10}, U=S ¼ ff1; 3; 4; 7; 8; 9g; f2; 5; 6; 10gg: Secondly, obtain the lower and upper approximations of subsets of U based on attribute LIP with respect to attributes MP, WP, CCI and S. It can be obtained using the formula in Eq. (1). The lower and upper approximations of subsets of U having different values of attribute LIP with respect to attributes MP, WP, CCI and S are given below. (a) LIP with respect to MP XðLIP ¼ yesÞ ¼ /;

XðLIP ¼ yesÞ ¼ f1; 2; 4; 5; 7; 9; 10g;

XðLIP ¼ noÞ ¼ f3; 6; 8g;

XðLIP ¼ noÞ ¼ f1; 2; 3; 4; 5; 6; 7; 8; 9; 10g:

Table 3 A subset of the credit card promotion dataset from Acme Credit Card Company database [18]. Person

Magazine promotion

Watch promotion

Life insurance promotion

Credit card insurance

Sex

1 2 3 4 5 6 7 8 9 10

Yes Yes No Yes Yes No Yes No Yes Yes

No Yes No Yes No No No Yes No Yes

No Yes No Yes Yes No Yes No No Yes

No No No Yes No No Yes No No No

Male Female Male Male Female Female Male Male Male Female

225

T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231

(b) LIP with respect to WP

XðLIP ¼ yesÞ ¼ /;

Table 4 The total roughness of all attributes in Table 9 using TR technique.

XðLIP ¼ yesÞ ¼ f1; 2; 3; 4; 5; 6; 7; 8; 9; 10g;

XðLIP ¼ noÞ ¼ /;

XðLIP ¼ noÞ ¼ f1; 2; 3; 4; 5; 6; 7; 8; 9; 10g:

Attribute (with respect to)

TR mean roughness

MP

WP 0

LIP 0.25

CCI 0.1

S 0

0.0875

WP

MP 0

LIP 0

CCI 0

S 0

0

LIP

MP 0.15

WP 0

CCI 0.1

S 0

0.0625

CCI

MP 0.15

WP 0

LIP 0.25

S 0.2

0.15

S

MP 0

WP 0

LIP 0

CCI 0.1

0.025

(c) LIP with respect to CCI XðLIP ¼ yesÞ ¼ f4; 7g; XðLIP ¼ noÞ ¼ /;

XðLIP ¼ yesÞ ¼ f1; 2; 3; 4; 5; 6; 7; 8; 9; 10g;

XðLIP ¼ noÞ ¼ f1; 2; 3; 5; 6; 8; 9; 10g:

(d) LIP with respect to S

XðLIP ¼ yesÞ ¼ /; XðLIP ¼ noÞ ¼ /;

XðLIP ¼ yesÞ ¼ f1; 2; 3; 4; 5; 6; 7; 8; 9; 10g; XðLIP ¼ noÞ ¼ f1; 2; 3; 4; 5; 6; 7; 8; 9; 10g:

Thirdly, obtain the roughness of all attributes, where TR uses formula in Eq. (8). The roughness of subsets of U having different values of attribute LIP with respect to attributes MP, WP, CCI and S are given below. (a) LIP with respect to MP Roughness

RMP ðXjLIP ¼ yesÞ ¼ 0;

RMP ðXjLIP ¼ NoÞ ¼ 0:3000:

(b) LIP with respect to WP Roughness

RWP ðXtjLIP ¼ yesÞ ¼ 0;

gleton attribute are the same as in TR technique. The mean roughness of MMR technique uses formula as in Eq. (12) which is different from that in TR technique. In this case we give an example to obtain MMR of attribute LIP with respect to attributes MP, WP, CCI and S. The mean roughness of attribute LIP with respect to attributes MP, WP, CCI and S are given below. (a) LIP with respect to MP Mean roughness

RWP ðXjLIP ¼ noÞ ¼ 0:

(c) LIP with respect to CCI Roughness

RCCI ðXjLIP ¼ yesÞ ¼ 0:2000;

MMRoughMP ðLIPÞ ¼ 1  RoughMP ðLIPÞ ¼ 1  0:15 ¼ 0:85: (b) LIP with respect to WP Mean roughness

RCCI ðXjLIP ¼ noÞ ¼ 0: MMRoughWP ðLIPÞ ¼ 1  RoughWP ðLIPÞ ¼ 1  0 ¼ 1:

(d) LIP with respect to S Roughness

RS ðXjLIP ¼ yesÞ ¼ 0;

TR

RS ðXjLIP ¼ noÞ ¼ 0:

(c) LIP with respect to CCI Mean roughness

MMRoughCCI ðLIPÞ ¼ 1  RoughCCI ðLIPÞ ¼ 1  0:1 ¼ 0:9: Fourthly, obtain the mean roughness of all attributes, where TR uses formula in Eq. (9). The mean roughness of attribute LIP with respect to attributes MP, WP, CCI and S are given below. (a) PLIP with respect to M Mean roughness

RoughMP ðLIPÞ ¼ 0:1500: (a) LIP with respect to WP Mean roughness

RoughWP ðLIPÞ ¼ 0: (c) LIP with respect to CCI Mean roughness

RoughCCI ðLIPÞ ¼ 0:1000: (d) LIP with respect to S Mean roughness

RoughS ðLIPÞ ¼ 0: And, lastly, obtain the total roughness of attribute LIP with respect to the set of all attributes {MP, WP, CCI and S} using the formula in Eq. (8), i.e.

TRðai Þ ¼

0:15 þ 0 þ 0:1 þ 0 ¼ 0:0625: 4

The total roughness of all attributes in Table 3 can be summarized in Table 4. From Table 4, the value of TR of LIP, i.e. 0.0625 is lower than that of MP, i.e. 0.0875 and CCI, i.e. 0.15. Thus, the decision to select LIP as a clustering attribute is not appropriate, because the total roughness of attribute LIP is lower than that of attribute CCI. With the MMR technique, all attributes are equally treated. The procedures used to obtain the lower and upper approximations of subsets of U having different values of singleton attribute and to obtain the roughness of subsets of U having different values of sin-

(c) LIP with respect to S Mean roughness

MMRoughS ðLIPÞ ¼ 1  RoughS ðLIPÞ ¼ 1  0 ¼ 1: The MMR of all attributes in Table 3 can be summarized as in Table 5. From Table 5, the mean roughness of MP and CCI has the same minimum value, i.e. 0.75. It has to look at the next lowest minimum value, and so on until the difference value is obtained. In this case, the CCI has the minimum value, i.e. 0.8, as compared to MP, i.e. 0.9. Thus, the attribute CCI is selected as the clustering attribute. However, with the MDA technique, all attributes are evaluated. Unlike TR and MMR, MDA only needs the equivalence classes of the clustering induced by indiscernibility relation of singleton attribute as stated in step 1 in Section 4. Using the formula in Eq. (13), the value of MDA of all attributes in Table 3 can be summarized as in Table 6.

Table 5 The minimum–minimum roughness of all attributes in Table 3 using MMR technique. Attribute (with respect to)

MMR mean roughness

MP

WP 1

LIP 0.75

CCI 0.9

S 1

0.75 0.9

MMR

WP

MP 1

LIP 1

CCI 1

S 1

1

LIP

MP 0.85

WP 1

CCI 0.9

S 1

0.85

CCI

MP 0.85

WP 1

LIP 0.75

S 0.8

0.75 0.8

S

MP 1

WP 1

LIP 1

CCI 0.9

0.9

226

T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231

Table 6 The degree of dependency of all attributes in Table 3 using MDA technique. Attribute (depends on)

Degree of dependency

MDA

MP

WP 0

LIP 0.5

CCI 0.2

S 0

0.5 0.2

WP

MP 0

LIP 0

CCI 0

S 0

0

LIP

MP 0.3

WP 0

CCI 0.2

S 0

0.3

CCI

MP 0.3

WP 0

LIP 0.5

S 0.4

0.5 0.4

S

MP 0

WP 0

LIP 0

CCI 0.2

0.2

From Table 6, the attributes MP and CCI has the same maximum degree of dependency, i.e. 0.5. Based on the MDA algorithm, the next degree of attributes will be considered, until the tie is broken. In this case, the second degree corresponding to attribute CCI, i.e. 0.4 is higher than that of MP, i.e. 0.2. Therefore, attribute CCI is selected as the clustering attribute. From Figs. 2 and 3, the accuracy of selecting clustering attribute using TR, MMR and MDA techniques is the same, i.e. 0.25. But, the accuracy of BC technique, i.e. 0.15 is lower than that of TR, MMR and MDA techniques. Nevertheless, the computational complexity is better than the other techniques, because the BC technique only select the clustering attribute based on the least distinct balancedvalue of attribute, where no iteration is needed. However, MDA

180

0.3

160 140 120

0.2

Iteration

Mean of Roughness

0.25

0.15

100 80 60

0.1

40

0.05

20

Accuracy

Computational complexity

0

0

BC

TR

MMR

BC

MDA

Fig. 2. The accuracy of BC, TR, MMR and MDA techniques for Case 1.

TR

MMR

MDA

Fig. 3. The computational complexity of BC, TR, MMR and MDA techniques for Case 1.

Table 7 An information system of student’s enrollment qualification. Student

Degree

English

Experience

IT

Maths

Programming

Statistics

1 2 3 4 5 6 7 8

Ph.D Ph.D M.Sc M.Sc M.Sc M.Sc B.Sc B.Sc

Good Medium Medium Medium Medium Medium Medium Bad

Medium Medium Medium Medium Medium Medium Good Good

Good Good medium medium Medium Medium Good Good

Good Good Good Good Medium Medium Medium Medium

Good Good Good Good Medium Medium Medium Medium

Good Good Good Medium Medium Medium Medium Good

Table 8 The total roughness of all attributes of Table 7 using TR technique. Attribute (w.r.t)

TR mean roughness

Degree

English 0.005

Experience 0.333

IT 0.333

Mathematics 0

Programming 0

Statistics 0

TR 0.112

English

Degree 0.167

Experience 0

IT 0.167

Mathematics 0

Programming 0

Statistics 0.167

0.083

Experience

Degree 1

English 0.5

IT 0.25

Mathematics 0.25

Programming 0.25

Statistics 0

0.375

IT

Degree 1

English 0.125

Experience 0.125

Mathematics 0

Programming 0

Statistics 0

0.208

Mathematics

Degree 0. 333

English 0.143

Experience 0.125

IT 0

Programming 1

Statistics 0

0.267

Programming

Degree 0. 333

English 0.143

Experience 0.125

IT 0

Mathematics 1

Statistics 0

0.267

Statistics

Degree 0.125

English 0.125

Experience 0

IT 0

Mathematics 0

Programming 0

0.042

227

T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231

technique has better computational complexity than TR and MMR techniques. 5.2. Case 2: The student’s enrollment qualifications Table 7 shows an information system of student’s enrollment qualification. There are eight objects (m = 8) with seven categorical attributes (n = 7): Degree, English, Experience, IT, Mathematics, Programming and Statistics. The attributes Degree and English have three values, and other attributes have two values. Note that, attributes Mathematics and Programming in Table 7 have the same value with respect to each object (student). Thus, the accuracy of clustering of objects using either attribute Mathematics or attribute Programming provides the same result. Based on BC technique, four attributes can be considered as clustering attributes i.e. Experience, IT, Mathematics and Statistics.

The attribute of the least distinct balanced-valued will be selected as clustering attribute. Thus, the candidate attributes to be selected as a clustering attributes are IT, Mathematics and Statistics. In this case, the process to evaluate TR, MMR and MDA of each technique is the same as in Case 1. The total roughness from those techniques can be summarized as in Tables 8–10, respectively. With TR technique, since there are three candidates for clustering attribute, we must compare the value of total roughness of those attributes. From Table 8, attribute Mathematics is considered as clustering attribute because it has the highest total roughness, i.e. 0.267 as compared to IT, i.e. 0.208 and Statistics, i.e. 0.042. However, the value of total roughness of attribute Mathematics, i.e. 0.267 is lower than the value of total roughness of attribute Experience, i.e. 0.375. Thus, the decision to select Mathematics as a clustering attribute is not appropriate, because the accuracy of Mathematics is lower than that Experience.

Table 9 The minimum–minimum roughness of all attributes in Table 7 using MMR technique. Attribute (w.r.t)

MMR mean roughness

Degree

English

Experience

IT

Mathematics

Programming

Statistics

MMR

0.905

0.667

0.667

1

1

1

Degree

Experience

IT

Mathematics

Programming

Statistics

0.833

1

0.833

1

1

0.833

Degree

English

IT

Mathematics

Programming

Statistics

0

0.500

0.750

0.750

0.750

1

Degree

English

Experience

Mathematics

Programming

Statistics

0

0.875

0.875

1

1

1

Degree

English

Experience

IT

Programming

Statistics

0. 667

0.857

0.875

1

0

1

Degree

English

Experience

IT

Mathematics

Statistics

0.667

0.857

0.875

1

0

1

Degree

English

Experience

IT

Mathematics

Programming

0.875

0.875

1

1

1

1

0.667 English

0.833 Experience

0 0.500 IT

0 0.875 Mathematics

0.667 Programming

0.667 Statistics

0.875

Table 10 The degree of dependency of all attributes in Table 7 using MDA technique. Attribute (depends on)

Degree of dependency

Degree

English 0.25

Experience 0.25

IT 0.5

Mathematics 0

Programming 0

Statistics 0

MDA 0.5

English

Degree 0.5

Experience 0

IT 0.5

Mathematics 0

Programming 0

Statistics 0.5

0.5

Experience

Degree 1

English 0.25

IT 0.5

Mathematics 0.5

Programming 0.5

Statistics 0

1 0.5 0.5

IT

Degree 1

English 0.25

Experience 0.25

Mathematics 0

Programming 0

Statistics 0

1 0.25 0.25

Mathematics

Degree 0.5

English 0.25

Experience 0.25

IT 0

Programming 1

Statistics 0

1 0.5 0.25

Programming

Degree 0.5

English 0.25

Experience 0.25

IT 0

Mathematics 1

Statistics 0

1 0.5 0.25

Statistics

Degree 0.25

English 0.25

Experience 0

IT 0

Mathematics 0

Programming 0

0.25

228

T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231 0.6

250

200

0.4

Iteration

Mean of Roughness

0.5

0.3

150

100

0.2 50

0.1 Accuracy

Computational complexity 0

0 BC

TR

MMR

BC

MDA

TR

MMR

MDA

Fig. 5. The computational complexity of BC, TR, MMR and MDA techniques for Case 2.

Fig. 4. The accuracy of BC, TR, MMR and MDA techniques for Case 2.

On the other hand, with the MMR technique, all attributes (bivalued, multi-valued) are to be evaluated. The procedure to obtain the minimum–minimum roughness is the same as in case 1. Table 9 shows the MMR of all attributes of Table 7. From Table 9, the attribute Experience will be selected as clustering attribute. This is due to the fact that the second mean roughness of attribute Experience, i.e. 0.500 is less than that of the attribute IT, i.e. 0.875. Meanwhile, with the MDA technique, the first maximum degree of dependency of attributes, i.e. 1 occurs in attributes Experience,

IT, Mathematics and Programming as shown in Table 10. The second maximum degree of dependency of attributes, i.e. 0.5 occurs in attributes Experience, Mathematics and Programming and the third maximum degree of dependency of attributes, i.e. 0.5 only occurs in attribute Experience. Thus, from Table 10, attribute Experience is selected as clustering attribute. Figs. 4 and 5, illustrate the accuracy of selecting clustering attributes and computational complexity for case 2, respectively. The accuracy of selecting clustering attribute using TR, MMR and

Table 11 Animal world data set from [19]. Animal

Hair

Teeth

Eye

Feather

Feet

Eat

Milk

Fly

Swim

Tiger Cheetah Giraffe Zebra Ostrich Penguin Albatross Eagle Viper

Y Y Y Y N N N N N

Pointed Pointed Blunt Blunt N N N N Pointed

Forward Forward Side Side Side Side Side Forward Forward

N N N N Y Y Y Y N

Claw Claw Hoof Hoof Claw Web Claw Claw N

Meat Meat Grass Grass Grain Fish Grain Meat Meat

Y Y Y Y N N N N N

N N N N N N Y Y N

Y Y N N N Y Y N N

Table 12 The total roughness of all attributes in Table 11 using TR technique. Attribute (w.r.t)

TR mean roughness

TR

Hair

Teeth 0.485

Eye 0

Feather 0.222

Feet 0.286

Eat 0.380

Milk 1

Fly 0.111

Swim 0

0.310

Teeth

Hair 0

Eye 0

Feather 0.333

Feet 0.472

Eat 0.476

Milk 0

Fly 0.074

Swim 0

0.169

Eye

Hair 0

Teeth 0.380

Feather 0

Feet 0.271

Eat 1

Milk 0

Fly 0

Swim 0

0.206

Feather

Hair 0.222

Teeth 1

Eye 0.222

Feet 0.271

Eat 0.380

Milk 0.222

Fly 0.111

Swim 0

0.303

Feet

Hair 0

Teeth 0

Eye 0

Feather 0

Eat 0.5

Milk 0

Fly 0

Swim 0

0.0625

Eat

Hair 0

Teeth 0.357

Eye 0

Feather 0.5

Feet 0

Milk 0

Fly 0

Swim 0

0.138

Milk

Hair 1

Teeth 0.485

Eye 0

Feather 0.222

Feet 0.286

Eat 0.380

Fly 0.111

Swim 0

0.310

Fly

Hair 0.222

Teeth 0.277

Eye 0

Feather 0.277

Feet 0.222

Eat 0.166

Milk 0.222

Swim 0

0.173

Swim

Hair 0

Teeth 0.111

Eye 0

Feather 0

Feet 0.271

Eat 0.196

Milk 0

Fly 0

0.072

229

T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231

MDA techniques is the same, i.e. 0.5 which is higher than BC technique, i.e. 0.33. Meanwhile, the MDA technique has lower computational complexity due to less iteration required, as compared to BC, TR and MMR techniques.

values of TR, MMR and MDA of all attributes of Table 11 can be summarized as in Tables 12, 13 and 14, respectively. Since there are four candidates for clustering attribute, we must compare the value of total roughness of those attributes. From Table 12, attribute Hair is considered as clustering attribute because it has the highest total roughness, i.e. 0.310 as compared to Feather, i.e. 0.303, Fly, i.e. 0.173 and Swim, i.e. 0.072. Since, the value of total roughness of attribute Hair, i.e. 0.310 is the highest as compared to other attributes, then the decision to select Hair as the clustering attribute is the best choice. From Table 13, MMR select attribute Hair as clustering attribute. This is due to the fact that the second mean roughness of attribute Hair, i.e. 0.515 is lower than that other attributes. Meanwhile, with the MDA technique, the first maximum degree of dependency of attributes, i.e. 1 occurs in attributes Hair (Milk), Eye and Feather as Table 14 shows. The second maximum degree of dependency of attributes, i.e. 0.666 occurs in attributes Hair. Thus, from Table 14, attribute Hair is selected as clustering attribute.

5.3. Case 3. Animal data set in Hu [19] Table 11 shows an animal world data set as in [19]. There are nine animals (m = 9) with nine categorical attributes (n = 9): Hair, Teeth, Eye, Feather, Feet, Eat, Milk, Fly and Swim. The attributes Hair, Eye, Feather, Milk, Fly and Swim have two values. Attributes Teeth has three values, and other attributes have four values. Since the number of animal is 9 (odd), then we cannot make a balance clustering. With the BC, we refer to select clustering attribute with least distinct values. They are Hair, Milk, Feather, Fly and Swim. We use the same procedure as in Case 2, to select the clustering attribute between the four candidates. In this case, the process to evaluate the TR, MMR and MDA is the same as in Case 1. The Table 13 The MMR of all attributes in Table 11 using MMR technique. Attribute (w.r.t)

MMR mean roughness

MMR

Hair

Teeth 0.515

Eye 1

Feather 0.778

Feet 0.714

Eat 0.620

Milk 0

Fly 0.889

Swim 1

0 0.515

Teeth

Hair 1

Eye 1

Feather 0.667

Feet 0.528

Eat 0.524

Milk 1

Fly 0.926

Swim 1

0.524

Eye

Hair 1

Teeth 0.620

Feather 1

Feet 0.729

Eat 0

Milk 1

Fly 1

Swim 1

0 0.620

Feather

Hair 0.778

Teeth 0

Eye 0.778

Feet 0.729

Eat 0.620

Milk 0.778

Fly 0.889

Swim 1

0 0.620

Feet

Hair 1

Teeth 1

Eye 1

Feather 1

Eat 0.5

Milk 1

Fly 1

Swim 1

0.5

Eat

Hair 1

Teeth 0.643

Eye 1

Feather 0.5

Feet 1

Milk 1

Fly 1

Swim 1

0.5

Milk

Hair 0

Teeth 0.515

Eye 1

Feather 0.778

Feet 0.714

Eat 0.620

Fly 0.889

Swim 1

0 0.515

Fly

Hair 0.778

Teeth 0.723

Eye 1

Feather 0.723

Feet 0.778

Eat 0.834

Milk 0.778

Swim 1

0.723

Swim

Hair 1

Teeth 0.889

Eye 1

Feather 1

Feet 0.729

Eat 0.804

Milk 1

Fly 1

0.729

Table 14 The degree of dependency of all attributes in Table 11 using MDA technique. Attribute (depends on)

Degree of dependency

MDA

Hair

Teeth 0.666

Eye 0

Feather 0.444

Feet 0.444

Eat 0.555

Milk 1

Fly 0.222

Swim 0

1 0.666

Teeth

Hair 0

Eye 0

Feather 0.444

Feet 0.444

Eat 0.555

Milk 0

Fly 0.222

Swim 0

0.555

Eye

Hair 0

Teeth 0.555

Feather 0

Feet 0.444

Eat 1

Milk 0

Fly 0

Swim 0

1 0.555

Feather

Hair 0.444

Teeth 1

Eye 0

Feet 0.444

Eat 0.555

Milk 0.444

Fly 0.222

Swim 0

1 0.555

Feet

Hair 0

Teeth 0.222

Eye 0

Feather 0

Eat 0.333

Milk 0

Fly 0

Swim 0

0.333

Eat

Hair 0

Teeth 0.555

Eye 0.444

Feather 0

Feet 0.333

Milk 0

Fly 0

Swim 0

0.555

Milk

Hair 1

Teeth 0.666

Eye 0

Feather 0.444

Feet 0.444

Eat 0.555

Fly 0.222

Swim 0

1 0.666

Fly

Hair 0.444

Teeth 0.555

Eye 0

Feather 0.555

Feet 0.444

Eat 0.333

Milk 0.444

Swim 0

0.555

Swim

Hair 0

Teeth 0.222

Eye 0

Feather 0

Feet 0.444

Eat 0.333

Milk 0

Fly 0

0.444

230

T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231

0.6

Table 16 The total roughness of all attributes in Table 15 using TR technique.

Mean of Roughness

0.5

Attribute (w.r.t)

TR mean roughness

a1

a2 0.389

a3 0.477

a4 0

a5 0.096

a6 0

0.1920

a2

a1 0.250

a3 0.108

a4 0

a5 0.072

a6 0.250

0.1357

a3

a1 0.477

a2 0.156

a4 0

a5 0.193

a6 0

0.1231

a4

a1 0

a2 0.334

a3 0

a5 0.237

a6 0

0.1139

a5

a1 0

a2 0.118

a3 0

a4 0

a6 0.150

0.0336

a6

a1 0

a2 0.375

a3 0

a4 0

a5 0.167

0.1083

0.4 0.3 0.2 0.1

Accuracy

0 BC

TR

MMR

MDA

TR

Fig. 6. The accuracy of BC, TR, MMR and MDA techniques for Case 3.

Table 17 The minimum–minimum roughness of all attributes in Table 15 using MMR technique.

800 700

Attribute (w.r.t)

MMR mean roughness

a1

a2 0.611

a3 0.523

a4 1

a5 0.904

a6 1

0.523 0.611

400

a2

a1 0.750

a3 0.892

a4 1

a5 0.928

a6 0.750

0.750

300

a3

a1 0.523

a2 0.944

a4 1

a5 0.907

a6 1

0.523 0.907

a4

a1 1

a2 0.666

a3 1

a5 0.763

a6 1

0.666

a5

a1 1

a2 0.882

a3 1

a4 1

a6 0.950

0.882

a6

a1 1

a2 0.625

a3 1

a4 1

a5 0.933

0.625

600

Iteration

500

200 100

Computational complexity

0 BC

TR

MMR

MDA

Fig. 7. The computational complexity of BC, TR, MMR and MDA techniques for Case 3.

Table 15 An information system from Parmar et al. [10].

MMR

Table 18 The degree of dependency of all attributes in Table 15 using MDA technique.

#

a1

a2

a3

a4

a5

a6

Attribute (depends on)

Degree of dependency

1 2 3 4 5 6 7 8 9 10

Big Medium Small Medium Small Big Small Small Big Medium

Blue Red Yellow Blue Yellow Green Yellow Yellow Green Green

Hard Moderate Soft Moderate Soft Hard Hard Soft Hard Moderate

Indefinite Smooth Fuzzy Fuzzy Indefinite Smooth Indefinite Indefinite Smooth Smooth

Plastic Wood Plush Plastic Plastic Wood Metal Plastic Wood Plastic

Negative Neutral Positive Negative Neutral Positive Positive Positive Neutral Neutral

a1

a2 0.5

a3 0.6

a4 0

a5 0.2

a6 0

0.6 0.5

a2

a1 0

a3 0.3

a4 0

a5 0.2

a6 0.2

0.3

a3

a1 0.6

a2 0.1

a4 0

a5 0.2

a6 0

0.6 0.2

a4

a1 0

a2 0.4

a3 0

a5 0.5

a6 0

0.5

a5

a1 0

a2 0.3

a3 0

a4 0

a6 0.2

0.3

a6

a1 0

a2 0.3

a3 0

a4 0

a5 0

0.3

Figs. 6 and 7, illustrate the accuracy of selecting clustering attributes and computational complexity for case 3, respectively. The accuracy of selecting clustering attribute of BC, TR, MMR and MDA techniques is the same, i.e. 0.515. Nevertheless, the MDA technique has lower computational complexity due to less iteration required, as compared to BC, TR and MMR techniques. 5.4. Case 4: Dataset in Parmar et al. [10] In Table 15, there are ten objects (m = 10) with six categorical attributes (n = 6):a1, a2, a3, a4, a5 and a6. Each attribute has more than two values (l > 2). In this case, we cannot use BC techniques for selecting clustering attribute. Thus, we employ TR technique

MDA

to all attributes. The process to evaluate the accuracy of each technique is the same as in the previous cases. The values of TR, MMR and MDA can be summarized as in Table 16–18, respectively. Since there is no bi-value attributes, then BC technique is not applicable. The total roughness of all attributes can be summarized as in Table 16. From Table 16, attribute a1 has higher accuracy as compared to ai, i = 2, 3, 4, 5 and 6. Thus, attribute a1 is considered as the clustering attribute. From Table 17, two attributes are of equal MMR (a1 and a3, i.e. 0.523). But, the second value corresponding to attribute a1, i.e.

T. Herawan et al. / Knowledge-Based Systems 23 (2010) 220–231

0.5

ing clustering attribute called maximum of dependency attributes (MDA). The proposed technique is based on rough set theory using the dependency of attributes in information systems. The analysis of the MDA was presented in terms of accuracy and computational complexity. Four test cases were selected. It showed that, the MDA technique provides a convenient approach to high accuracy with low computational complexity as compared to the three existing techniques. The proposed approach could also be applied in clustering data in large databases.

0.45

Mean of Roughness

0.4 0.35 0.3 0.25 0.2 0.15 0.1

References Accuracy

0.05 0 TR

MMR

MDA

Fig. 8. The accuracy of TR, MMR and MDA techniques for Case 4.

300 250

Iteration

200 150 100 50

231

Computational complexity

0 TR

MMR

MDA

Fig. 9. The computational complexity of TR, MMR and MDA techniques for Case 4.

0.611 is lower than that of a3, i.e. 0.907. Therefore, attribute a1 is selected as the clustering attribute. Table 18 shows the degree of dependency of all attributes using MDA technique. From Table 18, two attributes are of equal accuracy (a1 and a3, i.e. 0.6). But, the second value corresponding to attribute a1, i.e. 0.5 is higher than that of a3, i.e. 0.2. Therefore, attribute a1 is selected as the clustering attribute. Figs. 8 and 9, illustrate the accuracy of selecting clustering attributes and computational complexity for case 4, respectively. In this case, the accuracy of selecting clustering attribute of TR, MMR and MDA techniques is the same, i.e. 0.476. However, the MDA technique has lower computational complexity due less iteration required, as compared to other techniques. 6. Conclusions A few techniques designed to cluster categorical data to handle uncertainty in the data set. The most popular techniques are biclustering, total roughness and min–min roughness. However, the performance of these techniques is an issue due to low accuracy and high computational complexity. Thus, there is a need for a clustering technique that can handle the uncertainty in the clustering process with high accuracy and lower computational complexity. In this paper, we proposed a new technique for select-

[1] S. Wu, A. Liew, H. Yan, M. Yang, Cluster analysis of gene expression data based on self splitting and merging competitive learning, IEEE Transactions on Information Technology in Bio Medicine 8 (1) (2004) 5–15. [2] K. Wong, D. Feng, S. Meikle, M. Fulham, Segmentation of dynamic pet images using cluster analysis, IEEE Transactions on Nuclear Science 49 (1) (2002) 200– 207. [3] S. Haimov, M. Michalev, A. Savchenko, O. Yordanov, Clustering of radar signatures by autoregressive model fitting and cluster analysis, IEEE Transactions on Geo Science and Remote Sensing 8 (1) (1989) 606–610. [4] Z. Huang, Extensions to the k-means algorithm for clustering large data sets with categorical values, Data Mining and Knowledge Discovery 2 (3) (1998) 283–304. [5] D. Gibson, J. Kleinberg, P. Raghavan, Clustering categorical data: an approach based on dynamical systems, The Very Large Data Bases Journal 8 (3–4) (2000) 222–236. [6] S. Guha, R. Rastogi, K. Shim, ROCK: a robust clustering algorithm for categorical attributes, Information Systems 25 (5) (2000) 345–366. [7] V. Ganti, J. Gehrke, R. Ramakrishnan, CACTUS – clustering categorical data using summaries, in: Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999, pp. 73–83. [8] A. Dempster, N. Laird, D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society 39 (1) (1977) 1–38. [9] D. Kim, K. Lee, D. Lee, Fuzzy clustering of categorical data using fuzzy centroids, Pattern Recognition Letters 25 (11) (2004) 1263–1271. [10] D. Parmar, T. Wu, J. Blackhurst, MMR: an algorithm for clustering categorical data using rough set theory, Data and Knowledge Engineering 63 (2007) 879– 893. [11] L.J. Mazlack, A. He, Y. Zhu, S. Coppock, A rough set approach in choosing clustering attributes, in: Proceedings of the ISCA 13th, International Conference (CAINE-2000), 2000, pp. 1–6. [12] Z. Pawlak, Rough sets, International Journal of Computer and Information Science 11 (1982) 341–356. [13] Z. Pawlak, Rough Sets: A Theoretical Aspect of Reasoning about Data, Kluwer Academic Publisher, 1991. [14] Z. Pawlak, A. Skowron, Rudiments of rough sets, Information Sciences: An International Journal 177 (1) (2007) 3–27. [15] Y.Y. Yao, Two views of the theory of rough sets in finite universes, Approximate Reasoning: An International Journal 15 (4) (1996) 191–317. [16] Y.Y. Yao, Constructive and algebraic methods of the theory of rough sets, Information Science: An International Journal 109 (1–4) (1998) 21–47. [17] Y.Y. Yao, Information granulation and rough set approximation, International Journal of Intelligent Systems 16 (1) (2001) 87–104. [18] R.J. Roiger, M.W. Geatz, Data Mining: A Tutorial-Based Primer, Addison Wesley, 2003. [19] X. Hu, Knowledge discovery in databases: an attribute oriented rough set approach. Ph.D. Thesis, University of Regina, 1995. [20] Z. Pawlak, Rough classification, International Journal of Human Computer Studies 51 (1983) 369–383. [21] Z. Pawlak, Rough set approach to knowledge-decision support, European Journal of Operational Research 99 (1997) 48–57. [22] J. Komorowski, L. Polkowski, A. Skowron, Rough sets: a tutorial, in: S.K. Pal, A. Skowron (Eds.), Rough-Fuzzy Hybridization, Springer-Verlag, Berlin, Heidelberg, 1999, pp. 3–98. [23] I. Düntsch, G. Gediga, Rough approximation quality revisited, Artificial Intelligence 132 (2001) 219–234. [24] Z. Pawlak, Rough sets, decision algorithms and Bayes’ theorem, European Journal of Operational Research 136 (2002) 181–189. [25] Z. Pawlak, Rough sets and intelligent data analysis, Information Sciences: An International Journal 147 (2002) 1–12. [26] J.F. Peters, A. Skowron, Zdzislaw Pawlak: life and work (1926–2006), in: J.F. Peters, A. Skowron (Eds.), Transaction on Rough Set V, LNCS 4100, SpringerVerlag, Berlin, Heidelberg, 2006, pp. 1–24. [27] L.A. Zadeh, Fuzzy set, Information and Control 8 (1965) 338–353.