Mining changes in association rules: a fuzzy approach

Mining changes in association rules: a fuzzy approach

Fuzzy Sets and Systems 149 (2005) 87 – 104 www.elsevier.com/locate/fss Mining changes in association rules: a fuzzy approach Wai-Ho Au∗ , Keith C.C. ...

294KB Sizes 12 Downloads 120 Views

Fuzzy Sets and Systems 149 (2005) 87 – 104 www.elsevier.com/locate/fss

Mining changes in association rules: a fuzzy approach Wai-Ho Au∗ , Keith C.C. Chan Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Received 3 April 2003; received in revised form 31 December 2003; accepted 31 December 2003 Available online 18 August 2004

Abstract Association rule mining is concerned with the discovery of interesting association relationships hidden in databases. Existing algorithms typically assume that data characteristics are stable over time. Their main focus is therefore to mine association rules in an efficient manner. However, the world constantly changes. This makes the characteristics of real-life entities represented by the data and hence the associations hidden in the data change over time. Detecting and adapting to the changes are usually critical to the success of many business organizations. This paper presents the problem of mining changes in association rules. Given a set of database partitions, each of which contains a set of transactions collected in a specific time period, a set of association rules is discovered in each database partition. We propose to perform data mining in the discovered association rules so as to reveal the regularities governing how the rules change in different time periods. Since the nature of many real-life entities is rather fuzzy, we propose to use linguistic variables and linguistic terms to represent the changes in the discovered association rules. In particular, fuzzy decision trees are built to discover the changes in the discovered association rules. The fuzzy decision trees are then converted to a set of fuzzy rules, called fuzzy meta-rules because they are rules about rules. By doing so, the changes hidden in the data can be revealed and presented to human users in a comprehensible form. Furthermore, the discovered changes can also be used to predict any change in the future. To evaluate the performance of our approach, we make use of a set of synthetic datasets, which are database partitions collected in different time periods. A set of association rules is discovered in each dataset. Fuzzy decision trees are constructed in the discovered association rules in order to reveal the changes in these rules. The experimental results show that our approach is very promising. © 2004 Elsevier B.V. All rights reserved. Keywords: Data mining; Change mining; Association rules; Fuzzy decision trees; Evolving data; Trends; Approximate reasoning

∗ Corresponding author. Tel.: +85-227-66-4081; fax: +85-2277-40842.

E-mail address: [email protected] (W.-H. Au). 0165-0114/$ - see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.fss.2004.07.018

88

W.-H. Au, K.C.C. Chan / Fuzzy Sets and Systems 149 (2005) 87 – 104

1. Introduction The ability to detect and adapt to changes is critical to the success of many individuals and business organizations. It allows decision makers to take the changes into consideration and even take advantage of the changes when they make decisions. Knowing the changes in advance not only allows a business organization to provide new products and services to satisfy the changing needs of its customers but also allows it to design corrective actions to stop or delay undesirable changes. Existing data-mining techniques (e.g., [1,3,4,6,8–11,15,17,19,20,22,24,25,27]) aim at producing accurate models of the real world in an efficient manner. These models are very useful for human users to better understand the problem domains and for prediction. However, regardless of how accurate a model can predict, it can only predict based on the historical data and it cannot lead to any change because it will no longer be valid otherwise. In this paper, we study the problem of mining changes in the context of association rules. The problem of mining association rules is defined as follows [1]. Let I = {i1 , . . . , im } be a set of literals, called items. Let D be a set of transactions where any d ∈ D is a set of items such that d ⊆ I . A transaction, d ∈ D, contains a set of items or an itemset, X ⊆ I , if X ⊆ d. An association rule is an implication of the form X ⇒ Y , where X ⊆ I , Y ⊆ I , and X ∩ Y = ∅. The rule X ⇒ Y holds in D with support, support(X ⇒ Y ) = P r(X ∪ Y ), and confidence, confidence(X ⇒ Y ) = P r(Y |X). An association rule is interesting if its support and confidence are greater than or equal to the user-specified minimum support and minimum confidence, respectively. The mining of association rules is concerned with finding all the interesting rules in a database. It has been studied extensively by many researchers and many algorithms have been proposed in data-mining literature (e.g., [3]). This problem definition is not developed for handling any changes in the underlying patterns. Data mining without taking the changes into consideration can result in severe degradation of performance, especially when the discovered association rules are used for classification (e.g., [22]). Since it is common that we need to predict the future based on the historical data in the past, the mining of changes in association rules is an important problem. In our previous work [5], we have proposed to mine changes in association rules. More formally, this paper presents the problem definition. Furthermore, we also generalize the work so that different fuzzy data-mining techniques can be used for tackling this problem. Given an association rule associated with a sequence of supports and a sequence of confidences in different time periods, we propose to use linguistic variables and linguistic terms to represent the changes in its supports and confidences. Furthermore, we propose to build a fuzzy decision tree to discover the regularities governing how the association rule changes over time. The fuzzy decision tree can then be converted to a set of fuzzy rules. These fuzzy rules are called fuzzy meta-rules because they are rules about rules. The fuzzy meta-rules can be used for human users to examine and for predicting how the association rule will change in the future. The rest of this paper is organized as follows. In Section 2, we present the formal definition of mining changes in association rules over time. In Section 3, we describe an approach for representing changes in the supports and confidences of association rules using linguistic variables and linguistic terms. We also discuss how to apply an algorithm for building fuzzy decision trees to mine fuzzy meta-rules in this same section. To evaluate the performance of our approach, we applied it to a set of synthetic datasets. The experimental results are provided in Section 4. In Section 5, we discuss the related work in the literature. Finally, we conclude this paper with a summary in Section 6.

W.-H. Au, K.C.C. Chan / Fuzzy Sets and Systems 149 (2005) 87 – 104

89

2. The problem definition Let us suppose that a set of transactions, Dj , is collected in time period tj , j = 1, . . . , n, where t1 , . . . , tn are consecutive and tj happens before tk if j < k. A set of association rules, Rj = {rj1 , . . . , rj sj }, is discovered in Dj , j = 1, . . . , n. Let the antecedent and the consequent of an association rule, X ⇒ Y , be denoted as antecedent (X ⇒ Y ) = X and consequent (X ⇒ Y ) = Y , respectively. The rule X ⇒ Y holds in Dj with support, support j (X ⇒ Y ) =

|{dj ∈ Dj |X ∪ Y ⊆ dj }| |Dj |

(1)

and confidence, confidencej (X ⇒ Y ) =

|{dj ∈ Dj |X ∪ Y ⊆ dj }| , |{dj ∈ Dj |X ⊆ dj }|

(2)

where |S| denotes the cardinality of set S. Rules rj u ∈ Rj and rkv ∈ Rk , j, k ∈ {1, . . . , n}, j < k, represent the same association relationship if antecedent (rj u ) = antecedent (rkv ) and consequent (rj u ) = consequent (rkv ). Definition 1. Given an association rule, rj u ∈ Rj , if there exists another association rule, rkv ∈ Rk , j < k, such that antecedent (rj u ) = antecedent (rkv ) and consequent (rj u ) = consequent (rkv ), rj u is equivalent to rkv , denoted rj u ≡ rkv , because they represent the same association relationship. It is important to note that although rj u ≡ rkv , support j (rj u ) and confidencej (rj u ) may be different from support k (rj u ) and confidencek (rj u ), respectively, because the rule may change. Definition 2. Given two association rules, rj u ∈ Rj and rkv ∈ Rk , j < k, such that rj u ≡ rkv , rj u changes during the period from tj to tk if support j (rj u ) = support k (rj u ) and/or confidencej (rj u ) = confidencek (rj u ). We say that rj u is a changed rule in tk . For rule rj u ∈ Rj , it is possible that its support and confidence is greater than or equal to the userspecified thresholds in tj but they become less than the thresholds in tk , j < k. In this case, no corresponding rule can be found in Rk . Definition 3. Given Rj and Rk , j < k, if rj u ∈ Rj and there does not exist rkv ∈ Rk such that rj u ≡ rkv , we say that rj u is perished in tk and rj u is a perished rule in tk . In this case, the support and confidence of rj u in tk is missing, denoted support k (rj u ) =? and confidencek (rj u ) =?. Similarly, for rule rkv ∈ Rk , it is possible that its support and confidence is less than the user-specified thresholds in tj but they become greater than or equal to the thresholds in tk , j < k. In this case, no corresponding rule can be found in Rj .

90

W.-H. Au, K.C.C. Chan / Fuzzy Sets and Systems 149 (2005) 87 – 104

Definition 4. Given Rj and Rk , j < k, if rkv ∈ Rk and there does not exist any rj u ∈ Rj such that rj u ≡ rkv , we say that rkv is added in tk and rkv is an added rule in tk . In this case, the support and confidence of rkv in tj is missing, denoted support j (rkv ) =? and confidencej (rkv ) =?. It is important to note that an added or a perished rule is a special case of a changed rule in such a way that the support and confidence of the added rule changes from below the user-specified thresholds to above the thresholds, whereas the support and/or confidence of the perished rule changes from above the thresholds to below the thresholds. By revealing how a rule changed in the past, one can predict whether it will be added or perished or to what degree it will change in the future. For each rule in R1 ∪ · · · ∪ Rn , we are interested in mining a set of fuzzy meta-rules, which represent the regularities governing how the rule changes during the period from t1 to tn . Based on the discovered fuzzy meta-rules, how the rule will change in tn+1 can be predicted. Definition 5. For r ∈ R1 ∪ · · · ∪ Rn , a fuzzy meta-rule of support is an implication of the form Sjr1 = spr 1 ∧ · · · ∧ Sjrh = spr h ⇒ Sjrq = spr q , where Sjrk is a linguistic variable representing support jk (r) = support jk +1 (r) − support jk (r), which is the difference in support of r during the period from tjk to tjk +1 , and spr k is a linguistic term in T (Sjrk ), which denotes the set of linguistic terms that the linguistic variable Sjrk can take on (i.e., the domain of Sjrk ), for k = 1, . . . , h, q and j1 < · · · < jh < jq . Definition 6. For r ∈ R1 ∪ · · · ∪ Rn , a fuzzy meta-rule of confidence is an implication of the form Cjr1 = cpr 1 ∧ · · · ∧ Cjrh = cpr h ⇒ Cjrq = cpr q , where Cjrk is a linguistic variable representing confidencejk (r) = confidencejk +1 (r) − confidencejk (r), which is the difference in confidence of r during the period from tjk to tjk +1 , and cpr k is a linguistic term in T (Cjrk ), which denotes the set of linguistic terms that the linguistic variable Cjrk can take on (i.e., the domain of Cjrk ), for k = 1, . . . , h, q and j1 < · · · < jh < jq . Through the mining of fuzzy meta-rules, one can examine the regularities governing how an association rule changes during the period from t1 to tn . Furthermore, the discovered fuzzy meta-rules can be used to predict how the association rule will change in tn+1 . The ability to predict how association rules will change allows accurate results to be achieved when the discovered association rules in the past are used for classification in the future. Example 1. Let us consider the association rules concerning with items i1 , i2 , i3 , and i4 discovered in three consecutive time periods, t1 , t2 , and t3 . Assume that an association rule discovered in time period t1 was: r : {i1 , i2 , i3 } ⇒ {i4 }, whose support and confidence in t1 are support 1 (r) = 37.8% and confidence1 (r) = 95.0%, respectively.

W.-H. Au, K.C.C. Chan / Fuzzy Sets and Systems 149 (2005) 87 – 104

91

In time period t2 , the association rule becomes: r  : {i1 , i2 , i3 } ⇒ {i4 }, whose support and confidence in t2 are support 2 (r) = 34.9% and confidence2 (r) = 94.8%, respectively. Then in time period t3 , the association rule becomes: r  : {i1 , i2 , i3 } ⇒ {i4 }, whose support and confidence in t3 are support 3 (r) = 28.4% and confidence3 (r) = 94.5%, respectively. The support of the association rule decreases in the period from t1 to t2 and in the period from t2 to t3 . A fuzzy meta-rule of support discovered in these rules would be: Change in support this period = Fairly decrease ⇒ Change in support next period = Highly decrease where Change in support this period and Change in support next period are linguistic variables, while Fairly decrease and Highly decrease are linguistic terms. This fuzzy meta-rule of support states that if the change in support this period fairly decreased, then the change in support next period would decrease significantly. Based on this fuzzy meta-rule of support, the support of the association rule in time period tj can be predicted given the support of this rule in time period tj −1 and that in time period tj −2 . On the other hand, the confidence of the association rule is more or less the same in the period from t1 to t2 and in the period from t2 to t3 . A fuzzy meta-rule of confidence discovered in these rules would be: Change in confidence this period = More or less the same ⇒ Change in confidence next period = More or less the same where Change in confidence this period and Change in confidence next period are linguistic variables and More or less the same is a linguistic term. This fuzzy meta-rule of confidence states that if the change in confidence this period is more or less the same, then the change in confidence next period would be more or less the same. Based on this fuzzy meta-rule of confidence, the confidence of the association rule in time period tj can be predicted given the confidence of this rule in time period tj −1 and that in time period tj −2 .

3. Mining fuzzy meta-rules For each rule, r ∈ R1 ∪· · ·∪Rn , we have a sequence of supports, S r = (support 1 (r), . . . , support n (r)), and a sequence of confidences, C r = (confidence1 (r), . . . , confidencen (r)). S r and C r are then converted to sequences S r = (support 1 (r), . . . , support n−1 (r)) and C r = (confidence1 (r), . . . , confidencen−1 (r)), respectively. By sliding a window of width w across S r , it is divided into a set of subsequences, S r1 , . . . , S rn−w , where S rj = (support j (r), . . . , support j +w−1 (r)). Similarly, C r is also divided into a set of subsequences, C r1 , . . . , C rn−w , where C rj = (confidencej (r), . . . , confidencej +w−1 (r)). We can then mine a set of fuzzy meta-rules of support in subsequences S r1 , . . . , S rn−w and a set of fuzzy meta-rules of confidence in subsequences C r1 , . . . , C rn−w .

92

W.-H. Au, K.C.C. Chan / Fuzzy Sets and Systems 149 (2005) 87 – 104

Example 2. Let us consider an association rule, r. Let us suppose that its supports in certain 6 consecutive periods are given by the sequence S r = (37.8%, 34.9%, 28.4%, 29.3%, 28.9%, 29.7%) and its confidences in these 6 periods are given by the sequence C r = (95.0%, 94.8%, 94.5%, 92.7%, 92.8%, 94.9%). We have S r = (−2.9%, −6.5%, 0.9%, −0.4%, 0.8%) and C r = (−0.2%, −0.3%, −1.8%, 0.1%, 2.1%). By sliding a window of width w = 3 across S r , we obtain a set of subsequences, S r1 = (−2.9%, −6.5%, 0.9%), S r2 = (−6.5%, 0.9%, −0.4%), and S r3 = (0.9%, −0.4%, 0.8%). Similarly, by sliding the window across C r , we obtain a set of subsequences, C r1 = (−0.2%, −0.3%, −1.8%), C r2 = (−0.3%, −1.8%, 0.1%), and C r3 = (−1.8%, 0.1%, 2.1%). For simplicity, we only discuss how to mine fuzzy meta-rules of support in subsequences S r1 , . . . , S rn−w in the rest of this section. It is easy to extend the description to mine fuzzy meta-rules of confidence in subsequences C r1 , . . . , C rn−w . 3.1. Linguistic variables and linguistic terms Given subsequences S r1 , . . . , S rn−w , where S rj = (support j (r), . . . , support j +w−1 (r)), j = 1, . . . , n − w, we define a set of linguistic variables, Lr = {S1r , . . . , Swr }, such that Skr represents support j +k−1 (r) in S rj for k = 1, . . . , w. The value of Skr is a linguistic term in T (Skr ) = r , . . . , s r }, where s r , p ∈ {1, . . . , h }, is a linguistic term defined by a fuzzy set, F r , that is de{sk1 k khk kp kp r so fined on [−1, 1], which is the domain of support j +k−1 (r), and whose membership function is Fkp that F r : [−1, 1] → [0, 1]. kp

(3)

r is given by  r (x). Since it may not be The degree of compatibility of x ∈ [−1, 1] with Skr = skp Fkp trivial for one to define the fuzzy sets, he/she may consider using algorithmic approaches (e.g., [13,18]) to generate the membership functions of the fuzzy sets directly from data. r ∈ T (S r ), which is characterized by a fuzzy set, F r , the degree Given S rj and a linguistic term, skp k kp r r is given by  r (support of membership of the values in S j with respect to Fkp Fkp j +k−1 (r)). The degree r ,  r r (S r ), is defined as follows: to which S rj is characterized by Skr = skp Sk =skp j r r (S ) =  r (support Skr =skp F j +k−1 (r)) j kp

(4)

r (S r ) = 1, S r is completely characterized by S r = s r . If S r =s r (S r ) = 0, S r If Skr =skp j j k kp j j k kp r . If 0 <  r r (S r ) < 1, S r is partially characterized is undoubtedly not characterized by Skr = skp Sk =skp j j r . In the case where support by Skr =skp j +k−1 (r) is missing because support j +k−1 (r) =? and/or r (S r ) = 0.5, which indicates that there is no information available consupport j +k (r) =?, Skr =skp j r . cerning whether S rj is or is not characterized by Skr = skp r , Each subsequence S rj ∈ {S r1 , . . . , S rn−w } is represented by a set of ordered triples, ojr = {(S1r , s11 r r r r r r r r r r (S )), . . . , (S , s r r r =s r (S )), . . . , (S , s r =s r S1r =s11 w whw , Sw 1 1h1 , S1 =s1h1 (S j )), . . . , (Sw , sw1 , Sw j j w1 whw r (S j ))}.

W.-H. Au, K.C.C. Chan / Fuzzy Sets and Systems 149 (2005) 87 – 104

93

Example 3. Let us consider the association rule, r, described in Example 2. We have a set of subsequences of supports, S r1 = (−2.9%, −6.5%, 0.9%), S r2 = (−6.5%, 0.9%, −0.4%), and S r3 = (0.9%, −0.4%, 0.8%). Each subsequence is then represented by three linguistic variables, S1r (which represents Change in support 1 period ago), S2r (which represents Change in support this period), and S3r (which represents Change in support next period). The value of each linguistic variable can take from 5 linguistic terms whose membership functions are defined in the following:  1 if − 1  x  − 0.1,    −1 H ighly decrease (x) = 0.05 (x + 0.05) if − 0.1  x  − 0.05,    0 otherwise,  1 (x + 0.1)    0.05 −1 F airly decrease (x) = 0.05 (x)    0

More or

if − 0.1  x  − 0.05, if − 0.05  x  0, otherwise,

 1 (x + 0.05)    0.05 −1 less the same (x) = 0.05 (x − 0.05)    0

if − 0.05  x  0, if 0  x  0.05, otherwise,

 1 (x) if 0  x  0.05,    0.05 −1 F airly increase (x) = 0.05 (x − 0.1) if 0.05  x  0.1,    0 otherwise,  1 (x − 0.05) if 0.05  x  0.1,    0.05 H ighly increase (x) = 1 if 0.1  x  1,    0 otherwise. S r1 is then represented by a set of ordered triples, o1r , where

o1r = {(S1r , Highly decrease, 0), (S1r , Fairly decrease, 0.58), (S1r , More or less the same, 0.42), (S1r , Fairly increase, 0), (S1r , Highly increase, 0), (S2r , Highly decrease, 0.3), (S2r , Fairly decrease, 0.7), (S2r , More or less the same, 0), (S2r , Fairly increase, 0), (S2r , Highly increase, 0), (S3r , Highly decrease, 0), (S3r , Fairly decrease, 0), (S3r , More or less the same, 0.82), (S3r , Fairly increase, 0.18), (S3r , Highly increase, 0)} Similarly, S r2 and S r3 are represented by o2r and o3r , respectively.

94

W.-H. Au, K.C.C. Chan / Fuzzy Sets and Systems 149 (2005) 87 – 104

3.2. Building fuzzy decision trees r To discover interesting relationships hidden in o1r , . . . , on−w , we propose to use fuzzy decision trees, which combine symbolic decision trees with approximate reasoning offered by fuzzy representation. r Specifically, we apply FID [17] to o1r , . . . , on−w in order to construct a fuzzy decision tree. FID extends ID3 [25] by using splitting criteria based on fuzzy restrictions and using different inference procedures to exploit fuzzy sets and allow numerical values to be inferred. In our approach, we use FID to build fuzzy decision trees, which can then be converted into a set of fuzzy meta-rules. Swr is used as the class attribute for FID to induce a fuzzy decision tree. FID starts to build the fuzzy r decision tree by deciding which of S1r , . . . , Sw−1 to be the root node. To do so, it first calculates the r information gain by branching on Sk , k = 1, . . . , w − 1, by

I (Swr ; Skr ) = H (Swr ) − H (Swr |Skr ),

(5)

where H (Swr ) is the entropy of Swr , which is given by H (Swr ) = H (Swr |Skr )

hw  l=1

r r −P r(Swr = swl ) log P r(Swr = swl ),

(6)

is the conditional entropy of Swr given Skr and is calculated by

H (Swr |Skr )

=

hi  i=1

P r(Skr

=

r ski )

hw  l=1

r r r r −P r(Swr = swl |Skr = ski ) log P r(Swr = swl |Skr = ski ),

(7)

r ), q ∈ {1, . . . , h }, is the probability that a subsequence is characterized by S r = s r , P r(Swr = swq w w wq which is computed by: n−w r =s r (S r ) j =1 Sw j wq r r P r(Sw = swq ) = n−w h , (8) w r r r =s (S )  S j =1 l=1 w wl j r |S r = s r ), q ∈ {1, . . . , h }, p ∈ {1, . . . , h }, is the probability that a subsequence is and P r(Swr = swq w k k kp r given that it is characterized by S r = s r , which is calculated by characterized by Swr = swq k kp n−w r =s r (S r ), S r =s r (S r )) j =1 min(Sw j j wq k kp r k P r(Swr = swq |Skr = skp ) = n−w h h . (9) k w r r r =s r (S ), S r =s r (S )) min(  S j =1 i=1 l=1 j j w wl k ki r }, which has the maximum information gain, FID then selects a linguistic variable in {S1r , . . . , Sw−1 r to be the root node. After the root node is chosen, FID assigns each of o1r , . . . , on−w a new degree of r membership to the ordered triples concerning with in Sw according to its value of the selected linguistic variable. Let us suppose that Sar , a ∈ {1, . . . , w−1}, is selected as the root node. The degree of membership r (S r ), S r =s r (S r )). This of the ordered triples in ojr , j = 1, . . . , n − w, is changed to min(Swr =swq j j a ap r in effect divides o1r , . . . , on−w into different fuzzy subsets, one for each branch from Sar . FID goes on to build a fuzzy decision tree for each subset until the information gain given by Eq. (5) is zero (or less then a threshold) or all linguistic variables have been used along the path. When a leaf contains subsequences of non-unique classes, we use the majority class in the leaf as the consequent.

W.-H. Au, K.C.C. Chan / Fuzzy Sets and Systems 149 (2005) 87 – 104

95

After FID builds the fuzzy decision tree, the tree can be used for inference. To decide the classification assigned to an unseen subsequence, we find the leaves whose conditions along the paths are satisfied by the subsequence and then combine their decisions into a single crisp value, which provides the predicted value of Swr in the subsequence. Let e be an unseen subsequence and v be a leaf of the fuzzy decision tree. Let us further suppose that there are m (  w − 1) conditions along the path to v and they are Skr1 = skr1 p1 , . . . , Skrm = skrm pm , ki ∈ {1, . . . , w − 1}, pi ∈ {1, . . . , hki }, i = 1, . . . , m. The level of satisfaction of v is determined by min(Skr =skr p (e), . . . , Skr =skr p (e)). The level of satisfaction of each m m m 1 1 1 r ∈ T (S r ) is given by the sum of the level of satisfaction of all the leaves with S r = s r fuzzy set swq wq w wq as the consequent. We use the Fuzzy-Mean method to calculate the defuzzified value as follows: hw l l  = l=1 , (10) h w

l=1 l

r and  is the crisp representation of s r , which is the centroid where l is the degree of satisfaction of swl l wl r of swl . Furthermore, the fuzzy decision tree built can be transformed to a set of fuzzy rules, called fuzzy meta-rules because they are rules about rules. Each path from the root to a leaf in the fuzzy decision tree is transformed to a fuzzy meta-rule where each condition along the path becomes a conjunct in the antecedent of the rule and the leaf becomes the consequent of the rule. The extraction of fuzzy meta-rules from the fuzzy decision tree allows human users to easily understand how the association rules change because of the affinity of linguistic variables and linguistic terms to the human knowledge representation. Realistically, FID also supports different methods to deal with non-unique classes, different defuzzification methods, and the use of weights in tree construction and inference. The interested readers are referred to [17] for the details.

4. Experimental results In our experiments, we first generated a synthetic dataset using the tool provided by [16]. The parameter setting for generating the synthetic dataset is listed in Table 1. The parameters and the method for the generation of the synthetic dataset are detailed in [3]. We used the synthetic dataset as the transactions collected in time period t1 (i.e., D1 ). Apriori [3], which is a well-known algorithm for mining association rules, was then applied to D1 . By setting minimum Table 1 Parameter setting for generating the synthetic dataset Parameter

Value

Number of transactions Average size of transactions Average size of the maximal potentially large itemsetsa Number of maximal potentially large itemsets Number of items

1000 5 2 20 100

aAn itemset is large if its support is greater than or equal to minimum support.

96

W.-H. Au, K.C.C. Chan / Fuzzy Sets and Systems 149 (2005) 87 – 104

Table 2 Association rules in R1 Association rule

Support (%)

Confidence (%)

r1 : {i99 } ⇒ {i63 } r2 : {i99 } ⇒ {i28 } r3 : {i39 } ⇒ {i69 } r4 : {i69 } ⇒ {i39 } r5 : {i39 } ⇒ {i51 } r6 : {i51 } ⇒ {i39 } r7 : {i69 } ⇒ {i51 } r8 : {i51 } ⇒ {i69 } r9 : {i63 } ⇒ {i28 } r10 : {i47 } ⇒ {i28 } r11 : {i39 , i69 } ⇒ {i51 } r12 : {i39 , i51 } ⇒ {i69 } r13 : {i51 , i69 } ⇒ {i39 }

25.1 25.1 27.1 27.1 27.1 27.1 27.4 27.4 30.3 53.4 26.2 26.2 26.2

96.4 96.4 96.0 95.3 96.0 94.6 96.4 96.0 90.7 95.6 96.6 96.6 95.5

support to 25% and minimum confidence to 90%, a set of 13 association rules was discovered and stored in R1 . The association rules in R1 are shown in Table 2. We then generated another 124 datasets, D2 , . . . , D125 , in such a way that (i) r1 , r2 , r10 , and r13 changed in the period from t1 to t125 , (ii) r9 was perished in t125 , (iii) a new rule, r15 , was added in t119 , (iv) r14 added, changed, and perished periodically during the period from t1 to t125 , and (v) all the other rules remained the same in the period from t1 to t125 . Fig. 1 shows how r1 , r2 , r9 , r10 , r13 , r14 , and r15 changed in the period from t1 to t125 . The association rules discovered in D125 and stored in R125 are given in Table 3. Each rule in R1 ∪ · · · ∪ R124 is associated with a sequence of supports and a sequence of confidences. In our experiments, we set the width of the window to 20. We divided the sequence of supports into a set of subsequences of supports and the sequence of confidences into a set of subsequences of confidences by sliding the window across them. r , to represent each subsequence of support. S r represents We defined 20 linguistic variables, S1r , . . . , S20 20 r Change in support next period, S19 represents Change in support this period, and Skr , k ∈ {1, . . . , 18}, represents Change in support 19 - k period(s) ago. The value of Skr , k = 1, . . . , 20, can take from 5 linguistic terms whose membership functions are defined as follows:  1 if − 1  x  − 0.1,    −1 H ighly decrease (x) = 0.05 (x + 0.05) if − 0.1  x  − 0.05,    0 otherwise,  1 (x + 0.1)    0.05 −1 F airly decrease (x) = 0.05 (x)    0

if − 0.1  x  − 0.05, if − 0.05  x  0, otherwise,

W.-H. Au, K.C.C. Chan / Fuzzy Sets and Systems 149 (2005) 87 – 104 100

100 90 80

70 60

support

confidence

minimum support

minimum confidence

Value

Value

90 80

50 40 30 20 10 t26

(a)

t51

t76

confidence

minimum support

minimum confidence

50 40 30

t101

t1

t26

t51

(d)

Time Period

t76

t101

Time Period 100

100

90 80

90 80 70 60

support

confidence

minimum support

minimum confidence

Value

Value

support

0 t1

50 40 30

70 60

support

confidence

minimum support

minimum confidence

50 40 30 20 10

20 10

0

0 t1

t26

(b)

t51

t76

t1

t101

100

90 80

90 80 support

confidence

minimum support

minimum confidence

50 40 30

t51

t76

t101

Time Period

100

70 60

t26

(e)

Time Period

Value

Value

70 60

20 10

0

70 60

support

confidence

minimum support

minimum confidence

50 40 30 20 10

20 10

0

0 t1

(c)

97

t26

t51

t76

t1

t101

t26

(f)

Time Period

t51

t76

t101

Time Period

Value

90 80 70 60

support

confidence

minimum support

minimum confidence

50 40 30 20 10 0 t1

(g)

t26

t51

t76

t101

Time Period

Fig. 1. The changes in r1 , r2 , r9 , r10 , r13 , r14 , and r15 in the period from t1 to t125 : (a) r1 , (b) r2 , (c) r9 , (d) r10 , (e) r13 , (f) r14 , and (g) r15 .

98

W.-H. Au, K.C.C. Chan / Fuzzy Sets and Systems 149 (2005) 87 – 104

Table 3 Association rules in R125 Association rule

Support (%)

Confidence (%)

r1 : {i99 } ⇒ {i63 } r2 : {i99 } ⇒ {i28 } r3 : {i39 } ⇒ {i69 } r4 : {i69 } ⇒ {i39 } r5 : {i39 } ⇒ {i51 } r6 : {i51 } ⇒ {i39 } r7 : {i69 } ⇒ {i51 } r8 : {i51 } ⇒ {i69 } r10 : {i47 } ⇒ {i28 } r11 : {i39 , i69 } ⇒ {i51 } r12 : {i39 , i51 } ⇒ {i69 } r13 : {i51 , i69 } ⇒ {i39 } r14 : {i63 , i99 } ⇒ {i28 } r15 : {i28 , i99 } ⇒ {i63 }

32.5 34.7 27.1 27.1 27.1 27.1 27.4 27.4 40.4 26.2 26.2 26.0 29.8 25.0

96.8 99.5 96.0 95.3 96.0 94.6 96.4 96.0 90.9 96.6 96.6 98.0 94.6 97.7

More or

 1 (x + 0.05)    0.05 −1 less the same (x) = 0.05 (x − 0.05)    0

 1 (x)    0.05 −1 F airly increase (x) = 0.05 (x − 0.1)    0  1 (x − 0.05)    0.05 H ighly increase (x) = 1    0

if − 0.05  x  0, if 0  x  0.05, otherwise,

if 0  x  0.05, if 0.05  x  0.1, otherwise, if 0.05  x  0.1, if 0.1  x  1, otherwise.

r , to represent each subsequence of confidence. Similarly, we defined 20 linguistic variables, C1r , . . . , C20 r r C20 represents Change in confidence next period, C19 represents Change in confidence this period, and Ckr , k ∈ {1, . . . , 18}, represents Change in confidence 19 - k period(s) ago. The value of Ckr , k = 1, . . . , 20,

can take from 5 linguistic terms whose membership functions are defined in the following:  1 if − 1  x  − 0.1,    −1 H ighly decrease (x) = 0.05 (x + 0.05) if − 0.1  x  − 0.05,    0 otherwise,

W.-H. Au, K.C.C. Chan / Fuzzy Sets and Systems 149 (2005) 87 – 104

 1 (x + 0.1)    0.05 −1 F airly decrease (x) = 0.05 (x)    0

More or

if − 0.1  x  − 0.05, if − 0.05  x  0, otherwise,

 1 (x + 0.05)    0.05 −1 less the same (x) = 0.05 (x − 0.05)    0

 1 (x)    0.05 −1 F airly increase (x) = 0.05 (x − 0.1)    0  1 (x − 0.05)    0.05 H ighly increase (x) = 1    0

99

if − 0.05  x  0, if 0  x  0.05, otherwise,

if 0  x  0.05, if 0.05  x  0.1, otherwise, if 0.05  x  0.1, if 0.1  x  1, otherwise.

Each subsequence was then converted to a set of ordered triples. After that, we applied FID to these ordered triples to build fuzzy decision trees. In our experiments, each subsequence has the same weight. To use FID, we set the minimum information gain level at which attribute will not be used for expansion to 0 and we decided to use (i) set-based inferences; (ii) the majority class in a leaf as the consequent when the leaf contains non-unique classes; and (iii) the fuzzy-mean method to calculate the defuzzified value in the inference. The fuzzy decision trees were converted to a set of fuzzy meta-rules. The discovered fuzzy meta-rules were then used to predict how the supports and confidences of the association rules would change in t125 . The predicted association rules were stored in Rˆ 125 (Table 4). It is important to note that r9 is perished in t119 and it was therefore not found in Rˆ 125 . Our approach is able to predict the changed rules (i.e., r1 , r2 , r10 , and r13 ), the perished rule (i.e., r9 ), the added rules (i.e., r14 and r15 ), and the other unchanged rules (i.e., r3 , r4 , r5 , r6 , r7 , r8 , and r11 ) in t125 . The difference of the actual association rules in R125 and the predicted association rules in Rˆ 125 is shown in Fig. 2. In the rest of this section, we present some of the fuzzy meta-rules discovered by our approach. Fig. 3 shows the fuzzy decision tree representing a set of fuzzy meta-rules of support for r1 . The fuzzy decision tree was converted to a set of fuzzy meta-rules of support by traversing the tree from the root to each leaf. For example, let us consider the left-most tree branch in the fuzzy decision tree. This tree branch was converted to the following fuzzy meta-rule of support: Change in support this period = Highly decrease ∧Change in support 5 periods ago = Highly decrease ⇒ Change in support next period = Fairly decrease

100

W.-H. Au, K.C.C. Chan / Fuzzy Sets and Systems 149 (2005) 87 – 104

Table 4 Association rules in Rˆ 125 Association rule

Support (%)

Confidence (%)

r1 : {i99 } ⇒ {i63 } r2 : {i99 } ⇒ {i28 } r3 : {i39 } ⇒ {i69 } r4 : {i69 } ⇒ {i39 } r5 : {i39 } ⇒ {i51 } r6 : {i51 } ⇒ {i39 } r7 : {i69 } ⇒ {i51 } r8 : {i51 } ⇒ {i69 } r10 : {i47 } ⇒ {i28 } r11 : {i39 , i69 } ⇒ {i51 } r12 : {i39 , i51 } ⇒ {i69 } r13 : {i51 , i69 } ⇒ {i39 } r14 : {i63 , i99 } ⇒ {i28 } r15 : {i28 , i99 } ⇒ {i63 }

32.3 36.0 27.1 27.1 27.1 27.1 27.4 27.4 40.5 26.2 26.2 26.2 28.7 25.0

96.8 100.0 96.0 95.3 96.0 94.6 96.4 96.0 90.9 96.6 96.6 98.7 95.3 97.2

This fuzzy meta-rule states that if the change in support this period and the change in support 5 periods ago decreased significantly, the change in support next period would fairly decrease. Fig. 4 shows the fuzzy decision tree representing fuzzy meta-rules of confidence for r15 . The fuzzy decision tree was converted to a set of fuzzy meta-rules of confidence by traversing the tree from the root to each leaf. For example, let us consider the right-most tree branch in the fuzzy decision tree. This tree branch was converted to the following fuzzy meta-rule of confidence: Change in confidence this period = Highly increase ⇒ Change in confidence next period = Highly decrease This fuzzy meta-rule states that if the change in confidence this period increased significantly, the change in confidence next period would decrease significantly. 5. Discussions In addition to fuzzy decision trees, many fuzzy techniques, including linguistic summaries [27], a context-sensitive fuzzy clustering method [15], an information-theoretic fuzzy approach [24], fuzzy isa hierarchies [20], fuzzy association rules [6,8,9], and a fuzzy classification approach [4], have been applied to data mining. However, these techniques assume that the data characteristics and the underlying associations hidden in the data are stable over time and they perform data mining in the whole database. To deal with the data collected in different time periods, the maintenance of discovered association rules (e.g., FUP [12]), active data mining [2], and the measurement of difference between two datasets [14] have been proposed in data-mining literature. Incremental updating techniques (e.g., FUP) can be used to update the discovered association rules if there are additions, deletions, or modifications of any tuples in a database after the discovery of a set of association rules. Active data mining is concerned with representing and querying the shape of the history of parameters for the discovered association rules. In

W.-H. Au, K.C.C. Chan / Fuzzy Sets and Systems 149 (2005) 87 – 104 45%

101

Actual

40%

Predicted

35%

Support

30% 25% 20% 15% 10% 5% 0% r1

r2

r3

r4

r5

(a) 100%

r6 r7 r8 r10 r11 r12 r13 r14 r15 Association Rule

Actual Predicted

98%

Confidence

96% 94% 92% 90% 88% 86% r1

r2

r3

r4

r5

(b)

r6 r7 r8 r10 r11 r12 r13 r14 r15 Association Rule

Fig. 2. Actual and predicted association rules. (a) Support of actual and predicted association rules. (b) Confidence of actual and predicted association rules.

r

as decre

e

Fa irl y

ly High

de sc r ea se

S191

r

less e or Mor same the

r

S141

S11

Fair

ly in crea se

r

S141

Fairly increase

Fairly increase

dec r Fai rly

se ea cr in

More or less the same

More or less the same

ly

H

se ea cr de ly h ig

se rea

More or less the same

inc

More or less the same

eas e

ly High gh Hi

More or less the same

Highly decrease Fa irly

rly Fai

Fairly increase

incre ase

Fairly increase

Fa se irl Hig rea ase yi hly in ec ecre nc crea ly d rea yd More or less se High irl se a F the same

Fairly decrease

High ly

Fairly increase

Highly decrease

se ase rea decre dec More or less irly Fa the same

More or less the same

Fairly increase

Fig. 3. Fuzzy decision tree representing fuzzy meta-rules of support for r1 .

More or less the same

inc

High ly in crea rea se se

Fairly decrease

Fairly decrease

102

W.-H. Au, K.C.C. Chan / Fuzzy Sets and Systems 149 (2005) 87 – 104

r

C1915

ase ease cre decr de s hly g i y l H ir Fa

Fairly increase

More or less the same

More or less the same

Fa irly

Hig hly incr inc e as rea e se

r

C715

Highly decrease

Highly decrease

Fa se ir l y i Highly rea ease r c c incre ncr e e d d y l h ase ea ly More or less r Hig i se Fa the same

Fairly increase

More or less the same

More or less the same

Fairly decrease

Highly decrease

Fig. 4. Fuzzy decision tree representing fuzzy meta-rules of confidence for r15 .

[14], a framework has been proposed to measure the difference between two datasets by building two models (one from each dataset) and measuring the amount of work required to transform one model to the other. Although these techniques can be used to track the variations in supports and confidences of association rules, none of them is developed for mining rule sets to discover and predict rule changes. Although the mining of rule changes over time is an important problem, it has received little attention. To our best knowledge, in addition to our previous work [5], this problem has only been studied in [21,23,26]. Ref. [21] is concerned with finding whether a decision tree built in a time period is applicable in other time periods. Given two datasets collected at two different time periods, this approach builds a decision tree based on one of the datasets and then builds another based on the other dataset such that the latter tree uses the same attribute and chooses the same cut point for the attribute as the former at each step of partitioning. The approach, according to [21], can be used to identify three categories of changes in the context of decision tree building: partition change, error rate change, and coverage change. Compared with [21], instead of building a decision tree in the next time instance to ensure that it resembles the first, our approach is to mine for changes in association rules discovered at different time periods. Following the idea presented in [21], an approach has been proposed in [23] to find whether a set of association rules discovered in a time period is applicable in other time periods. To do so, it employs chisquare test to determine whether there are any changes in the supports and confidences of the association rules discovered in different time periods. Unlike this approach, our approach is developed for mining rules to represent the changes and to predict any changes in the future. A framework for analyzing data-mining results, called higher-order mining, has been proposed in [26]. In this framework, a first-order rule is an association rule discovered in a dataset whereas a second-order rule is a sequence of first-order rules discovered in different datasets. Given a second-order rule, the supports or confidences of its first-order rules can be considered as a time series, which can be analyzed by time series analysis (e.g., ARIMA [7]). If the underlying datasets are collected over different time periods, it can be used to find the changes in the association rules discovered. Since the supports and/or

W.-H. Au, K.C.C. Chan / Fuzzy Sets and Systems 149 (2005) 87 – 104

103

confidences of the first-order rules of a second-order rule may fall below the user-specified thresholds in some time periods, the time series may contain missing values. However, time series analysis is not developed to deal with missing values. Consequently, it is difficult for one to use this framework for mining changes in association rules. Furthermore, the changes are represented by the statistical model constructed such that the discovered changes are embedded in the parameters of the model. Unlike this framework, our approach is to mine rules that represent the changes. 6. Conclusions In this paper, we present the definition of the problem of mining changes in association rules over time. The proposed approach allows different fuzzy data-mining techniques to be used for tackling this problem. In some time period, an association rule is (i) a changed rule if its support (confidence) in the period is different from its support (confidence) in the previous period; (ii) a perished rule if its support and/or confidence become less than the user-specified thresholds in the period; and (iii) an added rule if its support and confidence become greater than or equal to the user-specified thresholds in the period. To discover the regularities governing the changes in association rules in a comprehensible fashion, we propose to use linguistic variables and linguistic terms to represent the changes and to use fuzzy decision trees to discover the changes. The fuzzy decision trees can then be converted to fuzzy rules. These fuzzy rules are called fuzzy meta-rules because they are rules about rules. Furthermore, the discovered fuzzy meta-rules can be used to predict any change in the association rules in the future. To evaluate the performance of our approach, we make use of a set of synthetic datasets, each of which is a set of transactions collected in a specific time period. A set of association rules is discovered in each dataset. Fuzzy decision trees are then constructed in the discovered association rules to mine the changes in these rules. The experimental results show that our approach is very promising. Acknowledgements This work was supported in part by The Hong Kong Polytechnic University under Grant A-P209 and Grant G-V958. References [1] R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, in: Proc. ACM SIGMOD Internat. Conf. Management of Data, Washington DC, 1993, pp. 207–216. [2] R. Agrawal, G. Psaila, Active data mining, in: Proc. 1st Internat. Conf. on Knowledge Discovery and Data Mining, Montreal, Canada, 1995. [3] R. Agrawal, R. Srikant, Fast algorithms for mining association rules, in: Proc. 20th Internat. Conf. on Very Large Data Bases, Santiago, Chile, 1994, pp. 487–499. [4] W.-H. Au, K.C.C. Chan, Classification with degree of membership: a fuzzy approach, in: Proc. 1st IEEE Internat. Conf. on Data Mining, San Jose, CA, 2001, pp. 35–42. [5] W.-H. Au, K.C.C. Chan, Fuzzy data mining for discovering changes in association rules over time, in: Proc. 11th IEEE Internat. Conf. on Fuzzy Systems, Honolulu, HI, 2002, pp. 890–895. [6] W.-H. Au, K.C.C. Chan, Mining fuzzy association rules in a bank-account database, IEEE Trans. Fuzzy Systems 11 (2) (2003) 238–248.

104

W.-H. Au, K.C.C. Chan / Fuzzy Sets and Systems 149 (2005) 87 – 104

[7] G.E.P. Box, G.M. Jenkins, G.C. Reinsel, Time Series Analysis: Forecasting and Control, third ed., Prentice-Hall, Englewood Cliffs, NJ, 1994. [8] K.C.C. Chan, W.-H. Au, Mining fuzzy association rules, in: Proc. 6th Internat. Conf. on Information and Knowledge Management, Las Vegas, Nevada, 1997, pp. 209–215. [9] K.C.C. Chan, W.-H. Au, Mining fuzzy association rules in a database containing relational and transactional data, in: A. Kandel, M. Last, H. Bunke (Eds.), Data Mining and Computational Intelligence, Physica-Verlag, New York, NY, 2001, pp. 95–114. [10] K.C.C. Chan, A.K.C. Wong, APACS: a system for the automatic analysis and classification of conceptual patterns, Comput. Intell. 6 (3) (1990) 119–131. [11] K.C.C. Chan, A.K.C. Wong, A statistical technique for extracting classificatory knowledge from databases, in: G. PiatetskyShapiro, W.J. Frawley (Eds.), Knowledge Discovery in Databases, AAAI/MIT Press, Menlo Park, CA; Cambridge, MA, 1991, pp. 107–123. [12] D.W. Cheung, J. Han, V.T. Ng, C.Y. Wong, Maintenance of discovered association rules in large databases: an incremental updating technique, in: Proc. 12th Internat. Conf. on Data Engineering, New Orleans, Louisiana, 1996, pp. 106–114. [13] M. Fajfer, C.Z. Janikow, Bottom–up fuzzy partitioning in fuzzy decision trees, in: Proc. 19th Internat. Conf. North American Fuzzy Information Processing Society, Atlanta, GA, 2000, pp. 326–330. [14] V. Ganti, J. Gehrke, R. Ramakrishnan, W.-Y. Loh, A framework for measuring changes in data characteristics, in: Proc. 18th ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems, Philadelphia, PA, 1999, pp. 126–137. [15] K. Hirota, W. Pedrycz, Fuzzy computing for data mining, Proc. IEEE 87 (9) (1999) 1575–1600. [16] IBM Quest Data Mining Project, Quest Synthetic Data Generation Code [http://www.almaden.ibm.com/software/quest/ Resources/datasets/syndata.html], 1996. [17] C.Z. Janikow, Fuzzy decision trees: issues and methods, IEEE Trans. Systems Man Cybernetics—Part B: Cybernetics 28 (1) (1998) 1–14. [18] C.Z. Janikow, M. Fajfer, Fuzzy partitioning with FID3.1, in: Proc. 18th Internat. Conf. of the North American Fuzzy Information Processing Society, New York, NY, 1999, pp. 467–471. [19] J. Kacprzyk, S. Zadrozny, On linguistic approaches in flexible querying and mining of association rules, in: H.L. Larsen, J. Kacprzyk, S. Zadrozny, T. Andreasen, H. Christiansen (Eds.), Flexible Query Answering Systems: Recent Advances, Proc. 4th Internat. Conf. on Flexible Query Answering Systems, Physica-Verlag, Heidelberg, Germany, 2001, pp. 475–484. [20] D.H. Lee, M.H. Kim, Database summarization using fuzzy ISA hierarchies, IEEE Trans. Systems Man Cybernetics—Part B: Cybernetics 27 (4) (1997) 671–680. [21] B. Liu, W. Hsu, H.-S. Han, Y. Xia, Mining changes for real-life applications, in: Proc. 2nd Internat. Conf. on Data Warehousing and Knowledge Discovery, London Greenwich, UK, 2000. [22] B. Liu, W. Hsu, Y. Ma, Integrating classification and association rule mining, in: Proc. 4th Internat. Conf. on Knowledge Discovery and Data Mining, New York, NY, 1998, pp. 80–86. [23] B. Liu, W. Hsu, Y. Ma, Discovering the set of fundamental rule changes, in: Proc. 7th ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining, San Francisco, CA, 2001, pp. 335–340. [24] O. Maimon, A. Kandel, M. Last, Information–theoretic fuzzy approach to data reliability and data mining, Fuzzy Sets and Systems 117 (2001) 183–194. [25] J.R. Quinlan, Induction of decision trees, Machine Learning 1 (1986) 81–106. [26] M. Spiliopoulou, J.F. Roddick, Higher order mining: modelling and mining the results of knowledge discovery, in: N.F.F. Ebecken, C.A. Brebbia (Eds.), Data Mining II—Proc. 2nd Internat. Conf. on Data Mining Methods and Databases for Engineering, Finance, and Other Fields, WIT Press, Southampton, UK, 2000, pp. 309–320. [27] R.R. Yager, On linguistic summaries of data, in: G. Piatetsky-Shapiro, W.J. Frawley (Eds.), Knowledge Discovery in Databases, AAAI/MIT Press, Menlo Park, CA; Cambridge, MA, 1991, pp. 347–363.