A lattice-based approach for mining high utility association rules

Information Sciences 399 (2017) 81–97 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins ...

Download PDF

1MB Sizes 25 Downloads 179 Views

Report

PDF Reader
Full Text

Information Sciences 399 (2017) 81–97

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

A lattice-based approach for mining high utility association rules Thang Mai a, Bay Vo b,c, Loan T.T. Nguyen d,e,∗ a

Institute of Research and Development, Duy Tan University, Da Nang, Vietnam Faculty of Information Technology, Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam c College of Electronics and Information Engineering, Sejong University, Seoul, Republic of Korea d Division of Knowledge and System Engineering for ICT, Ton Duc Thang University, Ho Chi Minh City, Vietnam e Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam b

a r t i c l e

i n f o

Article history: Received 25 August 2016 Revised 25 February 2017 Accepted 28 February 2017 Available online 2 March 2017 Keywords: Data mining High utility itemsets High utility association rules Lattice

a b s t r a c t Most businesses focus on the proﬁts. For example, supermarkets often analyze sale activities to investigate which products bring the most revenue, as well as ﬁnd out customer trends based on their carts. To achieve this, a number of studies have examined high utility itemsets (HUI). Traditional association rule mining algorithms only generate a set of highly frequent rules, but these rules do not provide useful answers for what the high utility association rules are. Therefore, Sahoo et al. (2015) proposed an approach to generate utility-based non-redundant high utility association rules and a method for reconstructing all high utility association rules. This approach includes three phases: (1) mining high utility closed itemsets (HUCI) and generators; (2) generating high utility generic basic (HGB) association rules; and (3) mining all high utility association rules based on HGB. The third phase of this approach consumes more time when the HGB list is large and each rule in HGB has many items in both antecedent and consequent. To overcome this limitation, in this paper, we propose an algorithm for mining high utility association rules using a lattice. Our approach has two phases: (1) building a high utility itemsets lattice (HUIL) from a set of high utility itemsets; and (2) mining all high utility association rules (HARs) from the HUIL. The experimental results show that mining HARs using HUIL is more eﬃcient than mining HARs from HGB (which is generated from HUCI and generators) in terms of runtime and memory usage. © 2017 Elsevier Inc. All rights reserved.

1. Introduction Data mining techniques are to explore the interesting and useful information hidden in databases, where the meaning of interest and usefulness depends on the problem formulation and application domain. Association rule mining plays an important role in many decision systems, and association rules help to determine the relationships among objects in the database. Traditional association rule mining [1], based on the support-conﬁdence framework, provides an objective measure of the rules that are of interest to users. In this approach, all items are given the same importance by considering the presence ∗

Corresponding author. E-mail addresses: [email protected], [email protected] (T. Mai), [email protected] (B. Vo), [email protected] (L.T.T. Nguyen). http://dx.doi.org/10.1016/j.ins.2017.02.058 0020-0255/© 2017 Elsevier Inc. All rights reserved.

82

T. Mai et al. / Information Sciences 399 (2017) 81–97

of items in a transaction, and the utility of items is not observed. Many investigations have been completed on in high utility itemsets mining of (HUI) [6–10,17–19,24–26,28], in which, users specify the utility for all items in the database and an itemset having utility no lower than a minimum utility threshold (min-util) is called a high utility itemset (HUI). The problem of HUIM is widely recognized as more diﬃcult than the problem of frequent itemset mining (FIM) [2]. In FIM, the downward-closure property states that the support of an itemset is anti-monotonic, and hence the supersets of an infrequent itemset are infrequent and the subsets of a frequent itemset are frequent. This property is very powerful with regard to pruning the search space. In HUIM, the utility of an itemset is neither monotonic nor anti-monotonic, and thus a high utility itemset may have a superset or subset with lower, equal or higher utility. Therefore, techniques to prune the search space developed in FIM cannot be directly applied in HUIM, and so many recent algorithms have focused on mining HUI, especially on candidate elimination [8–10,18,19,28]. However, the algorithms for generating association rules from HUI have received little attention. Sahoo et al. [16] proposed an approach for mining high utility association rules from HUI. The main contributions of this approach are mining non-redundant rules (high utility generic basic association rules – HGB), and mining rules from HGB by considering the current returned rules list to see if there are any rules that can be formed from its antecedent and subset of consequents. However, this approach consumes much time and memory usage, due to its costs to check: (1) whether temporary generated itemsets, which are combined from the antecedent and subset of consequent in current returned rules, are high utility itemsets; and (2) whether the newly generated rules satisfy the threshold or whether those rules are duplicated. In this paper, we propose an approach to mine high utility association rules using a high utility itemsets lattice. The main contributions of this paper are as follows: • We propose an algorithm for building HUIL from mined HUIs. • We propose an algorithm to generate all high utility association rules from HUIL. • We carry out experiments to show the eﬃciency of the proposed method, especially with regard to the reusability of HUIL. The rest of this paper is organized as follows. Section 2 outlines some current works related to mining HUI, the generation of high utility association rules from HUI, and mining association rules from lattices. Section 3 presents some basic deﬁnitions and the problem statement. Section 4 describes on our proposed algorithm to build a HUIL from a HUI list. The algorithm for mining all high association rules is presented in Section 5. Section 6 then discusses our experimental results and evaluates the performance of the algorithm with regard to both runtime and memory usage. Conclusions and future works are shown in Section 7. 2. Related works 2.1. High utility itemset mining High utility itemset mining (HUIM) has been known in the last years to solve the problems that items can appear more than once in each transaction, and each item has a weight (e.g. proﬁt or utility). Several studies on HUIM have been carried out. Liu et al. [9] proposed a two-phase algorithm with the concepts of Transaction Utility (TU) and Transaction Weighted Utility (TWU) to prune the search space of high utility itemsets. Because the TWUs of itemsets satisfy the downward-closure property, we can modify any frequent itemset mining algorithm to mine HUIs. Therefore, the authors modiﬁed the Apriori algorithm to mine HUIs. Although this algorithm can reduce the search space of utility mining, it still has performance issues for the following reasons: (1) a high number of candidates are generated with Apriori approach; and (2) the TWU of an itemset is often much higher than its utility. To address the issue of a large number of candidates being generated, Tseng et al. [18] proposed UP-Growth algorithm. This algorithm includes two steps: (1) construct the UP-Tree; and (2) identify high utility itemsets using a new term potential high utility itemsets (PHUI). In the ﬁrst stage, the algorithm scans all transactions to accumulate the (TWU) of each item. In the second step of scanning the database, items whose TWU values are less than the speciﬁed min-util threshold will be removed from each transaction. Four strategies are applied in this algorithm: (1) Discarding Global Unpromising Items (DGU) to eliminate the low utility items and their utilities from transaction utilities; (2) Discarding Global node utilities (DG) to reduce overestimated utilities; (3) Discarding Local Unpromising Items (DLU) to remove the utilities of low utility items from the path utilities of the path; and (4) Discarding Local Node Utilities (DLN) to discard item utilities of descendant nodes during the local UP-Tree construction. Tseng et al. [19] improved the UP-Growth algorithm and proposed UP-Growth+ to reduce the overestimated utilities. Liu and Qu [10] proposed HUI-Miner to discover high utility itemsets with a list data structure, called a utility list. It ﬁrstly creates an initial utility list for itemsets of the length 1 for promising items. The HUI-Miner then recursively constructs a utility list for each itemset of the length k using a pair of utility lists for itemsets of the length k − 1. For mining high utility itemsets, each utility list for an itemset keeps the information of TIDs for all of the transactions containing the itemset, the utility values of the itemset in the transactions, and the sum of the utilities of remaining items that can be included in super itemsets of the itemset in the transactions.

T. Mai et al. / Information Sciences 399 (2017) 81–97

83

Sahoo et al. [15] proposed a new generic basic scheme to mine non-redundant association rules. The non-redundant association rules are generated by deﬁning the redundancy deﬁnition and corresponding basic. A basic scheme deﬁnes a subset of all association rules from which all association rules can be extracted. Sahoo et al. [16] proposed a Fast High Utility Itemset Miner (FHIM) and deﬁned the Promising Utility Co-Occurrence Structure (PUCS) to further reduce the number of candidate itemsets. Zida et al. [28] proposed the eﬃcient high utility itemset mining (EFIM) algorithm for fast mining high utility itemsets. EFIM calculates the remaining utility (called sub-tree utility) at the parent node rather than at child nodes in the depth-ﬁrst search, and thus it can prune more nodes in the search space. Moreover, the upper-bound in EFIM is better, as locally unpromising items are removed from the sub-tree utility. In FHM and HUI-Miner, locally unpromising items cannot be removed from the remaining utility, since these algorithms use a vertical database. Therefore, sub-tree utility is a tighter upper bound than the remaining utility used in HUI-Miner, and generally also better than algorithms using the TWU, such as UP-Growth [18], UP-Growth+ [19], Two-Phase [9]. 2.2. High utility association rule mining Sahoo et al. [16] proposed an approach of mining high utility association rules which has three phases: (1) Mining high utility closed itemsets (HUCI) and their generators. (2) Executing the HGB algorithm to generate a high utility generic basic (HGB). HGB is a list of high utility basic association rules, and is deﬁned as HGB = {R : g → h \ g|h ∈ HUCI ∧ g = ∅, g ⊂ h, con f (R ) ≥ min − ucon f ∧ ࢜ g ⊂ g ∧ con f (g → h \ g ) ≥ min − ucon f . (3) Running the HAR algorithm to mine all HARs based on result of HGB algorithm. The name of the overall process to get all high association rules is HGB-HAR. The main idea of the HAR algorithm in phase 2 includes the following two steps: - Step 1: Let HUCI be the set of all high utility closed itemsets and HGB be the high utility generic basic. If R: X → Y HGB, for all subsets Z of Y, then R’: X → Z is also a valid association rule, where X∪Z is a high utility itemset. - Step 2: Let R: X → Y be a valid association rule on utility-conﬁdence framework. Then for any subset Z ⊂Y, if X∪Z is a high utility itemset, R’: X → Z is also a valid association rule. With each rule R: X → Y generated from the HGB algorithm, all subsets Z of consequence Y will be considered, and the utility of itemset X∪Z is also re-calculated. These tasks can thus cause the process to take a long time to ﬁnish. 2.3. Lattice-based approaches for mining association rules The use of a concept lattice [14] is an effective approach for data analysis and knowledge discovery, especially when mining association rules. Choi [3] proposed a variant to apply in building a frequent closed itemset lattice. Vo and Le [21] presented an extension of the Eclat algorithm [27] to quickly build a frequent itemset lattice (FIL). Vo and Le [22] proposed the Lattice-FI algorithm for quickly mining closed frequent itemsets and generators, and MNAR_Lattice for rapidly generating minimal non-redundant association rules from the lattice. Vo et al. [20] proposed a lattice-based approach for mining most generalization association rules. 3. Deﬁnitions and problem statement Deﬁnition 1. Given a ﬁnite set of items I = {i1 , i2 ,…, im }. Each item ip (1 ≤ p ≤ m) has a unit proﬁt p(ip ). An itemset X is a set of distinct k items {i1 , i2 , …, ik }, where ij ∈ I, 1 ≤ j ≤ k, k is the size of itemset X. A transaction database D = {T1 , T2 ,…,Tn } is a set of transactions where each transaction Td (1 ≤ d ≤ n) has an unique identiﬁer id, called Tid. Each item ip in the transaction Td is associated with a weight indicator called quantity q(ip , Td ), which is the number of item ip appearing in the transaction Td . Deﬁnition 2. The utility of an item i in a transaction Td is denoted as u(i, Tq ) and deﬁned as p(i) × q(i, Td) if i ∈ Td. For example, the utility of item A in t5 from Tables 1 and 2 is u(A, t5 ) = 3 × 3 = 9. Deﬁnition 3. The utility of an itemset X in a transaction Td is denoted as u(X,Td ) and deﬁned as

u(X, Td ) =

u (xi , Td ).

xi ∈ X

Deﬁnition 4. The utility of an itemset X in database D is calculated by the total utility of X in all transactions that X belongs to.

u (X ) =

u (X, Td ).

X ⊆ Td ∧ Td ∈ D

Deﬁnition 5. The support of an itemset X is an indication of how frequently that X appears in database D. The support value of X with respect to T is deﬁned as the proportion of itemsets in a database containing X, denoted as supp(X). For example, the support of X = {A, C, E} in database D from Table 1 is 2/9.

84

T. Mai et al. / Information Sciences 399 (2017) 81–97 Table 1 An example of transaction database D. Tid

Transaction

t1 t2 t3 t4 t5 t6 t7 t8 t9

A(4), C(1), E(6), F(2) D(1), E(4), F(5) B(4), D(1), E(5), F(1) D(1), E(2), F(6) A(3), C(1), E(1) B(1), F(2), H(1) D(1), E(1), F(4), G(1), H(1) D(7), E(3) G(10)

Table 2 An example utility table. Item

Utility

A B C D E F G H

3 4 5 2 1 1 2 1

Deﬁnition 6. An itemset X is a high utility itemset if it has a utility equal to or greater than the minimum user speciﬁed utility threshold. If an itemset X has a utility lower than the minimum utility threshold, it is not a high utility itemset, in other words it is a low utility itemset. Deﬁnition 7. An itemset Y is called a closure of itemset X if there are no other greater supersets of X than Y for which supp(X) = supp(Y), denoted as γ (X). An itemset X is a high utility closed itemset if X = γ (X ) and u(X) ≥ min-util. Deﬁnition 8. An itemset X is called a HUI Generator if it is a high utility itemset and there is no subset Z of X such that supp(X) = supp(Z). Deﬁnition 9. The local utility value of an item xi in itemset X, denoted as luv (xi , X) and calculated by the sum of utility of xi in all transactions containing X, is as follows:

luv(xi , X ) =

u(xi , td ).

X ⊆ td ∧td ∈D

Deﬁnition 10. Let X = {x1 , x2, …, xn } be an itemset, the utility unit array of X is deﬁned as U(X) = {u1 , u2, …, un } where ui = luv (xi , X ), i ∈ {1, 2, . . . , n}. Deﬁnition 11. The local utility value of an itemset X in itemset Y, X ⊆Y, is denoted as luv (X, Y) and deﬁned by the sum of the local utility values of each item xi ∈ X in itemset Y, as follows:

l uv (X, Y ) =

l uv(xi , Y ).

xi ∈ X ⊆ Y

Deﬁnition 12. A high utility association rule R is an implication between two high utility itemsets X, Y ⊆ I of the form X → Y, a utility conﬁdence of rule R, denoted as uconf(R), is deﬁned as (R ) = luv u((X,X )XY ) . R: X → Y is called a high utility association rule if uconf(R) is greater than or equal to the minimum utility conﬁdence (min-uconf) threshold deﬁned by a user. Property 1. Let R1: X → Y, R2: X → Z (Y⊂Z) be two association rules in the utility-conﬁdence framework, if R1 is not a valid rule, then R2 is an invalid rule. Deﬁnition 13. Let R1: X1 → Y1 and R2: X2 → Y2 be two valid association rules in a utility-conﬁdence framework. R2 is redundant with regard to R1 if X2 ∪ Y 2 ⊆ X1 ∪ Y 1, R1. ut ilit y ≥ R2.Ut ilit y, support (R1 ) = support (R2 ) and X1 ⊆X2, Y2 ⊆Y1 where Ri.utility is the utility of the rule Ri, i = 1, 2, and the support of a rule R: X → Y is supp(X∪Y).

T. Mai et al. / Information Sciences 399 (2017) 81–97

85

Fig. 1. High utility itemset lattice from the database in Tables 1 and 2.

Problem statement Given a transaction database D, a user speciﬁed minimum utility threshold min-util and a user speciﬁed minimum utility conﬁdence threshold min-uconf, the process of mining all high utility association rules from D is to ﬁnd all rules with a utility no less than min-util and utility conﬁdence no less than min-uconf. 4. Building a lattice of high utility itemsets (HUIL) There are several approaches for building lattices based on two categories: concept and frequent itemset lattices. In this paper, we propose a new kind of lattice, called a high utility itemset lattice. This lattice is built from the list of mined high utility itemsets. From HUIL we can extract much useful information, such as high utility closed itemsets and generators, and high utility association rules. 4.1. HUIL structure The HUIL structure contains a root node, child nodes and connections between each pairs of nodes. The root node is an empty itemset with utility and support equal to 0. The connections between each pair of nodes are used to specify their parent-child relationship. Each node contains information about the itemset, utility and support. The name of each node is formed based on the collection of items in an itemset. For example, in Fig. 1, Root is the root node, B is a node with utility = 20 and support = 2, and B node has two child nodes {BE, BF}. 4.2. Algorithm for building HUIL Firstly, the algorithm calls the BuildLattice procedure to setup a root node for the lattice. Secondly, it traverses all HUIs (set of HUIs is sorted by the number of items in an itemset). For each HUI, it resets the IsTraversed ﬂags of the root node and child nodes, then it calls the InsertLattice procedure to add HUI into the lattice. With regard to the InsertLattice procedure, the variable ﬂag is used to determine whether node {X} can be added directly into the current node. If the current rootNode

86

T. Mai et al. / Information Sciences 399 (2017) 81–97

Algorithm 1 Building HUIL. Input: Set of HUIs sorted by number of items ascending TableHUI Output: Lattice of HUIs with root node rootNode BuildLattice() 1. Set rootNode = Lattice Node of Empty Itemset; 2. For j = 1 to TableHUI.Levels.Count do 3. For each X in TableHUI.Levels [j] do 4. Set rootNode.IsTraversed = False; 5. For each childNode in rootNode. Children do 6. If childNode. Itemset ⊂X then 7. Set childNode.IsTraversed = False; 8. ResetLattice (childNode); 9. End 10. End 11. InsertLattice (HUI, rootNode); 12. End 13. End ResetLattice (LatticeNode) 14. Set LatticeNode.IsTraversed = False; 15. For each child in LatticeNode.Children do 16. ResetLattice (child); 17. End InsertLattice (X, rootNode) 18. If rootNode.IsTraversed then 19. return; 20. End 21. Set Flag = True, rootNode.IsTraversed = True; 22. For each childNode in rootNode.Children do 23. If childNode⊂X then 24. Set Flag = False; 25. InsertLattice (X, childNode); 26. End 27. End 28. If Flag = True then 29. rootNode.Children.Add (X); 30. End

Table 3 HUIs extracted from the database in Tables 1 and 2 with the minimum utility = 20. Itemset

Utility

Itemset

Utility

Itemset

Utility

A B D G E F AC

21 20 22 22 22 20 31

AE BE BF DE DF FE ACE

28 21 23 37 24 36 38

AFE BDE BFE DFE ACFE BDFE

20 23 22 36 25 24

has child nodes where each childNode ⊂X (line 23), the InsertLattice procedure is called recursively (line 25) to insert node {X} into the lattice with each childNode as the root node. If there does not exist any childNode ∈ rootNode.Children ∧Lc ⊂X, X will be a child node of the current rootNode (line 29). 4.3. Illustrations The HUIs extracted using the FHIM algorithm proposed by Sahoo et al. [16] from the sample database in Tables 1 and 2 are shown in Table 3. The FHIM algorithm is then applied, as follows. Let level-i be the set of HUIs for which each itemset has i items (i > 0). The results in Table 3 contain four levels, as follows: level-1 = {G, B, A, D, F, E}, level-2 = {BE, BF, AE, AC, DE, DF, EF}, level-3 = {BEF, BDE, AEF, ACE, DEF}, and level-4 = {BDEF, ACEF}. Firstly, an empty node is initialized and added into the lattice as root node rootNode. Secondly, the level-1 set of HUI is processed, initially with G, and is connected directly to rootNode, after that B, A, D, F, and E are processed and also inserted directly into rootNode as child nodes. Thirdly, considering level-2 of HUI, BE is processed, since the rootNode has a child node B, the algorithm calls method InsertLattice recursively with B as the root node then add a connection between B and BE. Similarly, with regard to child node E, E and BE also have a connection. This process then continues with the rest of the HUIs. The lattice of high utility itemsets associated with the database given in Tables 1 and 2 is represented in Fig. 1. This HUIL contains a root node which is an empty node and 20 nodes of a high utility itemset.

T. Mai et al. / Information Sciences 399 (2017) 81–97

87

5. Mining HARs from HUIL 5.1. Algorithm Vo and Le [22] proposed the MNARs algorithm for mining association rules from frequent itemset lattices. Then Vo et al. [20] modiﬁed the lattice and generated a new algorithm to mine all association rules. In this paper, we use the lattice which is built to generate all high utility association rules. We name our algorithm as LARM. In this approach we prune the search space to ﬁnd all the rules using Property 1. Algorithm 2 LARM. Input: HUIL with the rootNode, min-uconf Output: Set of high utility association rules RuleSet FindHuiRulesFromLattice() 1. Set RuleSet = ∅; 2. For each childNode in rootNode.Children do 3. FindRules (childNode); 4. End FindRules (latticeNode) 5. If latticeNode.IsFlag = False then 6. EnumerateHARs (latticeNode); 7. Set latticeNode.IsFlag = True; 8. For each childNode in latticeNode. Children do 9. FindRules (childNode); 10. End 11. End EnumerateHARs (latticeNode) 12. Set Queue = ∅, MarkLnode = ∅; 13. For each childNode in latticeNode.Children do 14. Queue.Enqueue(childNode); 15. MarkLnode.Add(childNode); 16. End 17. While Queue = ∅ do 18. Set Li = Queue.Dequeue(); 19. Set conf = CalculateConﬁdence (latticeNode, Li ); 20. If con f ≥ min − ucon f then 21. Set R: latticeNode.Itemset → Li .Itemset ࢨlatticeNode.Itemset; 22. Set R.conf = uconf; 23. Set RuleSet = RuleSet ∪ {R}; 24. For each Lc in Li .Chidren do 25. Queue.Enqueue (Lc ); 26. MarkLnode.Add (Lc ); 27. End 28. End 29. End

The LARM algorithm generates all high association rules from the HUIL. Initially, the algorithm traverses all of child nodes from the root of the lattice. For each child node, it calls FindRules procedure at line 3. The FindRules(latticeNodde) procedure ﬁnds rules using latticeNode as the antecedent via the EnumerateHARs procedure (line 6), then calls the FindRules recursively (line 9) for all child nodes of latticeNode. A queue structure is used to store all child nodes of childNode in the EnumerateHARs procedure, and each child node is also marked to prevent from coinciding. For each node Li taken from the queue, we consider a rule R: Lc .Itemset → Li .Itemset ࢨ Lc .Itemset, if the utility conﬁdence of this rule R.uconf is not less than the speciﬁed min-uconf (line 20), this rule is added into the returned results (line 23). To calculate the utility conﬁdence of a rule (line 19), we use the utility list structure which is attached in each itemset (from the FHIM algorithm). If R is a valid rule, all child nodes of Li are added into a queue (lines 24–27). Otherwise, it continues processing the remaining itemsets in a queue. The algorithm is described in more detail as follows. 5.2. Illustrations Considering the database given in Tables 1 and 2, the LARM algorithm generates 22 high utility association rules, as shown in Table 4. Firstly, the algorithm initializes the RuleSet variable. Secondly, It scans all children {G, B, A, D, F, E} of rootNode. Considering node G, there are no association nodes for G, and thus there are no rules generated with antecedent G. Stepping into node B, the algorithm calls the FindRules procedure to ﬁnd valid high utility association rules with antecedent B, and the consequent is taken from the list of all child nodes of sub-lattice where B is the root. The algorithm then calls FindRules procedure recursively for each child node of B. The following steps show details of mining on lattice node B, min-uconf = 80%.

88

T. Mai et al. / Information Sciences 399 (2017) 81–97 Table 4 Extracted high utility association rules with min-uconf = 80%. Rules

Uconf. (%)

Rules

Uconf. (%)

Rules

Uconf. (%)

Rules

Uconf. (%)

A→C B→F D→E F → DE BE → F BDE → F

100 100 100 80 100 100

A→E B → DE E→F AC → E BE → DF BEF → D

100 80 81 100 100 100

A → CE B → FE F→E AE → C DF → E

100 80 90 100 100

B→E B → DEF F→D BE → D AFE → C

80 80 80 100 100

Table 5 Characteristics of the test datasets.

-

Dataset

# of transactions

# of items

Size (MB)

Chess Foodmart Mushroom Retail Chainstore Accidents

3196 4141 8124 88,162 1112,949 340,183

75 1559 119 16,470 46,086 468

0.63 0.17 1.03 6.42 79.2 63.1

Call the FindRules procedure with lattice node B as a parameter. If IsFlag ﬂag of B false, call EnumerateHARs with input lattice node B. Initialize Queue = ∅, MarkLNode = ∅. Add all child nodes of B into Queue and MarkLNode, Queue = {BE, BF }, MarkLNode = {BE, BF }. Consider Li = Queue.Dequeue() = BE, RB1 : B → BE \B, RB1 .ucon f = 80% ≥ min-uconf. RB1 is valid, then add all child nodes of BE into Queue, Queue = {BF , BEF , BDE }, MarkLNode = {BE , BF , BE F , BDE }. Next, consider Li = BF , RB2 : B → BF \B, RB2 .ucon f = 100% > min − ucon f.RB2 is valid, then add child nodes of BF into queue, Queue = {BEF , BDE }, MarkLNode = {BE , BF , BE F , BDE }. Similarly, Li = BEF , RB3 : B → EF , RB3 .ucon f = 80%. RB3 is valid, BEF has no child nodes. Thus, consider Li = BDE, RB4 : B → DE, RB4 .ucon f = 80% . RB4 is valid, add all child nodes of BDE into Queue, Queue = {BDEF }, MarkLNode = {BE , BF , BE F , BDE , BDE F }. Consider Li = BDEF , RB5 : B → DEF , RB5 .ucon f = 80%, BDEF node has no child nodes. Queue is empty, stop processing with lattice node B. The valid rules with antecedent B are RuleSet = {RB1 , RB2 , RB3 , RB4 , RB5 }. Call the FindRules procedure recursively with child nodes {BE, BF} of B. The running steps are similar to those of ﬁnding rules from node B.

After processing with child node B of rootNode, the algorithm continues with the rest of the lattice nodes from root {A, D, F, E} and carries out similar steps to ﬁnd rules from lattice node B as those reported above. 6. Experimental results We executed the proposed algorithms and HGB-HAR algorithm to evaluate their performance with regard to runtime and memory usage. The experiments were implemented and tested on a system with the following conﬁguration: Intel Core I7-6500U 2.5 GHz (4 CPUs), 16 GB of RAM, and running Windows 10, 64-bit version. The source code was created in C# using Visual Studio 2015 Community, .Net framework 4.5. The datasets [5] for testing have the features shown in Table 5. In this section, we also show the results for most datasets, as seen below. The set of HUIs which were the input of these algorithms was generated from the FHIM algorithm, as proposed by Sahoo et al. [16]. We executed both LARM and HGB-HAR algorithms on different datasets by keeping the ﬁxed min-util threshold and varying min-uconf thresholds. On each dataset we also used the ﬁxed min-uconf and different min-util thresholds in order to have more correct comparisons on performance between the LARM and HGB-HAR algorithms. Speciﬁcally, using steps of 10% for min-uconf from 10% to 90%, we tested with min-util ∈ {27.5%, 28%, 28.5%, 29%, 29.5} for the Chess dataset, min-util ∈ {0.03%, 0.035%, 0.04%, 0.045%, 0.05%} for the Foodmart dataset, min-util ∈ {10%, 11%, 12%, 13%, 14%} for Mushroom dataset, minutil ∈ {0.01%, 0.02%, 0.03%, 0.04%, 0.05%} for the Retail dataset, min-util ∈ {0.0 04%, 0.0 05%, 0.01%, 0.02%, 0.03%} for the Chainstore dataset, and min-util ∈ {11%, 12%, 13%, 14%, 15%} for the Accidents dataset. For reference, we listed the results of HARs in Table 6 for each of the above testing datasets with min-uconf ∈ {60%, 70%, 80%} and the corresponding min-util. It can be observed that the number of HARs mined on Foodmart, Chess and Mushroom decreased slightly with same min-util and various min-uconf from 60% to 80%. On Retail, Accidents, and Chainstore datasets, the number of HARs decreased sharply if we ﬁxed min-util and increased min-uconf, especially with the Chainstore dataset, and the number of HARs decreased 20%−40% if we increased min-uconf in steps of 10%. Moreover, there were less HARs mined with Chainstore.

T. Mai et al. / Information Sciences 399 (2017) 81–97

89

Table 6 The number of high utility association rules with different datasets. Dataset

min-util (%)

#HUIs

#HARs (min-uconf = 60%)

#HARs (min-uconf = 70%)

#HARs (min-uconf = 80%)

Foodmart

0.03 0.04 0.05 0.06 27.5 28.0 28.5 29.0 0.005 0.01 0.02 0.03 10 11 12 13 0.01 0.02 0.03 0.04 11 12 13 14

54,928 20,766 2,266 1,483 791 493 305 176 12,347 3,884 1,165 593 9,594 5,801 2,726 1,152 22,479 7,375 3,765 2,272 2,367 728 189 48

3,099,516 810,707 105,805 4,891 30,726 14,287 6,677 2,893 718 113 15 7 679,987 279,706 78,308 19,606 22,120 6,725 3,160 1,873 88,388 17,778 2,453 346

3,098,322 810,488 105,785 4,891 30,144 14,197 6,668 2,893 439 77 12 6 636,490 268,547 77,259 19,606 13,642 3,827 1,755 1,033 51,700 11,332 1,855 290

3,098,176 810,42 105,740 4,891 22,211 11,512 5,844 2,701 342 65 11 6 594,178 255,680 74,688 19,474 6,016 1,472 673 397 23,911 5,568 1,024 170

Chess

Chainstore

Mushroom

Retail

Accidents

Table 7 Sample of pruning rates on some datasets. Dataset

min-util (%)

min-uconf (%)

# pairs of itemsets in lattice

# pairs of itemsets in lattice using LARM

Pruning rate (%)

Chess

27.5

Mushroom

10

Retail

0.01

Foodmart

0.04

Chainstore

0.005

Accidents

11

70 80 90 70 80 90 70 80 90 70 80 90 70 80 90 70 80 90

30,726 30,726 30,726 707,250 707,250 707,250 162,649 162,649 162,649 925,085 925,085 925,085 28,978 28,978 28,978 132,793 132,793 132,793

30,726 29,962 22,480 656,297 617,186 599,883 78,949 67,707 62,642 835,820 835,570 835,570 20,105 20,043 20,022 103,352 73,652 38,744

– 2.49 26.84 7.20 12.73 15.18 51.46 58.37 61.47 9.65 9.66 9.66 30.61 30.83 30.81 22.17 44.53 70.82

6.1. Runtime A key achievement of the proposed LARM algorithm is the search space pruning. Consider Property 1 mentioned with regard to Algorithm 2 (LARM), as described above: Let R1: X → Y, R2: X → Z, (Y⊂Z) be the association rules in the utilityconﬁdence framework, if R1 is not a valid rule, then R2 is also not a valid rule. The number of pair itemsets to check was reduced. Table 7 shows the pruning rate by number of pair itemsets for some datasets. By applying Property 1, the LARM algorithm signiﬁcantly reduces the execution time. The following ﬁgures compare the time execution between the proposed LARM and HGB-HAR algorithms [16] on standard datasets. The axes in Figs. 2–Fig. 13 are in logarithmic scale of 10. The runtimes for LARM which are shown in these ﬁgures are the sums of the time spent on constructing the lattice and mining all HARs, as denoted as LARM. The runtime for the HGB-HAR algorithm is denoted as HGB-HAR. Besides this, the runtime to extract all HARs using LARM includes the time for building lattice, and thus the time needed to mine rules would be much faster if the time needed for this were excluded. To show the actual execution time of mining all HARs without counting the building lattice runtime, we used the LARM-HUIL notation. In real applications, the HUI lattice can be built once for mining rules with different min-uconf, and so HARs can be returned more quickly.

90

T. Mai et al. / Information Sciences 399 (2017) 81–97

Foodmart

10,000

min-uconf: 70%

Runtime (s)

1,000 100 10

LARM HGB-HAR LARM-HUIL

1

0.05

0.045

0.04 0.035 Minimum Utility (%)

0.03

Fig. 2. Runtime for mining HARs on Foodmart with min-uconf = 70%.

Chess

10,000,000

min-uconf = 70%

1,000,000 Runtime (ms)

100,000 10,000 1,000 100

LARM HGB-HAR LARM-HUIL

10 1

29.5

29

28.5 28 Minimum Utility (%)

27.5

Fig. 3. Runtime for mining HARs on Chess with min-uconf = 70%.

In the Foodmart dataset, using a ﬁxed min-uconf = 70%, and decreasing value of min-util from 0.05% to 0.03% (Fig. 2), the runtime for both algorithms increased but LARM still had better performance than HGB-HAR. Speciﬁcally, with minutil = 0.03% and min-uconf = 70%, we needed 5764 s to get all HARs using the HGB-HAR algorithm, however we were able to get all HARs using the LARM algorithm in 1284 s which includes 862 s of building HUIL, meaning that we only needed 422 s to ﬁnd all HARs. Similarly, we evaluated the performance of the LARM algorithm on the Chess dataset by ﬁxing min-uconf = 70% and decreasing min-util from 29.5% to 27.5% (Fig. 3). The results showed that the time needed using both the HGB-HAR and LARM algorithms increased along with the size of HUIs. However, the runtime performance of LARM was also better than that of HGB-HAR. Additionally, without counting the execution time of Algorithm 1, the LARM-HUIL line in Fig. 3 also shows that HARs can be extracted quickly. Fig. 4 shows a comparison of mining HARs on the Chainstore dataset when decreasing min-util slightly from 0.03% to 0.004% and keeping min-uconf at 70%. It can be observed that the LARM algorithm had better performance on execution time than the HGB-HAR algorithm, especially when the number of HUIs increased. On the Retail dataset, when min-util was equal to 0.01% and min-uconf was equal to 70% then 78,949 utility rules needed to be considered in the LARM algorithm while HGB-HAR needed to check 162,649 utility rules to check whether they were high utility association rules or not (Table 7). The execution time of LARM algorithm was thus better than that of HGB-HAR algorithm. Moreover, if the lattice building runtime was excluded, LARM also took less time to generate HARs. In Fig. 5, with min-util = 0.04% and min-uconf = 70%, the runtime of LARM was 682 ms, which included 672 ms for building HUIL, meaning that it only took 10 ms to generate HARs from the lattice. This result also indicates the advantages of using lattices in mining HARs. On Mushroom dataset, we monitored the runtime for the LARM and HGB-HAR algorithms when mining HARs with the same min-uconf = 70% and varying min-util decreased from 14% to 10% (Fig. 6). The average runtime of HGB-HAR was 2,253,0 0 0 ms, while that of LARM was 38,223 ms. We could also see that the runtime of LARM already contained the time to construct HUIL, which was an average of 22,143 ms. The step of extracting HARs from HUIL only took an average of 16,079 ms to complete.

T. Mai et al. / Information Sciences 399 (2017) 81–97

Chainstore

1,000,000

91

min-uconf = 70%

Runtime (ms)

100,000 10,000 1,000 100

LARM HGB-HAR LARM-HUIL

10 1

0.03

0.02 0.01 Minimum Utility (%)

0.005

0.004

Fig. 4. Runtime for mining HARs on Chainstore with min-uconf = 70%.

Retail

1,000,000

min-uconf = 70%

Runtime (ms)

100,000 10,000 1,000 100 LARM HGB-HAR LARM-HUIL

10 1

0.05

0.04

0.03 0.02 Minimum Utility (%)

0.01

Fig. 5. Runtime for mining HARs on Retail with min-uconf = 70%.

Mushroom

10,000,000

min-uconf = 70%

Runtime (ms)

1,000,000 100,000 10,000 1,000 100

LARM HGB-HAR LARM-HUIL

10 1

14

13

12 11 Minimum Utility (%)

10

Fig. 6. Runtime for mining HARs on Mushroom with min-uconf = 70%.

Accidents is one of the biggest datasets, with 63.1 MB, 340,183 transactions and 468 items. The difference on runtime between the HGB-HAR and LARM algorithms increased if we decreased the value of min-util. Moreover, the HGB-HAR algorithm took a long time to ﬁnd rules if the min-util value was less than 14%, the results could not be returned after 10 hours of running, while LARM took only 15 s to ﬁnd HARs with min-util = 11%. The number of pairs of itemsets to be considered was optimized by applying Property 1 in the LARM algorithm. In fact, the total number of rules that needed to be checked were reduced 22.17% (Table 7) when we mined HARs using the LARM algorithm with min-util = 11% and min-uconf = 70%. Fig. 7 shows that mining high association rules using a large dataset can also be achieved effectively by using the lattice

92

T. Mai et al. / Information Sciences 399 (2017) 81–97

Accidents

min-uconf = 70%

10,000,000

Runtime (ms)

1,000,000 100,000 10,000 1,000 100

LARM HGB-HAR LARM-HUIL

10 1

15

14

13 12 Minimum Utility (%)

11

Fig. 7. Runtime for mining HARs on Accidents with min-uconf = 70%.

Foodmart

min-util = 0.03%

10,000

Runtime (s)

1,000 100 LARM

10 1

HGB-HAR LARM-HUIL 90

80

70 60 50 40 Min Utility Confidence (%)

30

20

10

Fig. 8. Execution time on Foodmart with min-util = 0.03%.

approach. In our experimental results for mining HARs on the Accidents dataset with min-util = 11% and min-uconf = 70%, the HARs set could be returned within 435 ms, and with min-util = 12% and min-uconf = 70%, and the runtime for mining rules is only 92 ms. The major problem of extracting high utility association rules based on the utility-conﬁdence framework is concerned with the size of the extracted rules. We then also evaluated the performance runtime of the LARM algorithm versus that of HGB-HAR by ﬁxing min-util and varying min-uconf from 10% to 90%, with the results shown in Fig. 8–Fig. 13. With the same HUI set, the execution time trend of LARM was accelerated by increasing the min-uconf. This shows that the LARM has good performance with regard to processing speed. Fig. 8 shows the execution time comparison between LARM and HGB-HAR algorithms [16] for mining HARs, with the same min-util threshold (0.03%) and min-uconf(%) ∈ {10, 20, 30, 40, 50, 60, 70, 80, 90} the average running time of HGB-HAR was 5606,0 0 0 ms, while that of LARM was 130 0,0 0 0 ms. We also showed the reusability of the high utility itemset lattice, as indicated by the LARM-HUIL line in Fig. 8. It can be observed that generating all HARs from the HUIL took an average of 420 ms when mining HARs on Foodmart with above given parameters. To discover all HARs in the Chess dataset we needed an average of 266 ms (Fig. 9), which included an average of 173 ms for building HUIL from HUIs with the Chess dataset using the min-util and min-uconf given above. We thus only needed 93 ms to extract HAR from this HUIL. Similarly, the results obtained on Chainstore dataset (Fig. 10) also show that the execution time of LARM was faster than that of HGB-HAR. Considering the pruning rates of the LARM algorithm on Chainstore dateset mentioned in Table 7, the average pruning rate value for min-uconf ∈ {70%, 80%, 90%} is 30%. This percentage indicates that the LARM algorithm eliminated invalid utility association rules successfully to achieve good performance. When mining high association rules from the Retail dataset with min-util = 0.03%, the time execution of LARM with minuconf = 80% was 1.838 ms, while the time execution of HGB-HAR with min-uconf = 80% was 4.255 ms, with an increase in speed of nearly 57% (Fig. 11).

T. Mai et al. / Information Sciences 399 (2017) 81–97

Chess

93

min-util = 28%

1,000,000 100,000

LARM HGB-HAR

Runtime (ms)

10,000

LARM-HUIL

1,000 100 10 1

90

80

70 60 50 40 30 Min Utility Confidence (%)

20

10

Fig. 9. Execution time on Chess with min-util = 28%.

Chainstore

min-util = 0.005%

100,000

Runtime (ms)

10,000 1,000 100 LARM HGB-HAR LARM-HUIL

10 1

90

80

70

60 50 40 30 Min Utility Confidence(%)

20

10

Fig. 10. Execution time on Chainstore with min-util = 0.005%.

Retail

1,000,000

min-util = 0.03%

100,000

Runtime (ms)

10,000 1,000 100 LARM HGB-HAR LARM-HUIL

10 1

90

80

70 60 50 40 30 Min Utility Confidence (%)

20

10

Fig. 11. Execution time on Retail with min-util = 0.03%.

In Mushroom dataset, as shown in Fig. 12, the average runtime of the LARM algorithm was 32,0 0 0 ms and that of the HGB-HAR algorithm was 3133,0 0 0 ms, with the former being 98 times faster. This veriﬁed that the LARM algorithm has a better runtime than the HGB-HAR algorithm. The HGB-HAR algorithm took a long time to complete the task of mining HARs from the Accidents dataset, while LARM only needed an average of 14.5 ms to complete this (Fig. 13). Actually, with this dataset we needed 6 ms to construct HUIL from HUIs, which was extracted from the FHIM algorithm with min-util = 14% [13]. Then, using this HUIL, we could mine all

94

T. Mai et al. / Information Sciences 399 (2017) 81–97

Mushroom

100,000,000

min-util = 11%

Runtime (ms)

1,000,000 10,000 LARM HGB-HAR LARM-HUIL

100 1

90

80

70 60 50 40 30 Min Utility Confidence (%)

20

10

Fig. 12. Execution time on Mushroom with min-util = 11%.

Accidents

10,000,000

min-util = 14%

1,000,000 Runtime (ms)

100,000 LARM HGB-HAR LARM-HUIL

10,000 1,000 100 10 1

90

80

70 60 50 40 30 Min Utility Confidence (%)

20

10

Fig. 13. Execution time on Accidents with min-util = 14%.

HARs easily within an average of 9.5 ms. This result again indicates the good performance of LARM as well as the reusability of HUIL. 6.2. Memory usage The experiments also showed that the memory usage of LARM was better than that of HGB-HAR. Both LARM and HGBHAR algorithms were used with the same set of HUIs with given min-util and min-uconf. However, HGB-HAR generated rules based on HUCI and Generators, and in phase 2 of the algorithm many temporary itemsets were generated and needed to be checked whether they were high utility itemsets. Therefore, the overall memory usage for the HGB-HAR algorithm was higher than that for the LARM algorithm. Moreover, in the LARM algorithm the memory usage was also optimized because the number of redundant pairs of HUIs that needed to be checked was reduced (Table 7). In Table 8, we show the comparison memory usage between LARM and HGB-HAR algorithms when mining HARs with ﬁxed min-util and various min-uconf from 60% to 90%. Although the difference of memory usage is not much, it still indicates that our proposed approach is more advantage. We also mined HARs from above testing datasets by using ﬁxed min-uconf = 70% and various min-util to evaluate memory usage performance between HGB-HAR algorithm and LARM algorithm. The result of memory usage comparison is presented in Table 9. On the Chess dataset, we mined HARs using both HGB-HAR and LARM algorithms with various min-uconf and ﬁxed minutil = 28%, the comparison on memory usage between these algorithm is shown in Table 8. We also evaluated the memory consumption for both algorithms on this dataset by various min-util and ﬁxing min-uconf = 70%. The difference in memory usage between these algorithms increased when we decreased min-util slightly from 29.5% to 27.5%. (Table 9). For Chess dataset, LARM algorithm used less memory than HGB-HAR algorithm used. We repeat memory usage evaluation for both LARM algorithm an HGB-HAR algorithm on Mushroom dataset. Firstly, we kept ﬁxed min-util = 11% and changed value of min-uconf from 60% to 90%, step by 10%. The difference on memory usage of LARM algorithm to HGB-HAR algorithm was not too much. The average memory usage of LARM algorithm with Mushroom dataset was 249 MB while that of HGB-HAR

T. Mai et al. / Information Sciences 399 (2017) 81–97

95

Table 8 Memory usage for mining HARs with ﬁxed min-util and various minuconf. Dataset

min-util (%)

Chess

28

Mushroom

11

Retail

0.03

Foodmart

0.03

Chainstore

0.005

Accidents

14

min-uconf (%)

60 70 80 90 60 70 80 90 60 70 80 90 60 70 80 90 60 70 80 90 60 70 80 90

Memory usage (MB) HGB-HAR

LARM

62 62 58 53 265 257 248 249 231 231 231 231 2307 2306 2306 2306 1257 1100 980 863 377 373 372 369

57 56 53 50 260 252 242 243 228 228 227 223 2232 2231 2231 2231 1141 1020 926 830 306 306 306 306

Table 9 Memory usage for mining HARs with various min-util and ﬁxed min-uconf = 70%. Dataset

Chess

Mushroom

Retail

Foodmart

Chainstore

Accidents

min-util (%)

27.5 28 28.5 29 29.5 10 11 12 13 14 0.01 0.02 0.03 0.04 0.05 0.03 0.035 0.04 0.045 0.05 0.004 0.005 0.01 0.02 0.03 11 12 13 14 15

Memory usage (MB) HGB-HAR

LARM

83 62 53 49 46 555 257 119 72 55 499 324 244 202 200 2306 1278 623 281 111 1379 1356 1100 921 811 Undeﬁned Undeﬁned Undeﬁned 373 366

65 56 51 47 46 538 252 112 68 53 432 317 233 193 171 2232 1255 616 265 102 1280 1261 1020 871 790 376 316 307 306 306

96

T. Mai et al. / Information Sciences 399 (2017) 81–97

algorithm on this dataset was 255 MB (Table 8). Secondly, we also kept a ﬁxed min-uconf = 70% and used various minutil from 10% to 14%. LARM algorithm required 205 MB on average while HGB-HAR required 211 MB on average (Table 9). Although the difference on memory usage was not too much, LARM algorithm used less memory than HGB-HAR algorithm used. Regard to memory usage for mining high utility association rules on Retail dataset, we did experimental results by using ﬁxed min-util = 0.03% and min-uconf ∈ {90%, 80%, 70%, 60%}, HGB-HAR algorithm needed more memory than LARM algorithm. Speciﬁcally, HGB-HAR algorithm needed average 231 MB while LARM algorithm needed 226.5 MB (Table 8). Besides that, by ﬁxing min-uconf = 70% and using various min-util ∈ {0.05%, 0.04%, 0.03%, 0.01%}, LARM algorithm also needed less memory than HGB-HAR algorithm average 24 MB (Table 9). On Foodmart dataset, we evaluated performance of LARM versus HGB-HAR algorithm in term of memory usage by keeping a ﬁxed value of min-util = 0.03% and using various value of min-uconf from 60% to 90%. LARM algorithm required less memory than HGB-HAR algorithm required (Table 8). We also tested with ﬁxed min-uconf = 70% and various value of minutil ∈ {0.03%, 0.035%, 0.04%, 0.05%, 0.05%} (Table 9). On this dataset, the differences on memory usage between LARM and HGB-HAR algorithms were not great, it still indicated that LARM had better performance than HGB-HAR in term of memory usage. On Chainstore dataset, we executed LARM and HGB-HAR algorithms to mine all high utility association rules. We observed that LARM algorithm consumed less memory than HGB-HAR algorithm. By keeping ﬁxed min-util = 0.005% and various min-uconf from 60% to 90%, average memory consumption of LARM algorithm was 979 MB and that of HGB-HAR was 1050 MB (Table 8). In other approach of ﬁxing min-uconf = 70% and using various min-util ∈ {0.0 05%, 0.0 04%, 0.01%, 0.02%, 0.03%}. The average memory consumption ratio between LARM algorithm and HGB-HAR algorithm was 93.72% (Table 9). We mined HARs from Accidents dataset by using various min-uconf increased slightly from 60% to 90% step by 10% and ﬁxed min-util = 14%. The results in Table 8 indicated that LARM algorithm had better performance than HGB-HAR algorithm with regard to memory usage. The HGB-HAR algorithm took a long time to mine high utility rules on the Accidents dataset if min-util < 14%, and thus in Table 9 the memory usage values for min-util < 14% and min-uconf = 70% are undeﬁned. This result also emphasizes the effectiveness of using a lattice-based approach for mining high utility association rules. 7. Conclusion and future work In this work we used the utility-conﬁdence framework and concept of lattices to mine high utility association rules and thus obtained the semantic relationships among high utility itemsets. To the best of our knowledge, this is the ﬁrst study on mining high utility association rules using a lattice structure. We proposed an algorithm, called HUIL, to construct the lattice of a HUI set. This lattice is also an input of LARM, and the results of this work show that this is an improved algorithm that requires less runtime and memory usage. The results of the experiments carried out in this study also show that the proposed algorithms can be used effectively in various recommendation systems. In the future, we will investigate how to improve the generating HUI phase. We then intend to study other interesting measures [11,23] and integrate these into the current algorithm in order to obtain more useful information from transaction databases. Additionally, we also study constraint-based methods [4,12,13,25] and apply them into mining HUI and high utility association rules. Acknowledgment This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant number 102.05-2015.10. References [1] R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, in: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 1993, pp. 207–216. [2] R. Agrawal, R. Srikant, Fast algorithms for mining association rules, in: Proceedings of International Conference on Very Large Data Bases, VLDB’94, 1994, pp. 487–499. [3] V. Choi (2006). Faster algorithms for constructing a concept (Galois) lattice, arXiv:cs.DM/0602069. [4] H.V. Duong, T.C. Truong, B. Vo, An eﬃcient method for mining frequent itemsets with double constraints, Eng. Appl. Artif. Intell. 27 (2014) 148–154. [5] P. Fournier-Viger, A. Gomariz, A. Soltani, T. Gueniche, SPMF: Open-source data mining library. SPMF: a Java open-source pattern mining library, J. Mach. Learn. Res. 15 (1) (2014) 3389–3393. [6] P. Fournier-Viger, C. Wu, S. Zida, V.S. Tseng, Faster high utility itemset mining using estimated utility co-occurrence pruning, in: Proceedings of the 21st International Symposium on Methodologies for Intelligent Systems, 2014, pp. 83–92. [7] W. Gan, J.C. Lin, P. Fournier-Viger, H. Chao, More eﬃcient algorithms for mining high-utility itemsets with multiple minimum utility thresholds, in: Proceedings of International Conference on Database and Expert Systems Application, DEXA(1), 2016, pp. 71–87. [8] G. Grahne, J. Zhu, Fast algorithms for frequent itemset mining using FP-trees, IEEE Trans. Knowl. Data Eng. 17 (10) (2005) 1347–1362. [9] Y. Liu, W. Liao, A. Choudhary, A Two-Phase algorithm for fast discovery of high utility itemsets, in: Proceedings of the 9th Paciﬁc-Asia conference on Advances in Knowledge Discovery and Data Mining, 2005, pp. 689–695. [10] M. Liu, J. Qu, Mining high utility itemsets without candidate generation, in: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012, pp. 55–64. [11] L. Nguyen, B. Vo, T. Hong, CARIM: An eﬃcient algorithm for mining class association rules with interestingness measures, Int. Arab J. Inf. Technol. 12 (6A) (2015) 627–634. [12] D. Nguyen, L.T.T. Nguyen, B. Vo, W. Pedrycz, Eﬃcient mining of class association rules with the itemset constraint, Knowl.-Based Syst. 103 (2016) 73–88.

T. Mai et al. / Information Sciences 399 (2017) 81–97

97

[13] D. Nguyen, B. Vo, B. Le, CCAR: An eﬃcient method for mining class association rules with itemset constraints, Eng. Appl. Artif. Intell. 37 (2015) 115–124. [14] U. Priss, Lattice-based information retrieval, Knowl. Organ. 27 (3) (20 0 0) 132–142. [15] J. Sahoo, A.K. Das, A. Goswami, An effective association rule mining scheme using a new generic basis, Knowl. Inf. Syst. 43 (1) (2015) 127–156. [16] J. Sahoo, A.K. Das, A. Goswami, An eﬃcient approach for mining association rules from high utility itemsets, Expert Syst. Appl. 42 (13) (2015) 5754–5778. [17] V.S. Tseng, C. Wu, P. Fournier-Viger, P.S. Yu, Eﬃcient Algorithms for Mining Top-K High Utility Itemsets, IEEE Trans. Knowl. Data Eng. 28 (1) (2016) 54–67. [18] V.S. Tseng, C. Wu, B. Shie, P.S. Yu, UP-Growth: an eﬃcient algorithm for high utility itemset mining, in: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2010, pp. 253–262. [19] V.S. Tseng, C. Wu, B. Shie, P.S. Yu, Eﬃcient algorithms for mining high utility itemsets from transactional databases, IEEE Trans. Knowl. Data Eng. 25 (8) (2013) 1772–1786. [20] B. Vo, T. Hong, B. Le, A lattice-based approach for mining most generalization association rules, Knowl.-Based Syst. 45 (2013) 20–30. [21] B. Vo, B. Le, Mining traditional association rules using frequent itemsets lattice, in: Proceedings of the 39th International Conference on Computers & Industrial Engineering, 2009, pp. 1401–1406. [22] B. Vo, B. Le, Mining minimal non-redundant association rules using frequent itemsets lattice, J. Intell. Syst. Technol. Appl. 10 (1) (2011) 92–106. [23] B. Vo, B. Le, Interestingness for association rules: combination between lattice and hash tables, Expert Syst. Appl. 38 (9) (2011) 11630–11640. [24] B. Vo, H. Nguyen, B. Le, Mining high utility itemsets from vertical distributed databases, in: Proceedings of International Conference Computing and Communication Technologies, 2009, pp. 1–4. [25] B. Vo, T. Le, W. Pedrycz, G. Nguyen, S.W. Baik, Mining erasable itemsets with subset and superset itemset constraints, Expert Syst. Appl. 69 (2017) 50–61. [26] U. Yun, H. Ryang, K.H. Ryu, High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates, Expert Syst. Appl. 41 (8) (2014) 3861–3878. [27] M.J. Zaki, C.J. Hsiao, Eﬃcient algorithms for mining closed itemsets and their lattice structure, IEEE Trans. Knowl. Data Eng. 17 (4) (2005) 462–478. [28] S. Zida, P. Fournier-Viger, J.W. Lin, C. Wu, V.S. Tseng, EFIM: a fast and memory eﬃcient algorithm for high-utility itemset mining, Knowl. Inf. Syst. (2016) 1–31, doi:10.1007/s10115- 016- 0986- 0.

A lattice-based approach for mining high utility association rules

A lattice-based approach for mining high utility association rules

Recommend Documents