Discrete Applied Mathematics (
)
–
Contents lists available at ScienceDirect
Discrete Applied Mathematics journal homepage: www.elsevier.com/locate/dam
Parallel algorithms for enumerating closed patterns from multi-relational data Hirohisa Seki *, Masahiro Nagao 1 Department of Computer Science, Nagoya Institute of Technology, Showa-ku, Nagoya, 466-8555, Japan
article
info
Article history: Received 14 February 2016 Received in revised form 15 March 2018 Accepted 28 March 2018 Available online xxxx Keywords: Multi-relational data mining Formal concept analysis Closed patterns Parallel algorithm Load-balancing
a b s t r a c t This paper presents parallel algorithms for enumerating closed patterns from multirelational data. In multi-relational data mining (MRDM), patterns are represented in logical formulae, and involve multiple tables (relations) from a relational database. Since the expressive framework of MRDM makes the task of pattern mining costly compared with the conventional itemset mining, we propose parallel algorithms for computing closed patterns on multi-core processors. In particular, we present new load-balancing strategies which try to fully exploit the task-parallelism intrinsic in the search process of the problem, and give some experimental results, which show the effectiveness of the proposed methods. We then apply our proposed methods to compute closed patterns, a.k.a. concept intents, for binary object-attribute relational data, and show by experiments that the performance of our method is comparable to the existing method. © 2018 Elsevier B.V. All rights reserved.
1. Introduction We propose parallel algorithms for enumerating closed patterns, a.k.a. concept intents, from multi-relational data. Multirelational data mining (MRDM) handles data and patterns (or queries) represented in the form of logical formulae, and it has been extensively studied for more than a decade (e.g., [7,8] and references therein) in the fields of data mining and inductive logic programming (ILP). This expressive formalism of MRDM allows us to use complex and structured data in a uniform way, including trees and graphs in particular, and multi-relational patterns in general. Since the computation of MRDM is costly compared with the conventional itemset mining, scalability and efficiency have been primary concerns in MRDM [2]. One of the problems in frequent pattern mining is handling a vast number of patterns generated in the mining process. To overcome this problem, concise representations of frequent patterns such as closed patterns and minimal generators (a.k.a. free sets), have been studied ([1,3,21,28,4] for survey). In MRDM, condensed representations such as closed or free patterns have been also studied in c-armr [5] and RelLCM2 [10], to mention a few. Since the notion of closed patterns has close relationship with formal concepts, it has been extensively studied in the field of Formal Concept Analysis (FCA). FCA has been developed as a field of applied mathematics, which provides us a clear mathematization of the notions of concept and conceptual hierarchy [9]. Many methods have been studied for efficiently computing closed patterns in frequent itemset mining (FIM) as well as FCA (see, e.g., excellent surveys in [16,27]). Moreover, several approaches have been proposed to parallelize algorithms for computing closed patterns, fully utilizing multi-core processors so that we can improve efficiency and scalability required in pattern mining. Among others, Krajca et al. [13] proposed a parallel version of algorithm Close-by-One (CbO) [15] in
*
Corresponding author. E-mail address:
[email protected] (H. Seki).
1 Current affiliation: Fujitsu Limited, Japan. https://doi.org/10.1016/j.dam.2018.03.080 0166-218X/© 2018 Elsevier B.V. All rights reserved.
Please cite this article in press as: H. Seki, M. Nagao, Parallel algorithms for enumerating closed patterns from multi-relational data, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.03.080.
2
H. Seki, M. Nagao / Discrete Applied Mathematics (
)
–
FCA. Negrevergne et al. [18] proposed a parallel closed itemset mining algorithm based on LCM (Linear time Closed itemset Miner) [26], which is one of the most successful algorithms in FIM. The recent work by Negrevergne et al. [19] proposes ParaMiner, which is a generic parallel algorithm for mining closed patterns on multi-core processors. It is, however, not immediately clear that their approach will be readily applicable to closed pattern mining in multi-relational data, and no load balancing strategy is discussed. The main aim of this work is to propose a novel parallelization model for mining closed patterns from multi-relational databases (MRDBs) on multi-core processors. We mainly address efficient load balancing strategies for computing closed patterns in parallel and their effectiveness for handling relational patterns in multi-relational data. We first show by experiments that the conventional approaches proposed by Krajca et al. and Negrevergne et al. do not work well, because of imbalance in workloads among the threads on multi-core processors. The search process for computing closed patterns based on the algorithm CbO or LCM constitutes a recursive-call tree. In the conventional approaches, hereafter called the subtree-wise parallelization, we first generate sequentially a set of subtrees in breadth-first manner from the root of the tree to a user-specified depth, called a depth limit. Then, computing closed patterns is performed by searching for each subtree in parallel. However, our experiments show that this subtree-wise parallelization approach leads to poor performance for our test MRDBs. We then propose a new approach to load-balancing, without using a depth limit. In this approach, called the node-wise parallelization, our search process is performed for each node in a search tree in parallel, thereby realizing a finer-grained parallelization. We also propose another load-balancing approach, called the parallel-for based parallelization, which achieves a further finer-grained parallelization, trying to make full use of the task-parallelism intrinsic in the search process of the problem; we give some experimental results, which show the effectiveness of the proposed method. Finally, we apply our proposed methods to compute closed patterns for binary object-attribute relational data, and show by some experiments that the performance of our node-wise parallelization method is comparable to the existing method by Krajca et al. The organization of the rest of this paper is as follows. We first summarize some basic notations and definitions of closed pattern mining in MRDM, followed by a closed pattern mining algorithm in MRDBs in Section 2. We then explain our approach to parallelizing closed pattern mining from MRDBs in Section 3, and show the effectiveness of our methods by some preliminary experimental results in Section 4. In Section 5, we show some experimental results of applying our proposed methods to compute closed patterns for binary object-attribute relational data. Finally, we give a summary of this work in Section 6. A preliminary version of this paper appeared in Nagao and Seki [17]. That version of the paper provides the experimental results using only two multi-relational datasets under a limited condition, which are included in Sections 4.2 and 4.3. In the current paper we provide further experimental results using more datasets under different conditions. Also new in the current paper is that we further apply the proposed parallelization methods to binary object-attribute relational data for computing concepts in Section 5. 2. Closed pattern mining in MRDBs 2.1. Multi-Relational data mining We recall some basic notions of MRDM, which can be found in e.g., [7,8]. We assume some familiarity with the notions of (inductive) logic programming (e.g., [20]), although we introduce some notions and terminology in the following. An atom is an expression of the form p(t1 , . . . .tn ), where p is a predicate symbol of arity n, denoted by p/n, and each ti is a term, i.e., a constant or a variable. A substitution θ = {X1 /t1 , . . . , Xn /tn } is an assignment of terms to variables. The result of applying a substitution θ to an expression (i.e., an atom or a conjunction of atoms here) E is the expression E θ , where all occurrences of variables Vi have been simultaneously replaced by the corresponding terms ti in θ . E θ is called an instance of E. The set of variables occurring in E is denoted by Var(E). A pattern is expressed as a conjunction l1 ∧ · · · ∧ ln of atoms, denoted simply by l1 , . . . , ln . For a pattern C , let answerset(C , DB) be the set of substitutions θ such that C θ is logically entailed by a database DB, denoted by DB |= C θ . Example 2.1. Consider a multi-relational database DB in Fig. 1, which consists of five relations, Customer, Parent, Buys, Male and Female. For each relation, we introduce a corresponding predicate symbol, i.e., customer , parent, buys, male and female, respectively. Consider now a pattern P of the form: customer(X ), parent(X , Y ), buys(X , pizza), meaning that a customer X has a child Y and X buys pizza. For a substitution θ , P θ is logically entailed by DB, denoted by DB |= P θ , if there exists a tuple (a1 , a2 ) such that a1 ∈ Customer, (a1 , a2 ) ∈ Parent, and tuple (a1 , pizza) ∈ Buys. Then, the set of such substitutions θ is: answerset(P , DB) = {{X /allen, Y /bill}, {X /allen, Y /jim}, {X /carol, Y /bill}}.
□
Please cite this article in press as: H. Seki, M. Nagao, Parallel algorithms for enumerating closed patterns from multi-relational data, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.03.080.
H. Seki, M. Nagao / Discrete Applied Mathematics (
)
–
3
Fig. 1. An example of datalog database DB with predicate symbol customer as a key.
In MRDM, one of predicate symbols is often specified as a key (e.g., [5,6]), which determines the entities of interest and what is to be counted. The key is thus to be present in all patterns considered. In Example 2.1, the key is predicate symbol customer. We assume henceforth that the arity of a key is 1 for simplifying the explanation, and we call an atom whose predicate symbol is a key a key atom. Given a database DB and a conjunction C containing a key atom key(X ), the support (or frequency) of C , denoted by supp(C , DB, key), is defined to be the number of different instances of the key atom that answer C divided by the total number of the instances of the key atom. C is said to be frequent, if supp(C , DB, key) is no less than some user defined threshold min_sup. A pattern containing a key will not be always meaningful; e.g., let C = customer(X ), parent(X , Y ), buys(Z , pizza) be a conjunction in Example 2.1. Variable Z in C is not linked to variable X in key atom customer(X ); an object represented by Z will have nothing to do with key object X , which will be inappropriate as an intended pattern to be mined. To specify the so-called language bias, we use the following condition, which is due to the notion of linked literals [11]. Definition 2.1 (Linked Atom [11]). Let C be a pattern which contains a key atom key(X ). An atom l in C is said to be linked to key(X ), if either X ∈ Var(l) or there exists an atom l1 in C such that l1 is linked to key(X ) and Var(l1 ) ∩ Var(l) ̸ = ∅. A pattern C is said to satisfy the bias condition if key(X ) ∈ C and, for each l ∈ C , l is linked to key(X ). The set of patterns satisfying the bias condition is denoted by L. □ 2.2. Closed patterns in MRDM As in [25], we consider an equivalence relation ∼DB on the set of patterns: two patterns C1 and C2 are said to be equivalent with respect to database DB if and only if answerset(C1 , DB) = answerset(C2 , DB). Given two patterns Cg and Cs , Cs is more specific than Cg , if Cs θ -subsumes Cg , i.e., all atoms in Cg θ occur in Cs . Definition 2.2 (Closed Pattern). Let DB be a database and ∼DB the equivalence relation on the set L of patterns. A pattern C is said to be closed (w.r.t. DB and L), iff C is the most specific pattern among the equivalence class to which it belongs: {C1 ∈ L | C ∼DB C1 }. For a pattern C , its closure, denoted by Clo(C ), is a closed pattern which is the most specific among the equivalence class of C : {C1 ∈ L | C ∼DB C1 }. □ Stumme [25] showed that the set of frequent closed patterns forms a semi-lattice. Taking the above bias condition into consideration, we can show [24] that the set of frequent closed patterns in our framework also forms a semi-lattice. Example 2.2. Continued from Example 2.1. Fig. 2 shows the semi-lattice constructed from the set C of closed patterns associated to DB in Example 2.1 with the support count 1, where each pattern C ∈ C has customer(X ) as a key atom, denoted Please cite this article in press as: H. Seki, M. Nagao, Parallel algorithms for enumerating closed patterns from multi-relational data, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.03.080.
4
H. Seki, M. Nagao / Discrete Applied Mathematics (
)
–
Fig. 2. The semi-lattice of the set of closed patterns associated to DB in Example 2.1: In the figure, a substitution θ of the form: θ = {X /t1 , Y /t2 } is denoted simply by (t1 , t2 ). Moreover, the name of each person in the tables in DB is abbreviated to its first character.
by key(X ) for short, C is supposed to contain at most two variables (i.e., X , Y ), and the 2nd argument of predicate buys is a constant. □ 2.3. Sequential algorithm for mining closed patterns from MRDBs Given a MRDB, we employ here a closed pattern mining algorithm called ffLCM (Algorithm 1) [22], which is a sequential algorithm based on the notion of closure extension (originally proposed in CbO [15,21,26]), and is applied to MRDM, taking our bias condition into consideration. We assume that each atom in the set A of atoms is totally ordered with its index i ≥ 0, where i is a natural number, and the index of key atom key(X ) is 0. Each atom in a conjunction C = l1 , . . . , ln is supposed to be ordered in ascending order according to the total order on A. The ith prefix of C , denoted by C [i], is the prefix of C such that it consists only of atoms whose indices are no greater than i. The algorithm is invoked initially with (Clo(key(X )), 0). In line 6, the canonicity test [15] is employed to avoid the duplicate generation of the same patterns; namely, it checks whether C [i − 1] = C ′ [i − 1], that is, the (i − 1)-prefix of C is preserved. The algorithm recursively calls itself in line 7, thus the search process of the algorithm constitutes a recursive-call tree shown in Fig. 3. We call such a tree a ppc-extension (prefix-preserving closure extension) tree [26] hereafter. In Algorithm 1, pi (i ≥ 0) refers to ith atom in A. Algorithm 1: Closed Pattern Mining: ffLCM(C , j) input : closed conjunction C , integer j, the set A of atoms, minimum support min_sup, MRDB DB, key key 1 2 3 4 5 6 7 8
output C ; for i ← j + 1 to |A| do if pi ∈ C then continue; if C ∧ pi does not satisfy the bias condition, or is infrequent then continue; C ′ ← Clo(C ∧ pi ) ; if C ′ fails the canonicity test then continue; call ffLCM(C ′ , i); end
// pi : ith atom in A // C ′ : closure of C ∧ pi
The correctness of the algorithm ffLCM and its computational complexity are shown in [22]. We note the difference of the computational complexity between closed pattern mining in multi-relational data and that in a conventional transaction database. Given a transaction database T , LCM [26] enumerates all frequent closed patterns in O(∑ ∥T (P)∥ × |I |) time for each pattern P, where ∥T (P)∥ is the total size of the occurrence set T (P), defined by ∥T (P)∥ = t ∈T (P) |t | and I is the set of items. On the other hand, Algorithm 1 takes O(|answerset(C , DB)| × |DB|2 × |A|2 ) time for enumerating all frequent closed patterns from each pattern C , where |DB| is the size of DB, i.e., the number of tuples in DB. Please cite this article in press as: H. Seki, M. Nagao, Parallel algorithms for enumerating closed patterns from multi-relational data, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.03.080.
H. Seki, M. Nagao / Discrete Applied Mathematics (
)
–
5
Fig. 3. A tree structure of computing closed patterns.
One way to see this difference is that, to compute answersets in MRDM, we should perform the operations of natural join multiple times to compute a counterpart of T . The time complexity required for computing closures in MRDM will be therefore a price intrinsic to the representation of multi-relational data. 3. Parallel algorithms for closed pattern mining in MRDBs In this section, we present some parallel algorithms to compute closed patterns in a MRDB. In particular, we exploit the task-parallelism existing in Algorithm 1. 3.1. The subtree-wise approach The conventional approach to closed itemset mining using task-parallelism is to compute in parallel each of subtrees of the ppc-extension tree of a given problem [13,18]. Since the computation of each subtree can be done independently, the given problem is broken down into subproblems (i.e., the computation of subtrees), and the solutions to the subproblems are combined to obtain the final result. Therefore, the conventional approach is based on a divide-and-conquer algorithm, which is suitable for a parallel implementation on multi-core processors. Algorithm 2: Parallel Closed Pattern Mining: st_para-ffLCM input : closed conjunction C , integer j, minimum support min_sup, MRDB DB, key key, depth limit L, the number of threads P 1 2 3 4 5 6 8 9 10 11
compute the ppc-extension tree T rooted at Clo(key(X )) in a breadth-first manner up to depth L; for all leafs (C , j) of T do store (C , j) to queue Q ; end invoke P threads; for each thread r (r = 0, . . . , P − 1) do in parallel while ((C , j) = Q .poll()) ̸ = null do call ffLCM(C , j); end endfpar
Algorithm 2 shows an application of the aforementioned subtree-wise parallelization algorithm to the sequential algorithm Algorithm 1, while Fig. 4 illustrates its parallelization schema. We assume that we have P (P ≥ 1) threads which can be executed in parallel and there exists a single queue Q , where tasks generated during computation will be stored. Each thread is able to access to the queue Q , and it will either poll the queue for a task or offer a task to the queue. The algorithm consists of two phases. In the first phase (in line 1–4), we first generate sequentially a set of subtrees in breadth-first manner from the root of the initial ppc-extension tree T to a user-specified depth limit L. We then store all the roots (C , j) of those subtrees into the queue Q in line 2–4, where C is a closed conjunction and j is an integer. In the second Please cite this article in press as: H. Seki, M. Nagao, Parallel algorithms for enumerating closed patterns from multi-relational data, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.03.080.
6
H. Seki, M. Nagao / Discrete Applied Mathematics (
)
–
Fig. 4. A subtree-wise parallelization schema.
Fig. 5. The average execution time and its standard deviation of the parallel execution phase in the subtree-wise parallelization: the mutagenesis dataset. P: #(threads). The average execution time is per thread (processor).
phase, called the parallel execution phase, we perform the sequential computation (line 9) of each subtree in Q in parallel using P threads (in line 6–11). More specifically, each thread polls the queue Q and it will retrieve a task, say (C , j), unless Q is empty. The thread then computes sequentially all the closed conjunctions descendent to (C , j) by calling the sequential algorithm ffLCM(C , j). Each thread repeats this process until the queue Q is empty. If Q is empty, i.e., Q .poll() = null, then the thread terminates its computation. From our preliminary implementation, however, this subtree-wise parallelization approach has shown poor performance when applied to our test datasets of MRDM (see Section 4.1), and some statistics (i.e., average execution time of the parallel execution phase and its standard deviation) are shown in Fig. 5 for the mutagenesis dataset (the details of our experiments will be explained in Section 4.1). In particular, the figure shows that the standard deviation σ of the average execution time m becomes large and its ratio σ /m is about 68.0% when the number P of threads is 8. We can thus see that one of the reasons for the poor performance is due to the imbalance in workloads assigned to each thread. 3.2. Node-wise parallelization approach The idea behind our approach is to reduce the imbalance in workloads assigned to each of threads. In the subtree-wise parallelization approach, when the single queue Q becomes empty, a thread polling Q terminates its computation, regardless of the loading states of the other threads. Therefore, it cannot give a help to the other threads, even if they have still load to process. In fact, we can see this situation in Fig. 5, where the standard deviation of the execution time among the threads becomes large when P ≥ 4. We thus need to realize finer-grained parallelism so that a larger number of finer-grained tasks would offer better task-parallelism. To do that, we adopt a node-wise parallelization approach as shown in Algorithm 3, and it is illustrated schematically in Fig. 6. As in the subtree-wise parallelization algorithm, we assume that we have P (P ≥ 1) threads which can be executed in parallel and there exists a single queue Q for storing tasks generated during computation. The difference between this algorithm and Algorithm 2 is that nodes in a ppc-extension tree are now stored in the queue Q . Namely, we first store an initial root node into Q (line 1). Then, the computation of each node storing in Q is done in parallel using P threads (line 3–14). Newly generated nodes are stored into Q (line 11). Unlike the subtree-wise parallelization approach, the algorithm consists of a single phase, which is executed in parallel. As a byproduct of this approach, it is not necessary to use the user-specified parameter depth limit. Please cite this article in press as: H. Seki, M. Nagao, Parallel algorithms for enumerating closed patterns from multi-relational data, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.03.080.
H. Seki, M. Nagao / Discrete Applied Mathematics (
)
–
7
Algorithm 3: Parallel Closed Pattern Mining: node_para-ffLCM input : closed conjunction C , minimum support min_sup, the set A of atoms, MRDB DB, key key, the number of threads P 1 2 3 5 6 7 8 9 10 11 12 13 14
store (Clo(key(X )), 0) to queue Q ; invoke P threads; for each thread r (r = 0, . . . , P − 1) do in parallel while ((C , j) = Q .poll()) ̸ = null do for i ← j + 1 to |A| do if pi ∈ C then continue; if C ∧ pi does not satisfy the bias condition or is infrequent then continue; C ′ ← Clo(C ∧ pi ); if C ′ fails the canonicity test then continue; store (C ′ , i) to queue Q ; end end endfpar
Fig. 6. Node-wise parallelization schema.
3.3. A further fine-grained approach based on parallel-for Our next approach aims at achieving further finer-gained task-parallelism. The idea behind this approach is to compute the for-loop (line 6) in Algorithm 3 in parallel using available threads. We therefore incorporate a parallel-for based parallelization into the aforementioned node-wise parallelization. Algorithm 4 shows our proposed method, and it is illustrated schematically in Fig. 7. As in the aforementioned two algorithms, we assume that we have P (P ≥ 1) threads which can be executed in parallel and there exists a single queue Q . The difference between this algorithm and Algorithm 3 is that the task of a single node (C , j) in a ppc-extension tree is now divided into P smaller subtasks (C , j, k) (k = 1, . . . , P), where an additional argument k is used to specify those indices i’s for which the foreach-loop (line 6) is performed. Namely, in processing each (C , j, k) of the subtasks, the foreach-loop is processed only for index i satisfying i ≡ k mod P, thereby decomposing a single task (C , j) in the node-wise parallelization into P smaller subtasks (C , j, 1), . . . , (C , j, P). These smaller tasks are stored in the queue Q , which is shown in line 1 for an initial node, and in line 10 for intermediate nodes. An available thread among P threads will poll for one of these smaller subtasks in the queue Q (line 5), and then process it if Q is not empty. 4. Experimental results 4.1. Implementation and test data We have implemented our proposed methods by using Java 1.7 on a PC with two processors (AMD Opteron 4180 c32 processors) with 6 cores each, running at 2.6 GHz, 32 GB of main memory, working under Ubuntu 11.04 (64 bit). We use two datasets, often used in the field of ILP; one is the mutagenesis dataset,1 and the other is an English corpus of the Penn Treebank Project.2 The mutagenesis dataset, for example, contains 30 chemical compounds. Each compound is 1 http://www.cs.ox.ac.uk/activities/machlearn/mutagenesis.html. 2 https://web.archive.org/web/19970614160127/http://www.cis.upenn.edu:80/~treebank/. Please cite this article in press as: H. Seki, M. Nagao, Parallel algorithms for enumerating closed patterns from multi-relational data, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.03.080.
8
H. Seki, M. Nagao / Discrete Applied Mathematics (
)
–
Algorithm 4: Parallel Closed Pattern Mining: para-for-ffLCM input : closed conjunction C , minimum support min_sup, the set A of atoms, MRDB DB, key key, the number of threads P 1 2 3 5 6 7 8 9 10 11 12 13
store (Clo(key(X )), 0, 1), . . . , (Clo(key(X )), 0, P) to queue Q ; invoke P threads; for each thread r (r = 0, . . . , P − 1) do in parallel while ((C , j, k) = Q .poll()) ̸ = null do foreach i ∈ [j, |A|] s.t. i ≡ k mod P do if pi ∈ C then continue; if C ∧ pi does not satisfy the bias condition or is infrequent then continue; C ′ ← Clo(C ∧ pi ); store (C ′ , i, 1), . . . , (C ′ , i, P) to queue Q ; end end endfpar
Fig. 7. Parallel-for based parallelization schema. P: the number of threads, N: the size of the set A of atoms. A single task of computing the for-loop for (C , j) in the node-wise parallelization approach is decomposed into P subtasks of computing the foreach-loops for (C , j, 1), . . . , and for (C , j, P).
represented by a set of facts using predicates such as atom, bond, for example. The number of predicate symbols is 19. The size of key atom (active(X )) is 230, and minimum support min_sup = 1/230. We assume that patterns contain at most 4 (resp., 8) distinct variables for the mutagenesis (resp., the English corpus) dataset. 4.2. Results of the node-wise parallelization approach Fig. 8 summarizes some statistics of the node-wise parallelization (Section 3.2) for a test data on the mutagenesis dataset. From the figure, we can see that the workload balancing among the threads has been improved compared with that of the subtree-wise approach (Fig. 5); when the number of threads is P = 8, the standard deviation of the execution time has been reduced to about 41% of that of the subtree-wise approach. This improvement leads to the speedup of the execution time compared with the subtree-wise approach, as shown in Fig. 9. However, the speedup ratio, i.e., the ratio of the execution time of one thread to that of P threads, has been still limited; for example, when P = 8, the speedup ratio is about at most 4.07 in the mutagenesis dataset. Fig. 9 shows some results in the English corpus dataset as well. In this case, the subtree-wise approach performs well; its speedup ratio is about 5.95 when P = 8, and the standard deviation of the execution time is small enough: σ /m = 4.78 × 10−3 . The node-wise parallelization performs further well; when P = 8, its speedup ratio is about 6.34 and the standard deviation of the execution time is an order of magnitude less than that of the subtree-wise approach. 4.3. Results of the parallel-for based approach Fig. 8 also shows some statistics of the parallel-for based parallelization (Section 3.3) for the same test datasets in the above. For the mutagenesis dataset, we can see that the workload balancing among the threads has been further improved compared with that of the node-wise approach so that, when the number of threads is P = 8, the standard deviation of the execution time has been reduced to less than about 0.2% of that of the node-wise approach. This improvement leads to the significant speedup of the execution time compared with the subtree-wise approach, as shown in Fig. 10. When the number of threads is P = 8, the speedup ratio is now about 6.47, which confirms the effects of good balancing in workloads among the threads. Please cite this article in press as: H. Seki, M. Nagao, Parallel algorithms for enumerating closed patterns from multi-relational data, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.03.080.
H. Seki, M. Nagao / Discrete Applied Mathematics (
)
–
9
Fig. 8. The average execution time and its standard deviation of the node-wise parallelization and the parallel-for parallelization: the mutagenesis dataset. P: #(threads). The average execution time is per thread (processor).
Fig. 9. Results of the node-wise parallelization approach: the Mutagenesis dataset (left) and the English corpus (right). The execution time of the subtreewise approach is the total execution time of the two phases of the algorithm.
Fig. 10. Results of the parallel-for based approach: the Mutagenesis dataset (left) and the English corpus (right). The execution time of the subtree-wise approach is the total execution time of the two phases of the algorithm.
For the English corpus dataset, the parallel-for based parallelization works less effectively than the other two approaches; its speedup ratio is about 5.36 when P = 8. As explained in Section 4.2, the other two approaches already perform well in this case, and the standard deviations σ of the execution time of both of the approaches are small enough. The parallel-for based parallelization makes σ further small, which causes overhead in this case. For this dataset, the node-wise parallelization gives the best result. 4.4. Results of further experiments We have used two more datasets; one is the Northwind database (NW),3 which contains the sales data for a fictitious company called Northwind Traders. The Northwind database contains 13 tables; among others, table orders contains 830 tuples (i.e., there are 830 order_id’s), and table order_details contains 2155 tuples. The other is the PKDD’99 Financial 3 https://northwinddatabase.codeplex.com/. Please cite this article in press as: H. Seki, M. Nagao, Parallel algorithms for enumerating closed patterns from multi-relational data, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.03.080.
10
H. Seki, M. Nagao / Discrete Applied Mathematics (
)
–
Fig. 11. Relative speedup of the proposed methods: the NorthWind dataset (left) and the Financial dataset (right).
Fig. 12. Standard deviations of execution time: the northwind database (left) and the financial dataset (right). P: #(threads). The execution time is per thread (processor).
database,4 which contains 8 tables. Among them, table loan contains 682 tuples (i.e., there are 682 loan_id’s), consisting of 606 successful and 76 not successful loans, and the other tables contain information and transactions of the loans. Figs. 11 and 12 show the effects of the aforementioned three parallelization methods. Overall, the parallel-for based approach works well for both datasets. The standard deviations of the execution time are the smallest among the three approaches, and the workload balancing among the threads is well implemented, compared with the other two approaches. In particular, when the number of threads is P = 8, the speedup ratio is now more than 6.0 in both of the datasets. The node-wise approach works well for the Financial dataset; the standard deviations of the execution times are almost the same as those of the parallel-for based approach. However, the node-wise approach works less effectively for the NW; the standard deviations of the execution times of the approach are rather large for P = 8, which results in poor speedup. Next, we have examined the effects of imposing minimum support thresholds min_sup. Fig. 13 and Table 1 summarize the effects of the aforementioned three parallelization methods for the mutagenesis dataset and the English corpus dataset when min_sup = 0.1. The table shows that the std’s of the execution times of the parallel-for based approach are small also for both of the datasets, compared with the other two approaches. More specifically, for the mutagenesis dataset, the parallel-for based approach works well for the workload balancing among the threads as shown in Fig. 13; when the number of threads is P = 8, the speedup ratio is more than 7.0, which is better than that of computing all concepts (i.e., min_sup_count = 1). For the English corpus dataset, the parallel-for based approach works less effectively; when P = 8, the speedup ratio for min_sup = 0.1 is now about 4.60, while it was 5.34 when computing all the concepts. In this case, the speedup ratio of the parallel-for based approach is almost comparable with the node-wise approach, and it still outperforms the subtree-wise approach, which is shown in Fig. 13. To see the reason of this result, we consider the overhead cost for parallelization, i.e., the time needed to manage multiple threads of computation. As a rough estimate of the overhead (per thread), we define the overhead ratio by overhead/Tseq , where overhead = (ΣiP=1 Ti − Tseq )/P, P is the number of threads, Ti is the execution time of the ith thread (1 ≤ i ≤ P), and Tseq is the sequential execution time. That is, the numerator of overhead is the total execution times of P threads minus the sequential execution time Tseq , which would be regarded as an overhead due to parallelization. In Table 1, we show the overhead ratios of the aforementioned three parallelization methods when P = 8. For the English corpus dataset, the overhead ratio of the parallel-for approach is larger than that of the node-wise approach, which can be considered as a reason of less effectiveness of the approach in this case. 4 https://relational.fit.cvut.cz/dataset/Financial. Please cite this article in press as: H. Seki, M. Nagao, Parallel algorithms for enumerating closed patterns from multi-relational data, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.03.080.
H. Seki, M. Nagao / Discrete Applied Mathematics (
)
–
11
Fig. 13. Relative speedup of the proposed methods: the Mutagenesis dataset (left) and the English corpus dataset (right): min_sup = 0.1. Table 1 Standard deviations of execution times: min_sup = 0.1. P: #(threads). The execution time (ms) is per thread (processor). o.h.: overhead ratios in percent (%) for P = 8. Mutagenesis dataset
English corpus dataset
#Concepts Tseq (ms)
2536 9664
2275 3371
P
Subtree
Node
para-for
Subtree
Node
para-for
2 4 8 o.h.
0.0 500 727 (−1.92)
0.0 31.6 1193 (6.01)
0.0 0.433 2.45 (1.20)
5.0 55.7 88.0 (0.852)
0.50 7.58 22.7 (6.16)
0.0 7.66 11.6 (7.97)
5. Parallel computing of formal concepts for binary relations 5.1. Node-wise parallelization for binary relations We now apply our proposed parallelization methods to compute formal concepts of a given binary object-attribute relation. Instead of Algorithm 1 for MRDM, we use Algorithm 5 for computing formal concepts; it is the same as in Close-byOne by Kuznetsov [15], which is used by Krajca et al. [13] for their parallel algorithm. Let ⟨G, M , I ⟩ be a given formal context such that G = {0, 1, . . . , m} and M = {0, 1, . . . , n} for some m, n ≥ 0. Given a formal concept ⟨A, B⟩ and y ∈ M, Algorithm 5 recursively computes all formal concepts which are obtained by adding j ∈ M such that j > y. Algorithm 5: GenerateFrom(⟨A, B⟩, y) input : formal concept ⟨A, B⟩, an attribute y. 1 2 3 4 5 6 7 8 9
output B; if y > |M | then return; for j from y upto |M | do if y ∈ / B then ⟨C , D⟩ ← Clo(⟨A, B⟩, j); if ⟨C , D⟩ fails the canonicity test then continue; call GenerateFrom(⟨C , D⟩, j + 1); end end
Our implementation environments are the same as that in Section 4.1. We use several data tables from the UCI Machine Learning Repository.5 Fig. 14 contains some results of the three approaches to computing concepts, using the mushroom dataset and the anonymous web dataset (AWD). We set the depth limit L = 2 in the subtree-wise approach. From the figure, we can see that all the three methods work equally well for the AWD, while the parallel-for based approach works less effectively for the mushroom dataset. To see the reason for this, Table 2 shows some statistics of the three approaches for both datasets. There is much difference between the node-wise approach and the parallel-for based approach in the overhead rate (shown in parentheses in the table) when #(threads) P = 8 in the mushroom dataset, while the workload balancing among the threads is almost the same in both approaches. The reason of this is that the parallel-for based approach 5 http://archive.ics.uci.edu/ml/. Please cite this article in press as: H. Seki, M. Nagao, Parallel algorithms for enumerating closed patterns from multi-relational data, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.03.080.
12
H. Seki, M. Nagao / Discrete Applied Mathematics (
)
–
Fig. 14. Results of the three parallelization approaches to computing concepts: the mushroom dataset (left) and the anonymous web dataset (right).
Table 2 Standard Deviations of Execution Times of the Three Parallelization Methods for Binary Relations: P: #(threads). The execution time (ms) is per thread (processor). o.h.: overhead ratios in percent (%) for P = 8. Mushroom dataset
AWD
#Concepts Tseq (ms)
238,708 5782
129,007 42,538
P
Subtree
Node
para-for
Subtree
Node
para-for
2 4 8 o.h.
0.0 0.560 9.32 (0.0)
0.320 0.767 1.48 (−10.8)
0.239 0.548 1.08 (12.6)
0.0 0.287 2.19 (0.0)
0.276 0.774 1.51 (−10.8)
0.380 0.618 0.807 (−10.8)
generates more tasks with finer-grained sizes than the node-wise parallelization approach, which causes an overhead in parallel computation of the concepts in the mushroom dataset. Next, we examine some effects of the depth limit L in the subtree-wise approach proposed by Krajca et al. [13], by varying it from 1 to 3 in Fig. 15 and Table 3. We denote by STd the subtree-wise approach with its depth limit L = d for short. We also denote by σd the std of the execution times of STd when P = 8. The figure shows that there exists some differences in the improvement of speedup ratios of ST1 and ST3 between the mushroom dataset and the AWD. This difference indicates the role of the depth limit L in the subtree-wise approach; namely, the larger L becomes, the smaller σd will become. On the other hand, the larger L takes more time to perform the serial computation to first generate sequentially a set of subtrees from the root of an initial ppc-extension tree, i.e., the sequential computation corresponding to line 1 in Algorithm 2. We thus consider another parameter, tserial , the ratio of the serial execution time to Tseq , the (totally) sequential execution time, and the efficiency of the subtree-wise approach depends on the balance of σd and tserial . The mushroom dataset is an example, where the positive effects of the smaller σd are more significant than the cost of the larger tserial . Namely, when P = 8, the std σ3 of ST3 is much smaller than σ1 , and tserial of ST3 is not so large as to decrease the positive effects of the smaller σ3 , which makes ST3 better than ST1 . In the AWD, the situation is opposite; ST3 works less effectively compared with ST1 . ST3 has much larger tserial compared with ST1 . Although the std σ3 of ST3 is much smaller than σ1 of ST1 , its effects are limited, and not enough to compensate for the negative effects of the larger tserial . ST3 thus works worse than ST1 in this case. In both datasets, ST2 shows the best performance, and our node-wise parallelization approach is almost comparable with it. In summary, as far as these experimental results are concerned, the proposed node-wise parallelization approach is an alternative to the conventional subtree-wise one, and it has an advantage that there is no need to give a depth limit in advance. 5.2. Results of further experiments We have used two more datasets from UCI Machine Learning Repository; one is the adult dataset,6 whose size |G|×|M | = 48, 842 × 14, and we have omitted numerical attributes from our experiments. The other is the Internet advertisements dataset (Ads),7 whose size |G| × |M | = 3279 × 1553, and missing values and numerical attributes have been preprocessed by a standard way such as replacing missing values with the means from the other values and equi-width binning. Based on 6 https://archive.ics.uci.edu/ml/datasets/Adult. 7 https://archive.ics.uci.edu/ml/datasets/Internet+Advertisements. Please cite this article in press as: H. Seki, M. Nagao, Parallel algorithms for enumerating closed patterns from multi-relational data, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.03.080.
H. Seki, M. Nagao / Discrete Applied Mathematics (
)
–
13
Fig. 15. Results of the subtree-wise approach with varying its depth limit L and its comparison with the node-wise parallelization approach to computing concepts: the mushroom dataset (left) and the anonymous web dataset (right).
Fig. 16. Relative speedup of the proposed methods: the adult dataset (left) and the Ads dataset (right).
Table 3 Standard Deviations of Execution Times of the Subtree-wise Approach with Varying its Depth Limit L: P: #(threads). The execution time (ms) is per thread (processor). tserial (o.h.): the ratio of the serial execution time to Tseq (overhead ratio) in percent (%) when P = 8, respectively. Mushroom dataset
AWD
Tseq (ms)
5782
P
L=1
L=2
L=3
L=1
42,538 L=2
L=3
2 4 8 tserial o.h.
0.0 696 605 0.671 1.85
0.0 0.560 9.32 1.54 0.0
0.1 0.1 1.14 4.97 2.07
0.0 708.1 2109 0.099 0.104
0.0 0.287 2.19 0.691 0.0
0.0 0.287 0.553 12.1 1.22
the observations in Section 5.1, we have used the aforementioned two parallelization methods, the subtree-wise approach with its depth limit L = 2, i.e., ST2 , and the node-wise approach in the following experiments. Fig. 16 and Table 4 show the effects of the two methods. Overall, the node-wise approach works well for both datasets. The standard deviations of the execution times of the approach are smaller than or comparable to those of the subtree-wise approach ST2 , and the workload balancing among the threads is relatively well implemented. ST2 works well in the adult dataset, while it works less effectively in the Ads dataset. In the latter case, the std σ2 of the execution times of ST2 is rather large, and tserial , the ratio of the computation time of the serial part to Tseq , is also large compared with the former case. These factors restrict the expected effects of the subtree-wise approach. Next, we have examined the effects of imposing minimum support thresholds min_sup, which are shown in Fig. 17 and Table 5. The results are for the mushroom dataset and the AWD when min_sup = 0.1. In the mushroom dataset, the node-wise approach works well, attaining its speedup ratio higher than the subtree-wise approach. The std of the node-wise approach is smaller than that of the subtree-wise approach when P = 8, as we can see from Fig. 17. In the AWD, the node-wise approach works less effectively. The parallelization effects are not obtained; when the number of threads is P = 8, the speedup ratio is less than 1. In this case, the speedup ratio of the node-wise approach is almost comparable with the subtree-wise approach, as shown in Fig. 17. As the table shows, the overheads become large so that the Please cite this article in press as: H. Seki, M. Nagao, Parallel algorithms for enumerating closed patterns from multi-relational data, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.03.080.
14
H. Seki, M. Nagao / Discrete Applied Mathematics (
)
–
Table 4 Standard Deviations of Execution Times of the Subtree-wise and the Nodewise Approach: P: #(threads). The execution time (ms) is per thread (processor). tserial (o.h.): the ratio of the serial execution time to Tseq (overhead ratio) in percent (%) when P = 8, respectively. Adult dataset
Ads
Nc Tseq (ms)
425,693 32,311
13,380 2081
P
Subtree
Node
Subtree
Node
2 4 8 tserial o.h.
0.569 0.517 0.822 0.683 (0.875)
0.220 1.76 3.24 – (5.03)
0.491 19.0 147 4.14 (2.56)
1.59 1.92 1.80 – (2.11)
Table 5 Standard Deviations of Execution Times of the Subtree-wise and the Nodewise Approach: P: #(threads). The execution time (ms) is per thread (processor). min_sup = 0.1. tserial (o.h.): the ratio of the serial execution time to Tseq (overhead ratio) in percent (%) when P = 8, respectively. Mushroom dataset
AWD
Nc Tseq
4896 983
8 31
P
Subtree
Node
Subtree
Node
2 4 8 tserial o.h.
0.1 0.339 6.88 7.95 (3.11)
0.399 0.817 2.36 – (4.86)
0.0 0.346 0.066 78.7 (−2.58)
0.602 3.86 4.33 – (98.4)
Fig. 17. Relative speedup of the proposed methods: min_sup = 0.1. The mushroom dataset (left) and AWD (right).
resulting speedup is limited. On the other hand, the subtree-wise approach has large tserial , which decreases the effect of its parallelization. 6. Concluding remarks In this paper we have proposed parallel algorithms for enumerating closed pattern from multi-relational databases. Since efficiency and scalability have been major concerns [2] in MRDM, we have developed some parallel algorithms to address these issues, focusing on the task-parallelism underlying in the problem. More specifically, we have shown by experiments that the conventional approach [13,18], called the subtree-wise parallelization, does not work well for MRDM due to imbalance in workloads of multi-core processors. To solve this imbalance in workloads, we have proposed two approaches with new load-balancing strategies, the node-wise parallelization and the parallel-for based parallelization. These approaches enable us to utilize the task-parallelism in a given problem with smaller sizes of granularity than the subtree-wise parallelization. We have shown that the proposed parallelization approaches have achieved better performance in some test multi-relational data. Furthermore, we have shown by some experiments that the node-wise parallelization approach, when applied to compute closed patterns for binary object-attribute relational data, works comparably to the existing method by Krajca et al. [13]. Please cite this article in press as: H. Seki, M. Nagao, Parallel algorithms for enumerating closed patterns from multi-relational data, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.03.080.
H. Seki, M. Nagao / Discrete Applied Mathematics (
)
–
15
In this paper, we have shown the effectiveness of our methods experimentally. We plan to accumulate more experimental results using different support thresholds on different datasets. The theoretical analysis of our parallel algorithms will be also an interesting research problem to be explored. Another research direction is to study another type of parallel algorithms in MRDM. In this work, we have studied some parallel algorithms for MRDM using the task-parallelism in the problem. Krajca et al. [14] have proposed an algorithm which allows us to perform closed pattern mining in a distributed manner using Map-Reduce framework. In [12,23], we have proposed an approach to mining closed patterns from distributed multi-relational databases. It will be interesting to further investigate the data-parallelism such as Map-Reduce in MRDM. Acknowledgements The authors would like to thank anonymous reviewers for their constructive and useful comments on the previous version of the paper. This work was partially supported by JSPS Grant-in-Aid for Scientific Research (C) JP15K00305. References [1] Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, L. Lakhal, Mining minimal non-redundant association rules using frequent closed itemsets, in: Computational Logic, CL 2000, in: LNCS, vol. 1861, Springer, 2000, pp. 972–986. [2] H. Blockeel, M. Sebag, Scalability and efficiency in multi-relational data mining, SIGKDD Explor. Newslett. 4 (2) (2003) 1–14. [3] J.-F. Boulicaut, A. Bykowski, C. Rigotti, Free-sets: a condensed representation of boolean data for the approximation of frequency queries, Data Min. Knowl. Discov. 7 (1) (2003) 5–22. [4] T. Calders, C. Rigotti, J.-F. Boulicaut, A survey on condensed representations for frequent sets, in: Constraint-Based Mining and Inductive Databases, in: LNCS, vol. 3848, Springer, 2006, pp. 64–80. [5] L. De Raedt, J. Ramon, Condensed representations for Inductive Logic Programming, in: Proc. Intl. Conf. on the Principles of Knowledge Representation and Reasoning, KR ’04, 2004, pp. 438–446. [6] L. Dehaspe, Frequent Pattern Discovery in First-order Logic (Ph.D. thesis), Dept. Computer Science, Katholieke Universiteit Leuven, 1998. [7] S. Dzeroski, Multi-relational data mining: An introduction, SIGKDD Explor. Newslett. 5 (1) (2003) 1–16. [8] S. Dzeroski, N. Lavrač (Eds.), Relational Data Mining, Springer, 2001. [9] B. Ganter, R. Wille, Formal Concept Analysis: Mathematical Foundations, Springer, 1999. [10] G.C. Garriga, R. Khardon, L. De Raedt, On mining closed sets in multi-relational data, in: Proc. Intl. Joint Conf. on Artificial Intelligence, IJCAI ’07, 2007, pp. 804–809. [11] N. Helft, Induction as nonmonotonic inference, in: Proc. Intl. Conf. on the Principles of Knowledge Representation and Reasoning, KR ’89, 1989, pp. 149–156. [12] Y. Kamiya, H. Seki, Distributed mining of closed patterns from multi-relational data, J. Adv. Comput. Intel. Intel. Inform. 19 (6) (2015) 804–809. [13] P. Krajca, J. Outrata, V. Vychodil, Parallel algorithm for computing fixpoints of Galois connections, Ann. Math. Artif. Intel. 59 (2) (2010) 257–272. [14] P. Krajca, V. Vychodil, Distributed Algorithm for Computing Formal Concepts Using Map-Reduce Framework, in: Advances in Intelligent Data Analysis VIII — Intl. Symp. on Intelligent Data Analysis, IDA ’09, Springer, 2009, pp. 333–344. [15] S.O. Kuznetsov, Learning of simple conceptual graphs from positive and negative examples, in: Proc. Third European Conf. on Principles of Data Mining and Knowledge Discovery, PKDD ’99, Springer-Verlag, 1999, pp. 384–391. [16] S.O. Kuznetsov, S.A. Obiedkov, Comparing performance of algorithms for generating concept lattices, J. Exp. Theoret. Artif. Intel. 14 (2–3) (2002) 189–216. [17] M. Nagao, H. Seki, Towards parallel mining of closed patterns from multi-relational data, in: Proc. 2015 IEEE 8th Intl. Workshop on Computational Intelligence and Applications, IWCIA ’15, 2015, pp. 103–108. [18] B. Négrevergne, A. Termier, J. Méhaut, T. Uno, Discovering closed frequent itemsets on multicore: Parallelizing computations and optimizing memory accesses, in: Proc. Intl. Conf. on High Performance Computing & Simulation, HPCS ’10, 2010, pp. 521–528. [19] B. Négrevergne, A. Termier, M. Rousset, J. Méhaut, ParaMiner: a generic pattern mining algorithm for multi-core architectures, Data Min. Knowl. Discov. 28 (3) (2014) 593–633. [20] S.-H. Nienhuys-Cheng, R. de Wolf, Foundations of Inductive Logic Programming, in: LNAI, vol. 1228, Springer, 1997. [21] N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Discovering frequent closed itemsets for association rules, in: Proc. Intl. Conf. on Database Theory, ICDT ’99, Springer, 1999, pp. 398–416. [22] H. Seki, Y. Honda, S. Nagano, On enumerating frequent closed patterns with key in multi-relational data, in: Discovery Science, DS 2010, in: LNAI, vol. 6332, Springer, 2010, pp. 72–86. [23] H. Seki, Y. Kamiya, Merging closed pattern sets in distributed multi-relational data, in: Proc. Intl. Conf. on Concept Lattices and their Applications, CLA ’14, 2014, pp. 71–82. [24] H. Seki, S. Tanimoto, Distributed closed pattern mining in multi-relational data based on iceberg query lattices: Some preliminary results, in: Proc. Intl. Conf. on Concept Lattices and their Applications, CLA ’12, 2012, pp. 115–126. [25] G. Stumme, Iceberg query lattices for datalog, in: Conceptual Structures At Work, in: LNCS, vol. 3127, Springer, 2004, pp. 109–125. [26] T. Uno, T. Asai, Y. Uchida, H. Arimura, An efficient algorithm for enumerating closed patterns in transaction databases, in: Discovery Science, DS 2004, in: LNAI, vol. 3245, Springer, 2004, pp. 16–31. [27] S.B. Yahia, T. Hamrouni, E.M. Nguifo, Frequent closed itemset based algorithms: A thorough structural and analytical survey, SIGKDD Explor. Newslett. 8 (1) (2006) 93–104. [28] M.J. Zaki, Mining non-redundant association rules, Data Min. Knowl. Discov. 9 (3) (2004) 223–248.
Please cite this article in press as: H. Seki, M. Nagao, Parallel algorithms for enumerating closed patterns from multi-relational data, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.03.080.