Mass transfer in polymers in a supercritical CO2-atmosphere
J Supercomput DOI 10.1007/s11227-016-1919-0
Multi-query processing of XML data streams on multicore Soo-Hyung Kim1 · Kyong-Ha Lee2 Yoon-Joon Lee1
·
...
Abstract The multicore architecture has been the norm for all computing systems in recent years as it provides the CPU-level support of parallelism. However, existing algorithms for processing XML streams do not fully take advantage of the facility since they have not been devised to run in parallel. In this article, we propose several methods to parallelize the finite state automata (FSA)-based XML stream processing technique efficiently. We transform a large collection of XPath expressions into multiple FSA-based query indexes and then process XML streams in parallel by virtue of the index-level parallelism. Each core works only with its own query index so that no synchronization issue occurs while filtering XML streams with multiple path patterns given by users. We also present an in-memory MapReduce model that enables to process a large collection of twig pattern joins over XML streams simultaneously. Twig pattern joins in our approach are performed by multiple H/W threads in a shared and balanced way. Extensive experiments show that our algorithm outperforms conventional algorithms with an 8-core CPU by up to ten times for processing 10 million XPath expressions over XML streams.
School of Computing, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Korea
2
Division of Convergence Technology Research, KISTI, 245 Daehak-ro, Yuseong-gu, Daejeon 34141, Korea
123
S.-H. Kim et al.
Keywords Data streams · XML · Query processing · Parallel processing · Multicore architecture
1 Introduction XML has become one of the most popular data formats for data representation and transmission on the Internet [5]. Many data in various applications have been typed in XML by virtue of its simplicity and extensibility. XML data are dynamically generated in general. The typical examples include ticket makers, network traffic data, web services, and log streams. Such XML data, known as XML streams, typically require real-time data processing, which means that the XML data are required to process as soon as they are delivered. In this respect, it has been a great challenge to process XML streams in a timely manner. Therefore, processing queries over such continuous, unbounded, and sequentially accessed XML streams has been gaining significant attention in the recent decade [2,11,14,20,32,33,42,44]. Specifically, XML stream processing is to find structural matchings from a series of small-sized XML documents with a given set of XPath expressions. In XML stream applications, the results of XML stream processing are structural matchings, i.e., matching elements from a series of XML documents. Meanwhile, a multicore CPU, also known as chip-level multiprocessor (CMP), has been rapidly adopted in various computing systems as a more cost-effective architecture, delivering more computing facilities like chip-level H/W thread support and less power consumption [21]. However, conventional XML stream processing algorithms do not instantly benefit from the features of the multicore since they have not been initially considered to work in parallel. Most of conventional algorithms are serial and only a few of studies on parallel XML stream processing have been reported in the literature. However, the recent studies on parallel XML stream processing also failed to address the issue of handling a massive set of queries over XML streams. They only considered processing a single XPath expression [15,23,30,33,39] or processing multiple simple path patterns rather than complex twig patterns [44]. To shed light on the issues, we propose a set of algorithms called Distributed query Index for XML stream querY processing on multicore (DIXY). Our algorithms process XPath expressions over XML streams in parallel. Moreover, our algorithms are able to process many XPath expressions simultaneously in a shared and balanced way. To achieve this, we devise a new partitioning scheme that partitions an FSA-based query index into multiple indexes at runtime, rather than following conventional data or query partitioning schemes. In our approach, each core performs path pattern filtering with its own query index so as to guarantee workload balance across cores. Twig pattern joins, which join path pattern solutions to find relevant twig pattern matchings from XML streams, are also performed in parallel. This is achieved by implementing the join operations with an in-memory MapReduce programming model. To the best of our knowledge, this is the first study on processing a massive set of XPath expressions over XML streams in parallel on a multicore CPU. The main contributions of this article are summarized as follows:
123
Multi-query processing of XML data streams on multicore
Index-level partitioning scheme In our approach, an NFA-style query index for all XPath expressions given by users is transformed to multiple DFA-style query indexes as many as the number of cores in a system. As such, each core works only with its own query index and any communications between cores are not required for state transitions so that any synchronization issue does not occur while filtering XML streams. This scheme also helps naturally balance workloads across cores since every core performs only one state transition for each incoming element. Sharing input scans and path solutions It is wasteful to process a single XPath expression at a time in XML stream processing when a massive set of XPath expressions are given by users. Actually, many XPath expressions in the query set share common linear path patterns with each other in practice. In our approach, we share input scans and path solutions so as to reduce redundant processing of path patterns and save many computations and I/Os. Path pattern solutions are also shared by multiple twig pattern join operations in our approach. While joining path solutions for finding twig patterns, a group of join operations assigned to a reducer share the path solutions with each other. Multiple twig pattern joining in parallel In our approach, multiple twig pattern join operations are distributed across reducers and simultaneously executed as many as the number of H/W threads. We also implement a twig pattern join operation with a holistic twig pattern join algorithm, which is proven to be optimal for a certain subset of XPath expressions, to improve both computational and I/O efficiency. Runtime workload balancing and multi-query optimization In parallel processing, a straggling task lags the overall job execution. The native runtime scheduling scheme in the MapReduce programming model often does not work well when input data are severely skewed [26,27]. To address the issue, we rather exploit a dynamic shuffling scheme that balances workload across reducers at runtime. We group similar twig pattern join operations as many as the number of reducers such that the sum of the costs of join operations in each group is equal or similar to ones of other groups. We then assign each group into each reducers so as to perform multiple joins at once in each reducer. To achieve this, we first estimate the cost of each twig pattern join operation before actual joining. This cost estimation can be simply done as the worst-case I/O and CPU time complexity of the holistic twig join algorithm is linear in the sum of sizes of input path solutions [6,10]. The sizes of path solutions are naturally counted after finishing a path pattern filtering step. While performing the cost estimation, we also consider the sizes of path solutions shared by multiple twig pattern queries. The more common path patterns twig patterns have, the more chance to be grouped the join operations have. We then assign twig pattern joins into reducers at runtime so that every reducer approximately has the same cost of join operations. Experimental evaluation For performance evaluation, extensive experiments were conducted with two different datasets, i.e., XMark and Treebank. We also compared our algorithms with various conventional systems. We further evaluated the scalability and the elapsed time of DIXY by varying the number of cores, the num-
123
S.-H. Kim et al.
ber of queries, query types, probability of sub-operators/predicates, and selectivity. In addition, we measured the memory usage of DIXY. The remainder of this article is organized as follows. Section 2 describes the preliminary knowledge that helps understand our approach. We describe our approaches in Sect. 3. A quantitative analysis of our approach is proposed in Sect. 4. Section 5 presents the results of our extensive experiments. Related work is presented in Sect. 6. Finally, we conclude this article in Sect. 7.
2 Preliminaries 2.1 XML streams and XPath query processing An XML stream is modeled as an unbounded series of XML documents which come in real time. A single XML document is a rooted, ordered, and labeled tree where each node corresponds to an element or a value and edges represent either of element– element or element–value relationships between two nodes. The total order on the nodes in an XML tree is obtained by a preorder traversal of the tree nodes. Figure 1 presents an example of an XML document and its tree structure. Note that in the tree structure, each element is labeled in interval-based numbering scheme, e.g., (2, 13, 2) for element B. Interval-based numbering scheme, also known as region numbering scheme, helps promptly identify a relationship between any two nodes in an XML tree without tree traversal. The labels are ternary tuples, i.e., (star t, end, level) [12]. For any two XML tree nodes u and v, u is an ancestor of v if and only if u.star t < v.star t and u.end > v.end. A node u is a parent of a node v if and only if u is an ancestor of v and v.level = u.level + 1. For example, element A is a parent of element B since A’s star t < B’s star t, A’s end > B’s end and A’s level + 1 = B’s level. Also, a node u precedes a node v in document order if and only if u.star t < v.star t. v1v2v3 v4 v5 v6
A (1,24,1) B (2,13,2) C (3,8,3)
v1
v2
(14,23,2)
E (9,12,3) C (15,16,3)
D (4,5,4) E(6,7,4) D (10,11,4)
Fig. 1 Sample XML document and its tree structure
123
B
v3
v4
F (17,22,3)
C (18,19,4)
G (20,21,4)
v5
v6
Multi-query processing of XML data streams on multicore
Fig. 2 XPath expression samples, twig patterns, and their decomposed path patterns
Querying XML documents identifies elements that their values and structures match a given query. XPath(XML Path Language) is a basic query language for XML data to find matches in an XML document [7]. A single XPath expression is commonly modeled as a twig pattern whose nodes are connected by either parent–child(/) or ancestor–descendant(//) axes. A predicate, denoted by ‘[ ]’, is applied to check conditions, either of a value or a substructure. Wildcards are also used to accept any element. Most of existing approaches for twig pattern matching start from decomposing a twig pattern into several linear path patterns (see Fig. 2). Once all the instances matched with the linear path patterns are found, they are joined together to produce instances matched with the twig pattern. Numerous algorithms for twig pattern query processing have been reported in the literature, and some of them have been proven to be optimal for a certain class of twig patterns. Readers are referred to a survey for various XML query processing techniques [17]. In this article, we use a simplified version of XPath query language version 1.0 defined as follows: QUERY ::= ( LOCATION_STEP )+ LOCATION_STEP ::= AXIS NODE ( [ PREDICATE ] )∗ NODE ::= element_name | * PREDICATE ::= NODE ( AXIS NODE )∗ | element_ name OP value AXIS ::= / | // OP ::= > | < | = | ≤ | ≥ 2.2 Automata-based XML stream processing Automata-based XML stream processing algorithms are known to be efficient for processing a large set of structural queries [42]. In the algorithms, a query index is built in a form of finite state automata before runtime and then nodes in the XML streams are evaluated by using state transitions, triggered by either of start or end element events. To build a query index, XPath expressions are first decomposed into linear path patterns as shown in Fig. 2. For example, an XPath expression /A/B[C]/F is decomposed into three linear path patterns /A/B, /A/B/C, and /A/B/F.
123
S.-H. Kim et al. Path ID List State ID {p2} B
1
{p1,p10} C 3 2
F
{p3} 4 * 16
A {p4}
0 B
*
2
6
C
7
* C D
{p7} 10
*
11 * 14
C G
Value
(0,A)
{1}
(1,B)
{2,16}
p2
4
p3
6
p4
(4,*)
{18}
17 {p11}
8
p5
(16,*)
{16}
9
p6
(16,C)
{17}
8 {p5}
10
p7
(5,A)
{10,14}
13
p8
9 {p6} A
p1, p10
Hash Table Key
3 18 {p12}
E
5
Path IDs
E
13 {p8} 15 {p9} 12
(2,C)
{3}
(2,F)
{4}
(5,B)
{6}
(5,*)
{5}
15
p9
(6,C)
{7}
17
p11
(6,E)
{9}
18
p12
(7,D)
{8}
(10,*)
{11}
(11,C)
{12}
(12,E)
{13}
(14,*)
{14}
(14,G)
{15}
Fig. 3 Example of an NFA-style query index
A nondeterministic finite automata (NFA)-style query index that shares common prefixes is presented in the literature [14,42]. In the approach, a single query index A is defined by a quintuple (Q, Σ, Q 0 , δ, F), where Q is a finite set of states, Σ is a finite set of input XML elements, Q 0 is a starting state, δ is a transition function: Q × Σ → Q, F is a set of accepting states, and a subset of Q that correspond to identifiers of given queries, respectively. Figure 3 presents an example of the NFAstyle query index built with a set of XPath expressions shown in Fig. 2. When a start element event comes, states in the query index are transited to the next states with respect to δ and the snapshot of the current states are stored in a runtime stack. This is simply done by pushing the identifiers of the states into the stack. Note that δ is typically implemented with a single hash table for guaranteeing constant lookup time as shown in Fig. 3. When an end element event comes, backtracking is performed by popping the runtime stack. The benefit of this approach is the ease of inserting and deleting queries. On the other hand, this approach suffers from the exponentially increasing number of state transitions if queries are deep and have many //-axes. To address this limitation, a deterministic finite automata (DFA)-style query index has been also reported in the literature. Since we can track only a single state transition for each element event in the DFA-style query index, the computational complexity of path pattern filtering is guaranteed to be O(1). However, building a full version of DFA from an NFA with n states requires powerset computation whose time complexity is O(2n ). XMLTK avoids the powerset computation by lazily building a DFA at runtime with a given set of queries over incoming XML streams [18]. Figure 4 presents a partial snapshot of the DFA-style query index in XMLTK, translated from the NFA in Fig. 3. It is noteworthy that in the DFA-style query index the sizes of query ID lists, each of which is stored in a bucket in another hash table, corresponding to a state rather sharply increase while the number of runtime active states/transtions decreased to 1. Since each state in the DFA is built by combining corresponding NFA states, a single
123
Multi-query processing of XML data streams on multicore A 5,10,14,16 … B A B 5,6,14,16 … 2,5,6,11,14,16 C B C B C 3,5,7,12,14,16,17 A E A 1,5,10,14 … C 5,11,12,14 5,10,11,14 C … B A 5,6,11,14 G [others] G … 5,11,14,15 [others] [others] B 0,5 B … 5,6,14 5,11,14,15 C A … 5,12,14 G 5,11,14 [others] B A … 5 … 5,14,15 B … [others] G B … A 5,6 5,10,14 A A E 5,9 A B C B A D 5,7 5,8 …
5,11,14
[others]
A
…
Fig. 4 Building a DFA-style query index by merging states in an NFA-style query index shown in Fig. 3
DFA state can have many redundant query IDs associated with accepting states in the corresponding NFA. The query IDs are kept in a hash table for constant-time lookups; however, query IDs in a single bucket still need to be sequentially scanned. Therefore, sequential scans of the query IDs can severely degrade the overall performance if the size of a given query set is large (see Fig. 7). 2.3 Parallelization strategies for XML query processing Some parallelization strategies for XML query processing have been reported in the literature [3,4]. These strategies are threefold: data partitioning, query partitioning, and hybrid partitioning. In the data partitioning strategy, each thread processes a certain portion of XML data with the same query set. In the strategy, we further have two choices in how to partition XML data. The first way is to partition each XML document into multiple chunks, as shown in Fig. 5a. The problem of this strategy in XML stream processing is that XML streams are unbounded so that it is hard to partition an XML stream1.xml