An efficient B+-tree design for main-memory database systems with strong access locality

An efficient B+-tree design for main-memory database systems with strong access locality

Information Sciences 232 (2013) 325–345 Contents lists available at SciVerse ScienceDirect Information Sciences journal homepage: www.elsevier.com/l...

5MB Sizes 2 Downloads 53 Views

Information Sciences 232 (2013) 325–345

Contents lists available at SciVerse ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

An efficient B+-tree design for main-memory database systems with strong access locality Pei-Lun Suei a, Victor C.S. Lee b,⇑, Shi-Wu Lo c, Tei-Wei Kuo a,d,e,⇑ a

Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, ROC Department of Computer Science, City University of Hong Kong, Hong Kong c Department of Computer Science and Information Engineering, National Chung-Cheng University, Chia-Yi, Taiwan, ROC d Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan, ROC e Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan, ROC b

a r t i c l e

i n f o

Article history: Received 9 January 2011 Received in revised form 21 November 2011 Accepted 20 December 2012 Available online 5 January 2013 Keywords: Main-memory database RFID applications Access locality Index structures

a b s t r a c t This paper is motivated by the strong demands of many main-memory database applications with strong locality in data access, such as front-end logistical systems. We propose to adopt an auxiliary-tree approach with an tree-merging algorithm to efficiently handle bursty data insertions with keys in a small range and avoid significant overheads in tree rebalancing. A range-based deletion algorithm is then proposed to process data deletions with strong access locality in a batch fashion. The capability of the proposed approach is evaluated by a series of experiments with a wide range of workloads and a variety of locality patterns, where different tree index structures are compared in terms of the performance and memory space requirements. Ó 2013 Elsevier Inc. All rights reserved.

1. Introduction In recent years, the advance of hardware and software technology has opened up many new database application domains. It is now perfectly feasible in the manipulation of a database of hundreds of megabytes or even several gigabytes entirely stored in volatile memory for applications with extremely high-performance requirements. Example applications are those with several radio-frequency identification (RFID) readers to identify and track a large number of products, animals, vehicles, and even persons within a very short duration of time. Sometimes, the number of objects such as RFID tags per truck container, could be large, and the time duration, such as the passing time of a truck through a control gate, could be very short [21,31]. Many other applications also have the characteristics of high-rate data access of keys with strong access locality, including the process line of products manufacturing management [6,21,31], logistics management [23] goods retail management [31], and inventory management [31]. From Fig. 1, the example application of the process lines of product manufacturing presents the needs to have large amounts of key insertions in a small range and in a short duration of time. These applications (and our industry collaboration projects, such as one with the Institute for Information Industry, i.e., the major software-development institute in Taiwan) inspire us to explore an efficient index design for main-memory databases (rather than for disk-based databases) to improve data access performance significantly. The handling of data, that flow in continuously and rapidly, has been widely explored in the literature [1,10]. With the feasibility of the accommodation of an entire database system in the main memory, the performance of such a database

⇑ Corresponding authors. Tel.: +886 233664888x315 (T.-W. Kuo), +852 34428617 (V.C.S. Lee). E-mail addresses: [email protected] (P.-L. Suei), [email protected] (V.C.S. Lee), [email protected] (S.-W. Lo), [email protected] (T.-W. Kuo). 0020-0255/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ins.2012.12.018

326

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

Fig. 1. An example application of the process lines of product manufacturing.

system is significantly improved and become much more predictable, and excellent research results were reported in the literature, e.g., [8,12,15,32]. Researchers [5,7,22,25] also started to explore efficient index structures, such as the AVL tree, B+-tree [7], T-tree [22], and their derivatives [5,25], for main-memory database systems. Among many index structures, B+-tree is one of the most popular ones although it could suffer from significant overheads in tree reorganization, due to insertions and deletions [3,7,19,24,26,29]. Similar to B+-trees, T-trees and AVL trees would suffer from significant overheads in tree rebalancing when data are updated, inserted or deleted in a skewed fashion in main-memory (and non-main-memory) databases. Such skewed data updates could often be observed in bursty data with keys in a small range (referred to as strong access locality for the rest of this paper). Note that AVL trees are, in general, no better than T-trees in data manipulations with strong access locality even though each T-tree node could have more data than an AVL-tree node does (because each T-tree only has two pointers). Such an observation motivates this work in the exploring of data manipulations with strong access locality in a short period of time. How to reduce the overheads in tree rebalancing has received a lot of attention in the past years. Many excellent results were proposed based on different lazy rebalancing or/and overflow-buffer strategies, especially for B+-trees [4,11,18,20]. The researches [20,27,28] proposed to rebuild a tree with the temporary structures resulted from the lazy rebalancing/overflow buffer when some threshold condition is satisfied. Because of those deferred rebalancing strategies, tree rebalancing can be done more efficiently in a batch-oriented fashion. Similar to the rebalancing problem due to data insertions, methodologies that delete one datum individually might also result in tremendous overheads in tree rebalancing [16,26]. Besides the approaches extended from lazy rebalancing and overflow-buffer strategies, researchers also explore methodologies in the deletion of data with keys in a given range, e.g., [9,29,33], where a number of leaf nodes were deleted or modified before a tree reorganization is done. Although excellent results were done to reduce the overheads in tree rebalancing for data insertions and deletions, little work is done for data insertions and deletions when strong access locality is considered for bursty data manipulation requests, especially when main-memory databases are considered. In this paper, an auxiliary-tree approach is proposed for bursty data manipulations with strong access locality over main-memory databases. B+-trees are explored as a target in the study, even though the proposed approach could also be applied to other index trees, such as B-tree, 2–3–4-tree, R-tree [13] for spatial data, UB-tree [2] for multi-dimensional data, and BX-tree [17] for moving-object locations, which are different variations of B+-tree-based index trees for different applications. The ideas in data insertions and deletions could be adopted by using the concept of node composing and tree merging. We propose to adopt an auxiliary tree to cache the tentative results of a selected collection of data insertions, and an effective tree-merging algorithm is proposed to merge an auxiliary tree into a B+-tree. The auxiliary tree is built very efficiently and differs from the method for building a conventional B+-tree. An efficient batch deletion algorithm based on a range-based strategy is then proposed for data deletions of strong access locality to reduce the overheads in tree reorganization. The proposed approach is evaluated by a series of experiments with a wide range of workloads and a variety of locality patterns. The experimental results showed that the proposed methodology could greatly outperform a naive B+-tree batch insertion method (NBBI) that we implement in the experiments and the conventional B+-tree methods in terms of both response time and space utilization. It was shown that NBBI and conventional B+-trees under the proposed approach need more memory space, compared to those with AVL trees and T-trees, but B+-trees tended to be more efficient in terms of the performance. With the main memory as the target storage medium for databases, the size of a leaf/internal node is no longer restricted by the size of a disk sector (or the size of a logical disk block). In other words, it is possible to explore a proper node size in the experiments (or for applications) in the optimization of the memory space usage and the system performance. The rest of this paper is organized as follows: Section 2 provides the motivation of this work. Section 3 presents the batch insertion and range-based deletion algorithms. Their properties are also shown in Section 3. Section 4 summarizes the experimental results. Section 5 is the conclusion.

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

327

2. Motivation and related work 2.1. Motivation This paper is motivated by the strong demands of main-memory data applications in the handling of bursty data access with strong locality. Before proceeding with further discussions, terminologies of B+-trees are summarized, and the motivation is presented. A B+-tree is one variant of the B-tree structure, in which leaf and internal nodes are nodes without and with some child node, respectively. A leaf node of a B+-tree of order P is of the following structure (hDPtr0, Key1i, . . . , hDPtrn1, Keyn1i, Pnext), where Keyi1 < Keyi are keys for 0 < i < n and n < P, and each DPtri is a data pointer to the physical data record with the key value Keyi. Pnext is a pointer to its neighboring leaf node. An internal node contains at most (P  1) keys and P pointers and is of the following structure (Ptr0, Key0, Ptr1, Key1, . . . , Keyn1, Ptrn), where Keyi1 < Keyi are keys for 0 < i < n and n < P, and each Ptri is a pointer to a child node. For every key value X in the sub-tree pointed at by Ptri, Keyi1 6 X < Keyi for 0 < i < n,X < Keyi for i = 0, and Keyi1 6 X for i = n. An example B+-tree of order 3 with the height 2 is shown in Fig. 2, where the height of a tree is the maximum number of nodes from the root to a leaf node. For the simplicity of presentation, data are referred to as their corresponding keys. Any insertion, deletion, or modification to a piece of data is considered as that of its corresponding key to the B+-tree under considerations. When data are manipulated, the B+-tree under considerations could need the corresponding adjustment, and merging and splitting of nodes might happen in tree rebalancing. We shall use the following example to illustrate the rebalancing problem due to bursty access with strong locality: Consider a B+-tree of order 3, as shown in Fig. 3a, and the insertions of three data 20, 25, and 30 with strong access locality (that are in a small range). After the insertion of 20 into the tree, a rebalancing activity with a node splitting occurs, where new and affected nodes are marked in a gray color, as shown in Fig. 3b. New internal and leaf nodes are created. The insertions of 30 and 25 result in rebalancing again, as shown in Fig. 3c and d. The frequent rebalancing activities are clearly resulted from insertions of data with strong access locality. Even though there are only four nodes with four pieces of data initially, the insertions of three more pieces of data result in two rebalancing with the creating of three additional leaf nodes and three additional internal nodes. As astute readers might point out, deletions or modifications of data with strong access locality would also result in a similar serious rebalancing problem. Such problems could be a serious threat to many main-memory database applications that need restricted and predictable system performance. This observation motivates this work in the proposing of a proper methodology with manipulation algorithms to resolve the overhead problems for many main-memory database applications with strong access locality.

Fig. 2. A B+-tree of order 3.

Fig. 3. Data insertion in a conventional B+-tree.

328

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

2.2. Related work In this section, we review prior works, including the well-known memory index structure, bulk loading methods, batch insertion methods, and range-based deletion methods. T-tree is a widely-used index structure for main-memory database systems [22], where the tree reorganization process of a T-tree is similar to that of AVL tree, such as LL and LR rotation operations. The major difference between a T-tree and an AVL tree is that each node of T-tree, namely a T-node, contains more than one datum. The left pointer of a T-node points to a child node with keys less than the minimal key value of the T-node; its right pointer points to a child node with keys larger than the maximal key value of the T-node. Because T-tree requires to maintain tree balancing, tree reorganization will be serious when the insertion/deletion data have a skewed distribution. We include the T-tree index structure as one of the comparing targets in the experiments. To reduce the overhead of tree reorganization, bulk loading for data insertion is a popular technique. Sort-based and buffer-based bulk loading methods are two popular strategies for bulk loading [4]. Sort-based methods first sort the insertion data and build an index tree for those sorted data in a bottom-up manner when they are applied to B+-tree. Sorting cost will be an issue, especially for all large amounts of insertion data. Buffer-based methods use memory buffers to store new or updated data for internal nodes except for the root node. When the memory buffer is full or reaches a threshold, the data in the buffer are adjusted to the next level of the tree. The buffer-based method makes good use of memory space to defer data changes and avoid frequent disk writing. Our proposed batch insertion algorithm shares the idea of efficient inserting of data in bulk. Different from the sort-based bulk loading methods, our proposed batch insertion method merely requires to sort small pieces of data rather than all insertion ones. The continuous pieces of sorted data are then quickly built as an auxiliary tree. This design greatly reduces the sorting cost for large volume of data. Unlike the buffer-based bulk loading methods, the proposed batch insertion method maintains large amounts of insertion data into an auxiliary tree and later the auxiliary tree is directly merged with the main index tree, rather than adjusting the index tree level by level. Some researches [27,28] design multiple indexes to buffer new insertion data in an index tree stored in high-performance storage, e.g., main memory, and merge it with another index tree. For example, the method of buffering in B-tree partitions [11] (that combines multiple partitions into one) is very similar to a merge sort. The basic LSM-Tree (log-Structured MergeTree) [28] is designed for transactional log records and is a disk-based index tree which has two index trees: One is resident in disk, and the other is resident in memory. A rolling merged process [28] is proposed to combine the changes stored in the memory index tree into the disk index tree in a way similar to the merge sort when the memory index tree reaches a threshold size. The rolling merged process starts from the leftmost sequence of entries in the memory index tree and reorganizes them into a leaf node of the disk index tree. In order to reduce disk I/O cost, a memory buffer helps store successive leaf nodes in the pages of a buffer block until the block is full; then the block is written back to the disk. Different from the LSM-Tree, our target index is based on the B+-tree and focuses on main memory. The LHAM (Log-structured History Data Access) [27] is an index structure for transaction-time databases. The idea of the LHAM is to divide the time domain into successive intervals and assign each interval to a separate storage component, such as main memory (storing the most-recent records), disks, or optical disks. When a component becomes full, data is migrated to the next component and a rolling merged process, which follows the idea of the LSM-Tree [28] is performed. A cursor [27] is used to compare keys of the to-bemerged trees and then some record versions are moved to the target component. This merging behavior is also similar to a merge sort. Block buffer is applied to reduce I/O operations and disk overheads in terms of seeks and rotational delays. Our proposed method shares the idea of the tree merging with some literatures [11,27,28]. However, there are fundamental differences between our proposed method and the past work, e.g., [11,27,28]. Firstly, high-performance requirements and good memory space usage of main-memory database systems are the goals of this work, rather than the reducing of I/O operations and disk overheads. We focus our study on main-memory-resident index trees. Secondly, the methods for building an auxiliary tree and the merging methodology of two index trees are quite different. The works in [11,27,28] did not especially focus on how to insert data into the memory index tree (a B+-tree), and the merging operation is similar to a merge sort. Unlike the method for building a conventional B+-tree, an auxiliary tree is built through the proposed pile-up/bound operations, which create fewer nodes and result in less tree rebalancing. Moreover, the auxiliary tree designed by our proposed method uses the largest possible sub-tree as the unit for merging and merges directly with the main index tree in a topdown order. No other structure is supported while index trees are merging. The proposed methods could reduce tremendous overheads of data insertions and deletions due to tree rebalancing when the strong access locality of data is considered. Different from the past work in batch deletions, e.g., [9,29,33], where a number of leaf nodes were deleted or modified before a tree reorganization is done, our range-based deletion algorithm does not need rebalancing for every data deletion in a node or a sub-tree. Instead, all the data in the deletion area are deleted at once, and rebalancing is then done for only one time.

3. An auxiliary-tree approach for B+-trees 3.1. Overview In this section, we present the proposed auxiliary-tree based approach to maintain the results of insertions with strong access locality as an auxiliary B+-tree, and later merge the auxiliary tree into the B+-tree under consideration, as shown in

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

329

Fig. 4. A to-be-merged auxiliary tree and a B+-tree.

Fig. 4. In the example, an auxiliary tree T2, which is also a B+-tree, is merged into the B+-tree T1, where all the keys of T2 are larger than those of T13, possibly requiring a new internal node in the merging process. The example concept presents a (sub) tree as the unit for merging in our proposed batch insertion algorithm, instead of batch merging one or several leaf nodes at a time. A batch insertion algorithm is proposed in Section 3.2 as an efficient method for building an auxiliary tree, and to merge the auxiliary tree into a B+-tree using the largest possible sub-tree of the merged trees. The batch insertion algorithm to be proposed should first locate the correct position in the target B+-tree and then merge the auxiliary tree into the final B+-tree. The rationale behind the algorithm is to find a proper internal node with a height equivalent to the height of the auxiliary tree for merging, where the height of a node is the number of nodes from the node to the leaf, including itself. Moreover, the merging of the auxiliary tree into a B+-tree should consider the keys of the trees. In particular, the minimal key of the auxiliary tree would be added to the parent node of the selected node (in consideration of merging) of the B+-tree. In some cases, the auxiliary tree might need to be partitioned into sub-trees in the merging process. A proposed range-based deletion algorithm is presented later in Section 3.3 for bursty deletions with strong access locality. The main idea of the proposed algorithm is to separate the keys to be deleted from the tree and to remove them individually, so as to avoid too much rebalancing of overheads. The remaining keys after deletion might be of more than one (sub-) tree, and the trees can be merged by the proposed tree-merging algorithm, as described above. As a comparison, consider the insertions of 20, 25, and 30 into the B+-tree shown in Fig. 3a. The insertions of 20, 25, and 30 by the proposed batch insertion method would result in an auxiliary B+-tree, as shown in Fig. 5a. In the example shown in Fig. 5a, the auxiliary tree has the same height as that of the B+-tree in Fig. 3a. An internal node, i.e., the root in this example, is created for the proposed batch insertion, as shown in Fig. 5b. Compared with Fig. 3d, fewer additional nodes are created in the batch insertion, and no rebalancing is experienced. To illustrate an example for data deletion, let keys in T12 of Fig. 4 be deleted. The remaining keys, such as those of T11 and T13, will be of more than one tree, and merging of trees would be needed. When there is no ambiguity, we refer any deletion/insertion/modification of data to that of their corresponding keys for the rest of this paper. 3.2. An auxiliary tree and batch-insertion In this section, we shall present the auxiliary tree idea and its associated algorithms for batch insertions. In Section 3.2.1, the batch insertion algorithm is presented. An auxiliary tree is created with a collection of arrival insertions. The pile-up and bound algorithms, which are presented in Section 3.2.2, are invoked when it is needed to merge the auxiliary tree into a B+-tree. 3.2.1. Batch-insertion algorithm In the batch insertion algorithm, new data are first inserted into an auxiliary B+-tree. The main steps are to classify data into groups, and then pile up or bound these groups into an auxiliary B+-tree in a bottom-up manner. Next, the main and the auxiliary B+-trees are merged into a final B+-tree by the same bound operation. Before proceeding with further discussion, two notations are defined: Queuei is denoted as the ith queue, which is an array for storing B+-trees. Queueij is denoted as the jth B+-tree in the ith queue. As shown in Algorithm 1, insertion requests are stored in a data buffer upon their arrival, and an auxiliary tree will be constructed by the algorithm for the buffered requests. Note that the invocation time of the batch insertion algorithm could

Fig. 5. An auxiliary tree and result built by the proposed batch insertion method.

330

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

be determined based on the application characteristics, or simply when the buffer is full. Firstly, the buffered data are sorted and grouped into several leaf nodes (lines 2–3). Each leaf node is filled with the maximum number of data, i.e., P  1 data for a P-order tree. This can maximize space utilization and reduce the number of nodes. The data pointers in a leaf node are linked to the physical storage space of the data records with their corresponding key values. Each leaf node can be regarded as an independent B+-tree. Subsequently, the leaf nodes are combined to form the auxiliary tree. Algorithm 1. The Batch-Insertion algorithm

There are two core algorithms (or referred to as operations) called pile-up and bound in the batch insertion algorithm for constructing the auxiliary tree, and combining it with the main B+-tree. V B+-trees in Queue1 are piled up or bound into a single B+-tree, where V is less than or equal to P (line 6), and is denoted as a processing unit in each iteration (inside the for loop of line 5). When the heights of all the V B+-trees are the same, the pile-up algorithm is used to combine these trees (lines 9–12); otherwise, the bound algorithm is used (line 15). These two algorithms will be presented in Section 3.2.2. The batch insertion algorithm repeats the pile-up and bound operations until all V B+-trees are combined into one (lines 4–19). This final B+-tree is the auxiliary tree built for the buffered data. Lastly, it is combined with the main B+-tree by using the bound algorithm (line 23). It is easy to notice that the number of pile-up operations is far greater than the number of bound operations, because most B+-trees have the same height in the construction process. 3.2.2. Pile-up and bound algorithms When the heights of the n B+-trees are the same, the pile-up operation (in Algorithm 4) is used to combine the trees. An internal node is created for all the n B+-trees (line 4). Keyi of the internal node is set to the key value of the leftmost leaf node

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

331

of the (i + 1) th B+-tree, and accordingly, the Ptri is linked to the ith B+-tree (lines 7–9). The new tree processed by the pile-up algorithm satisfies the key constraints of the B+-tree, which implies that the tree built by the pile-up operations is still a B+-tree. The bound operation (Algorithm 6) is used to combine two B+-trees of different heights. There are two steps in the bound operation. The first step is to search for an initial bounding node in the taller B+-tree (lines 3–7), and the second step is to combine them (lines 8–34). For convenience of description, the taller and the shorter B+-trees are called T1 and T2, respectively. The minimal key value of the tree led by T2 is the key to be searched for in T1. The search for the initial bounding node starts at the root node of T1, which is recursively traversed downwards from the top, until the height of a child node in T1 is the same as that in T2 (lines 3–7). This child node of T1 is the initial bounding node. The initial bounding node of T1 has two properties: (1) It has the same height as that of T2. (2) If there is no intersection between T2 and the sub-tree rooted at the initial bounding node of T1, T2 can be combined with T1 directly. Otherwise, T2 cannot be combined with T1 directly, and each of the sub-trees of T2 has to perform the bound operation recursively. If there is no intersection, T2 can be directly plugged into the initial bounding node of T1 (lines 8–21). The minimum key value of T2, which is usually the first key value of the leftmost leaf node of T2, is inserted into the parent node of the initial bounding node of T1. If there is no overflow, a pointer to T2 is created in this parent node (line 17). Otherwise, splitting is repeatedly triggered until an upper node has enough space to store the promoted key value (line 19). This splitting can propagate all the way up to create a new node and hence a new level for the tree. This behavior is the same as for a key insertion of the conventional B+-tree whether split operation happens or not. Therefore, the tree built by the bound operation with no key intersections is still a B+-tree. If the keys of T2 are distributed throughout the sub-tree rooted by the initial bounding node of T1, a bound operation is performed on each of the child nodes of T2 (lines 23–26). This may occur recursively down to the leaf level of T2. In this case, each single data in the leaf node of T2 is independently inserted into T1 and this resembles the conventional B+-tree (lines 28–30). The new tree which combines two sub-trees through a bound operation with key intersections still follows the key constraints of the B+-tree and is also a B+-tree. All in all, a tree processed by the bound algorithm also returns a B+-tree. To provide better performance, we should allow all user transactions to query the main index and the auxiliary tree when they are merging, and a concurrency control policy is required to guarantee consistent data access. Based on the proposed batch insertion method, an exclusive write lock/latch is held only on the parent node of the bounding node of the main index tree (referred to as P-node) when it and the auxiliary tree are merging. If the minimum key of the auxiliary tree is inserted into the P-node and an overflow happens, the exclusive write lock/latch will be held for all nodes on the path from the root to the P-node to ensure consistent data access because of the potential for node splitting. If there is no key intersection between the main index tree and the auxiliary tree, a shared read lock/latch is held on the root node of the auxiliary tree; otherwise, the shared read lock/latch will be held on the child root node of each sub-tree. Fig. 6 shows a simple example to illustrate the batch insertion algorithm. The new input buffered data {8, 15, 20, 1, 45, 27, 36, 41, 44, 31, 19, 37, 24, 11, 23, 14, 39, 29, 5} have to be inserted into a B+-tree of order 3. In Fig. 6a, ten B+-trees (T1 to T10) are created and filled with the sorted data. Note that the nodes in B+-trees T1 to T9 are full. Next, pile-up operation is executed to create three internal nodes to link T1 to T9 since their heights are the same. The ith key of the newly created internal node is the minimum key value of its (i + 1) th child sub-tree. For instance, in Fig. 6b, the first key value 8 of internal node N1 is the minimum key value of its second child sub-tree. Every three B+-trees are piled up to form a new B+-tree. The last B+-tree T4 is kept for later operation. Similarly, the three B+-trees in Fig. 6b are piled up to form a new B+-tree shown in Fig. 6c. Lastly, the remaining two B+-trees T1 and T2 in Fig. 6c have different heights. Therefore, the bound operation is performed. The comparing key value of T2 is 45 and the initial bounding node of T1 is the node Point. Since there is no intersection between T1 and the sub-tree rooted at node Point of T2, T2 can be combined with T1 directly by inserting the minimum key value 45 to the

Fig. 6. Examples of a batch data insertion.

332

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

(a)

(b)

(c)

Fig. 7. An example for batch insertion with key intersections.

parent node of node Point. Since the parent node of node Point has insufficient space to store key 45, the parent node is split and the middle key 41 of the parent node is promoted to the root. However, the root node is also full, so splitting is performed again. A new root node is created to store the promoted key 31. Fig. 6d shows the final auxiliary tree built for the buffered data. From this example, we can observe the easiness in the construction of the auxiliary tree. Fig. 7 is used to illustrate why an auxiliary B+-tree cannot be combined with the existing main index tree directly. This situation happens when key intersections appear between the main index tree and the auxiliary B+-tree. In the example, two sub-trees of the auxiliary index tree have to perform the bound operation separately, and one sub-tree requires downward to the leaf level to perform the bound operation for each independent leaf node. Fig. 7a is the main index tree, and Fig. 7b is the auxiliary B+-tree which needs to be merged into the main tree. Because the data in tree (b) intersect the data in tree (a), tree (b) cannot be merged into tree (a) immediately, and each sub-tree of tree (b) needs to be bound into tree (a) recursively. The sub-tree (b1)/sub-tree (b3) in Fig. 7b can be combined into the left/right part of tree (a) directly because of no key intersection. The leaf nodes of the sub-tree led by node {210, 395} (shown in (b2) of Fig. 7b) have key intersections with tree (a). Hence, the bound operations are performed down to the lowest level for each leaf node. Leaf nodes {160, 170} and {210, 220} can also be combined into tree (a) directly; however, leaf node {395, 420} cannot. Finally, key {395} and {420} are solely inserted into tree (a). Fig. 7c shows the resulting tree of combining tree (a) and tree (b). 3.3. A range-based deletion algorithm In many main-memory database systems, such as those for logistical systems, selected data might become obsolete or need to be deleted after some period of time; because they have already been moved and stored at back-end systems. Such data often have strong access locality, such as RFID tags read by some readers within a certain period of time. In this section, a range-based deletion algorithm is proposed with the help of the batch insertion algorithm to reduce potential rebalancing overheads. Note that each per-key deletion of a B+-tree might result in the borrowing of keys from neighboring nodes to the left or right of an affected node, or the merging of an affected node with a neighboring one. Any parent nodes of affected nodes might need to be revised in a similar way. Bursty deletions with strong access locality would very much likely generate extremely high overheads in tree rebalancing. The proposed range-based deletion algorithm (i.e., Algorithm 2) is designed to perform the deletion process efficiently for a range of data stored in the tree structure. Large amounts of data are deleted mainly in a batch manner along the boundaries (referred to as searching paths) which partition the tree into deletion and non-deletion areas. Moreover, only the affected nodes in the boundaries of the deletion and non-deletion areas are revised, rather than revising every affected node after deleting one datum or node. First, it is necessary to determine the deletion area. Two parameters are used to obtain the deletion range; they are startVal and endVal, which represent the start and end key values, respectively, of the range to be deleted. All keys between these two parameters need to be deleted. If the value of startVal is null, the left part of the tree whose keys are less than or equal to the endVal will be deleted. Similarly, if the value of endVal is null, the right part of the tree whose keys are greater than or equal to the startVal will be deleted. The next step of the range-based deletion algorithm is to find the searching paths from the root node to the leaf nodes where the startVal and endVal values are located (lines 2–3). The nodes in the searching paths are named searching nodes. Next, the searching paths are traversed backward, and nodes in the deletion area are deleted and reorganized by level (lines 4–19). The reorganization steps (lines 7, 14) in the range-based deletion algorithm are similar to those in the conventional B+-tree. The DelReorganize algorithm (Algorithm 5) handles key borrowing or node merging with a neighboring node. The major differences are in the methods for handling node deletion and space recycling. The range-based deletion algorithm

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

333

deletes data and recycles space in a batch manner. At each level of the searching paths, the child nodes rooted at the searching node in the deletion area are deleted from their leaf nodes upwards in the depth-first search order. Then, the keys and pointers in the searching node are adjusted. The detailed space recycling algorithm is shown in Algorithm 3. We shall use the following examples to illustrate the proposed algorithms. Suppose that we delete data in the left part of the tree in Fig. 8. The searching path in Fig. 8 is {D, B, A}, and it is traversed backward along {A, B, D} to perform data deletions. The data in leaf node A are deleted first (See Fig. 8a). Algorithm 2. The Range-based Deletion Algorithm

334

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

(a)

(b)

(c)

(d)

Fig. 8. An example of range-based data deletion and space recycling.

Assume that the node is not empty after data deletion. Node B is the next node to be processed. Since node C is its only child node located in the deletion area, all the data of node C are deleted completely. The keys and pointers in the searching node B are updated accordingly (See Fig. 8c). Node D is the last searching node, and its child node E is located in the deletion area. Before node E is deleted, a depth-first search is performed to delete its three child nodes, F, G and H first. Fig. 8d shows the resulting tree. Two boundary nodes, startBNode and endBNode, are used to record the leading live nodes along the searching paths (lines 8–10, 15–17) during data deletion. If the value of startVal or endVal is null, startBNode or endBNode is initialized to null accordingly. Otherwise, the initial startBNode points to the leaf node where the largest key is smaller than startVal is located, and the initial endBNode points to the leaf node where the smallest key is greater than endVal is located. If the nodes along the searching path are not empty after data deletion at a certain level, the boundary nodes are moved upward towards the root node. Finally, the sub-trees rooted at the boundary nodes remain after deletion (lines 30–33). If the range of data is clustered in the leftmost or rightmost part of the tree, only one boundary node operates, and one sub-tree remains after deletion. If the range of data is clustered in the middle part of the tree, two boundary nodes operate, and two possible cases may occur: one is that the two boundary nodes point to the same node and only one sub-tree remains (lines 23–25); otherwise, there are two sub-trees left, and the bound algorithm is used to combine these two sub-trees into the resulting tree (lines 26–28). Fig. 9 provides more examples for the proposed algorithms by deleting different key ranges from the tree shown in Fig. 6d. Fig. 9a and b shows the resulting trees by deleting data with keys between 1 and 20 (i.e., left sub-trees) and keys between 37 and 45 (i.e., right sub-trees) from the B+-tree in Fig. 6d, respectively. Fig. 9c and d shows the resulting trees when we delete data with keys between 27 and 36 and keys between 23 and 29 from the B+-tree in Fig. 6d, respectively. Note that these two batch deletions remove leaf nodes in the middle of a B+-tree so that the tree is partitioned into sub-trees. In Fig. 9c, the bound algorithm is used to combine the two sub-trees with different heights. In Fig. 9d, the two sub-trees with the same height are combined by the pile-up algorithm. It is worthwhile to mention that the result performed by the range-based deletion algorithm is still a B+-tree. When a large number of data in a certain part of the tree are deleted, the range-based deletion algorithm can avoid heavy reorganization by deleting data in a batch manner, and only a few nodes along the searching path have to be adjusted.

Fig. 9. Examples of range-based data deletion.

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

335

3.4. Properties Theorem 1. A tree built by the proposed batch insertion algorithm is a B+-tree. Proof. The proposed batch insertion algorithm has two major operations: pile-up and bound operations. From Section 3.2.2, we know that the tree built by the pile-up and bound operations with/without key intersections is a B+-tree. Hence, the tree built by the batch insertion algorithm is a B+-tree. h Lemma 1. There is either one or two different heights of B+-trees in a processing unit when a tree is building for large amounts of new input data. Proof. In the first iteration of the while loop (line 4 in Algorithm 1), i.e., in the leaf level of the B+-tree, all the leaf nodes in a processing unit (at most including V trees) have the same height. Therefore, the kind of different tree height in leaf level is one. When the number of leaf nodes is a multiple of P, all the leaf nodes are able to be piled up to new trees by pile-up operations. The resulting trees of the first iteration will have the same height. If the number of leaf nodes is not a multiple of P, and the remainder of the number of leaf nodes divided P is equal to 1, the last leaf node is unprocessed and remained. However, previous nodes can be piled up to new trees with the same height because the number of previous nodes is a multiple of P. This implies that the new trees after the first iteration will have two kinds of heights. On the other hand, if the number of leaf nodes is not a multiple of P and the remainder of the number of leaf nodes divided P is larger than 1; all leaf nodes still can be piled up to new trees entirely. This implies that the new trees after the first iteration will have one kind of height. Afterward, in the beginning of the new iteration of the while loop of Algorithm 1, some trees are generated by pile-up operation, bound operation, or unprocessed from the previous iteration. We assume that there are M trees in a processing unit where M is less than or equal to P. M trees are divided into two groups: first (M  1) trees and the last one. Two possible cases are discussed as follows. (1) When (M  1) is larger than or equal to 2, this implies that the first (M  1) trees are produced by the pile-up operations of the last iteration, hence, the heights of the first (M  1) trees are the same. If the height of the last tree equals that of the first (M  1) trees, there is one kind of height in one processing unit. On the contrary, there are two kinds of different heights in one processing unit. (2) When (M  1) is less than 2, this implies that M is less than or equal to 2, i.e., there are most two trees in the processing unit. Therefore, there is two possible heights, identical height or different height, in one processing unit. h Theorem 2. Let D denote the number of insertion data, and P represents the order of the B+-tree. N represents the number of leaf l m D nodes created for storing D insertion data and equals ðP1Þ . When an auxiliary B+-tree is building for D insertion data, the total number of executions of the pile-up and bound operation are TP1(N) and TB1(N), respectively.

TP i ðKÞ ¼

8 > <0

if K ¼ 1   þ TP iþ1 KP if K > 1; ðK%PÞ ¼ 1 >    : K  þ TPiþ1 KP if K > 1; ðK%PÞ! ¼ 1 P K P

8 if K ¼ 1 > <0 K  TBi ðKÞ ¼ 1 þ TBiþ1 P if K > 1; ðK%PÞ ¼ 1 > : if K > 1; ðK%PÞ! ¼ 1 TBiþ1 ðdKP eÞ

ð1Þ

ð2Þ

Proof. From Lemma 1, we know that there are two possible heights in one processing unit. When the number of trees in one processing unit is a multiple of P or the remainder of the number of trees divided P is larger than 1 but not multiple of P, the trees have the same height and can be piled up to a new tree; otherwise, the remaining sub-tree is required to be bound with the previous processing tree. Whether the heights of the remaining sub-tree and the previous processed sub-tree are identical or different, the worst case for combining these two trees is done by the bound algorithm. Because the bound operation is more complicated than the pile-up operation, the pile-up operation is, however, assumed in the bound algorithm while the height of the previous processed sub-tree’s initial bounding node equals that of the remaining tree’s root node. The pile-up or bound operations are repeated until one complete B+-tree is built. Therefore, the above formula shows how to compute the number of total pile-up and bound operations. For example, there are 75 insertion data (D = 75) and the order of the B+-tree is 4 (P = 4). Total nine pile-up operations and one bound operation are adopted in the proposed batch insertion algorithm. h

336

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

Theorem 3. Let D denote the number of insertion data and P represents the order of the B+-tree. N represents the number of leaf l m D nodes created for storing D insertion data and equals ðP1Þ . When building an auxiliary B+-tree for D insertion data, the total number of nodes created is TN. PN1(N) represents the number of nodes created by the pile-up operations and BN1(N) represents the total number of nodes created by the bound operations in the worst case. H1(N) records the minimum height of the trees with bound operations.

TN ¼ N þ PN 1 ðNÞ þ BN 1 ðNÞ

PN i ðKÞ ¼

8 0 > <

if K ¼ 1   þ PN iþ1 KP if K > 1; ðK%PÞ ¼ 1 >    : K  þ PNiþ1 KP if K > 1; ðK%PÞ! ¼ 1 P K P

ð3Þ

ð4Þ

8 i  H1 ðNÞ þ 1 if K ¼ 1 > <   if K > 1; ðK%PÞ ¼ 1 BNi ðKÞ ¼ BNiþ1 KP >  K  : BNiþ1 P if K > 1; ðK%PÞ! ¼ 1

ð5Þ

8 if K ¼ 1 > 1; ðK%PÞ ¼ 1 Hi ðKÞ ¼ i >  K  : Hiþ1 P if K > 1; ðK%PÞ! ¼ 1

ð6Þ

Proof. The pile-up operation creates an internal node to link sub-trees with identical heights in one processing unit. Therefore, the number of nodes created by the pile-up operations equals the number of pile-up operations. The worst case of adopting bounding operation is that the parent node of the initial bounding node in the main sub-tree upward to its root node has insufficient space to store split keys. If more than one bound operation happens, the maximum number of nodes created during the bounding process is the difference between the height of the main sub-tree’s root node and that of the root node of the to-be-merged sub-tree. When the root node of the main sub-tree is split into two nodes, an internal node is created as the new root node. So, the final number of nodes, which are created by the bound operations, needs to add one. The total number of created nodes (TN) is the sum of the leaf nodes and the internal nodes created by the pile-up and bound operations. For example, there are 75 insertion data (D = 75) and the order of the B+-tree is 4 (P = 4). There are total 38 nodes created by the batch insertion algorithm in the worst case. h

4. Performance evaluation 4.1. Experimental setup The proposed batch insertion and range-based deletion algorithms were evaluated by simulation with a comparison of the insertion and deletion algorithms over famous traditional B+-tree and T-tree index structure of main-memory databases. We also implement a naive method and call it naive B+-tree batch insertion (NBBI) method. The NBBI method buffers the insertion keys in a sorted list. When the buffer is full, the keys are inserted into the tree in a batch way. The keys which fall into the same node are directly inserted into the node with only one tree traversal (for the first insertion key). When there is no room in the node to insert a key, tree rebalancing would occur. Therefore, the NBBI method does not traverse the tree for each key insertion. All the experiments were simulated on personal computers of Intel Core 2 Duo CPU E8400@ 3.00 GHz and 2 GB RAM. The major performance metric is the average response time in milliseconds. We shall use a RFID-based logistical transportation management system as a main-memory-database example to illustrate the inputs of the simulation. RFID readers deployed at a warehouse entrance would detect the tags of products contained in a truck, as the truck went through the entrance. In practice, the tagIDs of products in each single box of a truck usually had consecutive values. In other words, the tagIDs detected by a RFID reader in each interval of time could be close to each other, due to the special locality of the RFID tags of the products in the same box. As a result, data that were buffered by each RFID reader approximately in the same interval of time would be sent to the database for manipulations with strong locality. Similarly, data stored in a database might become obsolete and be removed from the main-memory database when time passes by. Such a manipulation pattern also implies strong locality in data deletion. The experiments have two parts: The first part was to have performance evaluation over data insertions (Section 4.2). The second part was to have performance evaluation over data deletions (Section 4.3). The parameters adopted in the simulation experiments are summarized in Table 1. In the first part of the experiments, the tagIDs of the insertion data in a RFID reader buffer could be either sorted or unsorted. When the tagIDs of the insertion data are unsorted, the response time of data insertions would include the sorting time of the tagIDs and their batch insertion time. The value of the parameter DataOrdered was set as true or false, depending of the sorting status of the tagIDs. Note that, when a B+-tree or T-tree was considered in the experiments (as a comparison study), data in a RFID reader buffer are inserted without the consideration of the sorting

337

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345 Table 1 Experiment parameters. Parameter

Description

Setting

P BufferSize TotalDataAmount InsertPattern

The order of a B+-tree, i.e., the node size of a T-tree The capacity of the RFID reader buffer The total number of insertion data per simulation run The distribution of insertion data

DataOrdered DeletePattern DeleteDataRatio

The sorting of data in the RFID reader buffer The distribution of deletion data The ratio of the number of deletion data to the total number of data

4, 32, or 128 100 1000 K, 3000 K, 5000 K, or 10,000 K Skewing in some left, right, or middle part of a tree, a random pattern, or a worst-case pattern Sorted or unsorted Skewing in some left, right, or middle part of a tree 3%, 5%, 10%, 15%, or 20%

status. Several distributions of insertion data were considered. When skewing in some left, right, or middle part of a tree was experimented (denoted as LeftSkew, RightSkew and MiddleSkew, respectively), data were inserted to some left, right, or middle part of a tree in a very biased fashion. For a random pattern, denoted as RandomSkew, it was assumed that the tagIDs between boxes might not be consecutive, even though the tagIDs of each box remained consecutive. A worst-case pattern, denoted as AdhocSkew, was used to evaluate the proposed algorithms when sensed tagIDs were scattered in a wide range and without any order such that no restriction was imposed on where data would be inserted into a tree. Each insertion pattern covers different data ranges. To synthesize data ranges for data insertion, we first define a baseNum as the beginning value of the data ranges. For RightSkew/LeftSkew patterns, insertion data are skewed in the right/left part of a tree. The baseNum is added to/subtracted from a specific value, which is equal to the parameter, BufferSize, and they can be sorted or unsorted. Then the next baseNum for RightSkew/LeftSkew patterns is set by the maximum/minimum number. For RandomSkew and MiddleSkew patterns, we generate an array (referred to as randomAry) to store random values calculated by a random function and an array (referred to as mediumAry) to store the values which are distributed towards medium of a sequence, respectively. A value extracted from randomAry/mediumAry multiplies by the value of BufferSize as the baseNum for RandomSkew/MiddleSkew pattern. In the same way, the baseNum is added to the value of BufferSize to form the insertion data (without Quick Sort) Average Response Time (sec.)

Average Response Time (sec.)

(without Quick Sort) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

B^L-tree B+-tree T-tree NBBI

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

B^L-tree B+-tree T-tree NBBI

K

0K

00

K

0K

00

K

0K

K

00

K

0K

00

00

00

50

00

00

30

00

10

-1

8-

8-

-5

10

32

12

4-

12

8-

-3

50

32

4-

12

K

00

0K

0K

K

(without Quick Sort) Average Response Time (sec.)

Average Response Time (sec.)

P=

P=

P=

P=

P=

P=

P=

00

10

00

K

00

00

K

0K

00

00

-1

8-

30

32

4-

12

32

10

10

00

00

50

Order P-Total Data Amount

(without Quick Sort) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

P=

P=

P=

P=

4-

8-

-1

10

12

32

4-

P=

P=

P=

P= 8-

0K

K

K

00

K

00

00

-5

12

32

P=

50

0K

30

00

K

00

00

10

K

0K

00

00

8-

-3

30

12

4-

P=

P=

P=

32

4-

8-

-1

10

12

P=

P=

P=

4-

32

P=

P=

Order P-Total Data Amount

B^L-tree B+-tree T-tree NBBI

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

B^L-tree B+-tree T-tree NBBI

0K

00

K

0K 00 10 8K 12 00 P= 00 -1 32 0K P= 00 10 K 4-

P=

00

K

00

50

00

-5

8-

12 P=

32

50

K

0K

K

00

0K

K

30

00

00

10

00

00

8-

-3

30

8-

12

4-

P=

P=

P=

4-

32

P=

12

-1

10

32

4-

K

0K

00

K

0K

00

00

10

00

00

0K

50

K

Fig. 10. Batch insertions with sorted tagIDs.

P=

P=

P=

P=

8-

-1

10

32

4-

12 P=

P=

P=

00

K

0K

K

00

0K

K

K

00

8-

-5

12

32

P=

00

00

00

10

00

00

30

-3

8-

50

12

8-

30

32

4-

12

-1

10

32

4-

4-

P=

P=

P=

P=

P=

P=

P=

P=

Order P-Total Data Amount

Order P-Total Data Amount

338

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

ranges. The data of AdhocSkew pattern are generated by a random function and without any distribution locality for each data insertion and all insertion rounds. The insertion process repeats until the simulated total number of data is reached. The second part of the experiments was to have a performance comparison between the proposed Range-Deletion algorithm and the conventional deletion algorithms over B+-tree and T-tree. For the fairness of the comparison, trees for data deletions of the B+-tree and the proposed algorithm were the same (that were built by the proposed batch insertion algorithm). Trees for data deletions of T-tree were built by a T-tree data insertion algorithm. We considered different skewed patterns in data deletions. They were data deletions on some left, right, or middle part of a tree, denoted as LeftRange, RightRange and MiddleRange, respectively. (Please see the parameter DeletePattern). The total number of data in a tree, de(with Quick Sort) Average Response Time (sec.)

Average Response Time (sec.)

(with Quick Sort) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

B^L-tree B+-tree T-tree NBBI

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

B^L-tree B+-tree T-tree NBBI

0K 00 10 8K 12 00 P= 100 32 0K P= 00 10 0K 40 P= -50 8 12 0K P= 500 32 K P= 00 50 0K 40 P= -30 8 12 0K P= 300 32 K P= 00 30 0K 40 P= -10 8 12 0K

P= 00

K

00

-1

32

10

4-

P=

P=

0K

30

K

0K 00 10 8K 12 00 P= 00 -1 32 0K P= 000 K 1 400 P= -50 8 12 0K P= 500 32 K P= 000 5 K 400

P=

00

8-

-3

12

P=

K

00

00

30

0K

K

00

10

8-

32

4-

P=

P=

12

P= -1

32

00

10

4-

P=

P=

Order P-Total Data Amount

Order P-Total Data Amount

Average Response Time (sec.)

(with Quick Sort)

B^L-tree B+-tree T-tree NBBI

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

B^L-tree B+-tree T-tree NBBI

00

K

00

0K 00 10 8K 12 00 P= 100 32 0K P= 00 10 0K 40 P= -50 8 12 0K P= 500 32 K P= 00 50 0K 40 P= -30 8 12 0K P= 300 32 K P= 00 30 0K 40 P= -10 8 12 0K

P=

-1

10

4-

32

P=

P=

0K

10

8-

K

00

0K 00 10 8K 12 00 P= 00 -1 32 0K P= 000 K 1 400 P= -50 8 12 0K P= 500 32 K P= 00 50 0K 40 P= -30 8 12 0K P= 00 -3 32 K P= 000 K 3 400

P=

12

P=

00

-1

10

32

4-

P=

P=

Order P-Total Data Amount

Order P-Total Data Amount

(with Quick Sort) Average Response Time (sec.)

Average Response Time (sec.)

(with Quick Sort) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

50 45 40

B^L-tree B+-tree T-tree NBBI

35 30 25 20 15 10 5 0

0K 00 10 8K 12 00 P= 100 32 0K P= 00 10 0K 40 P= -50 8 12 0K P= 500 32 K P= 000 5 K 400 P= 30 812 0K P= 300 32 K P= 00 30 0K 40 P= -10 8 12 0K P= 100 32 K P= 000 1 4-

P=

Order P-Total Data Amount

Fig. 11. Batch insertions with unsorted tagIDs.

339

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

noted as TotalDataAmount, ranged from 1,000,000 to 10,000,000. Different ratios of the number of deleted data to the total number of data, denoted as DeleteDataRatio, were experimented, i.e., 3%, 5%, 10%, 15%, and 20%. 4.2. Batch insertion

8500 8000 7500 7000 6500 6000 5500 5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0

B^L-tree

B+-tree

Left

Ske

Rig

T-tree

w

htS

kew

NBBI

Mid

dleS

kew

Ran

Total Number of Nodes (thousand)

Total Number of Nodes (thousand)

This section provides performance evaluation of the proposed batch insertion algorithm with different input data configurations to provide insights on data insertions to main-memory databases. Fig. 10 shows the performance comparison of the

dom

800 750 700 650 600 550 500 450 400 350 300 250 200 150 100 50 0

B^L-tree

B+-tree

Left

Rig

htS

Ske

Ske

w

T-tree

w

kew

NBBI

Mid

dleS

kew

Ran

dom

Ske

w

Data Insertion Pattern

Data Insertion Pattern

Fig. 12. Memory space allocation for inserting 10,000,000 tagIDs.

Total Records:1,000,000

990

B^L-tree B+-tree T-tree

880 770 660 550 440 330 220 110

Total Records:3,000,000

5400

Average Response Time (ms.)

Average Response Time (ms.)

1100

0

4860

B^L-tree B+-tree T-tree

4320 3780 3240 2700 2160 1620 1080 540 0 0 -2 812 0 P= --2 32 P= -20 4- 15 P= 8-12 5 P= --1 32 P= -15 4- 10 P= 8-12 0 P= --1 32 P= -10 4- 5 P= 8-12 P= --5 32 P= -5 4- 3 P= 8-12 P= --3 32 P= -3 4P=

0 -2 812 0 P= --2 32 P= -20 4- 15 P= 8-12 5 P= --1 32 P= -15 4- 10 P= 8-12 0 P= --1 32 P= -10 4- 5 P= 8-12 P= --5 32 P= -5 4- 3 P= 8-12 P= --3 32 P= -3 4P=

Order P--Data Delete Percentage (%)

Total Records:5,000,000 B^L-tree B+-tree T-tree

6880 6020 5160 4300 3440 2580 1720 860 0

Total Records:10,000,000

19000

Average Response Time (ms.)

Average Response Time (ms.)

8600 7740

Order P--Data Delete Percentage (%)

17100

B^L-tree B+-tree T-tree

15200 13300 11400 9500 7600 5700 3800 1900 0 0 -2 812 0 P= --2 32 P= -20 4- 15 P= 8-12 5 P= --1 32 P= -15 4- 10 P= 8-12 0 P= --1 32 P= -10 4- 5 P= 8-12 P= --5 32 P= -5 4- 3 P= 8-12 P= --3 32 P= -3 4P=

0 -2 812 0 P= --2 32 P= -20 4- 15 P= 8-12 5 P= --1 32 P= -15 4- 10 P= 8-12 0 P= --1 32 P= -10 4- 5 P= 8-12 P= --5 32 P= -5 4- 3 P= 8-12 P= --3 32 P= -3 4P=

Order P--Data Delete Percentage (%)

Order P--Data Delete Percentage (%)

Fig. 13. A LeftRange deletion pattern over a B+-tree built with a LeftSkew insertion pattern.

340

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

proposed batch insertion algorithm with the conventional B+-tree, NBBI, and T-tree insertion algorithms, where the tagIDs of the insertion data in the RFID reader buffer were sorted. Trees processed by the proposed algorithm, NBBI, conventional B+tree and T-tree are labeled as BL-tree, NBBI, B+-tree and T-tree in the figures, respectively. With the batch insertion algorithm, the higher the order of the BL-tree (P value) was, the shorter the response time was. It was because a large capacity of a node could store more keys, fewer nodes were required in a tree. In other words, fewer executions of pile-up and bound operations were needed, and the response time was thus reduced. The response time increased with the increasing number of input data. Moreover, the performance of the proposed algorithm tended to be insensitive to data insertion patterns. In general, the proposed batch insertion algorithm outperformed the conventional B+-tree and T-tree algorithms for every BL-tree order value P. Compared to B+-tree, the proposed batch insertion algorithm had at least 19% in performance gain when the amount of insertion data was small. With the increasing amount of insertion data, the performance gain reached 76%. We observe that the average response time of data insertions of the proposed method outperforms that of the NBBI method under different data insertion patterns. This is because the NBBI method still requires rebalancing the tree when there is no room in the leaf node to insert a key, and large amounts of data insertions with strong access locality result in severe tree rebalancing and further increase the response time. On the contrary, the proposed method reduces the serious overheads of tree rebalancing especially when insertion data have strong access locality through the simple pile-up and bound operations. The NBBI method outperforms the conventional B+-tree because it is not necessary to traverse the tree for each key insertion. Fig. 11 shows the performance comparison of the proposed batch insertion algorithm with the conventional B+-tree, NBBI, and T-tree insertion algorithms, where the tagIDs of the insertion data in the RFID reader buffer were unsorted. In other words, tagID sorting was needed before the execution of the proposed batch insertion algorithm, and the overheads was thus included in the experimental results. As shown in the figures, the major difference between Figs. 10 and 11 was that the average response time of the proposed algorithm (for BL-trees) in the latter figure was slightly longer than the corresponding one in the former figure. It was because the average response time of the latter case had to include the tagID sorting time. However, the sorting time was very limited because the size of the RFID reader buffer (i.e., BufferSize) was relatively small, compared to the total number of data (i.e., TotalDataAmount). As shown in Fig. 11, the conventional B+-tree algorithm also needed a lot of time to process unsorted tagIDs. Nevertheless, the proposed batch insertion algorithm still outperformed

Total Records:3,000,000

Total Records:1,000,000 990

5400 B^L-tree B+-tree T-tree

Average Response Time (ms.)

Average Response Time (ms.)

1100

880 770 660 550 440 330 220 110 0

4860

B^L-tree B+-tree T-tree

4320 3780 3240 2700 2160 1620 1080 540 0

0 -2 812 0 P= --2 32 P= -20 4- 15 P= 8-12 5 P= --1 32 P= -15 4- 10 P= 8-12 0 P= --1 32 P= -10 4- 5 P= 8-12 P= --5 32 P= -5 4- 3 P= 8-12 P= --3 32 P= -3 4-

P=

0 -2 812 0 P= --2 32 P= -20 4- 15 P= 8-12 5 P= --1 32 P= -15 4- 10 P= 8-12 0 P= --1 32 P= -10 4- 5 P= 8-12 P= --5 32 P= -5 4- 3 P= 8-12 P= --3 32 P= -3 4-

P=

Order P--Data Delete Percentage (%)

Order P--Data Delete Percentage (%)

Total Records:5,000,000

Total Records:10,000,000

7740

19000 B^L-tree B+-tree T-tree

Average Response Time (ms.)

Average Response Time (ms.)

8600

6880 6020 5160 4300 3440 2580 1720 860 0

17100

B^L-tree B+-tree T-tree

15200 13300 11400 9500 7600 5700 3800 1900 0

0 -2 812 0 P= --2 32 P= -20 4- 15 P= 8-12 5 P= --1 32 P= -15 4- 10 P= 8-12 0 P= --1 32 P= -10 4- 5 P= 8-12 P= --5 32 P= -5 4- 3 P= 8-12 P= --3 32 P= -3 4-

P=

0 -2 812 0 P= --2 32 P= -20 4- 15 P= 8-12 5 P= --1 32 P= -15 4- 10 P= 8-12 0 P= --1 32 P= -10 4- 5 P= 8-12 P= --5 32 P= -5 4- 3 P= 8-12 P= --3 32 P= -3 4-

P=

Order P--Data Delete Percentage (%)

Order P--Data Delete Percentage (%)

Fig. 14. A RightRange deletion pattern over a B+-tree built with a LeftSkew insertion pattern.

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

341

the conventional B+-tree, NBBI, and T-tree algorithms. Fig. 11e shows the performance results of the proposed batch insertion algorithm for the AdhocSkew pattern. It shows that such a pattern would require more time during the creation of a tree, and the proposed batch insertion algorithm still outperformed the conventional B+-tree, NBBI, and T-tree algorithms. The proposed batch insertion algorithm had more than 17% in performance gain, compared to the conventional B+-tree algorithm, even when the amount of insertion data was small. The performance gain could reach 72% when the amount of insertion data was slightly large. The performance gain of the proposed algorithm was a little lower than the results in the previous paragraph because of the extra overheads in the tagID sorting. In addition to the average response time, we also compare the memory space usage. Fig. 12 shows the memory space usage for inserting 10,000,000 tagIDs under different data insertion patterns and with P being equal to 4 or 32. The total number of nodes created by the batch insertion algorithm was much fewer than that of the conventional B+-tree algorithm but more than that of the conventional T-tree algorithm. The sole difference between the NBBI method and the conventional B+-tree is the total key amounts of each insertion, which is the major factor that influences the response time, but the other acts of these two methods are the same. Hence, the total numbers of nodes created by these two methods are the same. Based on Figs. 10 and 11, it was observed that the performance gain became better when P increased from 4 to 32, but the performance gain was not significant when P increased from 32 (about one-third of the RFID reader buffer size, i.e., BufferSize) to 128 (more than BufferSize). When P was equal to 4, a huge number of nodes was created. It implied the consumption of much more main memory space. Although a large order P could reduce the total number of created nodes, as shown in Fig. 12, there was no need to have a very large order P. In other words, it is preferred to determine the order P based on the buffer size to have a better compromise between the memory space usage and performance. 4.3. Range-based deletion This section is to have performance evaluation over data deletion under different input data configurations, such as the percent of deleted data to the total number of data. In specific, Figs. 13–15 show the performance comparison of the

Total Records:1,000,000

Total Records:3,000,000

990

5400 B^L-tree B+-tree T-tree

Average Response Time (ms.)

Average Response Time (ms.)

1100

880 770 660 550 440 330 220 110 0

4860

B^L-tree B+-tree T-tree

4320 3780 3240 2700 2160 1620 1080 540 0 0 -2 812 0 P= --2 32 P= -20 4- 15 P= 8-12 5 P= --1 32 P= -15 4- 10 P= 8-12 0 P= --1 32 P= -10 4- 5 P= 8-12 P= --5 32 P= -5 4- 3 P= 8-12 P= --3 32 P= -3 4-

P=

0 -2 812 0 P= --2 32 P= -20 4- 15 P= 8-12 5 P= --1 32 P= -15 4- 10 P= 8-12 0 P= --1 32 P= -10 4- 5 P= 8-12 P= --5 32 P= -5 4- 3 P= 8-12 P= --3 32 P= -3 4-

P=

Order P--Data Delete Percentage(%)

Order P--Data Delete Percentage(%)

Total Records:10,000,000

Total Records:5,000,000 19000

7740

B^L-tree B+-tree T-tree

Average Response Time (ms.)

Average Response Time (ms.)

8600

6880 6020 5160 4300 3440 2580 1720 860

17100

B^L-tree B+-tree T-tree

15200 13300 11400 9500 7600 5700 3800 1900 0

0

0 -2 812 0 P= --2 32 P= -20 4- -15 P= 812 5 P= --1 32 P= -15 4- -10 P= 812 0 P= --1 32 P= -10 4- -5 P= 812 P= --5 32 P= -5 4- -3 P= 812 P= --3 32 P= -3 4-

P=

0 -2 812 0 P= --2 32 P= -20 4- 15 P= 8-12 5 P= --1 32 P= -15 4- 10 P= 8-12 0 P= --1 32 P= -10 4- 5 P= 8-12 P= --5 32 P= -5 4- 3 P= 8-12 P= --3 32 P= -3 4-

P=

Order P--Data Delete Percentage(%)

Order P--Data Delete Percentage(%)

Fig. 15. A MiddleRange deletion pattern over a B+-tree built with a LeftSkew insertion pattern.

342

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

proposed range-based deletion algorithm and conventional B+-tree and T-tree algorithms with respect to the average response time. The experiments explored the performance issues with the considerations of different tree orders and different numbers of data stored in an index tree. Different skewed deletion patterns were experimented, i.e., the deletions of keys in some left, middle, or right parts of a tree. Fig. 13 shows the performance evaluation of the proposed range-based deletion algorithm when the index tree was built with a LeftSkew insertion pattern, and data were removed in the LeftRange pattern. The average response time of the proposed range-based deletion algorithm was below 80 ms for a wide range of settings. In contrast, the average response time of the conventional B+-tree algorithm became large with the increasing ratio of the deleted data to the overall data. In addition, a small tree order P resulted in the poor performance in general because such a tree had a large number of nodes. Moreover, a node with a small P value could cause an underflow (i.e., less than half-full) quickly after consecutive data deletions, and potential tree adjustments occurred. As a result, a longer response time was likely to happen. As the P value increased, the response time for the conventional B+-tree algorithm decreased accordingly, but it was higher than that for the T-tree algorithm. The proposed deletion algorithm outperformed the conventional B+-tree and T-tree algorithms because data were deleted in a batch fashion. We should point out that the performance improvement was not just from batch deletion of a group of leaf nodes. In fact, a proper set of nodes along the deletion searching paths of deleted leaf nodes were deleted in a depth-first search and batch fashion. As a result, the response time of the proposed range-based deletion algorithm remained very low for all experimental results. Figs. 14 and 15 show the experimental results where data were removed in the RightRange and MiddleRange data locality patterns, respectively. The results were similar to those shown in Fig. 13. Experiments were also conducted over trees built with MiddleSkew, RightSkew, and RandomSkew data insertion patterns, and the experimental results were also similar. With limited space, the results are not included in this paper. In general, the proposed range-based deletion algorithm outperformed the conventional B+-tree and T-tree algorithms. The response time of the proposed algorithm remained very low over a wide range of parameter settings. In particular, the performance of the range-based deletion algorithm was not affected by insertion patterns in the building of a tree and the data deletion patterns. Generally speaking, the proposed algorithm was also better than a conventional B+-tree algorithm regarding the memory space usage (by saving 40% of the memory space). Although the proposed data insertion algorithm required more memory space than a T-tree algorithm did, the proposed insertion and deletion algorithms greatly outperformed the corresponding Ttree algorithms when the data insertions and deletions had a strong access locality. 5. Conclusion This work is motivated by the strong demands of main-memory database applications with bursty and access-locality data manipulations. Even though the proposed methodology can also be applied to many other tree structures, B+-trees are selected as the target in the study because of its popularity in database implementations. In particular, we propose an auxiliary-tree approach that could efficiently handle a huge volume of data with strong access locality. Our objective is to reduce the tremendous overheads due to tree rebalancing, which is often observed in real practice for bursty data access with locality. A batch insertion algorithm with the tree-merging support is proposed to buffer insertion keys in an auxiliary tree for later merging with the main index tree in a batch fashion. The design is to favor dynamic data insertions with strong access locality to a database. We then present a range-based deletion algorithm with the support of the tree merging capability to delete data in a batch fashion without many overheads in rebalancing. A large number of nodes located inside the deletion area are followed to be removed in a depth-first search order, and tree reorganization is done at once. The capability of the proposed methodology was evaluated with a wide range of workloads and a variety of data locality patterns. The experimental results showed that the proposed methodology greatly outperforms conventional B+-tree methods in terms of both response time and space utilization. The batch insertion algorithm was evaluated with both sorted and unsorted input data. When sorting is not required, the batch insertion algorithm demanded only 29% of the processing time of the conventional B+-tree method. In the other case, the proposed algorithm attains 65% saving in the processing time, compared to the conventional B+-tree methods. The average response time of the range-based deletion algorithm can be kept at a very low level over a wide range of parameter settings and a variety of data locality patterns. The range-based deletion algorithm also outperforms the conventional deletion algorithm. We must point out that the total number of nodes created by the batch insertion algorithm is much fewer than that of the conventional B+-tree method, and it also provides better space utilization. We surmise that many other tree structures would also receive similar benefits when the proposed methodology is applied. For future research, we shall further explore the characteristics of different memory devices, such as flash-memory ones, and propose efficient index structures for database applications over such memory devices [14,30]. With the rapid growing of the capacity of memory devices, memory-based database systems would become even more important in many application domains in the coming years. Appendix A. RecycleNode algorithm Algorithm 3. The RecycleNode algorithm

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

Appendix B. Pile-up algorithm Algorithm 4. The Pile-Up algorithm

Appendix C. DelReorganize algorithm Algorithm 5. The DelReorganize algorithm

343

344

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

Appendix D. Bound algorithm Algorithm 6. The Bound algorithm

References [1] B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom, Models and issues in data stream systems, in: Proceedings of the Twenty-First ACM SIGMODSIGACT-SIGART Symposium on Principles of Database Systems, 2002, pp. 1–16. [2] R. Bayer, The universal B-tree for multi-dimensional indexing: general concepts, in: Proceedings of the International Conference on World-Wide Computing and Its Applications, Lecture Notes on Computer Science, Springer-Verlag, 1997, pp. 198–209. [3] R. Bayer, C. McCreight, Organization and maintenance of large ordered indexes, Acta Informatica 1 (1972) 173–189. [4] J.V.D. Bercken, B. Seeger, An evaluation of generic bulk loading techniques, in: Proceedings of the 27th International Conference on Very Large Data Bases, 2001, pp. 461–470. [5] K.R. Choi, K.C. Kim, T*-tree: a main memory database index structure for real time applications, in: Proceeding of the Third International Workshop on Real-Time Computing Systems and Applications, 1996, pp. 81–88. [6] M.G.C.A. Cimino, F. Marcelloni, Autonomic tracing of production processes with mobile and agent-based computing, Information Sciences 181 (2011) 935–953. [7] D. COMER, The ubiquitous B-tree, ACM Computing Surveys 11 (1979) 121–137. [8] D.J. DeWitt, R.H. Katz, F. Olken, L.D. Shapiro, M.R. Stonebraker, D. Wood, Implementation techniques for main memory database systems, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, 1984, pp. 1–8. [9] A. Gartner, A. Kemper, D. Kossmann, B. Zeller, Efficient bulk deletes in relational databases, in: Proceedings of the International Conference on Data Engineering, 2001, pp. 183–192. [10] L. Golab, M.T. Ozsu, Issues in data stream management, ACM SIGMOD Record 32 (2003) 5–14. [11] G. Graefe, B-tree indexes for high update rates, ACM SIGMOD Record 35 (2006) 39–44. [12] L. Gruenwald, M.H. Eich, Choosing the best storage technique for a main memory database system, in: Proceedings of the Fifth Jerusalem Conference on Information Technology, 1990, pp. 1–10.

P.-L. Suei et al. / Information Sciences 232 (2013) 325–345

345

[13] A. Guttman, R-Trees: a dynamic index structure for spatial searching, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, 1984, pp. 47–57. [14] Z. He, P. Veeraraghavan, Fine-grained updates in database management systems for flash memory, Information Sciences 179 (2009) 3162–3181. [15] G.M. Hector, S. Kenneth, Main memory database systems: an overview, IEEE Transactions on Knowledge and Data Engineering 4 (1992) 509–516. [16] J. Jannink, Implementing deletion in B+-trees, ACM SIGMOD Record 24 (1995) 33–38. [17] C.S. Jensen, D. Lin, B.C. Ooi, Query and update efficient B+tree based indexing of moving objects, in: Proceedings of the 30th International Conference on Very Large Data Bases, 2004, pp. 768–779. [18] P.-M. Kerttu, S.-S. Eljas, Y. Tatu, Concurrency control in B-trees with batch updates, IEEE Transactions on Knowledge and Data Engineering 8 (1996) 975–984. [19] S.W. Kim, On batch-constructing Bt-trees: algorithm and its performance evaluation, Information Sciences 144 (2002) 151–167. [20] T.W. Kuo, C.H. Wei, K.Y. Lam, Real-time access control and reservation on B-tree indexed data, Journal of Real-Time Systems 19 (2000) 245–281. [21] B.N. Lee, Y.-W. Kim, H.J. Kim, Evolution of RFID applications and its implications-standardization perspective, in: Proceedings of Portland International Center for Management of Engineering and Technology, 2007, pp. 903–910. [22] T.J. Lehman, M.J. Carey, A study of index structures for main memory database management systems, in: Proceedings of the Twelfth International Conference on Very Large Data Bases, 1986, pp. 294–303. [23] J. Li, Q.S. Jia, X. Guan, X. Chen, Tracking a moving object via a sensor network with a partial information broadcasting scheme, Information Sciences 181 (2011) 4733–4753. [24] S. Lim, M.H. Kim, Restructuring the concurrent B+-tree with non-blocked search operations, Information Sciences 147 (2002) 123–142. [25] H. Lu, Y.Y. Ng, Z. Tian, T-tree or b-tree: main memory database index structure revisited, in: Proceedings of the Eleventh Australasian Database Conference, 2000, pp. 65–73. [26] R. Maelbrancke, H. Olivie, Optimizing jan janninks implementation of B+-tree deletion, ACM SIGMOD Record 24 (1995) 5–7. [27] P. Muth, P. O’Neil, A. Pick, G. Weikum, The LHAM log-structured history data access method, The VLDB Journal 8 (2000) 199–221. [28] P. O’Neil, E. Cheng, D. Gawlick, E. O’Neil, The log-structured merge-tree (LSM-Tree), Acta Informatica 33 (1996) 351–385. [29] Pande A. Arun, Efficiently Performing Deletion of a Range of Keys in a B+-tree, US Patent 7370055, 2008. [30] H. Roh, W.C. Kim, S. Kim, S. Park, A B-tree index extension to enhance response time and the life cycle of flash memory, Information Sciences 179 (2009) 3136–3161. [31] D.-L. Wu, W.W.Y. Ng, D.S. Yeung, H.-L. Ding, A brief survey on current RFID applications, in: Proceedings of the International Conference on Machine Learning and Cybernetics, Baoding, 2009, pp. 2330–2335. [32] O. Ulusoy, Research issues in real-time database systems: survey paper, Information Sciences 87 (1995) 123–151. [33] C. Zou, B. Salzberg, On-line reorganization of sparsely-populated B+-trees, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, 1996, pp. 115–124.