JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING ARTICLE NO. PC981432
49, 146–155 (1998)
Parallel Dictionaries Using AVL Trees 1 Muralidhar Medidi Department of Computer Science and Engineering, Northern Arizona University, Flagstaff, Arizona 86011
and Narsingh Deo Department of Computer Science, University of Central Florida, Orlando, Florida 32816
AVL (Adel’son-Vel’skii and Landis) trees are efficient data structures for implementing dictionaries. We present a parallel dictionary, using AVL trees, on the EREW PRAM by proposing optimal algorithms to perform k operations with p (1 ≤ p ≤ k) processors. An explicit processor scheduling is devised to avoid simultaneous reads in our parallel algorithm to perform k searches, which avoids the need for any additional memory in the parallelization. To perform multiple insertions and deletions, we identify rotations (in addition to AVL tree rotations) required to restore balance and present parallel algorithms to perform p insertions/ deletions in O(log n + log p) time with p processors. © 1998 Academic Press
1. INTRODUCTION AVL trees, named after their inventors Adel’son-Vel’skii and Landis [1], are binary search trees which are locally balanced; that is, for any internal node, the heights of its left and right subtrees may differ by at most one. The local balance at each node guarantees that the height of an n-key search tree will always be bounded above by 1.44 log(n + 2) [9]. Since AVL trees are the most efficient method of balancing binary search trees [2], they are utilized in a wide variety of applications such as databases, operating systems, and symbol tables in compilers. Ellis [7] attempted to parallelize AVL trees, using a locking protocol for the insert and search operations. Kung and Lehman [12] proposed concurrent algorithms for all three dictionary operations by adding additional recovery pointers. Manber and Ladner [14] improved Kung and Lehman’s work by simplifying deletion, but introduced temporary cycles in the tree. Manber [13] also presented concurrent algorithms in a leaf-oriented 1Research supported in part by NSF Grant CDA-9115281. A preliminary version of this paper was presented at the Eighth International Parallel Processing Symposium (IPPS), 1994.
146 0743-7315/98 $25.00 Copyright © 1998 by Academic Press All rights of reproduction in any form reserved.
PARALLEL DICTIONARIES USING AVL TREES
147
binary tree, but did not perform any rebalancing. All these concurrent algorithms for AVL trees require, in the worst case, O( p + log n) time to perform p operations with p processors and hence are optimal only for p ≤ log n processors. Several other researchers have designed parallel dictionaries using systolic arrays primarily by maintaining a sorted list. Paul et al. proposed parallel dictionaries on 2–3 trees using a chaining or pipelining technique and presented optimal algorithms for performing p searches, inserts, or deletes with p processors in O(log n + log p) time on the EREW PRAM [10, 17]. However, no such parallel algorithms are known for AVL trees, even though these trees do not need any unused memory unlike 2–3 trees and are more cost-effective with respect to average search and insertion time [2, 11]. One of the major difficulties in parallelizing AVL-tree manipulation is their rebalancing scheme which is complex compared to that of 2–3 trees. In particular, parallelization of the insertions and deletions in AVL trees is complicated because (i) the rotations, defined for one insert or delete operation, do not extend for inserting or deleting p keys at a time, and (ii) all the leaves in an AVL tree are not necessarily at the same depth as in 2–3 trees and Paul et al.’s algorithm cannot be adapted for insertions and deletions in AVL trees. In this paper, we present parallel algorithms which perform p operations on an AVL tree in O(log n + log p) time with p processors on the weakest PRAM model, the EREW PRAM. Hence, when the processors available are fewer than the number of operations required, obvious simulation yields optimal algorithms. In Section 2, we set up the terminology. In Section 3, we present an optimal parallel algorithm, with explicit processor scheduling, for performing p searches in an AVL tree with p processors on the EREW PRAM. The parallel algorithms for multiple insertions and deletions are described next in Sections 4 and 5, respectively. Section 6 describes how the AVL tree can be constructed in parallel and can support multiple order-statistic operations efficiently. Section 7 contains some concluding remarks.
2. BACKGROUND AND TERMINOLOGY An AVL tree T is a binary tree in which the difference between the heights of the left and right subtrees of any node is at most one. Elements from a totally-ordered domain are stored in the leaves with smaller data to the left of larger ones. For each internal node v, we use k(v) to refer to the key value stored in it and l(v) and r(v) to denote the left and right children, respectively. Moreover, k(v) always equals the key value of the largest element stored in node v’s left subtree. Such trees are usually referred to in the literature as external AVL trees. When we insert a new node into an AVL tree, some external node is replaced by a new internal node (and two external nodes as its children), and the height of the parent of the new node may have increased by one. As a result, the AVL tree property may be lost at the ancestors of the newly inserted node if their heights increased. When the insertion causes an AVL tree to lose its balance, applying exactly one of the four rotations—single rotations LL or RR and double rotations LR or RL—will restore it. For details about the operations on the AVL tree, see [9]. We assume, in the rest of this paper, that the reader is familiar with the search, insertion, and deletion routines.
148
MEDIDI AND DEO
Even though each element could be a large record, we will refer to its key as the element itself without any loss of generality. Further, for clarity in presentation, we assume that our AVL tree always consists of a dummy element with a key value of infinity. Suppose an AVL tree T with n leaves is implemented in the shared memory, and a1 , a2 , . . . , a p are elements that may or may not be stored in the leaves. Further suppose that a1 < a2 < · · · < a p and processor Pi knows the data element ai , for all i. If the elements a1 , a2 , . . . , a p are unordered, they can be sorted in O(log p) time with p processors [3]. Similarly, any duplicates in the sequence can also be removed in O(log p) time with p processors on the EREW PRAM model. We show how to perform any of the three dictionary operations, namely search, insertion, and deletion, for these elements with p processors, on the EREW PRAM, O(log n + log p) steps. 3. SEARCHES If simultaneous access by several processors to the same memory location is allowed, as in the more powerful CREW PRAM model, then searching is very simple. Processor Pi , 1 ≤ i ≤ p, performs the standard sequential search for ai in the AVL tree T . Thus, multiple searches with as many processors can be performed in O(log n) time. Since such simultaneous reads are not allowed in the EREW PRAM model, we need a different approach. 3.1. Parallel Algorithm For the parallel algorithm for searching for p elements in the AVL tree, we can exploit the technique of chaining introduced by Paul et al. [10, 17]. We will also show that parallel searching on AVL trees is simpler and provide an explicit processor scheduling which was not possible with 2–3 trees. For the sake of description, we will follow the same terminology used by Paul et al. [17]. Let a chain be a subsequence a f , a f +1 , . . . , al of the input sequence a1 , a2 , . . . , a p of elements. Processors P f , P f +1 , . . . , Pl are naturally associated with such a chain. The parallel search algorithm starts with the whole input sequence a1 , a2 , . . . , a p as a chain at the root of the AVL tree T . Among the processors of a chain a f , a f +1 , . . . , al , only the first one, processor P f , is active. The active processor for the chain, P f , knows its index f and the index l of the last element in its chain. If the chain has to be split into two chains a f , . . . , am−1 and am , . . . , al , then P f invokes processor Pm to handle the latter chain and transmits the value l to Pm . The search algorithm proceeds in stages. During each stage s, the active processor of each chain C will access the data in some node v of the AVL tree T . The initial chain a1 , a2 , . . . , a p thus is in the root at stage one. During each stage, each active processor performs the computation, described next, for its chain. Suppose a chain C = a f , . . . , al is in node v at stage s. The chain C hits the node v if a f ≤ k(v) < al . On the other hand, if the chain does not hit node v and, for example, al ≤ k(v) then all its elements can be present only in the left subtree of node v and the chain proceeds to l(v). Thus, chain C that does not hit node v in stage s will descend to node l(v) if al ≤ k(v) and r(v) if a f > k(v) for processing in stage s + 1. If a chain hits a node v, on the other hand, then elements belonging to some prefix subsequence of the chain can be present in the left subtree of v while the remaining
PARALLEL DICTIONARIES USING AVL TREES
149
can be in the right subtree. In such a case, the chain C is split into two subchains as C 1 = a f , . . . , am−1 and C2 = am , . . . , al where m = d( f + l)/2e. If both subchains do not hit the node v, then they proceed to the appropriate children. However, if one of them hits v, then that subchain stays back for stage s + 1. Note however, that at most one of the two subchains can hit the node v, and will remain there for the next stage s + 1. For later usage, we refer to the subchain C1 (and the subchain C2 ) as the left (and the right) chop of chain C and to chain C as the parent of the subchains C 1 and C2 . Thus, as the search progresses, different chains descend down the AVL tree toward the leaves that may contain the required elements. At first glance, it might seem that various chains may be reading the keys at different nodes and incur simultaneous reads. As Paul et al. [17] did for 2–3 trees, we can show that the number of chains which may read the same node, in a particular stage, is bounded by a constant as follows. LEMMA 1. Let the elements a f , a f +1 , . . . , al be the only ones which passed through the edge e of T in stages 1 through s, for any s ≥ 1. If a chain C such that the element a j belongs to C and j > l passed through e at stage s + 1, then al+1 also belongs to the chain C. Proof. The lemma is trivially true when j = l + 1. Hence, in the rest of this proof, we will assume that j > l + 1. We will proceed by induction on the depth of the edge e in T . The base case when edge e emanates from the root obviously holds. Assume that the lemma is true for all edges of depth k. Let e1 = (v, w) be an edge of depth k + 1 and edge e2 = (parent(v), v) be its parent edge in T . Elements a f , a f +1 , . . . , al must have passed along edge e2 before or during stage s − 1. Element a j must have passed through e2 by stage s. By the inductive hypothesis al+1 passed e2 no later than stage s. Case 1: Node w is a left child of v. If al+1 passed e2 at stage s then, by the inductive hypothesis, al+1 passed e2 in the same chain as a j . If al+1 passed e2 before stage s then al+1 and a j were, again, in the same chain at e2 since otherwise al+1 would not have been delayed at v. Since left chops of hit chains can be sent only to the left children and al+1 (which is to the left of a j ) did not pass e1 before stage s + 1, al+1 and a j must pass e1 in the same chain at stage s + 1. Case 2: Node w is a right child of v. The chain in which al passed e2 did not contain al+1 because if it did then al+1 would have passed e1 not later than al . So, since a j (and al+1 ) could not have been delayed at v it passed e2 at stage s and by the inductive hypothesis its chain included al+1 . Thus, al+1 must belong to the chain with a j which passed e1 at stage s + 1. Hence we have the lemma. LEMMA 2. Let the elements a f , a f +1 , . . . , al be the only ones which passed through edge e of T in stages 1 through s, for any s ≥ 1. If a chain C such that the element a j belongs to C and j < f passed through e at stage s + 1, then a f −1 also belongs to the chain C. Proof. Similar to the proof of Lemma 1. Together, Lemmas 1 and 2 lead us to the following result. COROLLARY 3. stage.
No more than two chains may pass each edge e of T at any single
150
MEDIDI AND DEO
Proof. Let C = ad , . . . , ae and C 0 = ag , . . . , ah be two chains. Further, let us denote C < C 0 if and only if ae < ag . If a chain C is sent from node u to node v along edge e in stage s, then by Lemmas 1 and 2 either C < C 00 or C 00 < C for all chains C 00 sent along e during stages s 0 < s. If chains C, C 00 are sent to v along e in stage s, then all chains C 00 (C < C 00 < C 0 ) must be sent from u to v during the earlier stages s 0 < s. Any additional chain passing edge e in stage s will lead to an immediate contradiction; hence, at most two chains may pass any edge in the tree at any single stage. The corollary indicates that, for each stage s and any node v, at most three chains (two new and one old) could be present in v at stage s. Thus, processing in each stage requires only O(1) steps if we can schedule the reads by the three chains. Once a chain a f , a f +1 , . . . , al has arrived at a leaf b, processors P f +1 , . . . , Pl have to be informed of the value k(b). This can be achieved in dlog pe steps by recursive doubling: in step j, 0 ≤ j ≤ dlog pe − 1, the ith processor that knows where ai falls informs the (i + 2 j )th processor, if the latter processor does not know yet where its element falls. Elements move down, in this chaining algorithm, to a node v only when the range of keys in the subtree rooted at v can include them; hence the search is correct. 3.2. Complexity Analysis Let us consider the time complexity of the parallel search presented. In every stage, each chain either descends one level down or hits a node. Further, a chain is halved whenever it hits a node. Thus, any search element may be contained in chains that hit nodes no more than dlog pe times. Hence, each element arrives at a leaf in at most 1.44 log(n +2)+dlog pe stages. Thus searching p elements in an AVL tree with p processors, of the EREW PRAM, requires O(log n + log p) time. 3.3. Processor Scheduling Corollary 3 shows that at most two chains can arrive at a node in any stage; due to the way chains are split, only one of the earlier subchains can remain at a node in any stage. In a stage, however, all three chains may read the key simultaneously—a possibility not allowed under the EREW PRAM model. Simulating these simultaneous reads requires an excessive increase in either time or memory requirements [10]. Let us reconsider the two chains C and C 0 (C < C 0 ; i.e., all elements in C are smaller than those in C 0 ) that may arrive at a node v in stage s. Further let C 00 be the chain, if any, which is stuck at node v from earlier stages. Since all chains in between C and C 0 arrived at v earlier, chain C must be a right chop of its parent chain and chain C 0 must be a left chop. Each chain, as it gets detached from its parent chain, will remember whether it is a left or a right chop. The subchain which has to remain at a node will remember itself as a center chop. Thus, in any stage at any node, we can at most have a left, a center, and a right chop. Then, we can schedule the three reads at the node without any conflicts. The time complexity thus increases only by a factor of at most 3 and the scheduling does not suffer from any increase in the space requirements. 4. INSERTIONS Let a1 , a2 , . . . , a p be the elements to be inserted into the AVL tree T of n elements. To identify the correct external nodes at which they have to be inserted, we first perform
PARALLEL DICTIONARIES USING AVL TREES
151
the parallel search algorithm. Then the input sequence is split into chains so that elements ai , 1 ≤ i ≤ p, end up at the leaf bq , 1 < q ≤ n, if and only if bq−1 < ai ≤ bq . But, at the end of the search algorithm, different chains may end up at the same leaf. Since such chains have to be consecutive at a leaf, a parallel prefix operation would suffice to coalesce them into one. There are n possible chains C 1 , C 2 , . . . , Cn (some of which are empty), one per leaf. First, we describe a parallel algorithm (for rebalancing) for the simpler case |Cq | ≤ 1 for all 1 ≤ q ≤ n. Each nonempty chain Cq has an associated processor Pi trying to insert ai at leaf bq . Processor Pi , for all i, creates a new internal node with key value of ai and with external nodes ai and bq as the children. This node is then made a child of the parent of bq . Thus, an external node corresponding to bq is replaced by a new internal node, increasing the height of this zero-height subtree to one. Processor Pi then stands by the internal node it just created. Then, the parallel algorithm for rebalancing works in stages. We will now describe the processing done in stage s. The algorithm works such that after stage s, for all s, all internal nodes of height less than or equal to s satisfy the balance properties of the AVL trees. Thus, at the beginning of stage s, any “old” node of height s − 1 can have its height increased by at most one and has a processor standing by. If the height of its parent is already s + 1, the parent was out of balance and the new insertions restored the balance at the parent. If so, the processor quits after updating the balance field of the parent. Note that if there are any insertions in the other subtree, any rebalancing needed in succeeding stages at the parent will be handled by a processor climbing up that subtree. On the other hand, if the parent were skewed toward the child whose height increased, the processor would need to perform a rotation at the parent. If a simple single or double rotation is applicable, then the processor performs the required work and becomes inactive since the height of the subtree does not change. Figure 1 illustrates this case where multiple insertions took place in the subtree T1 and caused the height of the node marked with key A to increase. One of the processors inserting in the subtree T1 is standing by at this node and performs the required LL rotation to restore the balance at node B before becoming inactive. To restore the balance at a node in AVL trees, only one of its four grandchildren can increase in height by 1 for the four rotations to work. Even in our simple case of inserting
FIG. 1. A single rotation that restores the balance after multiple insertions.
152
MEDIDI AND DEO
FIG. 2. Rotation which increases the height after multiple insertions.
chains of length 1, however, we can come across situations where two grandchildren have increased in height as shown in Fig. 2. Thus, when multiple insertions take place, rebalancing at a node v can increase its height. In such a case, the processor stands by the node v even after the rotation for further rebalancing at its ancestors. The only other case left is when the parent was balanced and its height increased from s to s + 1. Note that at least one of its two children will have a processor standing by it, and the one associated with the leftmost child remains active at the parent for any rebalancing needed later. The processing required, for any rebalancing and updating any pointers, in stage s by the active processors takes only O(1) steps and affects disjoint portions of the tree. At the end of stage s, all nodes of height less than or equal to s satisfy the balance conditions and all nodes of (previous) height s whose height has increased have a processor standing by them. At the end of at most d2 log ne stages, all nodes, and hence the tree, will be balanced. The AVL tree, in this special case of multiple insertions, can thus support p insertions with p processors on the EREW PRAM model in O(log n) steps. To insert a longer chain C j = a f · · · al at leaf b j , we can reduce the problem to inserting shorter chains. To insert C j , we first insert the middle element am (m = d( f + l)/2e) at leaf b j . The two subchains a f · · · am−1 and am+1 · · · al are then recursively inserted at leaves am and b j , respectively. The middle element of all chains can be inserted in parallel by the simple insertion algorithm described earlier. The chains need to be split only dlog pe times and the insertion of shorter chains needs to wait only an interval of three stages to ensure that overlapping rebalancing does not occur. Thus, with this pipelined approach, we can insert p elements into an AVL tree and restore it with p processors on the EREW PRAM model in O(log n + log p) steps. 5. DELETIONS In this section, for clarity we omit details about rebalancing after deletions that are similar to those in insertions; we will only identify features and present details regarding deletions that are significantly different. Rebalancing, after deletions, in an AVL tree is markedly more difficult than that in 2–3 trees and complicates the parallel algorithm. As a first step in deleting the elements
PARALLEL DICTIONARIES USING AVL TREES
153
a1 < a2 < · · · < a p from the AVL tree T , we run the parallel search algorithm with p processors. Similar to the insertion algorithm, we have elements of chain Cq which arrive at leaf bq and fall to its left for 1 ≤ q ≤ n. For each nonempty chain Cq (1 ≤ q ≤ n), the processor associated with the rightmost element checks if its element is the same as the one stored in the leaf bq . If so, the processor marks the leaf bq for deletion and stands by it. All other elements in each chain, clearly, are not present in the tree and their processors become inactive. The deletion algorithm thus seems simpler as only one element of each chain will delete a node in the AVL tree. Note however, that the rebalancing operations will work only when the height of any subtree decreases only by one. But the multiple deletions we are attempting can possibly delete all the keys in a subtree and complicate the parallel deletion algorithm. We first present the case when there exist at least two leaves between every consecutive pair of leaves marked to be deleted. First, all the marked leaves are deleted in parallel. Further, the parent of this leaf is deleted; the other child node is now made a child of the parent of the deleted interior node. The processor then marks and stands by the parent node to take part in the rebalancing needed. Note that since there are at least two leaves in between any two deleted leaves, there is at least one interior node that separates any two deleted interior nodes; hence, simultaneous node deletions still leave the tree in a consistent state. The parallel rebalancing algorithm, again, proceeds in stages. The rebalancing algorithm works such that after stage s, for all s ≥ 1, each marked node has a processor standing by it and the subtree under it satisfies the AVL tree balance conditions. Clearly, each unmarked node of (previous) height s at stage s is a root of a proper AVL tree. Each node v of height s + 1 in T which has a marked child has subtrees which may have shrunk by 1 in height. The processor associated with the rightmost marked child takes over rebalancing at node v. If both children of v have decreased in height, then no rebalancing is needed in this stage and the processor marks v to take part in the subsequent stages. If only one of the children has shrunk in height, standard AVL tree rebalancing is performed to account for the deletions under it. If the height of node v has decreased due to rebalancing, node v is marked and a processor stands by it. Note that the other child may get marked in the current stage if there are any deletions in the subtree under it and will be rebalanced. See [15] for details about rebalancing. Thus, the AVL tree will be balanced in O(log n) stages in this special case of p deletions with p processors. There are no read or write conflicts in the steps described, and hence this simple parallel deletion algorithm can be implemented on the EREW PRAM. Now we can present the parallel rebalancing algorithm for the general case, which runs in phases. To ensure that there exist two unmarked leaves between any two consecutive leaves marked for deletion, we first compact the nonredundant deletions using a parallel prefix operation and then delete every third key in the sequence of keys to be deleted in each phase. Thus, we delete at least one-third of the leaves that remain to be deleted in each phase. Therefore, this algorithm requires O(log p) phases with each phase requiring at most O(log n) stages. Similar to the parallel insertions, we can pipeline the phases and achieve an O(log n +log p)-time algorithm for deleting p elements with p processors of the EREW PRAM. However, an interval of six stages is needed between any two phases to avoid overlapping restructurings. Hence we have the following.
154
MEDIDI AND DEO
THEOREM 4. An n-element AVL tree can support p dictionary operations optimally in O(log n + log p) time with p processors under the EREW PRAM model. 6. OTHER OPERATIONS Construction. Since an n-element complete binary tree satisfies the AVL tree properties and since the shape of this tree is regular, and hence predictable, it can be constructed from a sorted list in O(log log n) time using n/ log log n processors under the EREW model [4, 6]. Priority Deque Operations. Several researchers have proposed various parallel data structures to support priority queue and even priority deque operations (see [16] for related literature). An AVL tree can also be used to support priority deque operations; however, the pointer overhead makes this option less appealing, in sequential computing, compared to implicit structures such as heaps. By augmenting the AVL tree so that at every internal node we keep track of the number of elements in the left subtree, a simple parallel algorithm akin to the parallel search will enable us to extract the k maximum (or minimum) elements from the AVL tree in O(log n + log p) time using p processors on the EREW PRAM. Actually, with such augmentation, even multiple selections can be performed within the same resource bounds. The parallel algorithms for insertions and deletions have to be slightly modified such that, during the ascent phase of rebalancing, processors have to update the number of elements in the left subtree of the nodes visited. 7. CONCLUSIONS We have presented parallel algorithms to perform p searches, insertions, or deletions in an n-element AVL tree with p processors on the EREW PRAM, the weakest of the PRAM models, in O(log n + log p) time. Hence, compared to the sequential algorithms to perform p operations, our parallel algorithms achieve linear speedup with p processors, for all p ≤ n. We provided an explicit processor scheduling in our parallel algorithms to avoid simultaneous reads, as opposed to similar work on 2–3 trees by Paul et al. [17] who need to simulate the simultaneous reads on this EREW PRAM model which prohibits them. Since 2–3 trees (and in fact, all height-balanced trees) can be binarized as red– black trees [8] with rebalancing operations similar to the rotations of the AVL trees, our processor scheduling and algorithms can be adapted to provide parallel dictionaries on all height-balanced trees using the dichromatic (red–black) framework. We believe that the chaining technique and the processor scheduling identified in this paper are useful in other parallelization contexts. For example, using these techniques, we obtained a novel and optimal algorithm for parallel merging [5] whose number of comparisons is within lower-order terms of the minimum possible. We are currently working on mapping parallel dictionaries onto more realistic models such as hypercubes and address load balancing issues. REFERENCES 1. G. M. Adel’son-Vel’skii and E. M. Landis, An algorithm for the organization of information, Soviet Math. Dokl. 3 (1962), 1259–1263.
PARALLEL DICTIONARIES USING AVL TREES
155
2. J. L. Baer and B. Schwab, A comparison of tree-balancing algorithms, Comm. Assoc. Comput. Mach. 20, No. 3 (March 1977), 322–330. 3. R. J. Cole, Parallel merge sort, SIAM J. Comput. 17, 4 (August 1988), 770–785. 4. S. K. Das and K. B. Min, A unified approach to construction of height-balanced trees, J. Parallel Distrib. Comput. 27 (1995), 71–78. 5. N. Deo, A. Jain, and M. Medidi, An optimal parallel algorithm for merging using multiselection, Inform. Process. Lett. 50 (1994), 81–87. 6. N. Deo, A. Jain, and M. Medidi, Parallel construction of (a, b)-trees, J. Parallel Distrib. Comput. 23 (1994), 442–448. 7. C. S. Ellis, Concurrent search and insertion in AVL trees, IEEE Trans. Comput. C-29, No. 9 (September 1980), 811–817. 8. L. Guibas and R. Sedgewick, A dichromatic framework for balanced trees, in “Proc. 19th Annual Symposium of Foundations of Computer Science,” pp. 8–21, 1978. 9. E. Horowitz and S. Sahni, “Fundamentals of Data Structures in Pascal,” Computer Science Press, New York, 1990. 10. J. JáJá, “An Introduction to Parallel Algorithms,” Addison–Wesley, Reading, MA, 1992. 11. P. L. Karlton, Performance of height-balanced trees, Comm. Assoc. Comput. Mach. 19, No. 1 (January 1976), 23–28. 12. H. T. Kung and P. L. Lehman, Concurrent manipulation of binary search trees, ACM Trans. Database Systems 5, No. 3 (September 1980), 354–382. 13. U. Manber, Concurrent maintenance of binary search trees, IEEE Trans. Software Engrg. SE-10, No. 6 (Nov. 1984), 777–784. 14. U. Manber and R. E. Ladner, Concurrency control in a dynamic search structure, ACM Trans. Database Systems 9, No. 3 (Sept. 1984), 439–455. 15. M. Medidi and N. Deo, “Parallel Dictionaries on AVL trees,” Technical Report CS-TR-94-09, Dept. of Computer Science, University of Central Florida, Orlando, FL, January 1994. 16. M. Medidi and N. Deo, An optimal data structure for parallel priority deques, J. Comput. Software Engrg, (Special Issue on Parallel Algorithms and Architectures), to appear. 17. W. Paul, U. Vishkin, and H. Wagener, Parallel dictionaries on 2–3 trees, in “Proc. ICALP 154 (July 1983),” pp. 597–609. [Also R.A.I.R.O. Inform. Theor./Theoret. Inform. 17 (1983), 397–404.]
MURALIDHAR MEDIDI is an assistant professor in computer science and a co-founder of the Center for Data Insight at Northern Arizona University, Flagstaff, AZ. Dr. Medidi received his Ph.D. in computer science from the University of Central Florida, Orlando; his Masters in computer engineering from the Indian Institute of Technology, Kharagpur; and his Bachelors in electronics and communications from J. Nehru Technological University, Hyderabad. His research interests include graph algorithms, data mining, simulation, and parallel computing. NARSINGH DEO holds the Charles E. Millican Eminent Scholar’s Chair in Computer Science at University of Central Florida, Orlando. He is also the Director of UCF’s Center for Parallel Computation. Prior to accepting the current endowed chair in 1986, he was a professor (and the chairman for a three-year term) of computer science at Washington State University, Pullman, for nine years. Narsingh Deo received his Ph.D. (EE) from Northwestern University. A Fellow of IEEE and a Fellow of the ACM, Dr. Deo has authored four textbooks and some 150 papers in graph theory, discrete optimization, combinatorial algorithms, and parallel computation. Received February 17, 1997; revised January 7, 1998; accepted January 10, 1998