Multiple Templates Access of Trees in Parallel Memory Systems

Multiple Templates Access of Trees in Parallel Memory Systems

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING ARTICLE NO. PC981426 49, 22–39 (1998) Multiple Templates Access of Trees in Parallel Memory Systems 1 ...

179KB Sizes 4 Downloads 39 Views

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING ARTICLE NO. PC981426

49, 22–39 (1998)

Multiple Templates Access of Trees in Parallel Memory Systems 1 Vincenzo Auletta, 2,3 Amelia De Vivo, and Vittorio Scarano 2,3 Dipartimento di Informatica ed Applicazioni “R. M. Capocelli,” Università di Salerno, 84081, Baronissi (SA), Italy

We study the problem of mapping the N nodes of a data structure on M memory modules so that they can be accessed in parallel by templates, i.e., distinct sets of nodes. In literature several algorithms are available for arrays (accessed by rows, columns, diagonals, and subarrays) and trees (accessed by subtrees, rootto-leaf paths, levels, etc.). Although some mapping algorithms for arrays allow conflict-free access to several templates at once (for example rows and columns), no mapping algorithm is known for efficiently accessing subtree, path and level templates in complete binary trees. In our paper, we first prove that any mapping algorithm that is conflict-free for tree/level template has (M/ log M) conflicts when access is done according to path template and vice versa. Therefore, no mapping algorithm can be found that is conflict-free on both path and tree (or path and level) templates. Our main result is an algorithm for mapping complete binary trees with N = 2 M − 1 nodes on M memory modules in such a way that: • the number of conflicts for accessing an M-node subtree, M adjacent nodes √ in the same level, or M consecutive nodes of a root-to-leaf path is O( M/ log M), • the load (i.e., the ratio between the maximum and minimum number of data items mapped on each module) is 1 + o(1), • the time complexity for retrieving the module where a given data item is stored is O(1), if a preprocessing phase of space and time complexity O(log N ) is executed, or O(log log N ), if no preprocessing is allowed. The algorithm can be easily generalized to complete binary trees of any size. © 1998 Academic Press

1A

preliminary version of this paper appeared in Proceedings of IEEE 11th International Parallel Processing Symposium, 1997, IEEE Comput. Soc. Press, New York. 2 Partially supported by Progetto MURST 40% “Efficienza di Algoritmi e Progetto di Strutture Informative.” 3 E-mail: [email protected], [email protected].

22 0743-7315/98 $25.00 Copyright © 1998 by Academic Press All rights of reproduction in any form reserved.

ACCESS OF TREES IN PARALLEL MEMORY SYSTEMS

23

1. INTRODUCTION be a shared memory multiprocessor machine and let be a data structure Let . Processors of read/write from/into the memory stored in memory modules of by using a bus or an interconnection network and concurrently access different sets of nodes (called templates) of . A conflict occurs if several processors try to access the same memory module to retrieve different nodes of the data structure. We say that an instance of a template has k conflicts if at most k + 1 different processors try to access the same memory module. such that access to is required Let be a parallel algorithm to be executed on according to given templates. Since different processors are not allowed to access the same memory module at the same step, the time to access the template nodes is proportional to the number of conflicts. Therefore, overall performance of is affected by the mapping algorithm used to store data onto the parallel memory modules. We study the problem of mapping the nodes of an N -item data structure on M memory modules so that they can be accessed in parallel by using templates, i.e., distinct sets of nodes. The problem can be viewed as a coloring problem where the distribution of nodes into memory modules is done by coloring the nodes with a color from the set {0, 1, 2, . . . , M − 1}. When the size S of an instance of the template (i.e., the number of nodes) is larger than M, we call an algorithm optimal that, for each instance produces O(S/M) conflicts. Moreover, when S ≤ M, algorithms that avoid conflicts, i.e., every node of the template is on a different memory module so that the access can be done in parallel without delay due to bottlenecks, are said to be conflict-free. In literature, several requirements for mapping strategies of data structures in parallel memory systems are outlined as follows: (i) Efficiency. The goal is to minimize, for each instance of the considered template(s), the number of read/write operations in memory (i.e., the number of conflicts for each instance). (ii) Versatility. Mapping strategies should allow “efficient” data access to an algorithm that uses different templates at once. (iii) Memory load balance. The data structure is stored in M memory modules: the strategy should balance the load of each memory module (i.e., the number of data items stored in each module). (iv) Efficient memory address retrieval. Efficient algorithms should be provided for retrieving the memory module where a given item y is stored. Several authors have investigated the problem of accessing array data structures in parallel memory systems where considered templates are rows, columns, diagonals, and subarrays (see [CH92, DS94, KP93] for recent results and a comprehensive bibliography). Mapping tree data structures in parallel memory systems has not received much attention (compared to arrays) [B93, CA91, DS94, DSP94, DSP95, GR87, W86]. In this case, the templates that are more often dealt with are: Tree template: (T-Template) any complete subtree of a given size; Path template: (P-Template) any root-to-leaf path. Level template: (L-Template) all the nodes within a level.

24

AULETTA, DE VIVO, AND SCARANO

In particular, in [DS94, DSP94, DSP95] two optimal algorithms for accessing complete binary trees using T-Template and P-Template are provided. Multiple templates access of data structures. If an algorithm uses two or more distinct templates to access the same data structure, mapping becomes a tougher problem. In fact, an optimal mapping strategy for one particular template can be extremely inefficient for other essentially different templates. Therefore, we would like to have a more versatile mapping algorithm that is able to mediate among different templates. Benefits are clear: there is no need for changing the mapping strategy if the different templates used avoid costly permutations. Recent research on mapping array data structures in parallel memory systems [KP93] has focused on the versatility of the mapping algorithm. In fact, algorithms are given that are conflict-free for several templates at the same time. On the contrary, (to the best of our knowledge) the problem of multiple templates’ access of tree data structures has not yet been considered. Our results. Our main contribution to the field is, then, a first step toward a “unifying” algorithm that maps an N -node complete binary tree on M memory modules providing efficient access to T-, P-, and L-templates. be the set of templates, U be a mapping algorithm, and I be an instance of a Let template t ∈ . We define def

CU (t) = max{number of conflicts achieved by U on I }. I

The cost of a multiple-template algorithm U is defined as def

Cost(U) = max CU (t). t∈

Let MAX(U) and MIN(U) be the maximum and minimum number of nodes mapped by U in the same memory module, respectively. We define def MAX(U)

Load(U) =

MIN(U)

.

Let Retrieval(U) be the time required to retrieve the memory module where a node is stored. Given the inherently different structure of templates, it is reasonable that an algorithm designed for one template performs badly on the others. In fact, one of our preliminary results is that any conflict-free algorithm for a level/tree template has bad performances, i.e., Cost = (M/ log M) on a path template and vice versa. The challenge was, then, to find a mapping algorithm that could mediate between opposite needs. Dealing with multiple templates presents a certain amount of heterogeneity: different “logical” templates can have different sizes. A good example are the three templates presented previously for complete binary trees: levels can have as many nodes as (N + 1)/2 while root-to-leaf paths all have the same size, i.e., log(N + 1). In an effort to standardize the efficiency measure for multiple templates access, we model the access

ACCESS OF TREES IN PARALLEL MEMORY SYSTEMS

25

FIG. 1. The finger access of tree, path, and level templates for complete binary trees.

similarly to the so-called finger updates [AGR94, GMPR77, HM82, K81]. What is given as input to access a template is a pointer (or finger) to a leader node and, from that node, an M-size instance of the template is retrieved. The leader for T-template is the root x of the M-size subtree; the leader for P-template is a node x and the instance is the final M-length subpath of the path joining the root with x; the leader for L-template is a node x and the instance is composed by the M nodes in the same level starting from x (see Fig. 1). Our main result is a mapping algorithm LABEL-TREE whose cost is s ! M , Cost(LABEL-TREE) = O log M where the set of templates is composed by the tree, path, and level templates and M memory modules are available to store an N nodes complete binary tree. We also prove that Load(LABEL-TREE) = 1 + o(1) and give an algorithm that retrieves the memory module where a given node y is stored in time, Retrieval(LABEL-TREE) = O(1) if a preprocessing phase can build an additional array of M items in time O(M) (which means one additional location used for each memory module), or Retrieval(LABEL-TREE) = O(log log N ) if no preprocessing is allowed.

26

AULETTA, DE VIVO, AND SCARANO

Organization of the paper. In the next section we introduce some needed notation. In Section 3, some preliminary results are shown; in particular, we prove how optimal mapping algorithms for one template perform when access is performed according to other templates. We anticipate here that, while algorithms for tree and level templates look essentially the same, substantial differences arise in performances when optimal algorithms for a path template are used for tree/level accesses and vice versa. The main result of our paper is in Section 4; we first describe the algorithm LABEL√ TREE when N = 2 M −1 and prove that its cost is O( M/ log M), its load is 1+o(1), and retrieval time is O(1) with moderately costly preprocessing or O(log log N ) otherwise. In Section 5, we describe the generalizations of the algorithm for complete binary trees of any size and some final remarks and open questions conclude the paper.

2. NOTATION AND DEFINITIONS In the following, let N = 2 M − 1 be the number of nodes of a complete binary tree and let M be the number of memory modules available. We show in Section 5 how to remove this condition. The size of the T-template is 2blog(M+1)c − 1, i.e., the largest complete binary subtree with less than or equal to M nodes and we denote by m = blog(M + 1)c the height of the T-template. 4 Whenever we say “tree,” in this paper, we mean a complete binary tree. In the following, assume the nodes of the complete binary tree to be labeled as in a levelby-level, left-to-right visit. Then if the root is node 0, children of node i are node 2i + 1 and 2(i + 1) (resp., left and right child) and the parent of i is di/2e − 1. Also, assume that the root is at level 0. be the color set. We call a conflict that occurs while accessing an instance of Let a tree (path or level) template a tree-conflict (path-conflict or level-conflict). We call an optimal algorithm for T-Template a T-O algorithm, an optimal algorithm for P-Template a P-O algorithm, and an optimal algorithm for L-Template an L-O algorithm.

3. SOME PRELIMINARY RESULTS In this section, we show the relationship between the three templates previously introduced. What we show is that the T-template and the P-template are, in a way, strictly nonhomogeneous, i.e., that efficient algorithms for storing data that are going to be accessed using one of the two templates have bad performances when the other template is used. 3.1. The Relationships between Tree and Path Templates First, we show that any T-O and P-O algorithm produces (M/ log M) conflicts when data are accessed using the other template. 4

Throughout the paper, logarithms are base 2 unless differently specified.

ACCESS OF TREES IN PARALLEL MEMORY SYSTEMS

27

LEMMA 1. Given M colors, an N -node tree B, and a T-O algorithm U , then there is at least one path on B with bM/mc path conflicts if algorithm U is used to color B with M colors. Proof. Let {C0 , . . . , C M−1 } be the color set. Any T-O algorithm colors the nodes of the first m levels using a different color for each node. Assume, without loss of generality, that node i is labeled with color C i , for 0 ≤ i ≤ M − 1. To color the (M + 1)/2 leaves of the M-node subtree rooted at node 1, algorithm U must necessarily use all the colors used in the subtree rooted at node 2 and color C 0 . This means that at level m there will be a node x 1 labeled C 0 and, therefore, that any root-to-leaf path in B that passes through x 1 is colored by prefix C0 , C 1 , . . . , C0 . Since the coloring is conflict-free on subtrees of M nodes, there are no conflicts on any path passing through x 1 until level 2m − 1. Now, if we repeat this argument for the subtree rooted at the left child of x 1 we can find another node x 2 colored with C0 on level 2m. In general, we can find a path and a sequence of nodes xi , 1 ≤ i ≤ bM/mc colored with color C0 at level i · m. LEMMA 2. Given M colors, an N -node tree B, and a P-O algorithm U , there is at least one M-node complete subtree with at least bM/mc tree conflicts if algorithm U is used to color B with M colors. Proof. Let us consider any root-to-leaf path and let C be the colors used in the last m = |C| nodes of the path. Let x be the node on this path at level M − m. Any path from x to a leaf uses colors in C, given the path optimality of U . Then the M-node subtree rooted in x uses only the colors in C. Thus, if one wants both P-Template and T-Template to be accessed, using an optimal algorithm U for one of the two templates, Cost(U) = (bM/mc). 3.2. The Relationships between Path and Level Templates What we show, now, is that any L-O and P-O algorithm produces (M/ log M) conflicts when data are accessed using the other template. LEMMA 3. Given M colors, an N -node tree B, and an L-O algorithm U , then there is at least one path on B with bM/mc path conflicts if algorithm U is used to color B with M colors. Proof. The proof is similar to that of Lemma 1. Notice that, since the level optimality of U , all the colors are used at level log(M + 1). Then, pick a path from the root to the node at level log(M + 1) that has the same color as the root. Let the path grow to the node at level 2 log(M + 1) that has the same color as the root and so on. Clearly this path has bM/mc path conflicts. LEMMA 4. Given M colors, an N -node tree B, and a P-O algorithm U , then there is at least one level on B with bM/2mc level conflicts if algorithm U is used to color B with M colors. Proof. The proof is similar to that of Lemma 2. In the last log(M + 1) levels of B the P-O algorithm can use log(M + 1) colors. Choose the M-node subtree rooted at node x (defined as in Lemma 2) and in its leaves there are ≥ M/2 log(M + 1) level conflicts.

28

AULETTA, DE VIVO, AND SCARANO

FIG. 2. The equivalence chart that summarizes the results in Section 3.

3.3. Putting All Together Notice that it is easy to show that a T-O algorithm has O(1) level conflicts. In fact, the reader can easily see that a mapping algorithm that has at most C tree conflicts has at most 3C = O(C) level conflicts since any M adjacent nodes in a level are the leaves of at most 3 M-node subtrees. Given this relationship between tree and level templates, we can draw a “complexity” chart for the three templates (shown in Fig. 2) that essentially indicated to us that the biggest challenge for a multiple template mapping algorithm was to reconcile the tree and the path templates, the level template being somewhat related to the performances of the tree template. The algorithm that we present in the next section is focused, therefore, on achieving good performances for the two templates (tree and path) that differ the most, the result for the level template being straightforward because of the observations above. 4. THE ALGORITHM LABEL-TREE In this section, we show the mapping algorithm LABEL-TREE and analyze the cost and the load. We first informally describe the idea of the algorithm and then we give the algorithm. The algorithm is designed to take care of tree and path templates, but, successively, we prove that simultaneous conflicts on path, tree, and level templates are p Cost(LABEL-TREE) = O( M/m ) and that the load is 1 + o(1). Finally, we give the algorithm for memory address retrieval. 4.1. The Idea of the Algorithm Roughly speaking, the behavior of the algorithm is to divide horizontally B in blocks of m levels and color the subtrees in each block with a P-O strategy that uses a set of k colors, with k to be specified later. The algorithm divides the color set into disjoint groups of k colors and assigns color groups to subtrees in the blocks in such a way that groups are reused as much later as possible in the path. Then, after M/k blocks we can reuse a group of colors. Since our tree has height M and we introduce at most one path conflict every Mm/k levels, we have O(k/m) conflicts on the paths. We later show a mapping between color groups and M-node subtrees such that there are O(M/k) tree conflicts for any M-node subtree. Thus, it is easy to show that the cost of the √ algorithm (i.e., the max of path conflicts and tree conflicts) is minimized when k = ( Mm).

ACCESS OF TREES IN PARALLEL MEMORY SYSTEMS

29

More formally, we partition horizontally B into dM/me blocks, named B0 , B1 , . . . , BdM/me−1 . Each block is composed of m levels, except the bottom one, which may contain a smaller number of levels. So each Bi consists of complete subtrees of height m whose root is on the level i · m. We also call the jth subtree of block Bi the subtree rooted at the j th node left-to-right on level i · m. into p subsets, G 0 , G 1 , . . . , G p−1 of k or k + 1 Then, we divide the color set colors, k to be specified later. Now, B can be viewed as a complete 2m -ary “macro-tree” of height H = dM/me, where every node (but the last level) is a complete “micro-subtree” of B of height m as follows: B0 is the root of , and for each i = 1, . . . , H − 1 the jth node of level i in corresponds to the jth subtree of Bi . Given a P-O algorithm for that assigns a color group to each node of , we describe a mapping algorithm LABEL-TREE whose cost is 



k M , Cost(LABEL-TREE) = O max m k

 .

4.2. Building Blocks of the Algorithm In this section we present some algorithms used by LABEL-TREE. First, we describe the algorithm that is used to label a small complete subtree by using a color group in √ such a way that there are no path conflicts and O( M/m) tree conflicts. Then, we show how to assign to each small tree a color group in such a way that the macro-tree is √ labeled with O( M/m) path conflicts among groups. Finally, we show how to put the pieces together in LABEL-TREE. MICRO-TREE LABELING ALGORITHM. We present, here, an algorithm that labels a micro-tree (complete subtree of height m) in such a way that there are no path conflicts √ and only O( M/m) tree conflicts; the algorithm is a modified version of the T-O algorithm given in [DSP94]. Algorithm MICRO-LABELING (S, F) takes as input a complete subtree S of height m and a color set F = { f 0 , . . . , f `−1 }, where def

` = h + 2m−L − 1 √ def def and L = blogd Mmec, h = 2 L − 1. The algorithm, shown in Fig. 3, consists of two steps. In the first step it assigns a distinct color to each of the nodes of the L = log(h + 1) top levels of S. To color the remaining part of S the algorithm traverses the tree level by level, starting from the root. Each time a vertex j is visited, if j is on level t ≤ m − L then MICRO-LABELING (S, F) colors all the descendants of j on level t + L − 1. Let the ith node of a subtree be the node labeled with label i in the standard level by level, left-to-right visit of the subtree. Let lc and rc be the subtrees rooted in the left child and right child of n, respectively. Let y be the ith node on level L − 1 of lc; if i = 0 then MICRO-LABELING (S, F) assigns to y a new color, otherwise it assigns to y the same color of the (i − 1)th vertex of rc. In analogue manner we have the colors for the nodes at level L − 1 of rc. MACRO-TREE LABELING ALGORITHM. Now, we describe an algorithm to color the }. macro-tree, i.e., the 2m -ary tree , using as a color set G = {G 0 , G 1 , . . . , G p−1 ,

30

AULETTA, DE VIVO, AND SCARANO

FIG. 3. Algorithm MICRO-LABELING.

√ In order to keep the cost bounded by O( M/ log M ), our algorithm uses the ROTATE operation (described in Fig. 4) on G i before the color group is assigned to a macro-node. As we will see in the proof of Theorem 1, ROTATE ensures that subtrees in the same block that are colored in the same way are not close. In Fig. 5 the algorithm MACRO-LABELING is given. Notice that root B0 is labeled with . the whole set of colors, Putting the pieces together. We divide the color set into p = bM/`c groups with k or k + 1 colors each, where k = bM/ pc. (Notice that k = 2(`).) There are (M mod k) groups with k + 1 colors while the remaining ones have only k colors. Each time MICROLABELING is called with a color group, it uses only ` colors available in the group. Putting the two algorithms together, we obtain the labeling algorithm LABEL-TREE shown in Fig. 6.

FIG. 4. Algorithm ROTATE.

ACCESS OF TREES IN PARALLEL MEMORY SYSTEMS

31

FIG. 5. Algorithm MACRO-LABELING.

4.3. Analysis Let us analyze, first, the number of tree conflicts produced by the algorithm MICROLABELING (S, F). LEMMA 5. Algorithm MICRO-LABELING (S, F) colors a complete binary tree S of √ height m with a set F of ` colors so that there are at most O( M/m) tree conflicts and no path conflicts. Proof. In the for loop of lines 6–17, variable c is always equal to j ’s color and, therefore, the largest color index of F used is 2m−L − 2 + h = ` − 1 which proves that MICRO-LABELING (S, F) never runs out of colors. Divide the set F into F1 , containing h colors (h as defined in LABEL-TREE), and F2 , containing the remaining colors. Let L be the number of levels of S colored by F1 .

FIG. 6. Algorithm LABEL-TREE.

FIG. 7. Example of the execution of MACRO-LABELING on

B where the first index is the group and the second is the offset as evaluated in LABEL-TREE.

32 AULETTA, DE VIVO, AND SCARANO

ACCESS OF TREES IN PARALLEL MEMORY SYSTEMS

33

By construction there are no path conflicts in S. In fact, there are no path conflicts above L (all distinct colors are used) and a node below level L is colored either using a color among the first h colors (lines 13 and 17) that does not give path conflicts or with a new color (lines 12 and 16) distinct from all its ancestors. About tree conflicts: nodes on top L levels are colored with different colors and therefore there is no tree conflict in the top. On each level t ≥ L, each color is used O(2t / h) times. Adding √ up on all levels, we have that each color is used at most O(M/ h) times. Since h = 2( Mm) the result follows. Trivially: with color set {G 0 , G 1 , . . . , LEMMA 6. Algorithm MACRO-LABELING colors √ } with O( M/m) path conflicts (that is optimal). G p−1 , We can now state the main result of the section. It turns out that, by minimizing the number of simultaneous conflicts on tree and path templates, level conflicts are also kept √ roughly at the same magnitude, i.e., O( M/m). THEOREM 1. Algorithm LABEL-TREE has p Cost(LABEL-TREE) = O( M/m) when access can be done according to path, tree, and level templates. √ Proof. Path-template: Algorithm LABEL-TREE achieves O( M/m) path conflicts: in fact, for each root-to-leaf path q, by Lemma 5, there are no path conflicts among vertices √ in q of the same block and, by Lemma 6, there are O( M/m) path conflicts among vertices of q in different blocks. Tree-template: About tree conflicts, consider an m-level subtree S of B: if S lies √ entirely in a block then, by Lemma 5, there are O( M/m) tree conflicts in S. It remains √ to prove that if S belongs to blocks Bi and Bi+1 , then S has O( M/m) tree conflicts. Denote by Si and Si+1 the nodes of S that lie in Bi and Bi +1 , respectively. By √ Lemma 5, Si holds O( M/m) tree conflicts. Notice also that there is no conflict between nodes in Si and Si+1 , the two parts being colored with different color groups, and, therefore, we only need to bound the number of tree conflicts in Si +1 . Let c be the number of levels of S in Si +1 . We remark that Si +1 is a forest of 2m−c complete subtrees which are labeled by LABEL-TREE with colors taken from the same group. But the set of colors assigned to each subtree is selected in such a way that, if a subtree of Si +1 is colored with colors 5 {x, (x + 1) mod k, . . . , (x + ` − 1) mod k}, then the next subtree is colored with the set {(x + 1) mod k, (x + 2) mod k, . . . , (x + `) mod k}. Thus, two subtrees of Si +1 that are colored with the same set of colors are k subtrees apart. In order to count the conflicts in Si +1 we distinguish between two cases, depending on whether c is greater than L or not. Let us first consider the case c ≤ L. By the algorithm MICRO-LABELING if the root of a subtree T is colored x then the other vertices of T are colored with colors {(x + 1) mod k, . . . , (x + 2c − 2) mod k}. Thus, the color x is used only in the subtrees whose roots are labeled with {x, (x − 1) mod k, . . . , (x − 2c + 2) mod k}. Since, by the ROTATE operation of LABEL-TREE, Here, we assume that the group has k colors; clerical changes occur if group has k + 1 colors and do not affect the result. 5

34

AULETTA, DE VIVO, AND SCARANO

there are at most 2m−c /k subtrees whose roots have the same color, we obtain that there √ are O(2c 2m−c /k) = O( M/m) nodes in Si +1 that are colored with the same color. Suppose, now that c > L. By the previous case we have that in the first L levels of √ Si +1 there are O( M/m) tree conflicts. Moreover, as noticed in Lemma 5, in the level t > L of each subtree of Si+1 there are O(2t / h) tree conflicts. Thus, the total number of tree conflicts in Si +1 is c X p p O(2t / h) = O( M/m), O( M/m) + 2m−c t=L+1

by our choice of h. Level-template: The number of L-conflicts on M adjacent nodes in a level is √ O( M/m) since as noticed at the beginning of Subsection 3.3, any M adjacent nodes in a level are the leaves of at most 3 M-node subtrees that, as proved before, are accessed √ with O( M/m) conflicts. 4.4. Load Analysis In this subsection we prove that our algorithm assigns each color to approximately the same number of nodes. In particular, we prove that Load(LABEL-TREE) is equal to 1 + o(1). Call T0 , T1 , T2m −1 the binary subtrees rooted on the first level of block B1 . Divide these subtrees in d2m / pe forests F0 , F1 , . . . , Fd2m / pe−1 , where Fr = {Tr p , Tr p+1 , . . . , T(r+1) p−1 } for r < d2m / pe − 1. Clearly, the last forest can contain less than p subtrees. We notice that all the roots of the subtrees of a forest Fr are colored with colors from different groups. Let C(r, c) be the number of nodes in Fr colored with c. What we want to find is an upper and lower bound to C(r, c), for any color c and any forest Fr . LEMMA 7.

For each forest Fr and for each color c it holds that  2mt C(r, c) ≥ (2 − 1) k+1 t=0   dM/me−2 X 2mt m . C(r, c) ≤ (2 − 1) k m

dM/me−2 X 

t=0

Proof. Without loss of generality consider the forest F0 and assume that G i = {g0 , g1 , . . . , gnum } is the group containing c. From MACRO-LABELING, we have that for each block B j all the nodes of Tl ∩ B j , for 0 ≤ l ≤ p − 1, are labeled with the color group G (l+ j) mod p and they are the only nodes in B j to be colored with this group. Thus, there is only one tree T 0 such that the nodes in T 0 ∩ B j are colored with G i . Let S1 , S2 , . . . , S2 j−1 be the subtrees of T 0 contained in B j . The algorithm LABEL-TREE assigns to the subtree Sl the colors {l mod num, l + 1 mod num, . . . , l + ` mod num}. Therefore, for each l, the num consecutive subtrees Sl , Sl+1 , . . . , Sl+` contain exactly 2m − 1 nodes colored with c. Since the size of the group can be either k or k + 1 we obtain that the number of nodes in B j colored with c is at most (2m − 1)d2 j−1 /ke and at least (2m − 1)b2 j−1 /(k + 1)c. Summing over all the blocks we obtain the result.

35

ACCESS OF TREES IN PARALLEL MEMORY SYSTEMS

THEOREM 2. The load of LABEL-TREE is 1 + o(1). Proof. By the previous Lemma we have that for each forest Fr , with r < d2m / pe−1, the number of nodes colored with c in Fr belongs to the interval  (2m − 1)

dM/me−2 X  t=0

  dM/me−2 X  2mt  2mt . , (2m − 1) k+1 k t=0

On the other hand, the last forest can be incomplete and, thus, there could be less than PdM/me−2 mt d2 /ke nodes colored c in it. However, since we are considering a (2m − 1) t=0 lower bound to the number of nodes with the same color, we suppose that there are no nodes colored c in this forest. Recall that, for each algorithm A, we use MAX(A) and MIN(A) to denote the maximum and minimum number of nodes mapped by A in the same memory module, respectively. Therefore we obtain that  MAX(LABEL-TREE)



 dM/me−2 X  2mt  2m m (2 − 1) p k t=0

and

 MIN(LABEL-TREE)



 dM/me−2 X  2mt  2m (2m − 1) . p k+1 t=0

Then Load(LABEL-TREE) is def MAX(LABEL-TREE)

Load(LABEL-TREE) =

MIN(LABEL-TREE)





 1   ≤ 1 +  2m   · p

dM/me−2 X  2mt

t=0 dM/me−2 X  t=0





 1   k+1 ≤ 1 +  2m   · k p 

k 2mt k+1



dM/me−2 X



 1   k+1 ≤ 1 +  2m   · k · p 



(2mt + k)

t=0 dM/me−2 X

(2mt − (k + 1))



t=0

   · 1 +  

3

 1   ≤ 1 +  2m   = 1 + o(1) p

dM/me−2 X t=0 dM/me−2 X

 2k + 1

(2mt − (k + 1))

t=0

     

36

AULETTA, DE VIVO, AND SCARANO

and the theorem follows. 4.5. Memory Address Retrieval We show here how to retrieve the color of a node of B. Notice, first, that the behavior of MICRO-LABELING for a set of colors F = { f 0 , f 1 , . . . , f `−1 } is such that, regardless of the colors in F, color f i is always assigned to the same nodes in the micro-tree. Then, we can precompute, in time O(M) a small table of size O(M) that gives, for each node j in the micro-tree the position in F of the color assigned to j by MICRO-LABELING. We emphasize that the space requirements are moderate: the table can be stored in the parallel memory system, one item per module, incurring a constant size increase per module. We also briefly show how one can do without preprocessing (and without the additional table) at the cost of an O(log log N ) increase in time complexity. THE ALGORITHM FOR RETRIEVAL. Let y be the label (left-to-right) of a node in B. If y ≤ 2m − 1, its color is trivially y itself. In general y is on the level l B = blog(y + 1)c and its position into this level is x B = y − 2l B + 1. We identify y with the pair (l B , x B ). To have the color of (l B , x B ), we must know: (i) (ii)

the color group G assigned to the macro-node S of containing y; the color of G assigned to the root of the micro-tree associated to S;

How to determine the color group of the macro-node. Given the level of y in B, l B , and its position within the level x B , node y is in the binary subtree of height M − m rooted at level m of B that has index t = bx B /2l B −m c and in the forest f = bt/ pc. The group used to color the macro-node S containing y is 

 t mod p +

lB m



 −1

mod p.

How to determine the root used in the group. Let G be the group determined in the previous step. It is enough to find the offset used by G to color S. The position on its level of the macro-node S whose subtree includes y is xS =

j

xB 2l B mod m

k

.

Let l S be the level of macro-node S in the macro-tree. Then  lS =

 lB . m

The forest index f gives the offset used by the ancestor of S on level one in the macrotree and the offset used by group G is offset = ( f + (x S mod 2m(l S −1) ) mod |G|. How to determine the color of y. In the introduction of the Section 4.5 we described how to build a small table (O(M) size) that contains for each node in a micro-tree its

ACCESS OF TREES IN PARALLEL MEMORY SYSTEMS

37

color, as assigned by MICRO-LABELING. As already noticed, this requires a preprocessing phase of O(M) space and time complexity. If such preprocessing phase is allowed, then the complexity of a memory address retrieval algorithm is O(1). Alternatively, we here outline an algorithm that, given a group G and the offset used to color S, gives the color of y in S (without preprocessing and additional memory) in time O(log log N ). Intuitively, if y is in the top part (levels v < L) of S then in O(1) time the color can be retrieved. If y lies in the bottom part (level v ≥ L) of S then, by a recursive algorithm we can reduce the retrieval of y’s color to a color that lies at most on level v − 1. Recursion is halted at level L − 1 at most and, therefore, the recursion levels are O(m) = O(log log N ) in total. 5. CONCLUSIONS √ We give a mapping algorithm that achieves O( M/m) conflicts for accessing path, tree, and level templates in a complete binary tree with N = 2 M − 1 nodes. The careful reader will recognize that our algorithm does not really depend on the height of the tree: it is enough to go on with the subdivision in blocks of m levels, as illustrated in Subsection 4.1, for the whole tree. The argument on the total cost follows easily. Our algorithm, to the best of our knowledge, is the first that addresses the problem of multiple template access for complete binary tree data structures. The algorithm has optimal load and efficient retrieval of memory addresses. We have also shown that conflict-free access for tree and path templates at the same time (and for level and path templates) cannot be achieved. The research in this field should now be focused on two goals. The first one is to find an algorithm for accessing a different set of multiple templates for complete binary trees. For example, “hybrid” templates, such as a small path followed by a complete subtree, look interesting. A rather more challenging open question that is left by our paper is the optimality of our algorithm: a lower bound on the number of conflicts of a mapping algorithm for a complete binary tree where path, level, and tree templates are used has eluded us, so far.

REFERENCES [ADDPS98]

V. Auletta, S. K. Das, A. De Vivo, M. C. Pinotti, and V. Scarano, Toward a universal mapping algorithm for accessing trees in parallel memory systems, in “Proceedings of IEEE 12th Joint International Parallel Processing Symposium and Symposium on Parallel and Distributed Computing (IPPS/SPDP),” Orlando, FL, IEEE Comput. Soc. Press, Los Alamitos, California, 1998.

[AGR94]

M. J. Atallah, M. T. Goodrich, and K. Ramaiyer, Biased finger trees and three-dimensional layers of maxima, in “Proc. of Symp. on Computational Geometry,” 1994.

[B93]

R. V. Boppana, On the effectiveness of interleaved memories for binary trees, in “Proc. of Architectures and Compilation Techniques for Fine and Medium Grain Parallelism (A-23),” pp. 203–214, Elseviers, Amsterdam/New York, 1993.

[CH92]

C. J. Colbourn and K. Heinrich, Conflict-free access to parallel memories, J. Parallel Distrib. Comput. 14 (1992), 193–200.

38

AULETTA, DE VIVO, AND SCARANO

[CA91]

R. Creutzburg and L. Andrews, Recent results on the parallel access to tree-like data structures— The isotropic approach, in “Proc. of Int. Conference on Parallel Processing,” Vol. 1, pp. 369–372, 1991.

[DP97]

S. K. Das and M. C. Pinotti, Conflict-free template access in k-ary and binomial trees, in “Proc. of 11th ACM Intern. Conf. on Supercomputing,” pp. 237–244, Vienna, Austria, July 1997.

[DP97b]

S. K. Das and M. C. Pinotti, Load balanced mapping of data structures in parallel memory modules for fast and conflict-free templates access, in “Proc. of 5th Internat. Workshop on Algorithms and Data Structures (WADS),” Halifax, Canada, Aug. 1997, in “Lecture Notes in Computer Science,” Vol. 1272, pp. 272–281.

[DPS96]

S. K. Das, M. C. Pinotti, and F. Sarkar, Optimal and load balanced mapping of parallel priority queues in hypercubes, IEEE Trans. Parallel Distrib. Systems, 7, 6 (June 1996), 555–564.

[DS94]

S. K. Das and F. Sarkar, Conflict-free data access of arrays and trees in parallel memory systems, in “Proc. of 6th IEEE Symp. on Parallel and Distributed Processing,” pp. 377–383, Texas, 1994.

[DSP94]

S. K. Das, F. Sarkar, and M. C. Pinotti, “Conflict-Free Access of Trees in Parallel Memory Systems and Its Generalizations,” Technical Report CRPDC-94-21, Department of Computer Sciences, University of North Texas, Nov. 1994.

[DSP95]

S. K. Das, F. Sarkar, and M. C. Pinotti, Conflict-free path access of trees in parallel memory systems with application to distributed heap implementation, in “Proc. of 24th International Conference of Parallel Processing,” Vol. III, pp. 164–167, 1995.

[GR87]

M. Gössel and B. Rebel, Memories for parallel subtree access, in “Parallel Algorithms and Architectures,” Lecture Notes in Computer Science, Vol. 299, pp. 122–130, Springer-Verlag, Berlin/New York, 1987.

[GMPR77]

L. J. Guibas, E. M. McCreight, M. F. Plass, and J. R. Roberts, A new representation for linear lists, in “Proc. of 9th ACM Symp. on Theory of Computing,” pp. 49–60, 1977.

[HM82]

S. Huddlestone and K. Mehlhorn, A new data structure for representing sorted lists, Acta Inform. 17 (1982), 157–184.

[LV82]

D. H. Lawrie and C. R. Vora, The prime memory system for array access, IEEE Trans. Comput. C-31, 5 (May 1982), 435–441.

[KP93]

K. Kim and V. K. Prasanna, Latin squares for parallel array access, IEEE Trans. Parallel Distrib. Systems 4, 4 (April 1993), 361–370.

[K81]

R. Kosaraju, Localized search in sorted lists, in “Proc. of 13th Annual ACM Symp. on Theory of Computing,” pp. 62–69, 1981.

[W86]

H. A. G. Wijshoff, Storing trees into parallel memories, in “Parallel Computing,” pp. 253–261, Elsevier, Amsterdam/New York, 1986.

VINCENZO AULETTA received his Laurea cum laude in scienze dell’informazione (computer science) from the University of Salerno in 1987. In 1991, he received his Dottorato di Ricerca (Ph.D.) in applied mathematics and computer science from the University of Napoli. He has been an assistant professor at the Dipartimento di Informatica ed Applicazioni “Renato M. Capocelli” of the University of Salerno since 1991. His research interests are design and analysis of sequential and parallel algorithms, approximation algorithms, communication networks, and parallel data structures. AMELIA DE VIVO received his Laurea in scienze dell’informazione (computer science) from the University of Salerno in 1993. In 1997, she received her Masters degree in information technology at the IASSS (Vetri— Italy). She is currently a consultant for a software company working on high performance software on massively parallel machines. Her research interests are parallel data structures, parallel algorithms, and parallelization techniques in compilers. VITTORIO SCARANO received the Laurea in scienze dell’informazione (computer science) from the University of Salerno in 1990. In 1995, he received the Dottorato di Ricerca (Ph.D.) in applied mathematics and computer science at the University of Naples. Vittorio Scarano visited the University Eotvos Lorand in Budapest, Hungary, in 1992 and, from 1992 to 1994, he visited the Department of Computer Science at the University of Massachusetts at Amherst doing research with Professor Arnold Rosenberg. He has been an

ACCESS OF TREES IN PARALLEL MEMORY SYSTEMS

39

assistant professor at the “Dipartimento di Informatica ed Applicazioni” of the University of Salerno since 1995. His research interests are theory of parallel computation, design and analysis of sequential and parallel algorithms, fault-tolerant algorithms, graph theory, and distributed multimedia on the Internet. Received March 25, 1997; revised January 18, 1998; accepted January 26, 1998