COMPUTER VISION, GRAPHICS, AND IMAGE PROCESSING 42, 3 7 1 - 3 8 0 ( 1 9 8 8 )
Parallel Processing of Regions Represented by Linear Quadtrees* S. K. BHASKAR AND AZRIEL ROSENFELD
Centerfor Automation Research, Universityof Marylanc~ CollegePark, Maryland 20742 AND ANGELA Y. W U
Department of Computer Science and Information Systems, American University, Washington, D.C. 20016 Received August 20, 1987; accepted December 23, 1987 We show how computation of geometric properties of a region represented by a linear quadtree can be speeded up by about a factor of p by using a p-processor CREW PRAM model of parallel computation. Similar speedups are obtained for computing the union and intersection of two regions, and the complement of a region, using linear quadtree representations. 9 1988 Academic Press, Inc. 1. INTRODUCTION
Various compact data structures for representing regions have been developed; they include border codes, run codes, medial axis transformations, and quadtrees [1]. Efficient algorithms have been developed for computing geometric properties of regions from these representations, and computing the representations of Boolean combinations of sets directly from the representations of the component sets. In earlier papers [2-4] the authors have shown how region processing can be speeded up, using various models of parallel computation, when the regions are represented by run codes, border codes, or medial axes. In many cases, p-fold speedups are possible using p processors. This paper presents similar results for the quadtree representation. A region quadtree [5-7] is a hierarchical data structure for storing digital images. It decomposes a 2 m X 2 m image that is not homogeneous into four equal quadrants. Each nonhomogeneous quadrant (block) is decomposed into subquadrants . . . . . until each block is homogeneous. For a binary image, a homogeneous block contains all l's (black region) or all O's (white background). Each block can be represented as a node in a tree of degree four, and the block's four quadrants correspond to the node's four children. Thus the root (level 0) represents the entire image of size 2rex 2m; a node at level k represents a square region of size 2 m-k x 2 m-~. The leaf nodes are either black or white. The interior nodes are called gray nodes. Whenever no confusion arises, we will use the terms block and node interchangeably. Figures la, b show a binary image and its quadtree. A quadtree representation that explicitly stores the tree structure (referred to as a pointer-based quadtree) requires a large amount of space to store the gray nodes and the pointers from these nodes to their children. In this paper we will use a *The support of the U.S. Air Force Office of Scientific Research under Grant AFOSR-86-0092 is gratefully acknowledged, as is the help of Sandy German and Barbara Bull in preparing this paper. 371 0734-189X/88 $3.00 Copyright 9 1988 by Academic Press, Inc. All rights of reproduction in any form reserved.
372
BHASKAR, ROSENFELD,
BI
AND WU
Bz
,@@
F
0
I
2
3
(~) Re,on
A1
Sorted:
5
6
7
8
i J
B5 A2 B6 A3
B9 BI0 As B .
(b) Quadtree
White: Bl = 000 B~ = 1 2 0 B 2 = 001 B10 = 121 B3 = 002 B l l = 123 B4=01 B z 2 = 13 Bs = 020 B13 = 20 B 6 = 022 B . = 22 B7=10 BIs=23 B s = 11
Black A 1 ~ 003 A2=021 A3=023 A 4 ~ 03 A 8 = 122 A6=21 A7 = 3
(c) Linear Quadtree (all black nodes) FIG. 1.
4
(d) Codes for the white nodes.
A b i n a r y i m a g e a n d its q u a d t r e e s .
pointerless quadtree representation called the linear quadtree [8-11]. We use the linear quadtree notation in [8] in which only the "black" leaf nodes are stored. The "white" leaf nodes and the gray nodes are not stored. Each node is represented by a code obtained as follows. The four quadrants: NW, NE, SW, and SE, into which the region is divided by each gray node are assigned codes of 0, 1, 2, and 3, respectively. The code for a block in a quadtree is obtained by tracing the path from the root to the node. Starting with a null string, append a 0, 1, 2, or 3 as the path moves from a node to its NW, NE, SE, or SW child. Figures lc and d show the encoding of the leaf nodes in lb. We assume that the linear quadtree appears in the form of a list in which each black node is represented by its code. We further assume that these codes are listed in lexicographicaUy sorted order. This corresponds to a preorder listing of the black nodes of the quadtree.
PARALLEL PROCESSING OF REGIONS
373
In a linear quadtree, no pointers are used. Since only the codes of the black nodes are stored, the space and time complexity of algorithms depend only on the number of black nodes. This is often more efficient than using quadtree representations where both black and white leaf nodes are stored. If the smallest black square is at level k of the tree, then its size is 2 m-k • 2 m-~, its code is a string of length k and the height of the quadtree is k. Note that there may be as few as one black node in the quadtree, but the number of black and white nodes is at least 3k, i.e., three times the height of the quadtree. In this paper we present parallel algorithms to compute region properties such as perimeter, area, and moments of the black regions and to perform set operations such as union, intersection, and complement on binary images represented by linear quadtrees with black nodes. The model of computation we use is the shared memory C R E W P R A M [12, 13]. Parallel processing of quadtrees using different models (such as mesh connected networks, shuffle exchange networks) is discussed in [15-18]. These references use the full region quadtree or use both the black and white nodes of the linear quadtree which can have many more nodes than the black node linear quadtrees used in this paper. The CREW PRAM machine has p processors with common access to the same global memory. Any number of processors can all read the same memory simultaneously in constant time (O(1) steps). Any processor can write to a memory location in O(1) time, but concurrent write to the same memory location is not allowed. In many of our algorithms, a weaker model of computation may be used without sacrificing efficiency. 2. REGION PROPERTIES
Given a linear quadtree representing a region, with n black squares and p processors with p < n. In this section we find the perimeter, area, center of gravity, and moments of the region. We will show that a parallel computation model weaker than the PRAM can also execute our algorithms. 2.1.
Perimeter
Each processor is responsible for n / p nodes and executes the following concurrently with other processors: For each of the n / p nodes in processor i do: Let R be the next node to be processed, with code r l r 2 . , . r k The size of R is 2 " - k • 2 " - k . Add the perimeter of R ( = 4 9 2 m - k ) to an aggregate maintained by processor i. When it is discovered that some side of R (or a part of it) does not contribute to the perimeter, the appropriate length is subtracted from the aggregate. The codes of R's neighbors of size 2 m - k x 2 m - k ill each of the four directions are determined in O ( k ) time from r l r z . . . rg by using the procedures outlined in [4]. Suppose the neighbor A in direction D (D ~ (N, S, E, W}) has code s i s 2 . . . s k. Processor i can then search the linear quadtree for s i s 2 9 9 9 s k. The search terminates if: (1) A code equal to some proper prefix of s 1 99 9 s k is found (i.e., A is a subset of some other black node. This implies that R has a neighbor in direction D that is larger than itself. In this case a value equal to twice the size of R ( = 2 9 2 n-k) is subtracted from the aggregate. (Twice, because the larger neighbor will not subtract the common border from its aggregate.)
374
BHASKAR,
ROSENFELD,
AND
WU
I
! Re R7
R,~
B
A Rt R$
RZ
C
D R2
R6
E
R8~00"2 R7 =003
R4~l R5=9 Rl=30 R 3 = 310 R2=320 R 6 ~ 330
FIGURE 2
For an example, see Fig. 2. The W neighbor of R1 is 21, which is part of the larger square R5 (= 2). Since R1 would have contributed A D to the perimeter and R5 would have contributed A E , we subtract A D twice--once to account for the contribution from R1 (which should not have been made) and once more to account for the partial contribution of only D E from R5. The sequence of additions and subtractions for A E is: + A E (from R5) + A D (from R1) + D r (from R2)
(from R1) - 2 D r (from R2)
-2AD =
AE
-
AD
-
DF
=
EF.
(2) A code containing sis 2 9 .. s k as a proper prefix is found. This implies that A is a gray node and one of its black descendants is the neighbor of R in direction D. The size of this descendant is smaller than that of R, and no adjustment to the aggregate is necessary. For example: The E neighbor of R5 is 3, but the first node to be found in a search for 3 is R1 ( = 30), which will itself make the adjustments to the aggregate. (3) No neighbor in direction D exists: In this case, no adjustment is necessary. For example, the E neighbor of R7 is 012, which is not part of the linear quadtree. (4) The neighbor in direction D is of the same size as R. We subtract its length from the aggregate. For example, the W neighbor of R7 is R8. If t is the depth of the smallest node, the search for a neighbor can be done in time O(tlog n), because the linear quadtree is ordered and we can use a binary search on it. The time necessary to determine the neighbor of a node in direction D is O ( t ) . Thus, the aggregate in processor i is found in time
The aggregates in the p processors can be summed in O(log p) time. Thus the time needed to find the perimeter is O(log p + (n log n i p ) t ) , where n is the number of black nodes, p is the number of processors, and t is the depth of the smallest node ( = length of the longest code).
PARALLEL PROCESSING OF REGIONS
375
It is possible to improve this time in the following ways: (1) If enough memory is available, we can create a table which is directly indexed by the codes of the nodes. The number of entries in this table would be ( 4 / + 1 - - 1)/3, where t is the height of the quadtree. It is now possible to indicate the presence or absence of a black node by placing a mark in the table entry for that node, and instead of the O(t log n) time needed to search for a neighbor, only O(t) time is needed. For example, the W neighbor of R2 is 231, and we need to check for the presence or absence of marks in table entries 2, 23, and 231 in that order. The running time then becomes O(tn/p + log p). (2) In our perimeter algorithm, the adjustments to the perimeter sums are made by the nodes finding neighbors of equal or larger sizes. The search in (1) can stop after k steps where k is the depth of the black node being processed (i.e., the size of the node is 2 m-/~ • 2m-k). If there is a fair mix of large nodes (short codes) and small nodes (long codes), we can distribute the nodes so that the small nodes are evenly distributed among the processors. In this case the running time per processor is logp + ~..,i~ ~,/pt. l W~i , where k; is the length of the code for node i. In our algorithm, the only time a processor needs the global memory is to search the quadtree to find neighbors. If each processor has a copy of the quadtree, then no shared memory is needed. Thus any network of processors which allows fast summing of the contributions of the processors (e.g., a tree network) can execute the algorithm without losing efficiency. The algorithms in the rest of this section do not even need the entire quadtree in each processor's memory. Each processor only needs the nodes it is responsible for. Thus for these algorithms too, any parallel model of computation that can rapidly compute the sum of the numbers output from the processors can be used. 2.2. Area The area Az of the image is ERAn,, where the R i a r e the black nodes. This is because the nodes in the quadtree correspond to disjoint squares. If we have n squares we can assign n/p squares to each of the p processors. In time O(n/p) each processor computes the total area of the black squares it contains and in a further log p time, the total area of the region may be found. Thus the total time to compute area is O(n/p + log p). 2.3. Centroid The centroid (if, fi) of a region R is
f.f xd , A
fRF dx, '
Y
A
where A is the area of the region. Now
fRfxdx* = E fxax., Ri
Ri
376
BHASKAR, ROSENFELD, AND WU
where the R i are the black nodes, because the squares do not overlap. Suppose R; is a (black) node of size a~ with its lower left comer at (p, q):
(p,q+ai)
(P+ai, q+ai)
I
(P, q)
(P + a i, q)
p+ai f;~q q+ai".,11,dX uy .1. = 89 Then fR, fxdxdy = f]=p + a21. Suppose we have n nodes in the linear quadtree. Assign n/p nodes to each processor. Each processor computes E f f x d x d y for all its black nodes in time O(n/p). In time O(log p) we can compute fR fx dx dy. The area is also computed in time O(n/p + log p). Thus, ~ ( and similarly y) can be computed in time O(n/p + log p).
2.4. Moments The moment of inertia of R about the origin is given by
+ y2)dxdy
=
s
+s
= ER i f Ri fx2dxdy + f Ri fy2dxdy, where the R; are the black nodes, because the squares do not overlap. Each integral fR~fX 2 dxdy can be evaluated in constant time. Thus, the moment of inertia about the origin can be computed in time O(n/p + log p). 3. SET OPERATIONS 3.1. Union Suppose LQx = (A 1, A2,..., At) and LQ2 = (Wl, B E. . . . , B~) are two black node linear quadtrees with the nodes listed in lexicographic order. To find the linear quadtree representing the union of the regions specified by LQI, and LQ2 , we merge the two lists and eliminate the duplicate entries. The list may contain a block A i which covers Bj, Bj+ 1. . . . . W h. These B's subsumed by another block must be deleted from LQ. Finally, in the list, all four quadrants of a block may be present and these four nodes must be replaced by their parent node. This merging replacement may be repeated. The algorithm to find the union proceeds as follows:
Step 1. Merge the two ordered lists LQ1 and LQ2 to obtain a lexicographically sorted list LQ. Using p processors, merging takes O((r + s)/p + log p) time [13, 14] with the shared memory model. Step 2. Eliminate duplicate entries in LQ. Each processor is assigned (r + s)/p consecutive entries of LQ. In O((r + s)/p) time, each processor deletes duplicates within its own list. It only remains for each processor to check if the last entry in the
PARALLEL PROCESSING OF REGIONS
377
list of the preceding processor occurs within itself, and if so, to delete it. (One processor precedes another if the items it contains precede those contained in the other.) Thus, this step is completed in time O((r + s)/p).
Step 3. Eliminate subsumed nodes. If the code of a node is the prefix of another, then the node with the shorter code subsumes the other. To eliminate subsumed nodes, the list LQ resulting after Step 2 is divided equally among the p processors. Each processor contains a sorted list of length O((r + s)/p). Each processor first scans its own list and eliminates subsumed nodes by executing the following: Assume that the list contained in processor p extends from LQ! to LQt a<- LQf for i ~ f + 1 to l do if LQi = a/3 for some/3 then delete LQ~
else a ~ LQi This can be completed in time O((r + s)/p). It remains to eliminate expressions of the form a/3 when a is present in some processor p~ and a/3 is present in pj ( j > i). To do this, we construct a binary tree with p leaves. Each of the leaves contains an (equal) portion of LQ. Each processor computes a pair (start, end) where "start" is the first element of the portion of LQ and "end" is the last element. (These may be the same.) Each node in the tree can store a (start, end) pair and an " E lira" field which contains a string (see Fig. 3). Intuitively, the Elim fields which are set in the bottom-up pass mean that a node contains a square which subsumes some set of nodes contained in its brother. In the top-down pass, the information collected about which nodes need to be eliminated is passed down to the descendant nodes. In each case, only the largest subsuming node is passed down. Starting from the leaf nodes, we move up to the root, setting the Elim fields as follows: for i: = log 2 p down to 0 do: for all processors pj at level i do (in parallel) if end(pj) = a and begin (right brother(pj)) = a/3 for some 13 then Elim(brother(pj)) ~ a. (This is because the node a/3 in brother(pj) is subsumed by the node a in pj.) if end(pj) -- a and end (right brother(pj)) = a/3 for some/3, then PAIR(father(pj)) ~ (start(pj), end(pj)) else PAIR(father(pj)) ~ (start(pj), end(right brother(pj))).
Next, in a top-down pass, the proper Elim fields are passed to descendants: fore i ~ 1 to log 2 p do: for each processor pj at level i do: if Elim(pj) = a then send a to whichever of pj's sons contains a as a prefix in its PAIR. If son Px of pj gets a and Elim(px) = a/3, set Elim(px) ~ a. (We eliminate the largest subsuming node.) If Elim(p~) is null, set Elim(px) to a.
Each of the processors Pi a t the leaves now eliminates all entries i n i t s part of the list which contain Elim(p~) as a prefix.
378
BHASKAR, R O S E N F E L D , A N D W U
endJ 0
001 002 003 010
003
Elim = 0
0 '
011 o 012 02 020 021 022 023 024 131 132
/012"~ ~ 02 ~
Elim ~
"
~
1
( 0 2 1 ~ k023j /
(~24~_
0
=
0
:
~
i
m
..~. 0
~
Elim = 02 Elim = 0 Elim
=
0
133
2 201
Elim = 0
202 30 301
Elim
---- 2
Elim = 2
302 303 310
9
Elim = 30 FIOu~ 3
This procedure is shown by an example and may be completed in time O(t((r + s)/p + log p)), where t is the length of the longest code ( = depth of the quadtrees). Step 4. At this step, the squares represented by all the nodes are disjoint and need to be condensed to obtain the final result. If we have four nodes whose codes differ only in the last bit, we replace them by a single node, dropping the last bit. The maximum number of condensations is O(t), where t is the height of the quadtree. At each level of the tree, we have O(r + s) nodes, which are divided among the p processors. Because the list is sorted, each processor needs to examine at most three elements from its preceding processor to determine if the preceding processor contains any nodes which could be condensed with its own nodes. Condensation stops when no further change is obtained. This takes time O(((r + s)t/p) steps.
PARALLEL PROCESSING OF REGIONS
379
3.2. Intersection Let LQ1 = (A i . . . . , At) and LQ2 = (B i . . . . , Bs) be as in Section 3.1. It is clear that the A i are mutually disjoint, and so are the B i. Thus the intersection of the regions represented by LQi and LQ2 is U~, j(A~ A Bj). Also, because the regions share the same coordinate system, and because of the nature of quadtree partitioning, for all i, j either A i A Bj is empty or is the smaller of/1 i and Bj. Thus to find the intersection we need to db the following: (1) Merge LQi and LQ2 using p processors in time O((r + s)/p). (2) Perform the "Eliminate duplicate entries" step in the union algorithm. (3) Perform the "Eliminate subsumed squares" step in the union algorithm, except that each processor outputs the code of the square which is subsumed. The time complexity is O(t((r + s)/p + log p)), where t is the length of the longest code. 3.3. Complement Given two successive elements A, B in the linear quadtree representation of a region, we can output the codes of all white leaf nodes which would lie between A and B in a preordered traversal of the quadtree, as follows: while A 4: B do (Note: A < B lexicographically.)
begin Let i be the first index such that A i 4: Bi. If no such i can be found because A is not long enough, concatenate a 0 to A till such an i can be found, and output A while A i 4: B i do increment A by 1 (mod 4). By this, we mean that whenever a 4 is obtained, it is replaced by a 0 and a carry is passed to the previous bit. Remove all trailing 0s, and if A~ 4: B~, then output A. end
Suppose a linear quadtree T has n black nodes. Divide these equally among the p processors, with the last node in each processor repeated on the next processor, and use the procedure above to list the white nodes between any two black nodes. The time taken by the procedure is O(t), where t = max(length of A, length of B). Thus, the complement can be found in time O(maxi~j_
380
BHASKAR, ROSENFELD, AND WU
d i s c u s s i o n i n Section 1). T h u s a n O ( m ) algorithm using linear quadtrees which c o n t a i n all the m leaf nodes (white a n d black) implicitly has a time complexity of O ( t n ) , where n is the n u m b e r of black nodes only. W e have also shown how to c o m p u t e the u n i o n a n d intersection of two regions represented b y linear quadtrees i n time O ( t ( ( n 1 + n 2 ) / p + log p)), a n d the c o m p l e m e n t of a region i n time O((n/p)t). T h e algorithms assume a C R E W P R A M model of c o m p u t a t i o n , b u t weaker m o d e l s w h i c h support tree-like c o m m u n i c a t i o n between processors could also be used. REFERENCES 1. A. Rosenfeld and A. C. Kak, Digital Picture Processing, 2nd ed., Vol. 2, Chap. 11, Academic Press, New York, 1982. 2. A. Y. Wu and A. Rosenfeld, Parallel Processing of Encoded Bit Strings, University of Maryland CS-TR-1455, November 1984. 3. A. Y. Wu, S. K. Bhaskar, and A. Rosenfeld, Parallel Processing of Region Boundaries, University of Maryland CS-TR-1573,November 1985. 4. A. Y. Wu, S. K. Bhaskar, and A. Rosenfeld, Parallel Computation of Geometric Properties from the Medial Axis Transform, University of Maryland CS-TR-1691,August 1986. 5. H. Samet, The quadtree and related hierarchical data structures, ACM Comput. Surveys 16, 1984, 187-260. 6. G. M. Hunter and K. Steiglitz, Operations on images using quadtrees, IEEE-Trans. Pattern Anal. Mach. Intell. 1, 1979, 145-153. 7. A. Klinger, Patterns and search statistics, in Optimizing Methods in Statistics (J. S. Rustagi, Ed.), pp. 303-337, Academic Press, New York, 1971. 8. I. Gargantini, An effective way to represent quadtrees, Comm. ACM 25, 1982, 905-910. 9. F. W. Burton and J. G. Kollias, Comment on the explicit quadtree as a structure for computer graphics, Comput. J. 26, 1983, 188. 10. J. P. Lauzon, D. M. Mark, L. Kikuchi, and J. A. Guevara, Two-dimensional run encoding for quadtree representation, Comput. Vision Graphics Image Process. 30, 1985, 56-69. 11. D. J. Abel and J. L. Smith, A data-structure and algorithm based on a linear key for a rectangle retrieval problem, Comput. Vision Graphics Image Process. 24, 1983, 1-13. 12. J. C. Wyllie, The Complexity of Parallel Computation, Technical Report TR79-387, Department of Computer Science, Cornell University, 1979. 13. A. Borodin and J. E. Hopcroft, Routing, merging, and sorting on parallel models of computation, in "Proceedings, ACM 14th Sympos. Theory of Comput. 1982," pp. 338-344. 14. C. P. Kruskal, Search, merging and sorting in parallel computation, IEEE-Trans. Comput. 32, 1983, 942-946. 15. Y. Hung and A. Rosenfeld, Parallel Processing of Linear Quadtrees on a Mesh-Connected Computer, CS-TR-1817, University of Maryland, March 1987. 16. M. Martin, D. M. ChiaruUi, and S. S. Iyengar, Parallel processing of quadtrees on a horizontally reconfigurable architecture computing system, in "Proceedings, Int'l Conf. Parallel Processing, 1986," pp. 895-902. 17. G. G. Mei and W. Liu, Parallel processing for quadtree problems, in "Proceedings, Int'l. Conf. Parallel Processing, 1986," pp. 402-454. 18. G. G. Mei and W. Liu, Quadtree problems on a two dimensional shuffle exchange network, in "Proceedings, Comput. Vision Pattern Recognit., 1986," pp. 140-147.