Geometric retrieval in parallel

Geometric retrieval in parallel

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 5,92-102 (1988) RESEARCH NOTE Geometric Retrieval in Parallel MARTIN DAVID KATZ California State U...

646KB Sizes 0 Downloads 68 Views

JOURNAL

OF PARALLEL

AND DISTRIBUTED

COMPUTING

5,92-102

(1988)

RESEARCH NOTE Geometric Retrieval in Parallel MARTIN DAVID KATZ California State University, Fullerton, California 92631

AND DENNIS

J. VOLPER

University of California, Irvine, California 92717 Received April 24, 1986 We present a data structure for parallel retrieval of the sum of values associated with points within a region of a plane. For a square grid containing n points, we are able to update a value or retrieve the sum of the values in a region in @log n) time, with only O(n”‘) processors. For points uniformly distributed in a square, a similar result holds for the expected time. Furthermore, the structure is such that when the number of processors is between 0 and n ‘I3, the speedup due to adding processors is linear. @ 1988 Academic Press, Inc.

1. INTRODUCTION

We describe a data structure appropriate for efficiently and quickly performing region queries on a multiprocessor machine, such as a MIMD. Assume that there are values associated with various points in a plane. A region query specifies a region and is answered by the accumulation of the values associated with points within that region. For example, one might need to determine the number of people within regions for traffic planning, population growth studies, or electoral redistricting. In this example the accumulation is the summation of the values associated with points which fall within the region, but our data structure is powerful enough to allow the use of other accumulative operators such as minimum and maximum. While many data structures limit the query regions to orthogonal rectangles, our structure allows general polygonal shapes. Our structure also allows efficient updates of the 92 0743-73 15/88 $3.00 Cop&M 8 1988 by Academic

F’ms, Inc. All right.5 of mpmduction in any form reserved.

GEOMETRIC

RETRIEVAL

IN PARALLEL

93

values associated with points. A survey of some common data structures supporting region queries can be found in [lo]. This paper is divided into two parts. In the first part (Section 2) we assume that the points form a grid. The complexity analysis for this part is the usual worst case analysis, in which we evaluate the algorithm given the worst possible input data. Points fall naturally into a grid in a number of important problems to which people are applying parallel computing including medical imaging, physics, and VLSI [2, 91. In the second part (Section 3) we assume a random distribution of the points. In this part we use probabilistic analysis to derive the expected cost of the algorithm, and the probability, given a particular input datum, that the cost will exceed a given constant times the expected cost. Kaufman and Denning [7] supply a suitable introduction to probabilistic analysis. Many parallel algorithms are inefficient in the sense that the increase in the number of processors is not accompanied by a proportional increase in the speed of solution. This was the case with the algorithm used to obtain the log n speed for a parallel solution of the minimal spanning tree problem [4]. Other parallel algorithms have a higher time-processor product than the bestknown serial algorithm. As an example, using systolic arrays for multiplication of matrices achieves a time-processor product of o(n3) [8], while Strassen’s serial algorithm achieves U(n2.*l) [ 11. The speedup for algorithms using our data structure is linear in the number of processors used. Further, our algorithm achieves a speed which is log n times the theoretical minimum for the problem [3]. We know of no serial algorithm with a better space time product than our parallel algorithm. 1.1. ComputationalModel We are given a collection of points in an N X N region of a plane. In the first case the points form a square grid, n = N2. In the second case we assume n = Np points are distributed according to a uniform probability distribution within the region. In both cases a value is associated with each point. The values are elements of a commutative semigroup. Two examples of commutative semigroups are maximum and addition over the nonnegative integers. We will refer to the semigroup operator as sum. The data structure permits efficient implementation of the following tasks: Retrieve(R): Compute the sum of the values associated with all points in region R. Insert(k, x): Add a point at location k with value = x. Delete(k): Remove the point at k. Update(k, x): Delete(k) follwed by Insert(k, x). Retrievaltime is defined as the time to retrieve a region. We measure this as the number of semigroup operations, because the traversal time is domi-

94

KATZ AND VOLPER

nated by the time to perform semigroup operations. Update time is defined as the time to update a single point. Again, we measure this in semigroup operations, because the time to perform the semigroup operations dominates the traversal time. The total space for our data structure is o(N813) for N2 points. The processors are assumed to be multiple instruction, multiple data. Each memory location is accessed by no more than one processor at a time. Each processor accesses a set of memory cells for updates, and another set for retrievals. The set of cells used by a processor for update (or retrieval) is disjoint from the set used by any other processor for update (or retrieval). Thus, memory can be arranged such that each processor is connected to two blocks of dual-ported memory, a configuration which makes physical implementation of a large system feasible. For simplicity, one can consider all of the memory as a single, multiported RAM memory. 1.2. General Approach Many data structures (e.g., a range query structure) have an orientation. Our approach is to build many of these structures with different orientations, each covering the same data. We call each of these oriented structures a rotation. Each rotation contains a substructure which solves a query with a given orientation. Each substructure also contains structures to extend the solution to other queries with closely related orientations. Ours is a three-level structure, each level based on a partition of the data into groups of decreasing geometric size. For a query region, the large partitions are used to retrieve the interior of the region leaving the edges to be retrieved using the smaller partitions. It is at the middle level that the rotations occur. The work reported here is based on [5, 61. The success of this approach suggests that multiple data structures with different orientations may be useful for other parallel processing problems. 2. POINTS ON A SQUARE GRID We are given an N X N grid of points (for example, at integer coordinates); thus, n = N2. Technically, the query region is a polygon; however, a polygon can be partitioned into triangles so it suffices to consider queries which are arbitrary triangular-shaped regions. The output of the query is the sum of the values associated with the points within such a triangle. 2.1. The Range Query Structure Our data structures will need to perform one-dimensional range queries in a number of situations. In all these cases the query structure is implemented

GEOMETRIC

RETRIEVAL

95

IN PARALLEL

as a binary tree [ 1 I], in which each leaf contains the value of a data element (e.g., the value of a point or the sum of values in a square), and each internal node contains the sum of the values of its children (see Fig. 1). In all our data structures the trees are arranged so that computation of leaf membership can be done in constant time. To update the value of an element, the value is replaced and then the value of each affected internal node is replaced by the sum of its two children. A retrieval is performed by traversing the tree from the root, adding the value of a node if the elements covered by the node are entirely inside the region, and traversing the children of the node if some of the elements covered by the node are inside the region. For example, updating the value of element 4 in Fig. 1 would change the values of the nodes currently containing the values 8, 1526, and 48. No more than one node on each level of the tree must be modified to perform an update. Retrievals use nodes as far up the tree as possible. No more than two nodes on each level of the tree will be used to perform a retrieval. Actually, at any level, up to four nodes may be traversed, however, two of them are traversed only to reach lower levels of the tree. For example, the retrieval of the sum of elements 3 through 7 can be done using the nodes containing the values 15, 12, and 6. Both update and retrieval time of the range query structure are O(log N) using a single processor.

2.2. The Data Structure We will partition the N X N grid into squares to convert the retrieval into smaller regions (see Fig. 3). The squares form N2j3 rows of N2j3 squares each. This choice of size for the squares is designed to achieve a balance between the retrieval costs of the different cases discussed in Section 2.4.

1

2

3 FIG.

1. Sample

4 range

5 query

6 data structure.

7

8

96

KATZ AND VOLPER

The retrieval algorithm used for a square depends on the square’s relationship to the edges of the triangle. A range query structure is associated with each row of squares to permit the retrieval of a contiguous subset of squares in the row. The value of a leafin this structure is the sum of the values of the points within the corresponding square. This structure will handle those squares lying completely within the triangle. Each square contains a halfphase substructure which is used when one or more edges of the triangle cross the square. 2.3. The Half-Plane Substructure Half-plane retrieval is the calculation of the sum of the values associated with points which lie to one side of a line in a square. This data structure also permits the value of a point to be updated. As constructed here, the structure may be used for the intersection of multiple half-planes within the square. Each square contains 2N213 rotations (rotated substructures) corresponding to angles evenly spread over one half circle. Each rotation is cut into 4N2j3 “rectangles” which are truncated at the edges of the square. The 4N2j3 rectangles of a single rotation are shown in Fig. 2. For each rotation, the rectangles form a linear array which partitions the square. Within each rotation, each point falls into exactly one of its rectangles. The rectangles of a rotation are organized into a range query structure. Each rectangle is associated with a leaf of this structure. The value of a leaf is the sum of the values of points within the corresponding rectangle. In Ref. [5] it is shown that, when the area of a rectangle is less than 1, the grid points which fall within the rectangle must be collinear. Because our rectangles have been designed to guarantee an area less than 1, the points within a rectangle are collinear. Again, because they are grid points, in addition to being collinear, they are also spaced at even intervals. Thus the points within a rectangle form an array and can be organized into a range query structure. This allows fast computation of queries regarding points within a particular rectangle.

FIG. 2. One rotation in the half-plane data structure.

GEOMETRIC

RETRIEVAL

IN PARALLEL

97

Thus we have three classesof range query structures: one for rows of squares with squares as leaves, one within squares with rectangles as leaves, and one within rectangles with individual points as leaves. Each row of squares has a structure of the first class, each rotation of each square has a structure of the second class, and each rectangle has a structure of the third class. Consider the intersection of a line with a single square. For a data structure with the described geometry, given any line and the rotation closest to the slope of that line, no more than five rectangles of a single rotation will overlap the line. Therefore, by selecting the correct rotation, a half-plane can be retrieved in O(log N) time, that is, a range query on the rectangles included in the half-plane and range queries on the individual points within each of the rectangles overlapped by the half-plane boundary. To update the value of a point, each of the N2j3 rotations must be updated separately. For each rotation this means first updating the range query structure of the rectangle within which the updated point lies. As a side effect this update produces an updated value for the sum of all points within the rectangle. This updated value for the rectangle can now be used to update the range query tree which has rectangles as its leaves. This has the side effect of producing an updated value for the square. This affects the range query structure for the single row containing that square, which may be updated in @log N) time. Thus, for a single processor, the update time is O(N2j310g N). If there are P processors ( 1 d P G N213)we have each processor update N213jP rotations for an update time of O((N2’310g N)/P). 2.4. Performing a Triangle Retrieval We superimpose the triangle on the lattice of squares and classify each square according to its relationship to the edges of the triangle. The following cases are possible: 1. The square is inside the triangle. 2. One edge crosses the square. 3. Two or three edges cross the square. To perform a retrieval, we determine which rows of squares overlap the triangular region and which squares in those rows contain portions of the edges of the triangle. The squares which are entirely contained within the triangle (Case 1; see squares labeled 1 in Fig. 3) are retrieved by the range query on the rows of squares. Each square which has only one edge passing through it (Case 2) is retrieved by half-plane retrieval. When two or more edges cross the square (Case 3), we retrieve the square in three steps. The first step is to select the rotation closest to one of the edges and retrieve all rectangles crossing this edge, called the alignment edge. Next, the rectangles in the interior of the triangle (not crossing an edge) are retrieved. Finally, the rectangles which overlap the other edge(s) are retrieved.

KATZ AND VOLPER

98

I: : :I: : :I: : :I: : :I: : :I: : :I: : :I: : :I: : :I

FIG. 3. Squares in the N squares.

X

N grid with a sample triangle and the corresponding classified

We examine the three cases separately to see that each takes O(N2’310g N) time. Case 1. At most all of the O(N2j3) rows fall within a triangle. Each row takes time @log N). Case 2. No more than O(N213) squares can be crossed by an edge. Using the half-plane retrieval structure (i.e., choosing the correct rotation) each of these squares requires @log N) time to retrieve. Case 3. Again, no more than U(N213) squares can be crossed by two or more edges. Retrieval of rectangles of each of these squares which lie in the interior of the region may be done in time @log N). Since the alignment edge crosses U( 1) rectangles per square, it crosses a total of O(N2j3) rectangles. A rectangle will be crossed by an other edge only if there is more than one edge in the square. This requires that the edges be within X&N”’ of each other. This limits the total number of rectangles crossed by an other edge across all squares to O(N2j3). The points in each of these rectangles may be retrieved in time @log N). Thus, a single processor could retrieve a triangle in O(N2’310g N) time. Assume we have P processors (1 G P =SN2”). Define f = rN2’3/Pl. As discussed in Section 2.3, we can achieve an update time of O((N21310g N)/P). For retrieval of squares inside the triangle (Case l), the range query structure of each row of squares may be independently traversed. We assign a processor to each f rows of squares to achieve a retrieval time of O((N21310g N)/P). For retrieval of squares crossed by one edge (Case 2), each square may be retrieved independently in time @log N). Thus, assigning a processor to each f squares achieves a retrieval time of O((N2’310g N)/P).

GEOMETRICRETRIEVALINPARALLEL

99

For retrieval of squares crossed by two or more edges (Case 3), we first assign a processor to each f such squares to retrieve the rectangles entirely inside the triangle. Each of these squares takes O(log N) time for a total of O((N’“log N)/P). Then we assign one processor to each frectangles crossed by an edge. Each of these rectangles takes O(log N) time for a total of O((Nzf310g N)/P). Each processor determines to which components of the data structure it should be assigned in constant time. Therefore, P processors could perform a retrieval in O((N2’310g N)/P) time. Since [3] implies that the average of the update and retrieval times has a lower bound of 0(N2’3/P), we have come close to the minimum number of processors needed to achieve a particular speed. THEOREM 1. On the square grid, the triangle data structure can be used to retrieve convex polygons, with v vertices, using P processors (1 < P =GN213), with an update time of O((N21310g N)/P) and a retrieval time of O((vN2’310g N)/P).

ProoJ Any convex polygon with v vertices can be expressed as the disjoint union of O(v) triangles. n

3. UNIFORMLY DISTRIBUTED POINTS Our previous discussion required that the points be located at integral coordinates. Many applications do not meet this constraint. In this section we find the expected complexity when the position of each point is randomly selected. We assume that N* points are uniformly and independently distributed over the N X N square region of the plane. We define the data structure such that there are Npj3 rows of Np13 squares. Each square contains 2N*13 rotations, and each rotation contains 4Npj3 rectangles. The time to update a single rotation or to perform a range query on the array of rectangles in a square is still O(log N). The only difference between this and the case where the points form a grid is that now retrieval of the points within rectangles which are overlapped by an edge cannot be done using a range query since the points are not necessarily collinear. The points in a rectangle will be retrieved individually. However, since the area of each rectangle is 0(1/N*), the expected number of points per rectangle is O(l). Therefore, the expected time to retrieve a rectangle crossed by an edge is O(1). Assigning processors as before, the expected retrieval time is O((N*“log N)/P), and update time is O((Np’310g N)/P) for P processors.

100

KATZ

AND

VOLPER

3. I. Probabilistic Analysis of Retrieval We wish to determine a bound on the probability that, when the points are uniformly distributed, the time to perform a retrieval is substantially greater than the expected time. This is important in parallel processing, because the time to perform an operation is the maximum time taken by a processor. We actually develop a stronger result. We bound the probability that, given a collection of points, the retrieval time of any rectangle will significantly increase the time to retrieve the entire data structure beyond the expected time. DEFINITION 2. Define p (=r) as the probability that exactly r points are in a box. Define p (x) as the probability that more than r points are in a box. Define P (x) as the probability that at least one box contains more than I points. LEMMA

3. Given n Bernoulli trials with probability

l/m per trial. Given

r 3 2nlm: Pt=r)>PW). Proof pt=r) = Wm - lYr

m”



p(=r+

l)r)= Cp(=r+i)
n

i=l

THEOREM 4. Given n Bernoulli trials with probability k > 0 and n/2 2 r 2 2/k, then p(=r) < Jz/r!k?

Proof Apply Sterling’s approximation

and simplify.

n! (kn - l)n--I r!(n - r)! (kn)

p(=r)=from Sterling’s approximation,

-- n 1 (kn-l)“-’ d-- n - r r!e’ k”(n - r)n-’ I6 and (kn - 1)” = ka(n - l/k)“, <

because m<

I/kn per trial. If

GEOMETRIC

RETRIEVAL

IN PARALLEL

101

because ((a - @/(a - c))“-’ < .Cb and e” 3 1 for a > b, c > 0, <-.

V2 r!k’

n

3.2. Analysis for the Triangle Data Structure We obtain an upper bound on the probability that any rectangle contains more than cp 1ogxN points by assuming that the probability that a point will be placed in each rectangle equals the maximum for all rectangles. The maximum probability of success on a single Bernoulli trial l/kn = NeP/2. Thus, k= 2. From Theorem 4, we know that r > 1 and k = 2 implies p (=r) < fi/r!kr. From Lemma 3 we know that p (x) < p (=r). The probability that more than r points are in any rectangle in the data structure is at most p (x), times the total number of rectangles (8N4p’3). Thus, for cp log2N 2 4,

Since fi2’ < r! for r k 4, Pti(xp log2N) < 8Np(4’3-2c). Therefore, the probability that there exists a rectangle whose retrieval time is greater than cp log2N decreases with the size of the problem for any c > 5. This decrease is polynomial in the size of the problem with the degree of the polynomial dependent on the choice of c. Thus, the probability is small that the time for any retrieval will exceed the expected time by more than a constant.

4. SUMMARY

We have presented a data structure which permits efficient retrieval of the sum of values in a region of the plane. We have shown that the data structure is amenable to parallel execution. By assignment of a processor to each rotation and to each slice, update and retrieval times are reduced to qlog N), when the points are arranged in an N X N grid. The number of processors required is only O(N213). Further, we have shown that expected retrieval time is @log N) with O(Np13) processors for Np uniformly distributed points. We have shown that the probability that any retrieval will require more than O(cp log N) is very small for large Np. These data structures can be used in such applications as air pollution simulation, electoral redistricting, and traffic planning. In [5] we derive similar results for several other types of retrievals. These

102

KATZ AND VOLPER

include generalized polygons in two dimensions and half-spaces in higher dimensions.

REFERENCES I. Aho, A. V., Hopcroft, J. E., and Ullman, J. D. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, MA 1974. 2. Buzbce, B. L., and Raveche, H. J. Conference Report, Conference on Forefronts of LargeScale Computational Problems. Parallel Computing 1 (1984), 307-3 15. 3. Fredman, M. L. The inherent complexity of dynamic data structures which accommodate range queries. Proc. 1980 IEEE Symposium on Foundations of Computer Science, pp. 19 l199. 4. Hirschberg, D. S., and Volper, D. J. A parallel solution for the minimum spanning tree problem. Proc. Seventeenth Annual Conference on Information Sciences and Systems, 1983, pp. 680-684. 5. Katz, M. D. Geometric retrieval: Data structures and computational complexity. Ph.D. dissertation, Information and Computer Science, University of California, Irvine, Apr. 1985. 6. Rata, M. D., and Volper, D. J. Data structures for retrieval on square grids. SZAMJ. Comput. 15,4 (Nov. 1986), 919-931. 7. Kaufman, E. G., and Denning, P. J. Operating Systems Theory. Prentice-Hall, Englewood Cliffs, NJ, 1973. 8. Kung, H. T., and Leisetson, C. E. Algorithms for VLSI arrays. In Mead, C., and Conway, L. (Eds.). Introduction to VLSI Systems. Addison-Wesley, Reading, MA, 1980, p. 288. 9. Mead, C., and Conway, L. (Eds). Introduction to VLSI Systems. Addison-Wesley, Reading, MA, 1980. 10. Rosenberg, J. B. Geographical data structures compared: A study of data structures supporting region queries. IEEE Trans. Computer-Aided Design. CAD-4, 1 (Jan. 1985), 53-67. 11. Willard, D. E. New data structures for orthogonal range queries. SIAM J. Comput. 14, 1 (1985), 232-253.