Linear space adaptive data structures for planar range reporting

Linear space adaptive data structures for planar range reporting

Accepted Manuscript Linear space adaptive data structures for planar range reporting Ananda Swarup Das, Prosenjit Gupta PII: DOI: Reference: S0020-...

209KB Sizes 0 Downloads 31 Views

Accepted Manuscript Linear space adaptive data structures for planar range reporting

Ananda Swarup Das, Prosenjit Gupta

PII: DOI: Reference:

S0020-0190(16)00002-8 http://dx.doi.org/10.1016/j.ipl.2016.01.001 IPL 5379

To appear in:

Information Processing Letters

Received date: Revised date: Accepted date:

15 August 2014 22 December 2015 5 January 2016

Please cite this article in press as: A.S. Das, P. Gupta, Linear space adaptive data structures for planar range reporting, Inf. Process. Lett. (2016), http://dx.doi.org/10.1016/j.ipl.2016.01.001

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Linear Space Adaptive Data Structures for Planar Range Reporting Ananda Swarup Das, Prosenjit Gupta1, IBM India Research Labs, New Delhi, India, NIIT University, Neemrana, Rajasthan, India

Abstract Let S be a set of n points on an n × n integer grid. The maximal layer of S is a set of points in S that are not dominated by any other point in S . Considering Q as an axes-parallel query rectangle, we design an adaptive space efficient data structure using layers of maxima (iterative maximal layers) for reporting the points in Q ∩ S . Our data structure needs linear space and can be queried in time O(logε n + A logε n + k). Here A is the number of layers of maxima with points in the query rectangle, k is the size of the output and ε is a small arbitrary constant in the range of (0, 1). Also, A ≤ k. In the worst case, the query time of our data structure is O(k logε n + k). Our model of computation is the word RAM with size of each word being Θ(log n). 1. Introduction In this work, we study the problem of orthogonal range searching for planar points which is a frequently encountered query in geographical information systems (GIS). Applications in GIS tools need to preprocess huge volume of data into space efficient data structures that can be queried efficiently to retrieve data. In this work, we present a linear space adaptive data structure for the problem. The notion of adaptive data structures is not new and instances of such data structures can be found in [1, 5]. As stated in [5], intuitively, an adaptive algorithm is the one which enforces its running time to be small for easier instances of the problem while allowing the run time to be large for difficult instances. Thus, an adaptive algorithm focuses on the special instances (say sortedness of the data for example) and performs better for the special cases. For a concrete example, consider the problem of computing the maximal layer for a planar point set. Given a point set, the maximal layer is the subset of the points that are not dominated by any other point in the set. Assuming p = (p x , py ) and q = (q x , qy ) to be two points in R2 with distinct coordinates, the point p dominates q if q x < p x and qy < py . It is known that the maximal layer for a point set can be computed in O(n log n) time. However, if the points

Email address: [email protected], [email protected] (Ananda Swarup Das, Prosenjit Gupta) Preprint submitted to Elsevier

are already sorted in non-increasing order of their xcoordinates, then one can traverse the point set from left to right. The first point (the one with the maximum xcoordinate) is sure to be a maximal point. The next point (the one with the second most maximum x-coordinate) will also be a maximal point if its y-coordinate is greater than the y-coordinate of the last discovered maximal point. Thus, if the data is already sorted, the maximal layer for the data set can be found in O(n) time.The previous and the only known adaptive data structure for the problem of orthogonal planar range searching [1] solves the problem in pointer machine model. Our model of computation is the word RAM with size of each word equal to Θ(log n). To the best of our knowledge, this is the first adaptive data structure for the problem in the word RAM model. The solution that we propose here decomposes the point set into maximal layers (which we discuss shortly) and can be queried in time O((1 + A) logε n + k) where A is the number of layers of maxima with points in the query rectangle, k is the size of the output and ε is a small arbitrary constant in the range of (0, 1). Also, A ≤ k. Thus, if A = logkε n , our algorithm has a query time of O(logε n + k). The proposed data structure needs linear space. In the worst case that is when A = k, our query algorithm has a run time of O(k logε n + k). Thus, our proposed data structure has a worst case performance guarantee which is at par with the performance of the best known linear space data structure of [4] for the problem. In this context, it should be mentioned that the January 6, 2016

problem of beating O(k logε n) is a long standing open problem. Though our result is not an improvement in the worst case sense, it is interesting as it shows that it is possible to push the penalty of O(logε n) to a potentially smaller term which in this case is the number of layers of maxima with at least one output point. It should be noted that a similar idea of decomposing the point set into untangled chains can be found in [1].

Query Time O( logloglogn n + k logε n) O(logε n + k logε n) O(logε n + k logε n) O(logε n + A logε n + k)

Source [6] [7] [4] this work

Table 1: In the last row, the term A denotes the number of maximal layers with at least one point in the query rectangle Q. The results stated in the last row is adaptive in nature. All the results are in wordRAM model.

1.1. Layers of Maxima By Q, we denote the orthogonal query rectangle and n denotes the size of the data set. Throughout this work, ε is an arbitrarily small constant. The constants of proportionality in the big-Oh notation increase as ε decreases. For the above problem, we present the following results. Theorem 1. There exists a linear space data structure for a static data set of n points on an n × n grid such that given an axes-parallel query rectangle Q, we can report in time O(logε n + A logε n + k), the points in S ∩ Q.

Figure 1: The layers of maximas for a planar point set. The segments with the arrow heads are semi-infinite segments.

Layers of maxima as defined in [3] is the iterative lists of maximal points discovered as follows: Find the maximal points for the set S . Denote the set of these maximal points as S 1 . Remove these maximal points from S and then find the maximal points in the remaining set. Denote the new set of maximal points as S 2 . Iterate until the set S is empty. The iteration index i at which a point p becomes a maximum point is defined to be its layer. See Figure 1. It should be noted that for each layer, the points can be arranged in increasing order of their x-coordinates and can be joined by using alternate horizontal and vertical segments such that the layers are disjoint as shown in Figure 1. We will use the iteration indices to distinctly denote the layers.

Though the above result is not an improvement in the worst case sense, it is interesting as it shows that it is possible to push the penalty of O(logε n) to a potentially smaller term which in this case is the number of layers of maxima with at least one output point. The following table summarizes the efficiency of different linear space data structures known for the problem in the word RAM model.

2. Range Reporting Using Layers of Maxima A brief sketch of our solution is as follows. The points of the set S are stored in layers of maxima. We first find a point p which is certain to be inside the query rectangle Q, unless the result is an empty set. Suppose that p belongs to layer S i . We then traverse the layer of maxima S i to which the point p belongs and report points from S i that also fall inside Q. Using the layer S i as a reference, we then check if layers S j for j > i and layers S j for j < i have points inside Q. This is done by considering the layers S i+1 , S i+2 , · · · in that order for the former and the layers S i−1 , S i−2 , · · · , in that order for the latter. If we come across a layer S j such that S j intersects Q but does not have any points inside Q, then we have two possibilities. Either there are no further points inside Q, or there are layers numbered higher (or lower) than j which contain points inside Q. In the latter case, we again find a point which is certain to be in

1.2. Our Contributions We assume our model of computation to be the unitcost RAM with word size of log n bits. The primary result of this work is a linear space solution to the fundamental problem of reporting points in a query rectangle. The problem is defined as follows. Problem 1. Given a static data set S of n points on an n × n integer grid, preprocess S into a linear space data structure such that given an axes-parallel query rectangle Q, the points in S ∩ Q can be reported efficiently. We denote by A the number of layers of maxima intersecting the query rectangle while A is used to denote the number of layers of maxima with at least one point in the query rectangle. k denotes the output size. A ≤ k. 2

Q and identify the layer to which this point belongs. We keep on repeating the above steps until we find a layer which does not intersect Q. As the above summary indicates, we have to at times find a point that is inside Q, unless there are no further points to report. This is done by using a solution to the range successor problem. In our solution, the bulk of the query cost can be attributed to the query time of a range successor. In the following, we first state some important properties of the layers of maxima in Subsection 2.1, define the range successor problem and its solution in Subsection 2.2, and then describe our solution to the range reporting problem.

(a)

(b)

(c)

(d)

Figure 2: Different cases showing a maximal layer entering the query rectangle Q from the upper horizontal or left vertical boundary and exiting from the lower horizontal or right vertical boundary.

2.1. Few Properties of the Layers of Maxima Arrange the points of the set S as layers of maxima in O(n log n) time and O(n) space [2]. From the topmost and the lowermost point of each layer of maxima we respectively draw two semi-infinite lines, one directed leftwards to −∞ starting from the topmost point and the other directed downwards to −∞ starting from the lowermost point. It should be noted that given a query rectangle Q, any point p ∈ Q ∩ S belongs to exactly one layer of maxima. Other properties, illustrated in Figures 2 and 3 are:

S

S

i−1

i

Si+1

Property 1. A layer of maxima can enter the query rectangle Q by intersecting the upper horizontal or the left vertical boundary of Q. Similarly, it can exit by intersecting the lower horizontal or the right vertical boundary of Q.

Figure 3: The point with the maximum x-coordinate inside the rectangle belongs to the layer S i . The layers S i−1 and S i+1 are intersecting the rectangle but have no points in the rectangle.

we can report the point with the largest x-coordinate in S ∩ Q in O(logε n) time.

Property 2. There may be some layers S k with no points in Q and yet intersecting Q.

The range successor query plays a pivotal role in our solution as we use it to find a point in the query rectangle whenever we encounter a layer of maxima intersecting the query rectangle but has no output point in the rectangle. See Figure 3.

2.2. The Range Successor Problem Problem 2. Given a set S of n points in R2 , preprocess it into a data structure such that given a a three sided query rectangle Q = (−∞, a] × [c, d], we can report the point with the largest x coordinate in S ∩ Q.

We start our discussion by stating the steps for preprocessing and then move to the query algorithm.

In the literature, this problem is known as the range successor query. It is known from Theorem 1 of [7] that the points of the S can be preprocessed into a data structure of size O(n) to answer range successor queries in time O(logε n). Therefore, we summarize the result of the range successor problem using the following theorem.

2.3. Preprocessing Layers of maxima: 1. Compute the points of S in layers of maxima. 2. For every point p in the layer S i , find the point p1 and the point p2 in the layer S i+1 (that is the layer to the left of i) such that p1 is the first point just above p in S i+1 . Similarly, p2 is the first point just below p in S i+1 . See Figure 4.

Theorem 2. ([7]) A set S of n points in R2 can be preprocessed into a linear space data structure such that given a three sided query rectangle Q = (−∞, a]×[c, d], 3

p(3) p(1) p

p

p(4)

p(2)

Si+1

S i−1

Si

S i+1

Si

Figure 4: Four pointers for the point p. Figure 6: The point with the maximum x-coordinate inside the rectangle belongs to the layer S i . Both the points in the layer S i+1 as pointed by the navigational pointers of the point p are outside the query rectangle Q and yet the layer S i+1 has a point in Q.

3. Similarly, for each point p find the two points p3 and p4 in the layer S i−1 (that is the layer to the right of i). Call p1 , p2 , p3 , p4 as navigational pointers. 4. Store the navigational pointers along with the point p.

x-coordinate in (−∞, b] × [c, d]. Let the point p thus reported belong to the maximal layer S i . In this paper, we explain our algorithm by assuming that the layer S i enters Q by intersecting the upper horizontal boundary and exits by intersecting the lower horizontal boundary. The other cases can be handled analogously. Traverse the layer S i starting from the point p and report all the points of S i that are in Q. After traversing the layer S i , we need to traverse the layers that are to the left and right of S i . Since these two traversals are symmetrical algorithmically, we elaborate on how to traverse layers S j with j > i. By now, it must be clear that our algorithm mainly (a) traverse suitable maximal layers for reporting points and (b) follow navigational pointers to jump across consecutive maximal layers. With reference to our traversal of the layers S j : j > i, a maximal layer crossing the query rectangle can be one of the following three types:

It should be noted that each layer of maxima is stored as an array and the points are stored in decreasing order of their y-coordinates. The iteration index m of a layer is used to distinctly denote the corresponding array Am storing its points. The Range Successor Data Structure: Build a data structure D which is an instance of the range successor data structure of [7]. 2.4. Query Algorithm

S

i+1

(i) A Good Layer: A layer S j is a good layer if at least one of the points pointed by the navigational pointers from the layer S j−1 is in the query rectangle Q.

Si

(ii) A Bad Layer: A layer S j which crosses Q but has no points in Q. See Figure 5. It should be noted that the bad layers have the property that these layers will enter Q by intersecting the upper horizontal boundary and will exit by intersecting the lower horizontal boundary, provided S i , the layer with maximum x coordinate, enters and exits Q by respectively intersecting the upper and the lower horizontal boundaries and we are traveling towards left starting from S i .

Figure 5: The point with the maximum x-coordinate inside the rectangle belongs to the layer S i . The layers S i+1 is intersecting the rectangle but have no points in the rectangle.

Any layer of maxima can enter a query rectangle by intersecting the upper horizontal boundary or the left vertical boundary. Similarly, it can exit a query rectangle by intersecting the lower horizontal boundary or the right vertical boundary, Given a query rectangle [a, b] × [c, d] we can find the point with the maximum

(iii) A Tricky Layer: A layer S j for which the points of the layer as pointed by the navigational pointers in the 4

previous layer S j−1 are outside Q and yet the layer has at least one point in Q. See Figure 6. It should be noted that the tricky layers have the property that these layers will enter Q by intersecting the left vertical boundary and will exit by intersecting the lower horizontal boundary, provided S i , the layer with maximum x coordinate, enters and exits Q by respectively intersecting the upper and the lower horizontal boundaries and we are traveling towards left starting from S i .

or p2 , whichever is in Q. 3. If any of the points on S i+1 as pointed by any navigational pointer from S i is in Q, then S i+1 is a good layer. Therefore, we traverse the layer and report the points that are in Q. 4. If S i+1 or any layer to the left of S i is a bad layer and p = (px , py ) is the point pointed by the upper navigational pointer, we find the point with maximum x coordinate in (−∞, px ] × [c, d]. 5. Similarly, if a layer is a tricky layer and p = (px , py ) is the point with maximum x coordinate in the layer from where the current navigational pointers were followed, we find the top most point in the rectangle [a, px ] × [∞, py ). 6. We continue our traversal to the left till we encounter a layer that is not intersecting Q.

Thus, while traversing from left to right starting from S i , for any layer which is not a good layer, we consider the vertical segment starting from the point pointed by the upper navigational pointer to check if the vertical segment intersects the upper horizontal boundary. This validation confirms if the layer is indeed a bad layer considering the fact that a tricky layer will enter the rectangle by intersecting the left vertical boundary. It must be noted that a tricky layer will have a point below p = (px , py ), the point with maximum x coordinate in the layer from where the current navigational pointers were followed. Additionally, once we encounter a tricky layer, no subsequent layer intersecting Q can be a bad layer, provided S i , the layer with maximum x coordinate, enters and exits Q by respectively intersecting the upper and the lower horizontal boundaries and we are traveling towards left starting from S i . Thus, on certainty that the layer currently encountered is a tricky one, we find the top most point in the rectangle [a, px ] × [∞, py ). See Figure 6.

2.5. Handling Tricky and Bad Layers to the right of S i , the layer with the point having maximum xcoordinate Any tricky layer to the right of S i can occur in similar ways demonstrated in Figure 7.

Sj-1 S

j

However, if the layer is a bad layer, then we know that the layer will enter Q by intersecting the upper horizontal boundary and will exit by intersecting the lower horizontal boundary as we are we are traveling towards left starting from S i , the layer with maximum x coordinate. Let p = (px , py ) be the point pointed by the upper navigational pointer that we followed. Since we are traveling right to left, any layer in Q which is to the right of the bad layer has been successfully traversed. See Figure 5. Therefore, we find the point with maximum x coordinate in (−∞, px ]×[c, d] if the current layer is a bad one. Next, we summarize the steps of our algorithm

Figure 7: The points pointed by the navigational pointers from the layer S j to S j−1 are outside Q and yet, there are points from the layer S j−1 that are in Q.

Any layer to the right of S i with points in Q will exit the rectangle by intersecting the right vertical boundary. Thus, for any layer S j−1 which is a tricky layer, all the points that are in Q = [a, b] × [c, d] must be above the top most point in the current layer S j . Thus, we select the top most point p = (px , py ) from the layer S j . Also, we find the point of intersection (x , y ) for the layer S j with the upper horizontal boundary of Q. We then run a range successor query to find the point with minimal x coordinate in the rectangle [x , b] × [py , ∞). For a bad layer, the upper navigational pointer will be pointing to a point above the upper horizontal boundary of Q while the the lower navigational pointer will

1. Find the point p with maximum x-coordinate in Q. Let p be a point on the layer S i . 2. Check the points p1 , p2 of the layer S i+1 pointed by the navigational pointers from the points on S i . By the step 2 of preprocessing, any point on S i has pointers to two points p1 , p2 of the layer S i+1 . We start our traversal of the layer S i+1 either from p1 5

be pointing to a point to the right of the right-vertical boundary of Q. In this scenario also, we can use range successor query to find the minimal point and subsequently the layer to traverse. We are now ready to show Theorem 1 through the following lemma.

be present in the layers that are either to the left or to the right of S i . We traverse these layers until we encounter a bad layer or a tricky layer when we repeat the range maxima query to find a maximal/minimal point depending on the direction (left or right) in which we are traversing. If the point in inside the Q, we traverse further. The correctness of the range successor data structure again ensures that we discover the maximal layer that should be traversed next. 2

Lemma 1. The total storage space needed by the data structures in total is O(n). Proof Each point of S can belong to at most one layer of maxima. Thus, the total storage needed to maintain the layers of maxima is O(n). Next, each layer is maintained in a linear array of size equal to the number of points present in that layer. Thus, the total storage space needed by all the arrays is O(n). By Theorem 2, the total storage space needed by the range successor data structure D is O(n). Also, each point in the data set has at most four pointers. Hence, the total storage space needed is O(n). 2

Combining the Lemma 1 and Lemma 2, we conclude Theorem 1. References [1] D. Arroyuelo, F. Claude, R. Dorrigiv, S. Durocher, M. He, A. L´opez-Ortiz, J. I. Munro, P. K. Nicholson, A. Salinger, and M. Skala. Untangled monotonic chains and adaptive range search. Theor. Comput. Sci., 412(32):4200–4211, 2011. [2] H. Blunck and J. Vahrenhold. In-place algorithms for computing (layers of) maxima. Algorithmica, 57(1):1–21, 2010. [3] A. L. Buchsbaum and M. T. Goodrich. Three-dimensional layers of maxima. In ESA, volume 2461 of Lecture Notes in Computer Science, pages 257–269. Springer, 2002. [4] T. M. Chan, K. G. Larsen, and M. Patrascu. Orthogonal range searching on the RAM, revisited. In Symposium on Computational Geometry, pages 1–10, 2011. [5] E. D. Demaine, A. L´opez-Ortiz, and J. I. Munro. Adaptive set intersections, unions, and differences. In SODA, pages 743–752, 2000. [6] Y. Nekrich. A linear space data structure for orthogonal range reporting and emptiness queries. Int. J. Comput. Geometry Appl., 19(1):1–15, 2009. [7] Y. Nekrich and G. Navarro. Sorted range reporting. In SWAT, pages 271–282, 2012.

Lemma 2. The time needed by the query algorithm is O((1+|A|) logε n+k) where |A| is the number of maximal chains having points in Q and k is the number of points to be reported. As |A| ≤ k, in the worst case (that is when |A| = k), the time needed by our query algorithm is O(logε n + k logε n + k). Proof The most expensive steps in our query algorithm are the ones where we need to run range successor query. This is needed when we encounter a bad layer. For each application of range successor, we either get a point which is inside Q or we can infer that there are no more points to be reported for Q. Thus, the maximum number of times we need to repeat range successor query is bounded by the number of maximal layers with points in Q. Hopping from one layer to the next layer takes O(1) time as we have maintained pointers. Hence, the total query time of our algorithm is O(logε n + |A| logε n + k). It should be noted that |A| ≤ k. Also, note that reporting the ki output points of a layer can be done in O(ki ) time since these points occur con 2 secutively in the layer’s array and k = i ki . Lemma 3. Our query algorithm reports all the points that are in the query rectangle Q. Proof Any point in Q is also a point of some maximal layer intersecting Q. We first find the point with the maximum x-coordinate in Q. The correctness of the range successor data structure of [7] ensures that the point will be found. We traverse the layer S i holding the point with maximum x-coordinate, thereby reporting the points of S i in Q. All other points in Q will 6