ARTICLE IN PRESS Computers & Geosciences 35 (2009) 1843–1853
Contents lists available at ScienceDirect
Computers & Geosciences journal homepage: www.elsevier.com/locate/cageo
A new hierarchical triangle-based point-in-polygon data structure Juan J. Jime´nez , Francisco R. Feito, Rafael J. Segura ´tica, Universidad de Jae´n, Campus Las Lagunillas s/n, E-23071 Jae´n, Spain Departamento de Informa
a r t i c l e in fo
abstract
Article history: Received 21 April 2008 Received in revised form 6 August 2008 Accepted 20 September 2008
The point-in-polygon or containment test is fundamental in computational geometry and is applied intensively in geographical information systems. When this test is repeated several times with the same polygon a data structure is necessary in order to reduce the linear time needed to obtain an inclusion result. In the literature different approaches, like grids or quadtrees, have been proposed for reducing the complexity of these algorithms. We propose a new data structure based on hierarchical subdivisions by means of tri-cones, which reduces the time necessary for this inclusion test. This data structure, named tri-tree, decomposes the whole space by means of tri-cones and classifies the edges of the polygon. For the inclusion test only the edges classified in one tri-cone are considered. The tri-tree has a set of properties which makes it specially suited to this aim, with a similar complexity to other data structures, but well suited to polygons which represent regions. We include a theoretical and practical study in which the new data structure is compared with classical ones, showing a significant improvement. & 2009 Elsevier Ltd. All rights reserved.
Keywords: Inclusion test Geometric algorithms Computer graphics Geographical information systems Hierarchical data structure Spatial decomposition
1. Introduction The containment test is widely used in areas like computational geometry, computer graphics and geographical information systems (GIS). In some situations we must consider the inclusion not only of a point in a polygon, but also of many points and polygons, and a common task is to assign points to polygons. The point-in-polygon operation is popular in GIS analysis and makes sense from both the discrete object and the field perspective (Longley et al., 2001). The point-in-polygon test could be used in a very large set of applications, such as the superposition of polygons, the classification of entities or queries about properties of regions. Imagine an area in which it is necessary to determine spatial coincidence between hydrologic unit codes and regions. A point-in-polygon routine is needed for assigning regions to hydrologic units. These hydrologic units are useful for the prediction of zones of erosion and deposition in a catchment, and can be expanded when torrential rains occur. This special situation is considered in the ‘‘Andalucı´a’’ region of Spain. The inclusion test consists of obtaining a boolean result regarding the containment of a point in the interior or on the boundary of a polygon represented by a set of ordered vertices. The definition of polygon changes for different applications, and special algorithms have been developed for different types of
polygons (like convex or non-convex) and for different definitions of polygons (like simple or non-simple). We consider a polygon as an ordered set of vertices (in counter-clockwise order) which form edges making a closed loop without edge intersection (simple polygon). We allow convex or non-convex polygons, manifold or non-manifold, and polygons with or without holes (Fig. 1). We employ this type of polygons which represent spatial areas in a GIS perspective. When an inclusion test is needed repetitively for the same polygon and there is enough memory in the system for extra information, some data structures are introduced as pre-processing in order to reduce the time complexity of the algorithms. In these cases, some features of the polygon are classified or arranged and the inclusion time is reduced by means of a determined search in these structures. In this paper we present a new data structure specially suited to the point-in-polygon test. This triangle-based structure subdivides recursively the space occupied by the polygon into tricones1 and classifies the edges of the polygon in pre-processing time. Two bounding triangles delimit the area occupied by the classified edges. In query time the points are classified in this data structure, and a point-in-polygon test is performed only with the edges classified in the corresponding tri-cone (Fig. 2). For the containment test several algorithms could be used, but the raycrossing method (Haines, 1994) and the triangle-based algorithm (Feito et al., 1995) are well suited.
Corresponding author. Tel. +34 953 212 884; fax. +34 953 212 472.
E-mail addresses:
[email protected] (J.J. Jime´nez),
[email protected] (F.R. Feito),
[email protected] (R.J. Segura). 0098-3004/$ - see front matter & 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.cageo.2008.09.013
1 The term tri-cone comes from the 3D term tetra-cone (cone generated by a tetrahedron). In 2D it is a cone generated by a triangle.
ARTICLE IN PRESS 1844
´nez et al. / Computers & Geosciences 35 (2009) 1843–1853 J.J. Jime
Fig. 1. Types of simple polygons allowed: (a) convex; (b) non-convex; (c) non-manifold (two coincident vertices); (d) with holes. Types of polygons not allowed: (e) nonsimple.
P
Fig. 2. Inclusion is obtained using only some geometry classified in a tri-cone.
The new data structure is suitable for polygons which represent regions because it fits properly to the contour, which means a quick inclusion test. The expected complexity is Oðlog nÞ for query, being n the number of vertices of the polygon, with OðnÞ space and Oðn log nÞ construction time. This new data structure is constructed in less time than other data structures like quadtrees, obtaining an improvement in the inclusion test and being suitable for some inclusion methods like the ray-crossings or the trianglebased ones. In the next section we will revise data structures traditionally used for the point-in-polygon test, as well as their relevant properties. A review of point-in-polygon methods is also shown. Then we will propose a new data structure and its fundamental features. We will follow with the point-in-polygon algorithms adapted to the use of this data structure. Finally we will carry out a time study in which we measure the times obtained for diverse inclusion methods and data structures.
Fig. 3. Representation of crossings-count (a) and triangle-based (b) methods.
insufficient time because the polygons are transmitted suddenly or interactively deformed). 2.1. Point-in-polygon methods For non-convex polygons: All these methods have linear complexity for a query. These methods do not perform any preprocessing, but in some situations it is possible to store some type of information in order to reduce the constant associated with the time complexity:
Ray-crossings (crossings-count): At the moment, the ray-cross-
2. Background
The main data structures used for the point-in-polygon test are grids, quadtrees, k–d trees, bsp-trees, layer of edges, slices or trapezoids (Huang and Shih, 1997; Samet, 2006). Some of these data structures are suitable for some containment methods (i.e. the ray-crossing method could be used with grids, quadtrees, k–d trees, layer of edges or trapezoids). Let us examine some of these data structures. First we review the principal point-in-polygon methods. Point-in-polygon methods have been classified as methods without pre-processing (with OðnÞ query time) and methods with pre-processing (performed in order to reduce the query time). Performing pre-processing depends on some features like the number of repetitions for the test using the same polygon, the memory resources for the data structures, and finally if there is enough time for the pre-processing (in some situations there is
P
ings algorithm (Haines, 1994) is considered to be the fastest point-in-polygon inclusion algorithm without pre-processing. Essentially it throws an infinite ray from the test point (normally parallel to the x-axis) and calculates the number of intersections with the edges of the polygon. If this number is odd, the algorithm returns that the point is inside the polygon, and if even, that the point is outside (Fig. 3a). Triangle-based: This algorithm (Feito et al., 1995) is based on the calculation of the point inclusion in a set of triangles formed by an arbitrary and common point (named original vertex) and each one of the edges of the polygon (Fig. 3b). Depending on the sum of the sign of the triangles in which the test point is included, the inclusion in the polygon is obtained. Sum of angles (winding number) (Hormann and Agathos, 2001; Preparata and Shamos, 1985): This technique is less efficient than previous algorithms due to the trigonometrical calculations. It is based on obtaining the sum of angles formed by the test point and each one of the vertices of the edges of the polygon. If the point is inside the polygon this sum is 360 . If the point is outside the polygon it is 0 . Pixel based method (Haines, 1994): This method is based on a discrete representation of the polygon. When the polygon and
ARTICLE IN PRESS ´nez et al. / Computers & Geosciences 35 (2009) 1843–1853 J.J. Jime
Fig. 4. Wedges for a convex polygon.
the point are rasterized, the colour obtained for the point determines its inclusion state. This method is specially suited to raster representation of the objects. Coded coordinate method (Chen and Townsend, 1987): The vertices of the polygon are processed against the test point. A two-bit code is assigned for each quadrant and for the coordinate axes. This code is assigned to the test point depending its position. The different possibilities for the position of two consecutive vertices of the polygon are identified. Only edges of the second and third quadrant are used for an horizontal ray from the test point to negative infinity by using the ray-crossings method.
For convex polygons: In general the time for a query is reduced with regard to the methods developed for non-convex polygons, or has a best constant for the time complexity.
Sign of offset (half plane) (Taylor, 1994): It calculates or stores
(in OðnÞ time) a set of half-plane equations for the edges of the polygon. The side in which the test point is classified with regard to the edges of the polygon determines the inclusion of the point in the polygon. Only if the point is in the same side as the centroid of the polygon with regard to each edge of the polygon then the point is inside the polygon. This method has OðnÞ query time. Wedge method (Preparata and Shamos, 1985): The polygon is divided into wedges (Fig. 4) using an arbitrary point inside the polygon in OðnÞ time and OðnÞ space. Then the wedge in which the test point is included is determined by means of a binary search. For this wedge, the side of the edge in which the point lies is determined, and so the inclusion in the polygon. This method has Oðlog nÞ complexity for a query. We can note that it is not necessary to pre-process the polygon to construct the wedges. It could be constructed in query time during the binary search when needed.
1845
status of this cell. If a point is classified in a cell with edges (marked as grey) a point-in-polygon test is performed only with pffiffiffi edges classified in this cell in Oð nÞ time. Quadtrees and k–d trees (Samet, 2006): These data structures subdivide the space into axis-aligned cells recursively in Oðn log nÞ time, adapting to the input data. A tree is constructed until a determined depth or until a determined number of features is classified in a cell. These cells are marked as full, empty or grey and a point-in-polygon test is performed when the point is inside a grey cell in Oðlog nÞ time. Bsp-trees (Samet, 2006): Subdivides the space using arbitrary oriented lines for each subdivision in Oðn2 Þ time and Oðn2 Þ space and expected Oðn log nÞ and Oðn log nÞ, respectively (generalizes a k–d tree which has axis-aligned orientations). These lines subdivide the space into two regions (inside and outside) and fit with the edges of the polygon. When a point is filtered in the bsptree an inclusion result is obtained directly in OðnÞ time and expected Oðlog nÞ time. Swath method (Zˇalik et al., 2003): The polygon is divided into swaths (trapezoids) in Oðn log nÞ time and Oðn2 Þ space. Then a binary search is performed in Oðlog nÞ time to localize the swath in which the point is located, afterwards the ray-crossing algorithm is applied to the polygon’s edges located in the corresponding swath in OðnÞ time. Convex decompositions (Li et al., 2007): It is possible to decompose a polygon into a set of convex polygons and perform an inclusion query by means of finding the convex polygon in which the point is located and applying a point-in-non-convex test with this convex polygon. The pre-processing complexity is Oðn log nÞ with OðnÞ storage, and the query time is Oðlog nÞ. Bounding areas (Ericson, 2005): This technique is derived from the bounding volumes in the 3D space. It consists of generating a 2D bounding hierarchy based on axis-aligned bounding boxes (AABB), oriented bounding boxes (OBB), circumferences and other types of bounding areas. The inclusion test consists of traversing the hierarchy of bounding areas until reaching a leaf node and obtaining the inclusion with the edges classified in this bounding area. The construction time is Oðn log nÞ with Oðn log nÞ storage. The query time is Oðlog nÞ. Layers of edges (Martinez et al., 2006; Wang et al., 2005): The edges of the polygon are classified into layers in pre-processing according to the occlusion relations between the edges viewed orthogonally in a given direction. There is no occlusion among edges in the same layer. A simple inclusion test can be performed with the layers, searching the edge of a layer using a binary search. These algorithms pre-process the edges in OðnÞ to Oðn2 Þ time depending on the polygon shape and test direction, and in OðnÞ space. The inclusion test has complexity between Oðlog nÞ and OðnÞ, depending on the construction of the layers.
2.2. Strategies for reducing complexity
3. The new data structure
The following strategies can be applied to some point-inpolygon methods in order to obtain a query complexity lower than linear. For obtaining this query complexity it is necessary to preprocess the polygon, obtaining a pre-processing time complexity and a space complexity. These strategies are well suited to the case of several testing points in the same polygon. Grids (Zˇalik and Kolingerova, 1997): The plane is subdivided regularly into axis-aligned cells with the same size and then the pffiffiffi edges of the polygon are classified into these cells in Oðn nÞ time with OðnÞ space. If a cell has no edge in its interior, this cell is marked as empty if a representative point in it is outside the polygon, and marked as full if this point is inside the polygon. If a point is classified in a full or empty cell, the result obtained is the
The previous strategies attempt to reduce the number of edges to examine in the point-in-polygon test. These strategies benefit from reduced query time complexity due to additional storage cost and pre-processing time. We propose a new data structure specially suited to the inclusion test for GIS applications. In the new data structure we will try to use a spatial subdivision similar to the wedge method (Preparata and Shamos, 1985), but applied to non-convex polygons. The proposed method decomposes the space into equal sized regions with origin at the centroid of the polygon (Fig. 5). Each region is subdivided recursively into two sub-regions forming a tree of regions. For each region the edges of the polygon are classified until an established criterion is satisfied. These regions
ARTICLE IN PRESS ´nez et al. / Computers & Geosciences 35 (2009) 1843–1853 J.J. Jime
1846
are defined in base to triangles and are denominated tri-cones. The inclusion test locates the tri-cone in which the point is located and performs a point-in-polygon test with the edges classified in the tri-cone. Previously a point-in-triangle test is performed with a pair of bounding triangles which delimits the geometry classified in a tri-cone. Some criteria can be used to subdivide a tri-cone into two subtri-cones:
Whether the number of edges classified in a tri-cone is higher
than a certain threshold. The threshold could be expressed as a percentage of the number of edges of the polygon. Whether the level of the tri-cone in the corresponding tree is lower than a certain threshold. This threshold could be expressed as a function of the number of edges of the polygon.
In the following subsections we define formally the concepts associated with the new data structure. 3.1. Tri-tree definition First, the tri-cone concept will be used as the basic unit of the decomposition. We use several tri-cones for decomposing the space and forming a tree of tri-cones, so each tri-cone is recursively subdivided into new tri-cones. Definition 1. Let T ¼ V 0 V 1 V 2 be a triangle with positive orientation. We define tri-cone associated to the triangle T (Fig. 6), noted as ffT, as the region delimited by the supporting lines of V 0 V 1 and V 0 V 2 , so that T is inside this region. The triangle T will be named the generating triangle of the tri-cone ffT. We will use one generating triangle for each tri-cone. This triangle is given a fixed counter-clockwise orientation, that is, with positive orientation. The first vertex of the triangle definition is the origin of the tri-cone. The origin of the tri-cone and each one of the remaining vertices of the triangle defines two rays. The region delimited is inside the two rays (in a counter-clockwise sweep).
Definition 2. A tree of tri-cones or tri-tree is a hierarchical data structure whose root node has four equal sized tri-cones as children, all with a common origin and covering the whole space without overlappings, except for the border. Each tri-cone is divided recursively into two equal sized tri-cones (Fig. 6) until reaching a predefined threshold. We use the centroid of the polygon as the common origin to form the tri-tree associated to a polygon. The centroid does not have to be inside the polygon. Initially we have chosen a tri-cone which represents a quadrant for each of the four first level tricones (Fig. 5). When the first level tri-cones have been defined, the edges of the polygon are classified in them. If a previously defined criteria are met (thresholds), then the tri-cone is subdivided into two equal sized sub-tri-cones and the edges of the parent tri-cone are classified in their children. In order to construct the tri-tree, we can use generating triangles for the tri-cones that simplify the calculations for the inclusion test, for example using the sign of the barycentric coordinates. We can use the generating triangle ð0; 0Þ; ð1; 0Þ; ð0; 1Þ for the definition of a first level tri-cone (Fig. 7), and translate the polygon so that the centroid is placed on point ð0; 0Þ; and define symmetric generating triangles for the three remaining tri-cones, which simplifies the calculations. 3.2. Edges classification in a tri-cone In order to classify the edges of the polygon in each one of the tri-cones we must consider several configurations between the edge and the tri-cone (Fig. 8). This allows us to classify the edges by means of inclusions of vertices in tri-cones instead of intersections between edges: Property 1. Let F be a polygon with associated tri-tree TT F , and Ei ¼ Pi Piþ1 be an edge of F. Let T ¼ V 0 V 1 V 2 be a triangle so that ffT 2 TT F . If Pi or P iþ1 are inside the tri-cone ffT, then the edge Ei is classified in the tri-cone ffT (Fig. 8b and c). Property 1 classifies edges using the ends of these edges. This property considers edges with a point inside the tri-cone. Property 2. Let F be a polygon with associated tri-tree TT F , and Ei ¼ Pi Piþ1 be an edge of F. Let T ¼ V 0 V 1 V 2 be a triangle so that ffT 2 TT F . If V 1 or V 2 are inside the tri-cone ffV 0 P i Piþ1 , then the edge Ei is classified in the tri-cone ffT (Fig. 8d).
Fig. 5. Decomposition in equal sized regions. Left: first level decomposition. Right: decomposition at different levels.
V2
V2
The latter property uses Property 1 for a different tri-cone and edge. The new tri-cone is formed by the origin of the old tri-cone and the ends of the old edge (ffV 0 Pi Piþ1 ), and the new edge by the non-original vertices of the old generating triangle (V 1 V 2 ). This property considers the situation of an edge intersecting with the tri-cone, but without any end inside the tri-cone.
(0,1)
(V1 +V2)/2 V1
V1
V0
(0,0)
(1,0)
V0
Fig. 6. Left: a tri-cone defined by a triangle. Right: subdivision into two sub-tricones.
Fig. 7. Generating triangle for improving calculations for construction and classification.
ARTICLE IN PRESS ´nez et al. / Computers & Geosciences 35 (2009) 1843–1853 J.J. Jime
V2 P1
V2
V2
P2
P2
P1
V1
V1
V0
P2
P1 V1
V0
V0
V2
V2 P1
V P21
P1
V1 P2
V0
V0
Fig. 8. Different situations for classification of edge P 1 P 2 in a tri-cone. (a) Edge not classified. (b) and (c) Edge classified, P 1 or P2 are inside ffV 0 V 1 V 2 . (d) Edge classified, V 1 or V 2 are inside ffV 0 P 1 P 2 .
V2
V1
1847
(1) V 001 is on the ray from V 0 to V 1 . (2) V 002 is on the ray from V 0 to V 2 . (3) All vertices P i of the edges classified in ffT are on the opposite side as V 0 with regard to the supporting line of V 001 V 002 . By using these bounding triangles a rejection test could be performed for an early exit if a point is outside the bounding triangle or the point is inside the inner bounding triangle of the tri-cone in which this point has been classified (Fig. 10). In a given tri-cone there is more than one possible bounding or inner bounding triangle. We can use bounding triangles which fit as much as possible to the edges classified in a tri-cone, but the calculation of these is expensive (Fig. 10a). A brute force algorithm consists of generating a set of triangles by means of a pair of vertices of the edges classified in the tri-cone. Each new bounding triangle is formed by the tri-cone and limited by a line that passes through a pair of vertices. For each generated triangle, it is tested for being a bounding triangle. The triangle finally selected could be the triangle with fewer area (or with greater area for the inner bounding triangle). In order to speed up the construction of the tri-tree we can test for bounding triangle the triangle generated by the two more distant vertices of the edges classified in the tri-cone (or the less distant vertices for the inner bounding triangle). If it is not a bounding triangle we can use an isosceles bounding triangle obtained by the farthest or nearest vertex from the centroid to the vertices of the edges classified in the tri-cone (Fig. 10b). For most situations these bounding triangles obtain satisfactory results.
V0 Fig. 9. Edges classified in a tri-cone.
Property 3. An edge of a polygon is classified in a determined tricone if and only if this edge satisfies any of Properties 1 or 2. According to this set of properties, an edge is classified in a tricone if the full edge or any part of it is inside the tri-cone. This classification definition allows us to work only with the edges of the polygon which intersect with the tri-cone (Fig. 9). These conditions are checked for each edge/tri-cone pair for each level of the tri-tree. An edge can be classified in more than one tri-cone. In a specific level only edges classified in the parent tri-cone are checked against current tri-cones, until a threshold is reached (the number of edges classified is lower than a value, or a predetermined level in the tree is reached). 3.3. Bounding triangles We also define a pair of bounding triangles associated to a tricone:
4. Algorithms adaptation to the new structure We have mentioned four different algorithms for the point-inpolygon test with non-convex polygons: the ray-crossings, the triangle-based, the winding-number and the pixel-based method. The last method is suitable for raster representations. The raycrossings and winding-number concepts are very closely related in a mathematical perspective (Hormann and Agathos, 2001), the winding-number being slower than the ray-crossings method. Therefore we will study the adaptation of the ray-crossings and the triangle-based methods in order to use this new data structure for the inclusion test. For both methods we have to demonstrate that using only the edges classified in a tri-cone we can determine the inclusion status of a point in a polygon. The main idea behind the use of this new data structure for the point-in-polygon test consists of descending in the tree by means of the classification of the test point in the tri-cones of each level until we obtain the leaf tri-cone in which the point is included. Then if the point is inside the bounding triangle but outside the inner bounding triangle of the tri-cone, a classic point-in-polygon test is performed only with the edges classified in this tri-cone. In
Definition 3. Let F be a polygon with associated tri-tree TT F . Let T ¼ V 0 V 1 V 2 be a triangle so that ffT 2 TT F . We define the bounding triangle of the geometry of F contained in ffT as the triangle T b ¼ V 0 V 01 V 02 where: (1) V 01 is on the ray from V 0 to V 1 . (2) V 02 is on the ray from V 0 to V 2 . (3) All vertices Pi of the edges classified in ffT are on the same side as V 0 with regard to the supporting line of V 01 V 02 . Definition 4. Let F be a polygon with associated tri-tree TT F . Let T ¼ V 0 V 1 V 2 be a triangle so that ffT 2 TT F . We define the inner bounding triangle of the geometry of F contained in ffT as the triangle T ib ¼ V 0 V 001 V 002 where:
V’2 V2
V’2 V2
Bounding Triangle
V’’ 2
V’1
V’’ 2
V1
V’1 V1
V’’ 1 Inner Bounding Triangle V0
Bounding Triangle
V’’ 1 Inner Bounding Triangle V0
Fig. 10. Bounding triangles in a tetra-cone. (a) Ideal but expensive bounding triangles. (b) Bounding triangles used.
ARTICLE IN PRESS ´nez et al. / Computers & Geosciences 35 (2009) 1843–1853 J.J. Jime
1848
the case of an inclusion of the test point in the inner bounding volume, the pre-calculated inclusion state of the origin of the tritree is returned. When a polygon has a hole in its interior, we must introduce false edges in order to obtain a correct result (Fig. 11). These false edges do not affect to the parity in the crossings-count algorithm nor to the overall result on the number of positive and negative triangles for the triangle-based algorithm. A false edge must be covered twice, one for each direction.
the edges of the polygon in the same way as considering all the edges of the polygon. To ensure this, the ray must be fully included in the tri-cone. A simple solution consists of generating a ray from the test point to the infinite so that the supporting line passes through the origin of the tri-cone (Fig. 12, left). It ensures that the ray is inside the tri-cone and crosses the same edges of the original polygon, considering only the edges classified in the tri-cone. It can be demonstrated that this test is equivalent to the point-in-polygon test with all the edges of the polygon. In order to do this, we can decompose the polygon into two sets, the first composed of edges which do not intersect with the tri-cone, and the second composed of edges which intersect with the tri-cone. As the point is inside the tri-cone and the ray is fully inside it, there can only be intersections with edges classified in the tricone, and no intersections with the edges not classified in this tricone. Another alternative consists of applying the ray-crossings method using a ray directed towards a point included in the tricone whose inclusion state is known. We can throw a ray in the opposite direction. So that it goes from the point position to the origin of the tri-cone (Fig. 12, right). Previously it is necessary to classify the origin of the tri-cone as being inside or outside the polygon. This calculation is common for all the tri-cones of the tri-tree. One of these methods is better or not depending on the geometry of the polygon and the position of the origin of the tritree, but basically the performance is due to the number of intersections finally calculated. I.e. for a convex or star-shaped polygon, the second method obtains better times for points included in the polygon. In case of higher probability for the test point to be inside this type of polygons, the second method is better.
4.1. The ray-crossings method This method (Haines, 1994) throws an infinite ray from the test point and calculates the number of intersections with the edges of the polygon. If this number is odd, the algorithm returns that the point is inside the polygon, and if even, that the point is outside. Now we are going to discuss this method in the field of a tricone. In order to apply the ray-crossings method suitably we need to decide an appropriate direction for the ray. This ray must cross
Fig. 11. Introduction of false edges for treating polygons with holes.
4.2. The triangle-based method
V2
The triangle-based algorithm (Feito et al., 1995) is based on the calculation of the inclusion of the point P in a set of triangles formed by an arbitrary and common point O and each one of the edges of the polygon (Fig. 13a), and obtaining the number of triangles in which the point is included. The algorithm computes þ1 multiplied by the sign of the triangle for each one of these triangles, and considers some special cases like the inclusion on an edge shared by two triangles of the covering (Fig. 13b), computing þ12 multiplied by the sign of each one of these triangles. Finally the inclusion state is obtained by the addition of these values, obtaining the inclusion in the polygon when this value is greater than zero. When the test point is on one edge of the polygon (Fig. 13c), the algorithm returns an inclusion of the point in the polygon.
V2
P
P
V1
V0
V1
V0
Fig. 12. Methods for generating a ray for ray-crossings algorithm.
V6
V6 V5 V8 P
P
V4
V7
V5 V8
O
O
V3
V1
V1
V0
V4
P V 7
V3
V2
V0
V2
O
Fig. 13. (a) Inclusion on some triangles of a covering. (b) Inclusion on an edge shared by two triangles of a covering. (c) Inclusion on an polygon’s edge.
ARTICLE IN PRESS ´nez et al. / Computers & Geosciences 35 (2009) 1843–1853 J.J. Jime
As in the case of the ray-crossings method, the triangle-based method can be applied to a tri-cone. We must count the number of triangles of the covering in which the point is included. We can again decompose the triangles of the covering of the polygon into two sets. In the first set, the triangles’ edges intersect with the tricone (this is equivalent to the triangles of the covering which intersect with the tri-cone). In the second set, the triangles of the covering have edges which do not intersect with the tri-cone. If the point is inside the tri-cone, the point is not included in any triangle of the covering whose edge has not been classified in the tri-cone, and we can discard this set of triangles for the inclusion test. The point can only be inside the triangles whose edges have been classified in the tri-cone.
5. Experimental results and discussion Now let us analyse the storage, construction and inclusion time for the new data structure. For a tri-cone we must store information about the bounding triangle as well as the inner bounding triangle, and a list of edges classified in the tri-cone (pointers to the vertices). All bounding triangles have a common point as origin, the reason why for each triangle it is necessary to store only two vertices. The number of edges classified in a tri-cone depends on the geometry of the polygon. The construction time depends on the number of levels and the number of edges classified per node for each level of the tree. We can distinguish between two types of polygons. The first one, those with an uniform distribution of edges in the nodes for the different levels of the tree (i.e. convex polygons). The second, those with almost all edges classified in only some nodes per level (i.e. saw-shaped polygons, Fig. 14a). For the first case, let n be the number of edges of the polygon. For each node of the first level approximately n=4 edges are classified; for each node of the second level n=8; for each node of the third level n=16; and so on. For obtaining 1 ¼ n=n edges classified we need a depth for the tree of 2 log n. For each level of the tree ki n edges are consulted for classification, being ki a constant (for level i there are 2i nodes, each one with approximately n=2i edges classified). The overall number of edges visited is the number of edges classified per level multiplied by P the number of levels, 2 log n k n, being k ¼ ki , obtaining a construction time of Oðn log nÞ. For the second case, the number of edges classified for some nodes in each level (normally one or two nodes) is almost n, and for the rest of nodes of the same level it is a constant represented
1849
by ki, obtaining for the level Oðn þ ki Þ time. Therefore, for the tree depth of 2 log n the construction time is Oðn log nÞ. The storage of this data structure is measured by means of the number of edges stored in the leaves of the tree. For the first type of polygon, there are 2i nodes per level, each one with approximately n=2i edges classified, and therefore OðnÞ edges classified in the leaves of the tree. In the second case, imagine a polygon which zigzags along the y axis. In this situation there are some branches and leaves of these branches with almost n edges, being the other leaves with a constant number of edges. The number of edges classified is k n, being k a constant, and therefore the storage is OðnÞ. Finally, the inclusion time is the addition of the time needed for obtaining the deeper node in the tree in which the point is included, that is 2 log n, plus the time needed for visiting the edges classified in the node. For the first type of polygon, the number of edges classified is n=ð2 2log n Þ, obtaining a query time of Oðlog nÞ. For the second type of polygon, the number of edges classified in the node in the worst case is OðnÞ, obtaining in this case a query time of Oðn þ log nÞ or what is the same OðnÞ. Normally, due to the shape of the polygons in GIS applications, the expected query time is Oðlog nÞ. Although these complexities are similar to other data structures, there are some reasons for using tri-trees instead of other data structures (for example quadtrees). We emphasize the following:
The construction of a tri-tree for a closed polygon in the case of
the centroid being inside the polygon implies that there do not exist empty nodes as in the case of quadtrees, the reason why the resulting tree is usually more balanced. When the centroid is not inside the polygon we can use another point inside the polygon for the origin of the tri-tree. Moreover, the classification of edges in tri-cones (or bounding triangles) is performed by means of a point-in-triangle test, with less cost than the edge-rectangle intersection test needed for classifying the edges in cells in the case of quadtrees (Ericson, 2005). The cost of the classification of a point in a tri-cone comes from the cost of obtaining the side in which the point is located with regard to the rays which form its border. As between a parent and a child tri-cone, one side of the border of the tri-cone is shared (a ray), the position of the point with regard to this ray can be used from the parent to the child. The classification time of a point in a tri-tree is similar to that obtained for the classification in a quadtree. Although there exist some cases in which the tri-tree present linear query time (Fig. 14a), in the case of using a quadtree
Fig. 14. (a) Saw-shaped polygon in a tri-tree. (b) Polygon with edges condensed in a cell in a quadtree.
ARTICLE IN PRESS 1850
´nez et al. / Computers & Geosciences 35 (2009) 1843–1853 J.J. Jime
Fig. 16. Shapes of figures used in tests. (a) Polygon similar to GIS areas. (b) Polygon with zigzag edges. (c) Star-shaped polygon.
Fig. 15. Better adjustment of bounding triangles in a tri-cone than boxes in a quadtree.
there exist other polygons with similar query times (Fig. 14b).
The additional cost of determining whether a point is inside a bounding triangle implies the need to calculate only the side in which the point is located with regard to the new edge which adds the triangle to the tri-cone, because the bounding triangle has two common edges with the frontier of the tri-cone. This is more expensive than a point-in-box test, but on average the time is reduced due to the early rejection performed by the adjustment of the bounding triangles to the contour or the polygon (Fig. 15). The reduced construction time, with similar storage cost per node and lower point-in-cell inclusion time, gives us to use this data structure instead of others for situations in which it is necessary to construct the decomposition suddenly (Bertolotto and Egenhofer, 2001), being suitable for static data as well. As the bounding triangles of this data structure fit better to the geometry of the polygons normally used in GIS applications, the inclusion of points is rejected or accepted quickly, the reason why it improves the performance of such applications. 5.1. Times study We have studied the construction time of a quadtree, grid and tri-tree for non-convex polygons with different number of vertices and for different shapes (Fig. 16), and then the query time for these data structures. The results can be seen in Tables 1–3. We can conclude that the construction time is better for a tri-tree, followed by a quadtree and a grid. For complex shapes, a grid has better construction time than a quadtree. In query time we observe better times for a tri-tree, followed by a grid and a quadtree. Only for star-shaped figures we obtain better query time for a quadtree than a grid. We have used a combination of thresholds in order to decide whether to subdivide a tri-cone or not. The first threshold is the
maximum depth in the tree. The maximum depth is a function of the number of edges of the polygon. We look for a subdivision in which on average the number of edges classified per tri-cone is approximately equal to 16. The number of subdivisions used (and consequently the level) is less or equal to the number of edges divided by the average of the number of edges classified per tricone. The second threshold permits not descending in the tree when the number of edges classified is lower than a value predefined. In this case we use again the average of the number of edges classified per tri-cone. For the tests, this number is 16. Empirically we have used different levels and minimum number of edges classified by tri-cone, and the best results are obtained using the exposed combination of thresholds. In Table 4 we show storage rates for polygons with the shapes of Fig. 16, formed by 512 vertices, a maximum number of subdivisions of 64 (a quadtree of level 3, a grid of 8 8 and a tri-tree of level 5). The minimum number of edges classified per node is 16. We can conclude that the tri-tree classifies a greater number of edges than other data structures due to the greater number of levels and consequently with greater storage cost, but it obtains better construction and query times. Now we have implemented and tested this new data structure with regard to a quadtree and a grid applied to the determination of the region of a map in which a point is included. We used the ‘‘Andalucı´a’’ map (Spain) with 778 regions, formed by 87 922 vertices (Fig. 17). These are non-convex (some with holes) and range from 20 to 472 edges, there being five regions with more than 1000 edges (on the border of Andalucı´a). Table 5 shows the global construction time for quadtrees, grids and tri-trees for each region in the map, and the time for determining the regions in which one million random points generated inside the ‘‘Andalucı´a’’ bounding box are included, using these data structures. Each region is surrounded by a bounding box, and a hierarchy of bounding boxes is used for better performance (R-Tree). The time of accessing this hierarchy is not shown in the times because it is the same in all cases. Only the inclusion time in one or more regions (depending the situation) are considered. Algorithms have been implemented in gcc on a Linux system using an Intel Pentium IV with 1.6 GHz processor and 1 GB RAM. We can see that the new data structure has a lower construction time with regard to the quadtree and the grid construction. The time for the inclusion test is much improved with the new data structure.
6. Conclusions We have obtained a new data structure specially suited for the point-in-polygon problem using the crossings-count and the
ARTICLE IN PRESS ´nez et al. / Computers & Geosciences 35 (2009) 1843–1853 J.J. Jime
1851
Table 1 Times obtained for polygons similar to typical GIS areas (Fig. 16a) with different numbers of vertices. 16 vert.
32 vert.
64 vert.
0.0010 0.0010 0.0008
0.0021 0.0022 0.0017
Query time for 1 million test points Ray-crossing (s) Ray-crossing with quadtree (s) Ray-crossing with grid (s) Ray-crossing with tri-tree (s)
22.4102 3.5078 3.0531 0.4509
Triangle-based Triangle-based Triangle-based Triangle-based
21.0030 3.2783 2.8480 0.4210
Pre-processing time Quadtree (s) Grid (s) Tri-tree (s)
(s) with quadtree (s) with grid (s) with tri-tree (s)
128 vert.
256 vert.
512 vert.
1024 vert.
0.0039 0.0043 0.0028
0.0077 0.0087 0.0058
0.0173 0.0192 0.0097
0.0378 0.0415 0.0189
0.0853 0.0925 0.0397
26.8838 4.2861 3.4754 0.4923
33.4968 5.5932 5.0420 0.5360
57.4145 8.2251 7.2632 0.7140
86.4693 10.9053 10.1831 0.9511
148.2816 16.4911 15.1338 1.1559
199.7390 19.9607 17.2717 1.3383
25.2431 4.0132 3.2450 0.4605
31.3347 5.2225 4.7170 0.5000
53.7086 7.6727 6.7659 0.6667
81.3446 10.1681 9.4877 0.8889
138.5809 15.3979 14.0845 1.0787
186.4454 18.6445 16.0772 1.2484
128 vert.
256 vert.
512 vert.
1024 vert.
Table 2 Times obtained for polygons with zigzag edges (Fig. 16b) with different numbers of vertices. 16 vert.
32 vert.
64 vert.
0.0010 0.0010 0.0013
0.0020 0.0020 0.0021
0.0041 0.0040 0.0042
0.0104 0.0097 0.0087
0.0215 0.0198 0.0111
0.0476 0.0432 0.0203
0.1029 0.0950 0.0421
Query time for 1 million test points Ray-crossing (s) Ray-crossing with quadtree (s) Ray-crossing with grid (s) Ray-crossing with tri-tree (s)
23.3309 3.5122 3.0500 0.7327
27.9625 4.0820 3.1595 0.6081
34.8724 5.8801 4.6902 0.8040
61.7457 11.1092 8.0980 1.0710
93.6541 13.5528 10.5013 1.0884
160.2627 20.7666 15.7537 1.2415
213.3177 24.0792 17.7385 1.4192
Triangle-based Triangle-based Triangle-based Triangle-based
21.7032 3.2715 2.8322 0.6841
26.0844 3.8221 2.9500 0.5689
32.3792 5.4903 4.3879 0.7500
57.5449 10.3631 7.5436 1.0000
87.4455 12.6366 9.7842 1.0172
149.3594 19.3899 14.6615 1.1587
198.5644 22.4915 16.5117 1.3239
128 vert.
256 vert.
512 vert.
1024 vert.
Pre-processing time Quadtree (s) Grid (s) Tri-tree (s)
(s) with quadtree (s) with grid (s) with tri-tree (s)
Table 3 Times obtained for star-shaped polygons (Fig. 16c) with different numbers of vertices. 16 vert.
32 vert.
64 vert.
0.0010 0.0010 0.0008
0.0022 0.0022 0.0017
0.0036 0.0040 0.0025
0.0072 0.0083 0.0052
0.0166 0.0187 0.0089
0.0369 0.0405 0.0180
0.0834 0.0908 0.0380
Query time for 1 million test points Ray-crossing (s) Ray-crossing with quadtree (s) Ray-crossing with grid (s) Ray-crossing with tri-tree (s)
12.8820 2.8413 3.0412 0.3933
15.9607 3.6371 3.4001 0.4382
28.4723 4.1820 4.6902 0.4786
49.2124 6.2297 6.9293 0.6401
75.6607 8.4758 9.9179 0.8727
131.5595 13.0398 14.7691 1.1008
176.5710 15.8080 16.9543 1.2810
Triangle-based Triangle-based Triangle-based Triangle-based
12.0731 2.6554 2.8201 0.3987
14.9866 3.4055 3.2229 0.4202
26.6345 3.9048 4.3879 0.4464
46.0359 5.8113 6.4548 0.5977
71.1765 7.9029 9.2406 0.8156
123.1831 12.1753 13.7451 1.0274
165.0042 14.7657 15.7817 1.1950
Pre-processing time Quadtree (s) Grid (s) Tri-tree (s)
(s) with quadtree (s) with grid (s) with tri-tree (s)
ARTICLE IN PRESS ´nez et al. / Computers & Geosciences 35 (2009) 1843–1853 J.J. Jime
1852
Table 4 Storage statistics for polygons with different shapes (Fig. 16). Max. nodes
Nodes
Edges tested
Edges classif.
Edges tested/node
Edges classif./node
GIS area polygon Quadtree Grid Tri-tree
84 64 124
56 64 86
5248 8192 5280
1412 870 2336
93.71 128.00 61.39
25.21 13.59 27.16
Zigzag polygon Quadtree Grid Tri-tree
84 64 124
84 64 106
6144 8192 5792
1864 1126 2722
73.14 128.00 54.64
22.19 17.59 25.67
Star-shaped polygon Quadtree Grid Tri-tree
84 64 124
80 64 82
6016 8192 5248
1756 640 2224
75.20 128.00 64.00
21.95 10.03 27.12
All polygons have 512 vertices and a maximum space decomposition of 64 cells. Minimum number of edges classified per cell equal to 16. Max. nodes: addition of the maximum number of nodes or cells in the decomposition. Nodes: addition of the effective number of nodes in the decomposition. Edges tested: addition of the number of edges tested against the nodes or cells of the decomposition. Edges classif.: addition of the number of edges classified in the nodes or cells of the decomposition.
Fig. 17. Map of Andalucı´a and a detail of an expanded complex region.
ARTICLE IN PRESS ´nez et al. / Computers & Geosciences 35 (2009) 1843–1853 J.J. Jime
Table 5 Construction time of data structures for regions showed on the Andalucı´a map, and query time for one million random test points. Andalucı´a region Pre-processing time Quadtrees (s) Grids (s) Tri-trees (s)
7.22 9.57 4.48
Query time for 1 million test points Ray-crossing (s) Ray-crossing with quadtrees (s) Ray-crossing with grids (s) Ray-crossing with tri-trees (s)
107.20 15.19 13.62 1.32
Triangle-based Triangle-based Triangle-based Triangle-based
101.30 14.14 12.68 1.23
(s) with quadtrees (s) with grids (s) with tri-trees (s)
triangle-based methods. This data structure fits better to the geometry of a polygon, obtaining better construction and query times than other data structures like grids and quadtrees. The method proposed is suitable for complex polygons with a large quantity of edges with or without holes, being translation, rotation and scale invariant. This method could be combined with other strategies and data structures in order to obtain better performance. We think this data structure can be used for other applications like obtaining the proximity edges for a given point, or can be applied to visibility problems. In these cases we can choose a determined level in the tree for varying the amplitude or number of related edges.
Acknowledgements This work has been partially supported by the Spanish Ministry of Education and Science and the European Union (via ERDF
1853
funds) through the research Project TIN2007-67474-C03-03, by the Consejerı´a de Innovacio´n, Ciencia y Empresa of the Junta de Andalucı´a through the research Projects P06-TIC-01403 and P07TIC-02773, and by the University of Jae´n through the research Project UJA-08-16-02. References Bertolotto, M., Egenhofer, M.J., 2001. Progressive transmission of vector map data over the world wide web. Geoinformatica 5 (4), 345–373. Chen, M., Townsend, P., 1987. Efficient and consistent algorithms for determining the containment of points in polygons and polyhedra. In: Proceedings of the Eurographics, vol. 87, Amsterdam, Netherlands, pp. 423–437. Ericson, C., 2005. Real-Time Collision Detection, first ed. Morgan Kaufmann Publishers, San Francisco, 594pp. ˜ a, A., 1995. Orientation, simplicity and inclusion test for Feito, F.R., Torres, J.C., Uren planar polygons. Computers & Graphics 19 (4), 595–600. Haines, E., 1994. Point in polygon strategies. In: Heckbert, P. (Ed.), Graphics Gems IV. Academic Press, San Diego, CA, pp. 24–46. Hormann, K., Agathos, A., 2001. The point in polygon problem for arbitrary polygons. Computational Geometry: Theory and Applications 20 (3), 131–144. Huang, C., Shih, T., 1997. On the complexity of point-in-polygon algorithms. Computers & Geosciences 23 (1), 109–118. Li, J., Wang, W., Wu, E., 2007. Point-in-polygon tests by convex decomposition. Computers & Graphics 31 (4), 636–648. Longley, P.A., Goodchild, M.F., Maguire, D.J., Rhind, D.W., 2001. Geographic Information Systems and Science, first ed. Wiley, Chichester, West Sussex, 454pp. Martinez, F., Rueda, A.J., Feito, F.R., 2006. The multi-L-REP decomposition and its applications to a point-in-polygon inclusion test. Computers & Graphics 30 (6), 947–958. Preparata, F.P., Shamos, M.I., 1985. Computational Geometry: An Introduction. Springer, Berlin, 390pp. Samet, H., 2006. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco, 1024pp. Taylor, G., 1994. Point in polygon test. Survey Review 32, 479–484. Wang, W., Li, J., Wu, E., 2005. 2D point-in-polygon test by classifying edges into layers. Computers & Graphics 29 (3), 427–439. Zˇalik, B., Jezernik, A., Rizman Zˇalik, K., 2003. Polygon trapezoidation by sets of open trapezoids. Computers & Graphics 27 (5), 791–800. Zˇalik, B., Kolingerova, I., 1997. A cell-based point-in-polygon algorithm suitable for large sets of points. Computers & Geosciences 23 (1), 109–118.