DATAK-01559; No of Pages 25 Data & Knowledge Engineering xxx (2016) xxx–xxx
Contents lists available at ScienceDirect
Data & Knowledge Engineering
O
R O
Nearest neighbor query processing using the network voronoi diagram
F
journal homepage: www.elsevier.com/locate/datak
Mei-Tzu Wang ⁎
a r t i c l e
i n f o
P
Department of Information Management, Chinese Culture University, 55, Hwa-Kang Road, Yang-Ming-Shan, Taipei, Taiwan, ROC
a b s t r a c t
The purpose of this paper is twofold: to develop sound and complete rules, along with algorithms and data structures, to construct the network voronoi diagram (NVD) on a road network. To compute the NVD, attention is focused on how to distinguish divisible paths from others, i.e., those whose midpoints are exactly their border points. Research shows that the dominance relation introduced in this paper plays an important role in making that distinction. To generate and prune candidate paths concurrently, a border-point binary tree is introduced. The pre-computed NVD is organized as linked lists and is available for access by a NVD listbased query search method (NVDL), which can compute NN in a single step. Experiments show that the NVDL method reduces execution time by 28% for sparse data distribution on a real road network compared to the existing INE method. The NVDL's query time remains nearly constant regardless of how data points are distributed on the road network or where the query point is positioned. In addition, this approach prevents the NVDL from experiencing the slow convergence condition that often occurs when using the incremental approach. © 2016 Published by Elsevier B.V.
E
D
Article history: Received 25 April 2014 Received in revised form 21 February 2016 Accepted 23 February 2016 Available online xxxx
2 Q2 3 4 5 7 8 9 10 11 12 13 14 15 16 17 18 19 20
R
R
E
C
T
Keywords: Moving object Nearest neighbor Network voronoi diagram Query processing Spatial databases
Q1 1
1. Introduction
U
N
C
O
As a type of spatial query, the nearest neighbor approach has been widely studied in the literature and applied in various applications [9,22,24,25,29]. Spatial object modeling that supports an increasing number of applications has been extended from merely describing the object's static feature (i.e., point, line, and region) with spatial operations to incorporating a discrete time dimension called the spatiotemporal data model [13,14,37]. In recent decades, the time dimension incorporated into the model has become a continuous type with the development of mobile computing that includes GPS. The data model then is called a moving object, which can support modern applications. This model can be used in applications related to satellites, passengers, and public security with the following post queries: “Which satellite will approach this spacecraft's route within the next minute?”; “Which taxi is closest to a passenger who requests service?”; “Did the trucks with dangerous goods approach a high-risk facility?” [14]. A moving query point's position can be estimated within a small deviation if a motion vector is stored in a moving object's database that can be updated in an event-driven approach [38]. There are two spaces for spatial queries, namely, unconstrained space and constrained space. Each has its own applications. In recent decades, the restricted space (e.g., road network), which is used in the second and third examples set forth above, has received more attention [6,8,18,20,28,35,39]. The major difference between the two types of spaces is that the distance used by the restricted space is measured by network distance, which is more costly than the Euclidean distance used by unrestricted space. This paper proposes a new method for evaluating a NN (nearest neighbor) query on restricted space where the query point is modeled as a moving object equipped with GPS service and the data points are static objects. This method will be particularly ⁎ Tel.: +886 2 28610511 35922. E-mail address:
[email protected].
http://dx.doi.org/10.1016/j.datak.2016.02.003 0169-023X/© 2016 Published by Elsevier B.V.
Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
2
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
38 39
1. 2. 3. 4. 5.
Propose a set of sound and complete rules for constructing the NVD. Provide data structures and algorithms to implement the NVD. Organize the resulting NVD in linked lists to be accessed by the NN query evaluation method. Conduct experiments on a real road network to show that NVD can be completely constructed by the proposed rules. Conduct empirical experiments on a real network to show that NVDL has better performance than the incremental approach in situations involving sparse data distribution.
66
The rest of the paper is organized as follows. Section 2 reviews previous works, and Section 3 introduces basic terminology. In Section 4, some theorems are developed to distinguish divisible paths from others; thus, the proposed rules are developed. In Section 5, data structures, including adjacent lists, BPB trees, and NVD lists are introduced. Section 6 proposes algorithms to generate the NVD, and Section 7 conducts experiments on a real network to show that the NVD can be completely constructed; in addition, the NVDL's performance is evaluated. Section 8 provides our conclusion.
72
2. Related works
77
N C
O
R
R
E
C
T
E
D
P
R O O
F
adequate for applications that demand a real-time response or that have sparse data points. The potential application is for emergency services, e.g., for a seriously hurt pedestrian who must quickly find the nearest hospital or for people who need to find the nearest exit quickly because the large building in which they are positioned is on fire. Why do we want to find another method for evaluating NN when the literature has described many methods of doing so? There are two general approaches to finding NN: an online incremental approach and a pre-computed approach using indexes. The incremental approach is efficient in the average case. However, for the case in which the query point is far from all data points, there is a problem. It would take a long time to find NN because the incremental approach expands the search area incrementally (or slowly) from the query point to visit the vertex that is the most likely to reach NN; we call this “the slow convergence condition.” Clearly, this case is impermissible in real-time applications. The second approach uses spatial indices to filter numerous data points to reduce the exact computation of NN in the refinement step. That approach must prepare some spatial index, and the filtering rate that affects the performance is not always satisfactory. Furthermore, it repeats computation for all candidates, although different types of distance are used during different steps. Clearly, this approach cannot also guarantee a real-time response. One naïve idea for NN search in road networks is to pre-compute the network distances for all pairs of points, with one point representing the possible position of the query point and the other point representing the location of a data point. This precomputing method is straightforward and infeasible because all possible positions of a query point correspond to uncountable points of the road network. The problem can be solved by computing a large scale of points instead of points such as edges when it is realized that all possible positions with the same NN are clustered. This set of positions forms a continuous part of the road network, such as an edge, an edge fragment, or successive edges or edge fragments. As provide an example, the road network in Fig. 1 is partitioned into several blocks, each of which has the same NN as shown in Fig. 2. When the query point is positioned at the point indicated by Fig. 1, the NN d6 is found by looking up the 12th list in Fig. 2, which indicates that all points in edge (i, j) ranging from 12.5 to 16 relative to the initial vertex i have the NN d6. Our attention is now focused on the pre-computing method without indexing because that method is most likely to provide a uniform real-time response. The network voronoi diagram (NVD)-based method is our choice, which means we need to construct a network voronoi diagram for road networks that contains a set of data points in which we are interested. As the author knows, there is no existing algorithm to construct NVD. If we can do this work, then not only can we evaluate NN in real time in a single step without the disadvantages of the two NN approaches set forth above but also those who intend to find k-NN based on NVD such as VN3 can use the constructed NVD. The contribution of this paper is summarized as follows.
U
The incremental approach seems to be very efficient for finding the nearest neighbor on the road network because it always intends to visit the vertex that is most likely closer to NN among all of the vertices under consideration. The crucial factor in increasing efficiency is the strategy adopted to determine the next one to visit. However, not all strategies are always adequate in all conditions.
Fig. 1. Road network example.
Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
67 68 69 70 71
73 74 75 76
78 79 80 81
O
R O
Fig. 2. The nearest neighbor of each edge fragment in the road network example.
3
F
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
U
N
C
O
R
R
E
C
T
E
D
P
Papadias et al. [32] propose an incremental algorithm called the incremental Euclidean restriction approach (IER) for road networks. It begins with the computation of Euclidean nearest neighbor Ne for query point q using Euclidean distance. Next, it calculates the network distance between Ne and q, which is denoted by nBound. Among the data points that have not yet been visited, the data point de with minimum Euclidean distance to q and denoted by min_ed is selected and checked to determine whether its Euclidean distance to q, min_ed, is greater than nBound. If the answer is ‘yes,’ then data point Ne is claimed as q's nearest neighbor on the road network and the algorithm is terminated. Otherwise, data point de's network distance to q, which is denoted by ndisx, is calculated and compared with nBound. If ndisx is less than nBound, then it will replace the nBound value, and accordingly, de will replace Ne. The process is continued until the nearest neighbor is found. This approach is based on the so-called Euclidean lower-bound property—i.e., the network distance between two points cannot possibly be smaller than their corresponding Euclidean distance. All remaining data points with Euclidean distances to q N min_ed will therefore have network distances greater than nBound. Accordingly, they cannot be the nearest neighbors of q. Another incremental algorithm proposed by Papadias et al. [32] is called the incremental network expansion approach (INE), which has been found to be better than IER in terms of average performance; the latter's performance declines as the result of false hits when the ordering of data points by Euclidean distance is significantly different than those ordered by network distance. INE does not have such a disadvantage. As an incremental approach, INE expands the search area by incrementally including more adjacent road segments until NN is found. However, INE cannot always have good performance: when all of the data points are far from the query point, it takes a long time to approach the nearest neighbor. We call this problem “the slow convergence condition.” The island approach proposed by Huang et al. [19] aims to improve the performance of k-NN queries. Each data point di has a network island, and every vertex within the island is pre-computed—along with its distance—to the center, di. The method begins with an edge at which query point q is positioned. The edge's two adjacent vertices and the data points (if any) on it, together with their distance to the query point, are, respectively inserted into priority queues Q v and Q d. Next, the vertex closest to q is removed from queue Q v, and its adjacent vertices are checked. If there is an adjacent vertex within a data point's island, then the vertex and the data point, along with the distances to q, are inserted into priority queues Q v and Q d, respectively. The process continues until there is no vertex in priority queue Q v whose distance to q plus the minimum radius is less than the k-th element of Q d. It seems that the island can help alleviate the so-called slow convergence condition. The larger the island is, the fewer steps are needed to approach NN. However, a larger island implies possibly more vertices in the island, and all of their distances to the central data point must be pre-computed. The approach enables us to vary the island's radius, creating another problem: How do we assign an appropriate radius to an island so that overall efficiency can be achieved, particularly when the cardinality of the dataset is large? First, the distribution of data points might be investigated on the road network. Jensen et al. [21] propose a client–server system to answer the k-NN query, assuming that the query point and data points are moving objects. The client begins by issuing a request for nearest neighbor candidates from the server; the returned candidate set is very small and relevant. Although the small dataset can reduce the client's computation time, that time must be maintained and the client must decide when to refresh the set from the server to respond to the moving objects. The distance between each two nodes is pre-computed, and the length of the shortest path between two data points can be determined from four path values. This approach abandons Dijkstra's algorithm for the entire dataset. Instead, it chooses to locally compute a small relevant set. Thus, the computation time is reduced. However, this approach requires maintenance time and might suffer from transmission delay between the client and server, particularly when the communication network is congested. It also creates a problem regarding whether a large candidate set is sufficient for the client. The performance of spatial query processing can be improved by using an index. The famous spatial index R-tree, introduced by Guttman [15], is an extension of the B-tree and contains both an internal node and a leaf node. Each node typically corresponds to a disk page and has numerous entries ranging from a specified minimum value to a maximum value. Each internal node's entry consists of an n-dimensional minimum bounding rectangle (MBR) and a pointer to a child node. Each leaf node's entry contains the spatial object's MBR and a pointer to the spatial object's data record in the spatial database. There is a containment relation between the entry's MBR and its child nodes' MBRs. Two MBRs can be overlapped. R-tree is widely used by various queries, e.g., queries by linear constraints (QBLC) [12], nearest neighbor (NN), k-nearest neighbor [41,16], and skyline [27], which was introduced by [5]. There are several variants of R-tree, including R*-tree [1]. Fixed network R-Tree (FNR-Tree) is used by Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
4
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
E
C
T
E
D
P
R O O
F
range query [10]; TB-tree and 3D-R-tree are used by NN query [11]; graph strip tree (GStree) is used by querying the moving object's trajectory [30]; and network-based R-tree (NBR-tree) is used by path query [22]. The index-based NN (k-NN) query search approach must pre-compute an index structure and generally involves two steps. In the filter step, an index is used to find candidates based on filter distance. Next, the maximum, nBound, of these candidates' exact object distances is obtained. In the second step, a range query is performed on the index to obtain all candidates whose filter distance is less than or equal to nBound. After that, these candidates are ordered by the exact object distance in descending order, and the top k candidates are retrieved as the result. This approach's validity relies upon the so-called lower-bounding property: the filter distance between each two objects (measured by the index) is always lower than or equal to their exact object distance. Seidl et al. [34] propose an index-based k-NN search method to perform a similarity search on data points that are high dimensional. This method divides the typical two steps into multiple steps and seeks to reduce candidates by embedding the incremental concept. The candidates are iteratively generated, with the maximum filter distance gradually adjusted to be smaller than the previous one. As an index-based approach, this method relies on the lower-bounding property, which is similar to the Euclidean lower-bound property assumed by IER [32]. Therefore, many false hits may occur when using this method. For moving query point's NN query, the time parameterized index, TPR-tree, is employed by Benetis et al. [2]. This index is essentially an R*-tree with indexed points and bounding rectangle augmented by the velocity vector, and several results are presented. The NVD is defined as a collection of voronoi polygons called NVPs, each of which has the same closest generator [23,31]. The VN3 proposed by Kolahdouzan et al. [23] uses NVD to calculate k-NN where the distances between adjacent border points (i.e., the points located between two adjacent voronoi polygons) are pre-computed, and within each voronoi polygon, the distances between all edges and their generator are also pre-computed to find k-NN. In VN3, it is assumed that NVD has been constructed and NVPs have been stored in a table. The NVPs are considered as regular polygons, and each query point's NVP can be found by using the R-tree index. The main weakness of this method is that it uses NVD but does not describe how to obtain NVD, that is, no algorithm and data structure has been presented to show how to construct the so-called regular polygons along with border points, particularly when the network is large. There are two types of uncertainty related to a moving object's position: location uncertainty and existence uncertainty. Pfoser et al. [33] indicate that imprecise position sampling by GPS and measuring by interpolation cause the former to occur. By taking location uncertainty into consideration, [7] introduces probabilistic range query and NN query to provide a set of tentative answers instead of a definite answer. Uncertain trajectory is modeled by [36] as a series of three-dimensional cylinders that—along with spatio-temporal predicates—can capture a moving object's uncertain location. Existence uncertainty refers to a situation in which we cannot say a moving object definitely exists at a particular location, which also calls for probabilistic versions of range queries, nearest neighbors, and skyline queries [40]. In addition, uncertain spatial selection conditions, such as closer and very closer, instead of exact quantities can be used in location-based spatial queries (LBSQ) [3,4].
R
3. Basic terminology
O
R
Essentially, a road network is modeled as graph G = (V, E), where V is the set of vertices representing the crossroads or ends of the road segments and E is the set of edges representing the road segments [26]. The set of data points, denoted by D, on a road network is modeled as a finite set of points on edges. The query point q, assumed to be a moving object, is modeled as a function of time with the range to be the set of all points of all edges. To generate candidate paths, as will be explained in this section, the network must visit both vertices and data points. Definition 1. An edge fragment (or fragment) is a continuous part of an edge in E.
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
160 161 162 163 164 165 166 167
N C
An edge fragment may possibly be part of an edge or an edge itself. However, it cannot be a singular point, nor can it be a null. 168 Definition 2. A general path is a sequence of edges or edge fragments (x1, x2), (x2, x3),….., (xm − 1, xm) in a road network, where 169 xi, i = 2,…, m − 1, is a vertex, and xi, i = 1, m, is a vertex or an interior point of some edge such that the terminal vertex of an 170 edge or edge fragment is the initial vertex of the next edge or edge fragment. 171
U
The definition allows a general path to start and end with a data point. Hereafter, the general path and path will be used interchangeably. Moreover, a shorter form of path representation is suggested if there is no ambiguity, for example, (a,..., c, d2) in Fig. 1 is a shorter form of path (a, b), (b, c), (c, d2). Assuming all paths in the road network are bidirectional, we will always have a path's reverse path, that is, a path with the reverse direction. As an example, the reverse path of path p = (x1, x2,..., xm − 1, xm) is (xm, xm − 1,..., x2, x1), denoted by pre. Definition 3. A prefix of a path p is a subpath of p that shares at least the first edge or edge fragment with path p.
172 173 174 175 176 177 178 179
Definition 4. The concatenation of two general paths p1 = (v1, vk, b), p2 = (b, vk + 1, vk + 2, vn), denoted by p1 + p2, is a path 180 (v1, vk, b, vk + 1, vk + 2, vn) if b is a vertex or (v1, vk, vk + 1, vk + 2, vn) if b is an interior point of the edge (vk, vk + 1). 182 181 Definition 5. The network distance between two points x, y in a road network, denoted by dist(x, y), is the sum of the lengths of 183 all of the edges and edge fragments that belong to a shortest path joining x and y. 184 185
It is the network distance, not the Euclidean distance that defines the nearest neighbor in a road network. Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
186
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
5
Definition 6. Given a set of data points D on a road network, the nearest neighbor to a moving query point q is those data points 187 with the smallest network distance to q's current position qt, i.e., {di ∈ D |dist(di, qt) ≤ dist(dj, qt), ∀ dj ∈ D}. 189 188 Definition 7. Given a dataset D on a road network, the network voronoi block with generator di ∈ D, denoted by nvbdi, is a set of 190 points of edges such that for every x ∈ nvbdi, we have dist (di, x) ≤ dist (dj, x) for any other data point dj ∈ D. The NVD is the 191 union of all network voronoi blocks in the road network, i.e., NVD = ∪di∈D nvbdi. 192
F
The network voronoi block is represented as a combination of edges and edge fragments or represented as a path. Fig. 3 shows the NVD in the example of the road network in Fig. 1. Six network voronoi blocks correspond to six data points. For example, nvbd1 is represented as a path (a, b, b1), where b1 is a point between nvbd1 and nvbd2 in edge (b, c). The point located between two network voronoi blocks can have two nearest neighbors.
193 194 195 196 197
R O
O
Definition 8. Let nvbdi, nvbdj be two distinct network voronoi blocks for a dataset D in the road network. If there are two edge 198 fragments e1, e2 and point b is the intersection of e1 and e2, then b is called a border point between nvbdi and nvbdj assuming the 199 following conditions are satisfied. 200 ∀x∀dk ðx ∈ e1 ; dk ∈ D → distðdi ; xÞ ≤ distðdk ; xÞÞ x ∈ e2 ; dk ∈ D → dist d j ; x ≤ distðdk ; xÞ
202 203
ð2Þ
P
∀x∀dk
ð1Þ
E
D
The two voronoi blocks (nvbdi, nvbdj) are said to be adjacent to each other. If e1 and e2 are contained in one edge, then the border point b is the edge's interior point. Otherwise, it is located between two edges. The query point positioned somewhere in e1 shall travel the shortest path pre 1 = (b,..., di) to reach the nearest neighbor, di. Thus, the query point positioned somewhere in e2 shall travel the shortest path p2 = (b,....,dj) to reach the nearest neighbor, dj. Clearly, p1 and p2 must satisfy the following conditions.
C
x ∈ p2 ; dk ∈ D → dist d j ; x ≤ distðdk ; xÞ
208 209 210 211
ð3Þ 213 214
ð4Þ
E
∀x∀dk
T
∀x∀dk ðx ∈ p1 ; dk ∈D → distðdi ; xÞ ≤ distðdk ; xÞÞ
205 206 207
U
N
C
O
R
R
Otherwise, di, dj would not be the nearest neighbor for all points of e1, e2, respectively. The concatenation of p1 and p2, denoted as p, has a border point, b, which is also p's midpoint. In Fig. 4, a query point positioned at any point in the path (di, v1, v2, v3, b1) shall travel the shortest path, (b1, v3, v2, v1, di) to reach the nearest neighbor, di. Thus, the query point positioned at any point in the path (b1, dj) shall travel the shortest path (b1, dj) to reach the nearest neighbor, dj. The border point b1 located at p1's midpoint divides the path into two halves, with (di, v1, v2, v3, b1) ⊂ nvbdi and (b1, dj) = nvbdj. Although a path's midpoint will not necessarily be two voronoi blocks' common border point, a path may contain a border point that is not the path's midpoint. If the latter occurs, there are some points in this path from which the query point should travel another path to reach its own nearest neighbor. Fig. 4 shows p2 containing a border point, b1, that is not equal to its midpoint, m2. Another path, p1, also contains b1. However, it matches its own midpoint. A query point positioned somewhere in the edge fragment (v3, b1) in both paths p1 and p2 should travel the subpath (v3, v2, v1, di) of pre 1 to reach the nearest neighbor, di, rather than travel the subpath (b1, dj) of p2 to reach dj, which is not its nearest neighbor. Note that although traveling the subpath (b1, v3, v4, v1, di) of pre 2 can reach the nearest neighbor di, this is not the shortest path.
Fig. 3. The NVD for the road network example.
Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
216 217 218 219 220 221 222 223 224 225 226 227 228 229
6
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
F
Fig. 4. Candidate paths p1 = (di,v1,v2,v3,dj), p2 = (di,v1,v4,v3,dj) and candidate circuit p3 = (di,v1,v4,v3,v2,v1,di).
230
4. Rules for constructing a network voronoi diagram
234
R O O
If a path's midpoint matches a border point, the nearest neighbors of all of the points of this path can be determined. Searching border points in paths lies at the core of the construction of NVD. Our challenge is to distinguish such paths from others. In Fig. 4, nvbdi is represented by thick lines, whereas nvbdj is represented by thin lines. In Fig. 5(a), those paths whose midpoints match a border are indicated by the symbol “v” in Fig. 5(b).
4.1. Basic theorems
231 232 233
235 236
239 240
D
Clearly, only candidate paths must be considered for finding border points.
P
Definition 9. Let di, dj ∈ D, where D is a dataset on the road network. A general path joining di and dj is called a candidate path if 237 there is no other data point in between. In addition, di is called an adjacent data point of dj and vice versa if di ≠ dj. 238
E
Lemma 1. If a border point b is located between two adjacent network voronoi blocks nvbdi, nvbdj, then there exists a candidate 241 path p whose midpoint matches b and the subpath (di,..., b) of p belongs to nvbdi, and the subpath (b,..., dj) of p belongs to nvbdj. 243 242
T
Proof. For any border point b, by definition there exist two edge fragments e1, e2 such that {b} = e1 ∩ e2. And for every point of e1, di is 244 the nearest neighbor, and for every point of e2, dj is the nearest neighbor. Let pxi denote the shortest path joining x and di, and let │pxi│ 245 be its length. And also let pyj denote the shortest path joining y and dj, and let │pyj│ be its length. Then, we have 246
E
ð5Þ ð6Þ
248 249 251 252
R
∀y∀dk y ∈ e2 ; dk ∈ D → pyj ≤ distðy; dk Þ
C
∀x∀dk ðx ∈ e1 ; dk ∈ D → jpxi j ≤ distðx; dk ÞÞ
Because {b} = e1 ∩ e2, we replace x, pxi in Eq. (5) by b, pbi, respectively, and replace y, pyj in Eq. (6) by b, pbj, respectively. Also, 253 dk in Eq. (5) is replaced by dj, and dk in Eq. (6) replaced by di. We have 254
O
ð7Þ 256 257
ð8Þ
N C
pbj ≤ dist ðb; di Þ ¼ jpbi j:
R
jpbi j ≤ dist b; d j ¼ pbj ;
U
259
Fig. 5. (a) A road network. (b) Three divisible paths.
Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
7
Thus, b matches the midpoint of p, the concatenation of pre bi and pbj. We assume that p is a candidate path joining di and dj. Otherwise, (5) or (6) would not be valid, thus violating the definition of a border point. Next, we prove that path pbi = (b,..., di) ⊂ nvbdi and path pbj = (b,..., dj) ⊂ nvbdj. Suppose that the statement is false. Without loss of generality, we assume that z is a point of pbj and z ∉ nvbdj. This implies that there is another data point dx that is closer to z via a shortest path p3. We have ∀dk ðdk ∈D → ðjp3 j ≤ dist ðz; dk ÞÞ:
260 261 262 263 264
ð9Þ 266 267
Because z ∉ nvbdj, we have the following inequality:
O
F
jp3 j b dist z; d j ;
R O
dist z; d j ≤ jp5 j:
ð10Þ 269 270
ð11Þ 272
In Eq. (11), p5 is the subpath (z,..., dj) of p and dist (z, dj) is the network distance between z and dj. Let the subpath (b,..., z) of 273 p be denoted by p4. We have 274
P
jp4 j þ jp3 j b jp4 j þ jp5 j ¼ pbj :
ð12Þ
T
E
D
This means the path (p4 + p3), that joins b and dx is shorter than the path pbj that joins b and dj, violating the assumption that dj is the nearest neighbor of b. We conclude that z must belong to nvbdj. This proves the Lemma. □ Lemma 1 states that for every border point in the road network, there is a candidate path whose midpoint matches the border point. Thus, all border points can be obtained by investigating such paths. Recall that the edge fragment (v3, b1) contained in the second half of p2 in Fig. 4 has the nearest neighbor di, but not dj. We should collect b1 from path p1 whose midpoint matches b1 rather than from p2. The former allows us to determine the nearest neighbor of the whole path, whereas the latter does not.
276 277 278 279 280 281 282
C
Definition 10. A candidate circuit is a candidate path in which the initial vertex (point) of the first edge (fragment) coincides 283 with the terminal vertex (point) of the last edge (fragment). 284
R
E
The query point positioned at somewhere in the edge fragment (m2, v3) of the second half of p2 in Fig. 4 shall travel the subpath (v3, v2, v1, di) of the candidate circuit p3 to reach its nearest neighbor di. The set of data points adjacent to data point di is denoted by ndrp (di). The set of candidate paths joining di and its adjacent data point dj is denoted by np (di, dj). If the candidate path joining two adjacent data points is unique, then the cardinality of np (di, dj) = │np (di, dj) │ = 1. Similarly, the set of candidate circuits with data point di is denoted by np (di, di).
285 286 287 288 289 290
R
Lemma 2. Suppose that p is a candidate path joining two data points di, dj in the road network. If the intersection of p and any 291 other candidate path does not include the interior points of p, then we have the following results: 292
O
(1) P is divided by its midpoint m into two parts that belong to nvbdi and nvbdj, if di ≠ dj. (2) P fully belongs to nvbdi if di = dj.
293 294 295 296
Next, we prove that p is divided by its midpoint into two network voronoi blocks. The subpath (di,.., x) of p that joins di and x must be unique because the existence of another path joining di and x, say p2, would cause the intersection of p and p2 + (x,.., dj) equal to p2, thus violating our assumption. Similarly, the subpath (x,.., dj) of p joining x and dj must be unique for the same reason. Thus, we have dist(x, di) ≤ dist(x, dk) for every x in the subpath (di,..., m) of p and every dk ∈ D, and dist(y, dj) ≤ dist(y, dk) for every y in the subpath (m,..., dj) of p and every dk ∈ D. Suppose di = dj. Path p is a candidate circuit. Based on similar reasoning as that set forth above, we can conclude that di is the nearest neighbor for all points of p. This proves the Lemma. □ In Fig. 6(a), the edge (a, b) contains three data points, di, dj, and dk. Two candidate paths (di,..., dj), (di,..., dk) intersect at di, which is not in the interior of either (di,..., dj) or (di,..., dk). According to Lemma 1, both paths are divided by their respective midpoints into nvbdi, part of nvbdj, and part of nvbdk. In Fig. 6(b), four candidate paths (di,.., dj), (di,.., d2), (di,.., d3), and (di,.., d4) intersect at di, which is not in the interior of any of these paths. Thus, they are divided by their respective midpoints into five parts, where paths (bj,...., dj), (b2,..., d2), (b3,..., d3), and (b4,.., d4) belonging to nvbdj, nvbd2, nvbd3, and nvbd4, respectively, and paths (b2,..., bj), (b3,...., b4) constitute the entire nvbdi. In Fig. 6(c), all paths (di,...., dj), (di,...., dk), and (dj,...., dk) do not intersect with
303 304
U
N
C
Proof. We have two cases. Suppose that di and dj are distinct, i.e., di ≠ dj. If x is the initial vertex (point) of the first edge (fragment) or the terminal vertex (point) of the last edge (fragment) of p, then x coincides with data points di or dj. Thus, di or dj is clearly its nearest neighbor. Let x be an interior point of p and assume that x has the nearest neighbor dk, where dk ≠ di, dk ≠ dj. Let path ps = (x,..., dk) be the shortest path joining x and dk, and let the concatenation of the subpath (di,..., x) of p and path ps be denoted as p1. Clearly, p1 is a candidate path and the subpath (di,..., x) of p is contained in the intersection of p and p1, thus violating the assumption of not allowing p's interior points to appear in the intersection. Thus, dk must be di or dj.
Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
297 298 299 300 301 302
305 306 307 308 309 310 311 312 313 314 315
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
P
R O O
F
8
D
Fig. 6. Various candidate paths. (a) In an edge. (b) Intersecting at a single point. (c) Intersecting not at an interior point. (d) Intersecting at starting and ending points. (e) Intersecting with a candidate circuit at a single point.
E
others at interior points and thus are divided by their respective midpoints into three network voronoi blocks. In Fig. 6(d), two paths 316 intersect at the starting and ending points, resulting in two network voronoi blocks. In Fig. 6(e), the candidate path (dj,...., di) 317 intersects with the candidate circuit (dj, v1, v2, dj) at their starting point, making nvbdi = (di, b1), nvbdj = (b1, dj, v2, v1, dj). 318
C
T
Definition 11. A candidate path pk = (di,..., dk) is said to be dominated by another candidate path p ∈ ∪dz.∈(ndrp(di) ∪ di) np(di, dz) 319 if p is shorter than pk and shares a common prefix subpath with pk and if p's midpoint is contained in the interior of that common 320 prefix subpath. 321 322
R
E
A candidate path can be dominated by several candidate paths, and the dominance relation among them has a transitive prop- 323 erty. The following Lemma confirms our intuition about the dominated path: it is not the type of path that we are looking for to 324 find the border point. 325
R
Lemma 3. Suppose that pk is a candidate path joining two distinct data points, di and dk. If pk is dominated by some path p, then 326 pk cannot have a border point that matches its midpoint. 328 327
N C
O
Proof. Let pc = (di,..., v) be a common prefix subpath of paths p = (di,..., dj) and pk = (di,..., dk), and let it contain p's midpoint, mij. In addition, let mik be the midpoint of path pk. Because pk is dominated by p, pk is longer than p. There are two cases involving p. First, di and dj are distinct, i.e., di ≠ dj. Because p is shorter than pk, mij is closer to di than mik in pk. In addition, mik is in pc or between v and dk in pk. Let x be an interior point between mij and mik in pc for the former and between mij and v in pc for the latter. Both assure that x is contained in pc and is located between mij and mik. Therefore, the distance from x to di along (pc) re is less than the distance to dk along pk and greater than the distance to dj along p. The consequence is that there is some point x located between di and mik in pk such that dj is closer to it than di. Thus, pk's midpoint cannot be a border point between di and dk in pk.
329 330 331 332 333 334
335 336 337 is located between di and v in pc, then every 338
U
Second, di coincides with dj, i.e., path p is a candidate circuit. If pk's midpoint mik x in pc between mik and v shall travel the subpath (x,..., di) of p, which does not contain mij, to reach di because it is shorter than both the subpath (x,..., dk) of pk by assumption and that of the subpath (x,.., di) of pre that contains mij. The consequence is that there is some x located between mik and dk in pk such that di is closer to it than dk. Thus, pk's midpoint cannot be a border point between di and dk in pk. If mik is between v and dk in pk, we shall find some x between mik and dk such that di is closer to x than dk. Let ps denote the subpath of p from v to di, which does not contain mij. Let x be a point located between mik and dk in pk, and along pre k its distance to mik is less than 0.5 × δ, where δ = │pc│–│ps│. Let p3, p4, and p5 denote the subpaths of pk from v to mik, from mik to x, and from x to dk, respectively. Then,
339 340 341 342 343 344 345 346
jp5 j ¼ 0:5 jpk j–jp4 j ¼ jpc j þ jp3 j–jp4 j ¼ jps j þ δ þ jp3 j–jp4 j N jps j þ jp3 j þ jp4 j: 348
Thus, there is some x located between mik and dk in pk such that di is closer to it than dk. Thus, pk's midpoint cannot be a 349 border point between di and dk in pk. This proves the Lemma. □ 350 Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
9
The shortest path that dominates pk, as will be verified in Theorem 1, has the privilege of having the border point at the mid- 351 point: it can be divided by its midpoint into two distinct network voronoi blocks. 352 Lemma 4. Suppose that pk is a candidate circuit starting from data point di. If pk is dominated by a path that connects two distinct 353 data points, then pk does not entirely belong to nvbdi. 354 355 Proof. Along the same lines of reasoning as in the first case in Lemma 3, we can find a subpath that is contained in the common 356 prefix subpath of p and pk such that the nearest neighbor is not di. Thus, pk cannot entirely belong to nvbdi. □ 357 358
O
F
Definition 12. A path pk that starts from data point di to a vertex of degree 1 and has no other data point in between is said to be 359 dominated by other path p ∈ ∪dz∈(ndrp(di) ∪ di) np(di, dz) if p and pk have a common prefix subpath and p's midpoint is contained 360 in the interior of that common prefix subpath. 361 362
Path pk cannot entirely belong to nvbdi. Unlike candidate paths, such path can be dominated by even longer candidate paths. 363
R O
Lemma 5. If a path pk terminated with a vertex of degree 1 does not contain other data points except for the starting point and is 364 dominated by some path that is not a circuit, then pk does not totally belong to nvbdi. 365
P
Fig. 7(a) shows that data point di is connected to adjacent data points dj, d7, and d8 through paths p = (di, v1, v4, dj), p1 = (di, v1, v4, v2, d7) and p2 = (di, v1, v4, v2, v3, d8), respectively. Path p is shorter than both p1 and p2, and its midpoint b1 is contained in the interior of the common prefix subpath (di, v4). According to Lemma 3, both p1 and p2 are dominated by p. The subpath (b1,.., v4) of p belongs to nvbdj but does not belong to either nvbd7 or nvbd8, causing only nvbdj to be adjacent to nvbdi.
366 367 368 369 370
D
Definition 13. A candidate path that joins two distinct data points di, dj is called a divisible path if it can be divided by its mid- 371 point into nvbdi and nvbdj. 373 372
E
Theorem 1. Suppose that p is a candidate path joining two distinct data points di, dj. For any other path pk ∈ ∪dz∈(ndrp(di) ∪ di) np(di, dz) 374 that has a common prefix subpath pc with p, if we have either the length of p not greater than pk and the midpoint mij of p contained in 375 the interior of pc or both p and pk's midpoints mij, mik are not in the interior of pc, then p is a divisible path. 376 377 378
Case 1. ps and p intersect at x. The concatenation of the subpath pc = (di,..., x) of paths p and ps, denoted by p2, is a path that belongs to ∪dz ∈ (ndrp(di)∪di) np(di, dz). Suppose that mij is contained in the interior of the common prefix subpath pc of paths p and p2 and that p is shorter than or has the same length as p2. Then, the subpath (x,..., dj) of p is also shorter than or the same length as ps. As a shortest path joining x and its nearest neighbor dx, ps would not possibly be longer than the subpath (x,.., dj) of p. Thus, both must have the same length, and dj is also the nearest neighbor. Thus, x is contained in p between mij and dj and x ∈ nvbdj.
385
O
R
R
E
C
T
Proof. If x is the initial vertex (point) of the first edge (fragment) or the terminal vertex (point) of the last edge (fragment) in p, then x coincides with data points di or dj. Thus, its nearest neighbor is obviously di or dj. Let us suppose that x is an interior point of p. We shall verify that x's nearest neighbor is di when it is between di and mij and is dj when it is located between mij and dj. Let ps be the shortest path that joins x and its nearest neighbor dx. It can be seen that dx is not in the interior of p because p is a candidate path and that dx cannot possibly be connected to x via di or dj because that would cause ps to be longer than (x,.., di) or (x,..., dj) of p. Thus, there remain three possible cases showing how x is connected to dx.
379 380 381 382 383 384
386 387 388 389 390 391
U
N
C
Suppose that paths p and p2's respective midpoints, mij and mix, are not in the interior of their common prefix subpath pc. Path 392 ps cannot be longer than pc because dx is the nearest neighbor of x. Thus, mix cannot be contained in the interior of ps and will 393
Fig. 7. An example of divisible paths. (a) One divisible path dominates two candidate paths. (b) Three divisible paths.
Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
10
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
therefore coincide with x. Thus, di is also the nearest neighbor of x, with the consequence that x is contained in p between di and 394 mij and x ∈ nvbdi. 395
F
Case 2. ps is formed by concatenating the subpath (x,...,vx) of p and the shortest path from vx to dx, where x is contained between di and vx, x ≠ vx, in p. Path p2 formed by concatenating the subpath (di,..., x) of p and ps belongs to ∪dz∈ (ndrp(di)∪di) np(di, dz). Suppose that mij is contained in the interior of the common prefix subpath pc = (di,...., vx) of p and p2 and that p is shorter than or has the same length as p2. Then, the subpath (x,.., dj) of p would not be longer than ps. As a shortest path joining x and its nearest neighbor dx, ps cannot be longer than the subpath (x,..., dj) of p. Thus, both must be the same length, and dj is also the nearest neighbor of x. Moreover, x must be located between mij and dj in p because p and p2's respective midpoints, mij and mix, coincide and x must be located between mix and dx in p2 (otherwise, dx would not be x's nearest neighbor). Thus, x is located between mij and dj in p and x ∈ nvbdj.
396 397 398 399 400 401 402 403 404
R O O
Suppose that p and p2's respective midpoints, mij and mix, are not contained in the interior of their common prefix subpath pc. 405 Then, ps must be longer than the subpath (di,...., x) of p because x is contained in the interior of pc but mix is not. Therefore, dx 406 could not possibly become the nearest neighbor of x, violating our assumption. Accordingly, this case cannot exist. 407
D
P
Case 3. ps is formed by concatenating the subpath (x,..., vx) of pre and the shortest path from vx to dx, where x is between vx and dj, x ≠ vx, in p. Path p2, formed by concatenating the subpath (di,..., vx) of p and the subpath (vx,...., dx) of ps, belongs to ∪dz ∈ (ndrp(di)∪di) np(di, dz). Paths p and p2's common prefix subpath (di,.., vx) is denoted by pc. Suppose that mij is in the interior of the common prefix subpath pc and that p is shorter than or has the same length as p2. Then, the subpath (vx,..., dj) of p would not be longer than the subpath (vx,..., dx) of p2. It can be easily derived that the subpath (x, dj) of p is shorter than ps by the amount │p2│−│p│ + 2 × │p5│, where p5 is the subpath (vx,.., x) of p. Therefore, dx will not be the nearest neighbor of x, violating our assumption. Accordingly, such a case cannot exist.
N C
O
R
R
E
C
T
E
Suppose that paths p and p2's respective midpoints, mij and mix, are not contained in the interior of pc. If x is located between vx and mij, then the subpath (x,.., di) of p is shorter than ps by two times the length of the subpath (vx,..., mix) of p2. If x is located between mij and dj, then the subpath (x,..., dj) of p is shorter than ps by two times the sum of the lengths of the subpath (vx,.., mix) of p2 and the subpath (x,..., mij) of p. Thus, dx cannot possibly be the nearest neighbor of x, violating our assumption. Therefore, such a case cannot exist. Combining the three cases above, it is concluded that for every x in the subpath (di,.., mij) of p, di is its nearest neighbor; for every x in the subpath (mij,.., dj) of p, dj is its nearest neighbor. This proves the theorem. □ Theorem 1 states that every candidate path joining distinct data points is a divisible path if it is not dominated by others. In Fig. 7(b), path p1 = (di, v1, v2, dj) shares a common prefix subpath (di, v1) with a longer path p2 = (di, v1, v3, dk) and shares another common prefix subpath (di, v1, v2) with a longer path p3 = (di, v1, v2, d9). The midpoint mij of p1 is found in the interior of these two common prefix subpaths. According to Theorem 1, p1 is a divisible path. Moreover, p4 = (dj, v2, d9) and p5 = (dj, v2, v1, v3, dk) share a common prefix subpath (dj, v2), and the midpoints of both are not in the interior of the common prefix subpath (dj, v2). It is the same re case for p4 and pre 1 as well as p5 and p1 . Thus, all of these are divisible paths, resulting in four network voronoi blocks. In Fig. 8, paths p2 and p4 are dominated by p1 and p3, respectively. According to Lemma 3, their midpoints cannot possibly be the border points for two end data points. Because paths p1 and p3 are not dominated by others, they are also divisible paths, according to Theorem 1. The edges and edge fragments depicted as solid lines in Fig. 8(b), (d) belong to nvbdi, whereas those depicted as dotted lines belong to nvbdj. Note that those divisions are consistent: the edge (di, v1) in both divisions is contained in nvbdi and the edge (v4, dj) in nvbdj. At this moment, one might notice that the nearest neighbors for edge (v4, dk) are not yet determined. Fortunately, they can be found by visiting ndrp (dj) or ndrp (dk). The adjacent data point di in ndrp (dj) is connected re to dj through pre 1 = (dj, v4, v2, v1, di) or p3 = (dj, v4, v3, v1, di). Another adjacent data point, dk, is connected to dj through p5 = (dj, v4, dk). All of these paths share the common prefix subpath (dj, v4), and none of their midpoints are in this interior. By Theorem 1, p5 is a divisible path with (b3, dk) belonging to the network voronoi block of dk, and (dj, v4, b3) belonging to nvbdj re in Fig. 8(f). The division of p5 is also consistent with those of pre 1 and p3 : the edge (dj, v4) belongs to nvbdj in all of these divisions.
408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438
U
Theorem 2. Suppose that p is a candidate circuit starting from data point di. For any other candidate path pk ∈ ∪dz∈ndrp(di) np(di, dz) 439 that has a common prefix subpath pc = (di, vk) with p, if we have either p's length less than or equal to pk and the midpoint of p 440 contained in the interior of pc or both p and pk's midpoints mij and mik not in the interior of pc, then p entirely belongs to nvbdi. 441 442 Proof. With the same line of reasoning as used in Theorem 1, we can derive that p entirely belongs to nvbdi. □
443 444 The theorem states that every candidate circuit that is not dominated by others entirely belongs to one network voronoi block. 445
Determination of that path's network voronoi block is independent of those whose midpoints are not located in the interior of the shared prefix subpaths. In Fig. 4, path p1 shares a common prefix subpath (di, v1) with paths p2 and p3, and no midpoints of these three paths are contained in the interior of that subpath. According to Theorem 1, p1 is a divisible path. Path p2 shares a prefix subpath (di, v1, v4, v3) with circuit p3, is longer than it, and therefore is dominated by it. Thus, p2 is not a divisible path by Lemma 3. The candidate circuit p3 is not dominated, and according to Theorem 2, it belongs entirely to nvbdi.
446 447 448 449 450
Corollary 1. A path terminated with a vertex of degree 1 and containing no other data points except for starting data point di will 451 entirely belong to nvbdi if it is not dominated by a candidate path that connects two distinct data points. 452 Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
11
D
P
R O
O
F
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
E
Fig. 8. Consistent divisions of three divisible paths: p1, p2, p3.
453
T
Note that before all candidate paths that share a common prefix subpath with this path have been visited, we cannot say that 454 pk is not dominated. 455
C
Theorem 3. For each edge in the road network, we can always find a candidate path or a path terminated with a vertex of degree 456 1 that contains this edge and is not dominated. 457 458 459 460
contained in pc. We shall prove that p is not dominated by ps. If vx is in p1, then the path formed by concatenating the subpath (dx, vx) of the reverse of ps and the subpath (vx, vk) of p is shorter than the subpath (di, vk) of p by two times the length of the subpath (mix, vx) of pc, violating the assumption that di is the nearest neighbor of vk. If vx is in p2, then the subpath (vk + 1, vx, dx) of ps cannot be shorter than the subpath (vk+1, vx, dj) of p2 because dj is the nearest neighbor of vk + 1. Thus, p cannot be dominated by ps. Second, vk or vk + 1 are of degree 1. With no loss of generality, let vk + 1 be of degree 1. With the same line of reasoning as in the first case, we can verify that p is not dominated by other paths. Combining the above cases, it is concluded that every edge in the road network will be contained either in some divisible path or in a circuit that is not dominated by others. Thus, all points of the edges can obtain their nearest neighbors using our method. This proves the theorem. □ In Fig. 9, di and dj are the nearest neighbors of vk and vk+1, respectively. The path p = p1 + (vk, vk+1) + p2 is not dominated by any of the following paths: p1 + p4, p3 + p2 and p5. Thus, p is a divisible path, and the nearest neighbors of all points in the edge (vk, vk + 1) will be determined.
465 466
E
Proof. There are two possibilities for an edge e = (vk, vk + 1) is connected to others in the road network. First, both vertices are of a degree N1. Let path p1 = (di, vk) be the shortest path joining vk and its nearest neighbor di, and let p2 = (vk + 1, dj) be the shortest path joining vk+1 and its nearest neighbor dj. Path p is formed by concatenating p1, edge (vk, vk + 1), and p2. In what follows, we shall be primarily concerned with whether p is dominated by others.
U
N
C
O
R
R
461 462 463 Let ps = (di,..., dx) be a candidate path that shares with p a common prefix subpath pc = (di, vx), and the midpoint mix of path ps is 464
Fig. 9. The path p1 + (vk, vk+1) + p2 determines the nearest neighbors to all points of the edge (vk, vk+1).
Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
467 468 469 470 471 472 473 474 475
12
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
476
Recall that by definition, candidate paths are only dominated by shorter paths. This suggests that we should visit and prune candidate paths in ∪dz∈(ndrp(di) ∪ di) np(di, dz) by the order of path length. In this way, it can be quickly determined whether a candidate path is a divisible path or not at the time it is visited, according to Theorem 1, without necessarily generating all candidate paths. Note that all paths that have not yet been fully generated and processed earlier than this path will not be shorter than this path. Similarly, a candidate circuit can be determined as totally belonging to one network voronoi block at the time it is visited, according to Theorem 2. When a divisible path is found based on Theorem 1, all of the paths dominated by that path can be pruned according to Lemmas 3 and 4 and Corollary 1. Similarly, when a candidate circuit is found to belong entirely to one network voronoi block based on Theorem 2, all candidate paths that are not circuits and that are dominated by that circuit are pruned according to Lemma 3. The case that a candidate circuit pk is dominated by another, shorter candidate circuit p is somewhat more complex for determining NNs. Whether pk entirely belongs to one network voronoi block depends on the existence of a shorter path that joins two distinct data points. If such a path exists and it dominates pk, then pk should be pruned by Lemma 4. Otherwise, pk entirely belongs to one network voronoi block and should not be pruned. Fortunately, according to Theorem 3, this case can be processed more efficiently by treating p as a path joining two distinct data points and assuming that pk is dominated by p. In other words, when the candidate circuit pk is dominated by another, shorter candidate circuit p, then pk is simply pruned. This will not violate the completeness of our rules. Similar reasoning is also applied to paths terminated with a vertex of degree 1. Two patterns cannot exist in a candidate path because otherwise they will induce a shorter path that connects the same two data points. First, an edge is followed immediately by its reverse edge, such as (di, v1), (v1, v2), (v2, v1), and (v1, dj). Second, an edge appears more than once in a path, such as (di, v1), (v1, v2), (v2, v3), (v3, v1), (v1, v2), and (v2, dj). However, an edge can be followed by other edges and then a reverse edge such as (di, v1), (v1, v2), (v2, v3), (v3, v2), (v2, v1), and (v1, dj). The rules of constructing a NVD are given below.
477 478
1. For each data point di, the paths in ∪dz∈(ndrp(di)∪di) np(di, dz) are visited by the order of path length. 2. If a candidate path that joins two distinct data points is found to be not dominated by others by the time it is visited, then it is a divisible path (by Theorem 1). 3. If a candidate circuit is found to be not dominated by others by the time it is visited, then it entirely belongs to a single network voronoi block (by Theorem 2). 4. If a path terminated with a vertex of degree 1 is found to be not dominated by others by the time all candidate paths are visited, then it entirely belongs to a single network voronoi block (by Corollary 1). 5. If a candidate path (including circuit) or a path terminated with a vertex of degree 1 is dominated by others, then it should be pruned (by Lemmas 3, 4, Lemma 5, Theorem 3). 6. If a path has one of the following patterns, it should be pruned. (1) An edge is repeated in this path. (2) An edge is followed immediately by the reverse edge.
499 500
R
R
E
C
T
E
D
P
R O O
F
4.2. Generating rules
O
Corollary 2. The rule set presented above is sound and complete for constructing a NVD.
479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498
501 502 503 504 505 506 507 508 509 510 511 513 512 514 515
5. Data structures
520
U
N C
Proof. The rule set is sound because it is generated based on the verified theorems and corollaries. We must still prove completeness: all possible border points and network voronoi blocks can be derived by these rules. From Theorem 3, we know that for each edge we can always find a path/circuit that contains this edge and is not dominated, that is, a divisible path or a path entirely belonging to one network voronoi block. Thus, the nearest neighbors of all points of all edges can be found using these rules. This proves the corollary. □
516 517 518 519
The road network is represented by a collection of adjacent lists, which, according to Horowitz [17], is storage efficient. We need to visit both vertices and data points to generate candidate paths. Therefore, we also need adjacent lists for data points. To make it efficient to generate and prune candidate paths, we need a tree-type data structure. There are two types of trees. First, a k-ary border-point search tree (BPS-tree) is used to explain how it works, thus helping generate and prune candidate paths. Second, the border-point binary tree (BPB-tree) is transformed from BPS-tree and used for implementation. Finally, the NVD is represented as linked lists called NVD lists to store each edge's fragments and their associated nearest neighbors.
521
5.1. Adjacent list representation of road networks and data points
527
522 523 524 525 526
Every vertex in the road network is associated with an adjacent list, as is every data point on the road network. The vertex/ 528 data point's object ID, its type (1 represents the vertex, 0 represents the data point), and the number of adjacent objects are in- 529 dicated by attributes vobject, vtype, and vcount, respectively, in the head node, as shown in Fig. 10(a). 530 Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
13
P
R O
O
F
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
Fig. 10. The representation of the road network and the dataset. (a) Head node. (b) Adjacent node. (c) Adjacent list of vertex c. (d) Adjacent list of data point d2.
531
5.2. Border-point search trees
540
E
C
T
E
D
There are several adjacent nodes, each of which describes how an adjacent object connected to the road network. The adjacent object's ID, its type (1 for vertex, 0 for data point), the edge incident with it and the object stored in vobject, edge length, and its own adjacent list are indicated by the attributes aobject, type, aedge, length, listindex, respectively, in Fig. 10(b). The attributes dist1 and dist2 indicate the distance to the initial vertex and the object indicated by vobject on the joined edge, respectively. The flag attribute indicates whether the adjacent object is the joined edge's terminal vertex, a data point located between the joined edge's terminal vertex and the object indicated by vobject, or others in the road network. The value is 1 for the first two cases and 0 for other cases. Fig. 10(c) indicates that the vertex c has three adjacent objects. The adjacent data point d2 has its own adjacent list 11 in Fig. 10(d), which contains d2's two adjacent objects: the initial vertex c and the terminal vertex d of the joined edge e3.
U
N
C
O
R
R
The border-point search tree (BPS-tree) allows the candidate paths to be generated and pruned concurrently. The root node corresponds to a data point on the road network, and the parent–child relation on the tree corresponds to the adjacent relation on objects in the road network. However, it is unnecessary to fully generate all candidate paths in ∪dz∈(ndrp(di)∪di) np(di, dz) for finding border points. Most often, the paths to be pruned are recognized earlier than they are fully generated, based on rules 1, 5, and 6. The costs of path generation will therefore be reduced. Furthermore, we can prune a subtree instead of a single path at a time. Thus, the cost can be further reduced. The todis attribute in the node of the BPS-tree in Fig. 11(a) records the length of the path from the node's representing object to the root node's representing object in the road network. The node's degree is indeterminate because the number of child nodes corresponds to the number of adjacent nodes of this object is not constant. Fig. 11(b) shows a BPS-tree with the root node corresponding to the data point d1 in the road network example. Although a candidate path (d1, b, c, d2) is fully generated to find a
Fig. 11. BPS-tree. (a) Node structure. (b) BPS-tree for data point d1 with a divisible path (d1, b, c, d2).
Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
532 533 534 535 536 537 538 539
541 542 543 544 545 546 547 548 549 550
14
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
border point, another candidate path (d1, b, c, e, d3) is not fully generated at all. Indeed, the latter is pruned immediately after 551 node N5 is visited, according to rule 5. The path (d1, a) terminated with a vertex of degree 1 is not dominated and belongs en- 552 tirely to nvbd1, according to rule 4. 553 554
The border-point binary tree (BPB-tree) is a binary-tree version of the BPS-tree and presents two advantages. First, the node has a fixed size. Thus, the nchd attribute of the node in the BPS-tree, which records the number of child links, can be eliminated from the node of the BPB-tree, as shown in Fig. 13(a). Second, the binary tree occupies less storage than that of its corresponding k-ary BPS-tree (k N 2) [17]. The BPS-tree is transformed to the BPB-tree by making the root node's representing object unchanged and then matching each node's first child node in the BPS-tree to its corresponding node's left child node, then iteratively matching the node's next child node to the newly created node's right child node, i.e., a node's second child node in the BPS-tree matches its first child node's right child node in the corresponding BPB-tree. Likewise, a node's third child node in the BPS-tree matches its second child node's right child node in the corresponding BPB-tree, etc. The BPS-tree for d2 in Fig. 12 is transformed to the BPB-tree in Fig. 13(b) for the road network example. The node structure of the BPB-tree is shown in Fig. 13(a). The attribute todis records the length of the path from the node's representing object to the root node's representing object. A candidate path in the BPB-tree is found by following the parent link, plink, back to the root node, e.g., (d2, c, e, d3), (d2, c, b, d1), (d2, d, e, d3).
555
P
R O O
F
5.3. Border-point binary trees
5.4. NVD Lists
T
E
D
The NVD constructed for the road network example, as indicated by Fig. 3 is stored in twelve NVD lists in Fig. 2, which will be implemented as the nvdlist class's objects. The head node in Fig. 14(a) records an edge's initial and terminal vertices, edge length, the number of edge fragments, and two pointers. Each linked node shown in Fig. 14(b) corresponds to a fragment's nearest neighbor and its range in the edge. The NVD list of edge e2 is shown in Fig. 14(c). Two adjacent edge fragments having the same nearest neighbor in Fig. 14(d) should be combined together, as shown in Fig. 14(e).
C
5.5. Priority queues
556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576
6. Algorithms for generating a network voronoi diagram
580
6.1. Creating adjacent lists and initializing NVD lists
581
O
R
R
E
The priority queue is required to implement a shortest-path first policy, which is suggested by rule 1. The queue maintains every path's current end node along with its path length so that the path with the minimum length can be selected next for a visit and possible extension. A path terminated with a vertex of degree 1, however, should be inserted at the tail of the queue because all candidate paths sharing a common prefix subpath with it, regardless of whether they are shorter or longer, must be visited before this path can be claimed as not being dominated.
U
N C
The information about how road segments are connected in a road network and where and how data points are located at edges are recorded in a file called adjb1.dat. Every object's adjacent vertices or data points in the road network are arranged in successive records so that they are processed efficiently. One algorithm reads the file and produces adjacent lists in memory, i.e., the objects of the class adjlist. The nested attribute listindex is essential for generating paths, and its value is found by matching an adjlist object's nested attribute aobject value to another adjlist object's vobject attribute value. All of the edges are derived from the extent of the adjlist class. If a vertex's adjacent list contains an adjacent vertex, an edge joining the two vertices is found. If a vertex's adjacent list contains an adjacent data point or if a data point's adjacent list contains
Fig. 12. BPS-tree for data point d2 with three divisible paths: (d2, c, b, d1), (d2, c, e, d3), (d2, d, e, d3).
Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
577 578 579
582 583 584 585 586 587 588 589
15
P
R O
O
F
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
D
Fig. 13. BPB-tree for data point d2 with three divisible paths: (d2, c, b, d1), (d2, c, e, d3), (d2, d, e, d3).
T
E
an adjacent vertex, then an edge with one of its adjacent vertexes is known. In this case, another adjacent vertex of the edge can 590 be found by checking another adjlist object. To prevent an edge from being counted twice, only the case in which the vobject at- 591 tribute value is greater than that in which the adjacent node's aobject attribute value is counted. 592 593
C
6.2. Rules applied to BPB-trees
E
For reasons of concurrency and efficiency, the rules proposed in the previous section are now transformed for application to 594 BPB-trees: 595
U
N
C
O
R
R
A. Generate a BPB-tree for each data point in the road network according to the shortest-path-first strategy (according to rule 1). 596 B. If a candidate path that joins two distinct data points has been fully generated and a corresponding queue node has been 597 inserted into and is now removed from the priority queue for a visit, then it is a divisible path (according to rule 2). 598
Fig. 14. The NVD list. (a) Head node. (b) Linked node. (c) Edge e2's NVD list with two NNs d1, d2 for two fragments. (d) Edge e3's NVD list with one NN for two fragments (e) Refinement of (d).
Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
16
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
599 600
Rules B and C are valid. If a candidate path has been fully generated and the corresponding queue node is removed from the priority queue for a visit, then no other path maintained in the queue can be shorter than it and dominate it. Rule D is also valid. If a path terminated with a vertex of degree 1 is generated and it triggers a corresponding queue node added at the tail of the priority queue, then all of the candidate paths maintained in the queue will be visited before this path can be determined as entirely belonging to a single network voronoi block.
613 614
6.3. Generating BPB-trees
618
D
P
R O O
F
C. If a candidate circuit has been fully generated and a corresponding queue node has been inserted into and is removed from the priority queue for a visit, then it entirely belongs to a single NVB (according to rule 3). D. If a path terminated with a vertex of degree 1 is generated, then a corresponding queue node is added at the tail of the priority queue by changing the path length to an extreme large value. If this path is not pruned at last by rules E or F, then it entirely belongs to one network voronoi block (according to rule 4). E. If a divisible path p is determined by rule B, then the subtree with root node immediately below p's midpoint is pruned by removing corresponding queue nodes from the priority queue, except for other paths with the same length as p (according to rule 5). F. If a candidate circuit p is determined as entirely belonging to one network voronoi block by rule C, then the subtree with root node immediately below p's midpoint is pruned by removing corresponding queue nodes, except for other paths with the same length as p (according to rule 5). G. If a path satisfies one of the following conditions, it must be pruned. (a) An edge is repeated in the path. (b) An edge is immediately followed by its reverse edge (according to rule 6).
U
N C
O
R
R
E
C
T
E
The algorithm BPB-tree construction shown in Table 1 aims to generate a BPB-tree for each data point according to rules A–G. Only data points' adjacent lists are visited (lines 1–3). The paths of the tree are generated iteratively based on the shortest-path-first policy (lines 5–45). For the root node r of the tree that represents an object o, all adjacent objects of o in the adjacent list are visited to determine whether a child node must be added to the tree for path extension (lines 7– 25). If adding such a child node to the path will induce patterns (a) or (b) of rule G, then the addition is not allowed (lines 8–17). If, otherwise, the visited adjacent object is the first object whose corresponding child node added to the tree will not incur (a) or (b) of rule G to occur, then the child node can add to the path as node n's left child node (lines 18– 21). The second and subsequent adjacent objects that do not cause patterns (a) or (b) of rule G to occur in the corresponding path are allowed to be added to the path as the right child node of the most recently created node (lines 22–23). Note that the sibling relation in the original BPS-tree corresponds to the parent-right child relation in the BPB-tree. The nodes of the tree that correspond to these adjacent nodes will have the same plink attribute value in the BPB-tree, i.e., they have the same parent in the original BPS-tree (line 24). Next, all corresponding queue nodes are added to the priority queue and record how far the current objects (represented by the end node of these paths) are from the root object (represented by the root node) along these paths. After visiting all of the adjacent objects of the root object by checking the adjacent list and storing their corresponding queue nodes in the priority queue (lines 6–25), the queue node with the minimum path length will be removed from the queue according to rule A (lines 27–28). There are five cases related to the queue node's corresponding node in the BPB-tree. First, the corresponding node in the tree is of a vertex type (recorded in the type attribute value in the tree node) with a degree greater than 1 (line 29). Next, the corresponding path indicated by this queue node will be extended (lines 29–30, line 45) by visiting all of this vertex's adjacent objects (lines 7–25). Second, although the corresponding tree node is of the data-point type, the represented data point is different from the object represented by the root node. This means that a candidate path joining two distinct data points has been fully generated (line 31). According to rule B, this path is a divisible path, and the algorithm FindBorder is invoked to compute the nearest neighbors of all of the points of this path (line 32). Moreover, a subtree with the root node of the vertex type that is immediately below the midpoint of that path is pruned, except for the paths whose path length is equal to that path, according to rule E (line 33). Third, the corresponding tree node is of the data-point type and the represented data point is the same as that represented by the root node. This means that a candidate circuit has been fully generated (line 34). According to rule C, the data point is the nearest neighbor of all of the points of this circuit (lines 35–36). Moreover, a subtree with the root node of vertex type immediately below the border point is pruned, except for those whose path length is equal to that path, according to rule F (lines 37–38). Fourth, the corresponding node in the tree is of a vertex type with a degree of 1, and the queue node contains a reasonable path length (not equal to 230) (line 39). This means that it is a path terminated with a vertex of degree 1. Thus, it must be reinserted into the queue at the tail with its path length replaced by a very large value 230 (line 40) so that all candidate paths are checked for dominance according to rule D. Fifth, the corresponding node in the tree is of a vertex type with a degree of 1, and the queue node contains a path length equal to 230 (line 41). When this occurs, all of the candidate paths in queue have already been checked and no candidate path will be found to dominate this path. According to rule D, the data point represented by the root node is the nearest neighbor for all of the points of this path (lines 42–43). Finally, when the queue is empty, the tree is complete (line 46). Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
601 602 603 604 605 606 607 608 609 610 611 612
615 616 617
619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx Table 1 Algorithm BPB-tree-construction.
U
N
C
O
R
R
E
C
T
E
D
P
R O
O
F
t1:1 t1:2
17
t1:4
Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
18
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx Table 2 Algorithm FindBorder.
R O O
F
t2:1 t2:2
P
t2:4
D
6.4. Finding border points
Table 3 Algorithm DDBorder.
U
t3:1 t3:2
N C
O
R
R
E
C
T
E
Algorithm FindBorder, shown in Table 2, aims to locate a path's border point and stores the nearest neighbors for all of the edges in the corresponding nvdlist objects. Beginning with the leaf node of the path, it travels the path through the border point and then reaches the root node. One edge of the path may have one nearest neighbor for all points of this edge or two nearest neighbors for two distinct fragments. There are three cases involving the path. First, the path corresponds to part of an edge in the road network, i.e., two data points on an edge, e.g., Fig. 6(a). Algorithm DDBorder in Table 3 is invoked in line 5. Second, the path has more than one edge, and the border point is in the last edge fragment. Algorithm BorderInLast in Table 4 is invoked in line 7. Third, the border point is not in the last edge fragment. Algorithm BorderInUp in Table 5 is invoked in line 9. Some of the variables initialized in line 2–3 are updated while traveling. The path's starting and ending data points a, b remain constant throughout the process. Algorithm DDBorder, shown in Table 3, computes the distance between the edge's initial vertex and the border point in line 2 or 6 depending on the dir attribute value of vertex b. There are two new nodes linked to the nvdlist object that describe this edge through the head and next pointers. Two edge fragments' ranges and associated nearest neighbors are, respectively stored in the two nodes (lines 3, 7). Algorithm BorderInLast, shown in Table 4, computes the distance between the border point, which is in the last edge fragment, and vertex v, which corresponds to the tree node immediately above the border point of the path (line 1). The points between b and the border point belong to nvbb, whereas the points between the border point and v belong to nvba. Two linked nodes are added to the same nvdlist object in line 4 or 7, depending on the dir attribute value of vertex b (line 3). Other edges, edge fragments of the path belong to nvba, and new nodes are linked to their respective nvdlist objects (lines 8–16). Algorithm BorderInUp, shown in Table 5, locates the border point by checking every visited node's todis attribute value while traveling the path toward the root node. The points between b and the border point belong to nvbb (lines 1–7). If an edge that
Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx Table 4 Algorithm BorderInLast.
E
D
P
R O
O
F
t4:1 t4:2
19
T
t4:4
R
E
C
contains the border point does not contain the starting data point a, then it is divided into two parts, one for nvba and the other for nvbb (lines 8–14). The remaining edges, edge fragments along the path from the border point to the starting data point that the root node represents, belong to nvba (lines 15–21). Otherwise, if the edge that contains the border point also contains data point a (line 22), then the points between data point, a, and border point belong to nvba, and all other points of the path in this edge belong to nvbb (lines 23–27).
680 681 682 683
R
6.5. NVD list-based evaluation method
678 679
C
O
The proposed NVD list-based evaluation method (NVDL) uses pre-computed nvdlist objects to find nearest neighbors. The 684 query point is assumed to be equipped with a global positioning system that can indicate where the query point is positioned. 685 By looking up the corresponding nvdlist object, the nearest neighbor can quickly be found. 686 687
7.1. Run-time complexity of BPB-tree generation
688
The algorithm of creating all of the vertices/data points' adjacent lists in memory has a complexity of O(n + m), where n and m denotes the number of vertices and data points, respectively, because all of these lists' adjacent objects are clustered in the input data file. For each adjacent object x of each adjacent list, it must find its own adjacent list, which requires a nested loop. The complexity is O((n + m)2). Initializing the entire NVDlist also has a complexity of O((n + m)2) because it must find all edges from adjacent lists whose head nodes and/or adjacent nodes are of the vertex type. The algorithm BPB-tree construction analyzed below represents the main task involved in constructing a NVD, thus generating candidate paths/circuits and determining border points. A BPB-tree is implemented in lines 5–46 of the BPB-tree construction algorithm, which consists of five parts: (A) generating nodes of the BPB-tree (lines 7–25); (B) checking whether the patterns specified by rule G exist when a node will be added to a path (as a child node) for extension (lines 8–17); (C) maintaining a priority queue for selecting a path whose length is currently minimal in the queue for extension (lines 24, 27–28,40); (D) determining candidate paths/circuits and their border points/midpoints and handling nodes that represent a vertex of degree 1 (lines 31–32, 34–37, 39–43); and (E) pruning the subtree whose root is immediately below some divisible path's border point or candidate circuit's midpoint (lines 33, 37–38). Note that BPB-tree and BPS-tree are isomorphic, i.e., there is a 1–1 onto function between them, and thus the complexity of creating them is the same. However, the latter is easier to understand and express. When describing the path navigation from
689 690
U
N
7. Performance evaluation
Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
691 692 693 694 695 696 697 698 699 700 701 702 703
20
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx Table 5 Algorithm BorderInUp.
R
R
E
C
T
E
D
P
R O O
F
t5:1 t5:2
N C
O
t5:4
a leaf node back toward the root node via a logical parent link in the BPB-tree, the number of nodes visited is essentially the 704 height of the corresponding BPS-tree. Therefore, BPS-tree instead of BPB-tree is used for the below analysis. 705
U
Lemma 6. Given a road network with k edges and m data points, no path in the BPS-tree can have a length of greater than 706 2(k + m), and the average path length is (k + m)/m. 708 707 Proof. Recall that if data point dk is on edge (vi, vj) in the road network, then it will be treated as two edges, (vi, dk) and (dk, vj), in creating a BPS-tree, and the edges, (v1, v2) and (v2, v1), are considered distinct in a directed path. Suppose that there is a path p = (d1, v2, v3,…,v2k + 1, d2k + 2) with length 2(k + m) + 1 in a BPS-tree. We assume that only k edges exist in the road network, and thus, all of the paths in a BPS-tree can have a maximum of 2(k + m) distinct edges. This means that some edge (vi, vi + 1) in p exists such that (vi+j, vi + j + 1) = (vi, vi + 1), violating Rule G(a). Thus, no path in the BPS-tree can have a length greater than 2(k + m).
709 710
that a candidate path/circuit contains only two data points: one is at level 1 (the root node) and the other is the leaf node. Thus, the average path length is (k + m)/m. The lemma is proved. □ In a real road network, a road segment usually intersects with 1, 2, or 3 other road segments, and thus, the vertex is of degree 2, 3, or 4, respectively. Assuming a vertex's degree is 3 on average, the outgoing degree of each branch node in the BPS-tree will be 2. In other words, each branch node has two child nodes.
717 718
711 712
713 714 715 If a path contains 2 k + 2 m edges, then on average, one data point will appear for every (k + m)/m edge in this path. Recall 716
Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
719 720 721
21
722
Proof. When extending a path by adding an edge (a, b), there is a probability 1/r (= m/(k + m)) that b is a data point, and a probability 1 − 1/r (=k/(k + m)) that b is a vertex. At level 1, there is only root node. At level 2, two nodes, n21, n22 (the subscript of a node is arranged increasing from the left to right nodes), exist because the data point represented by the root node is assumed to be in the interior of an edge in the road network. At level 3, n31, n32 (n33, n34) exist if their parent, n21(n22), is a vertex. Thus, we have (1 − 1/r) × 4 nodes at level 3, on average, and there is the average number of nodes, (1–1/r)2 × 8, at level 4. At level 5, n51, n52, n53, n54 exist only if n41, n42 are both vertices. The reason for this rule is that if n41 (n42) is a data point, the new generated candidate path will have its border point lie on edge (n21, n31), and the subtree with root node n31 will be pruned. Thus, n42 (n41), even if it is a vertex, will be pruned and unable to generate n53, n54 (n51, n52). Note that only vertices can generate child nodes. The reasoning is the same for the other nodes at level 5. Thus, the average number of nodes at level 5 is (1 − 1/r)2 × (1 − 1/r)2 × 4 × 4. Following the same reasoning, the average number of nodes at level 6 is (1 − 1/r)4 × (1 − 1/r)2 × 4 × 8. At level 7, the nodes n71, n72, n73, n74, n75, n76, n77, n78 will exist only if n61, n62, n63, n64 are all vertices because to be a data point, either n61, n62, n63, or n64 will generate a candidate path. Moreover, the border point will lie in edge (n31, n41), and the subtree with root node n41 will be pruned, including n61, n62, n63, n64. Thus, there is no chance to generate n71, n72, n73, n74, n75, n76, n77, n78 from n61, n62, n63, n64. The reasoning is same for the other nodes at level 7. Accordingly, the average number of nodes at level 7 is (1 − 1/r)6 × (1 − 1/r)4 × 8 × 8. Note that any data point that occurs at level 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 will first generate a candidate path and then a subtree with its root node at level 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, respectively, will be pruned. Therefore, the average number of nodes at level 8, 9, 10, 11, 12, 13, 14,…, r + 1 are (1 − 1/r)14 × 128, (1 − 1/r)22 × 256, (1 − 1/r)30 × 512, (1 − 1/r)46 × 1024, (1 − 1/r)62 × 2048, (1 − 1/r)94 × 4096, (1 − 1/r)126 × 8192,……., (1 − 1/r)x × 2r, respectively. The number of nodes in a BPS-tree is the sum of the nodes at all of these levels. The lemma is proved. □
729
750 751
D
P
R O
O
F
Lemma 7. Assume there are k edges and m data points, each vertex has a degree of 3 in the road network, and all edges' lengths in the BPS-tree are equal. Then, the average number of generated nodes (also including those nodes that have been pruned) in the BPS-tree can be expressed as a function of r: 3 + (1 − 1/r) × 4 + (1 − 1/r)2 × 8 + (1 − 1/r)4 × 16 + (1 − 1/r)6 × 32 + (1 − 1/r)10 × 64 + (1 − 1/r) 14 × 128 + (1 − 1/r) 22 × 256 + (1 − 1/r) 30 × 512 + (1 − 1/r) 46 × 1024 + (1 − 1/r)62 × 2048 + (1 − 1/r)94 × 4096 + (1 − 1/r)126 × 8192 + ……. + (1 − 1/r)x × 2r, where r = (k + m)/m and x is determined by its ancestor and the border point of a candidate path.
T
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
723 724 725 726 727 728
730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745
E
746 747 748 We observe three cases: k/m = 9 (i.e., r = 10), 19 (i.e., r = 20), and 29 (i.e., r = 30). This shows that the number of nodes at levels 749
O
R
R
E
C
first are increasing and then are decreasing dramatically to nearly zero, even far before level r + 1 (i.e., the average path length is r). The reason is that pruning a larger subtree becomes possible at higher levels, which makes for huge nodes being pruned. The maximum number of nodes, 29, 109, and 254, occur at levels 8, 10, and 12 for k/m = 9, 19, and 29, respectively. After that, the nodes at levels decrease: 25, 96, and 172 at levels 9, 11, and 13 and 8, 0, and 0 at levels 11, 16, and 17, respectively. The average number of nodes in a BPStree for k/m = 9, 19, and 29 are 147, 568, and 1282, respectively, which are not greater than a multiple of two (k/m)2. Based on similar reasoning, the number of candidate paths/circuits in a BPS-tree can be derived by the following formula: 1/r × 2 + (1 − 1/r) × 1/r × 4 + …. + (1 − 1/r)10 × 1/r × 64 + (1 − 1/r)14 × 1/r × 128 + …. For k/m = 9, 19, 29, the amounts are 15, 28, and 43, respectively, a multiple time of k/m. Additionally, the total number of nodes pruned in all subtrees can be derived as follows: (1 − 1/r)2 × 1/r × 8/2 × 1 + (1 − 1/r)4 × 1/r × 16/2 × 1 + ….. + (1 − 1/r)14 × 1/r × 128/8 × 11 + (1 − 1/r)22 × 1/r × 256/ 8 × 11 + (1 − 1/r)30 × 1/r × 512/16 × 26 + (1 − 1/r)46 × 1/r × 1024/16 × 26 + …… For k/m = 9, 19, and 29, the number of pruned nodes are 18, 42, and 69, respectively, a multiple of k/m. It is easy to see that the maximal queue size is equal to the maximal number of leaves (the nodes at the highest level that have not been pruned at that moment during generation of the BPS-tree). For k/m = 9, 19, and 29, the amount is 29, 109, and 254, respectively, a multiple of k/m.
752 753 754 755 756 757 758 759 760 761 762
C
Theorem 4. Given a road network with n vertices, k edges, and m data points in the road network, the complexity of creating a 763 BPS-tree is O((k/m)2log(k/m)2). 765 764
U
N
Proof. Let s, f be the number of generated nodes in a BPS-tree and the size of the queue, respectively. In part A of the BPB-treeconstruction algorithm, all of a vertex's adjacent objects in the road network are visited in a while loop (lines 7–25) for path extension. The number of child nodes generated for a node x in the tree may be lower than the number of adjacent objects of x in the road network because of the pruning of illegal patterns. The next vertex to visit is removed from the priority queue (lines 27– 30, 45–46, 5–6), and the process is repeated until the queue is empty (line 27). The number of executions for the nested loop is proportional to the sum of the outgoing degrees of branch nodes, which is the number of generated nodes in the BPS-tree, i.e., the complexity is O((k/m)2). In part, all ancestors are checked for illegal patterns when a node is going to be added to a path as a child node for extension. The number of ancestors can vary from one to the height of the BPS-tree. Thus, the number of executions of the while loop (lines 11–16) is proportional to s multiplied by the height of the tree, i.e., O((k/m)2 × log(k/m)2). In part C, every generation of a node of the BPS-tree will trigger a corresponding node inserted into the queue to visit later. The priority queue is implemented as a tree in which the complexity of an inserting operation is O(log f) and that of a removing operation is O(1). Thus, the complexity for part C is s × O(log f) = O((k/m)2 × log (k/m)). In part D, a divisible path's border point is found by navigating the path. It is required to find the midpoints of candidate circuits that are not dominated. The complexity involved in part D relates to the number of candidate paths/circuits multiplied by the height of the BPS-tree, i.e., O((k/m) × log(k/m)2). In part E, pruning is triggered by each generated candidate path/circuit. The complexity is proportional to the number of pruned nodes multiplied by the height of the queue, i.e., O(k/m) × O(logf) = O((k/m)log(k/m)). Combining these five parts, it is seen that the average complexity of constructing a BPB-tree is O((k/m)2log(k/m)2). The theorem is proved. □ Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
R O O
F
22
Fig. 15. CPU time (without counting output time) of NVDL and INE for different sizes and data distributions: {d1, d2, d3, d4, d5, d6}, {d1, d4}, {d1, d2,}, respectively, in the road network example.
783
P
From Theorem 4, it is clear that the complexity of constructing a NVD is O((k/m)2 × log(k/m)2) × m = O(k2/m × log(k/m)2). 784
785
D
7.2. Experimental evaluation using the road network example
U
N C
O
R
R
E
C
T
E
The query evaluation method NVDL is measured and compared with INE on the road network example described by Fig. 1 in this section. We use three datasets: the dense dataset D1 = {d1, d2, d3, d4, d5, d6}, the sparse dataset with data points not clustered D2 = {d1, d4}, and the sparse dataset with data points clustered D3 = {d1, d2} for the road network example. All of the algorithms are implemented with Dev-C ++ 4.9.9.2 and executed on a computer with an Intel(R) Pentium(R) M processor (1.86 GHz and RAM of 1.5 GB) running MS Windows 7. For datasets D1, D2, and D3, three respective NVDs are pre-computed by performing the algorithms described in Tables 1 to 5 using respective files adjb1, adjb2, and adjb3 that describe how each vertex/data point is connected to others in the road network example. Each record of these input files contains a vertex/data point, one of its adjacent objects, and other information available for computing adjacent lists, as described in Fig. 10. These voronoi diagrams are stored in files nvd1, nvd2, and nvd3, respectively. When the road network example with dataset D1 is to be tested, the NVDL reads the file adjb1 and creates a class called adjlist that contains a vertex/data point and its adjacent nodes. Then, the NVD is constructed and stored in the file nvd1. After that, the file nvd1 is read, and the NVDlist class is created; it contains the nvdlist objects described in Fig. 14 for NN evaluation. As many as 50,000 query points are randomly selected by generating their respective positioned edge = rand() % 12, and their distance from the edge's initial vertex = rand()/32,767.0 ∗ edge_length. This assures that almost all of the edges have been chosen for a test and the query point's location on the edge is arbitrary. After that, the query is issued 50,000 times for these generated 50,000 query points, and the total CPU time (without including the time for output) is obtained. The experiment is executed 10 times, and the average CPU time per query is obtained. Fig. 15 shows this CPU time per query for NVDL. The same experiment is also performed for datasets D2 and D3. For INE, the road network is implemented by reading a file that describes how each vertex is connected to other vertices in the road network example. Then, the edgelist class and adjlist class are created in memory to record each edge's information and each vertex's adjacent vertices, respectively. Like NVDL, the same procedure for generating 50,000 query points is performed. The INE process navigates the path from the edge at which the query point is positioned and then chooses the best adjacent edges to travel, and so on, until a data point is found. The experiment is performed for three data sets, D1, D2 and D3, each of which is performed 10 times, and the average CPU time per query point is computed. It is seen from Fig. 15 that NVDL achieves from 18 to 63 times better CPU time for the dataset from dense to sparse data distributions. The reason for this finding is that NVDL uses the pre-computing approach without a filtering step, whereas INE must compute from edge to edge until a data point is found. The experiments also show that INE has a significant performance decline t6:1 t6:2
Table 6 Candidate paths/circuits and paths terminated with a vertex of degree 1 that are not dominated in the real network with six museums/exhibition halls.
t6:3 t6:4
Candidate path
t6:5
Candidate circuit
t6:6
Path terminated with a vertex of degree 1
Amount Average number of edges Amount Average number of edges Amount Average number of edges
m24
m26
m28
m31
m34
m35
4 6.5 4 7 2 6
1 5 0 0 2 2.5
2 16 8 9.25 3 4
3 6.3 4 7 0 0
4 10.75 32 16.75 4 8.25
2 3 0 0 0 0
Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx t7:1 t7:2
23
Table 7 Candidate paths/circuits and paths terminated with a vertex of degree 1 that are not dominated in the real network with two museums/exhibition halls.
t7:3
Candidate path
t7:5
Candidate circuit
t7:6
Path terminated with a vertex of degree 1
Amount Average number of edges Amount Average number of edges Amount Average number of edges
m35
2 7 4 7 3 4.7
2 7 48 18.5 9 10
F
t7:4
m31
O
for a sparse dataset. In this case, there is a chance that the nearest data point is far from the query point, particularly for a sparse 813 dataset with clustered data points. 814
R O
7.3. Experimental evaluation using a real road network
U
N
C
O
R
R
E
C
T
E
D
P
In this section, we intended to employ a real road network in our experiment to show that candidate paths/circuits together with paths terminated with vertex of degree 1 can be used to construct a complete NVD. Then, the performance of NVDL and INE are evaluated against the real road network. We use the road network of Taipei, created by the Ministry of Transportation and Communications of Taiwan. An area containing six museums/exhibition halls, including the National Museum of History, etc., is selected. To extract the area from the road network, the Excel file that stores the road network's edges and their adjacent vertices is imported to the MS SQL server, in which all road segments within the area are selected into a table with unnecessary columns being eliminated and some needed columns being derived. For example, the edge length is derived from the positions of two vertices with which the edge is incident. Then, six museums/exhibition halls are integrated into the road network by modifying six road segments because our model's data points are treated as vertices for facilitating computation. Finally, we have 256 tuples in a table, each of which stores information about a vertex/data point and one of its adjacent vertexes/data points. The table includes 122 distinct edges (each connects two vertices in the original network) and 102 vertices/data points. After that, the table in the database is exported as an ordinary data file, adjb1_tr.dat. Now, the NVD construction program described in Tables 1 through 5 is performed, and all of the records in the file adjb1_tr.dat are read, one by one, into the main memory. Next, the adjacent lists are generated in memory with the format described in Fig. 10. On behalf of query processing, a hash table with 180 elements is used to store 122 NVD lists, whose format is described in Fig. 14. In this manner, the right NVD list can be quickly located while processing a query. In this experiment, the road network is surrounded by six museums/exhibition halls; this represents a dense data distribution, as there are relatively more data points than in the distribution used in the next experiment. Experimental evidence shows that the NVD is completely constructed and stored in 122 NVD lists by applying our rules to the real road network. The border points are determined by candidate paths that are not dominated by others, and the edges have one, two, or three nearest neighbors depending on whether they have none, one, or two border points on them. In this real road network, there is an edge that has three nearest neighbors - all query points positioned between the edge's initial vertex and the point with distance 112.86 from the initial vertex have nearest neighbor m24, those positioned between the point with distance 112.86 and the point with distance 410.51 from the edge's initial vertex have nearest neighbor m35, and those outside these edge fragments have nearest neighbor m31. The paths terminated with a vertex of degree 1 on the road network must be placed at the bottom of the queue and are visited after all of the candidate paths/circuits have been generated. The reason for this rule is that, according to our system, such paths cannot dominate others and will totally belong to the root's NVB should
Fig. 16. Running time of NVDL and INE for different sizes and data distributions: {m24, m26, m28, m31, m34, m35}, {m31, m35}, respectively, in a real road network with 122 edges and 96 vertices.
Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843
24
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
O
R
R
E
C
T
E
D
P
R O O
F
they survive after the pruning process is completed. Table 6 shows how many candidate paths/circuits are not dominated and how many paths terminated with a vertex of degree 1 are not dominated, along with the number of edges contained in those paths/circuits (on average) for all of these museums/exhibition halls. As an example, in the BPB-tree corresponding to museum point m24, there are four candidate paths that are not dominated by others (they contain an average of 6.5 edges), four candidate circuits that are not dominated by others (they contain an average of 7 edges), and two paths terminated with a vertex of degree 1 that are not dominated (they contain an average of 6 edges), whereas for data point m35 there are only candidate paths. It is clear that many paths/circuits have been dominated and pruned from the BPB-trees during NVD construction. To emulate a road network with sparse data distribution in a simple way, we just eliminate four museums/exhibition halls from the real road network, which results in a road network with two data points, m31 and m35, located not too far from each other. In the second experiment, a query point may have a chance to travel a longer path to reach its nearest neighbor, presuming that it is far away from these two data points. The road network produced by integrating two museums/exhibition halls into the original road network has 248 tuples containing 122 distinct edges (connecting two vertices in the original road network) and 98 distinct vertices/data points. Table 7 shows the number of candidate paths/circuits that are not dominated and the number of paths terminated with a vertex of degree 1 that are not dominated for two museums/exhibition halls. Both m31 and m35 have two candidate paths not dominated; they have an average of 7 edges. However, m35 has more and longer candidate circuits than m31. From the results of the experiment, 17 edges and the other 103 edges on the road network have the nearest neighbor m31 and m35, respectively, as expected. The remaining two edges have both nearest neighbors m31 and m35, corresponding to their different edge fragments. Next, the performance of NVDL and INE are evaluated against the real road network. First, the road network with 6 museums/ exhibition halls, which is of a dense type, is used. Each moving query point is characterized by its positioned edge and the distance from the edge's initial vertex. These values are obtained using a pseudo random number generator, as described in Section 7.2. However, the edge number is obtained in a manner that is more complex. We first compute k = rand () %122, and then the k-th element of an array that stores all of these 122 edges is retrieved to obtain the edge number. We cannot assume that the edge number ranges between 0 and 121 because the edge number has been defined in the real road network and belongs to a large interval of integers. Consequently, the edge number is a key to be hashed into a hash table of 180 elements (for possible collisions) to find the corresponding NVD list. NVDL needs the edge's NVD list that stores the edge's nearest neighbors for various fragments. The NVDL reads two files for each experiment: a file storing 122 NVD lists and another file storing 122 records containing edge numbers and a pointer to the next record because of the hash collision. Both are output files from the NVD construction program. INE reads one file that stores each vertex's adjacent objects. NVDL and INE generate 5000 query points and find their nearest neighbors, respectively. Both are performed 10 times, and the average execution time per query is computed. Clearly, when more query points are included in the experiments, the average execution time will be closer to the real case. Imagine a situation in which many tourists positioned at various places want to visit the nearest museum/exhibition hall. Thousands of NN queries may be issued at the same time. The query processor must respond very quickly. It is seen from Fig. 16 that INE performs slightly better than NVDL for a road network with dense data distribution but significantly worse than NVDL for a road network with sparse data distribution. The reason is that INE adopts the incremental approach and the amount of computation time is proportional to the number of edges in the path between the query point and data point, whereas NVDL adopts the pre-computing approach without using an index and can obtain the NN in one step. Thus, the optimal condition for NVDL is a road network with sparse data distribution. In addition, all real-time applications are appropriate for using NVDL.
N C
8. Conclusions
844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884
References
897
U
This paper proposes rules for constructing a network voronoi diagram and is verified as sound and complete. These rules are applied to a BPB-tree so that generating and pruning a BPB-tree can be performed concurrently. Experimental evidence shows that a NVD has been completely constructed for the real road network with 122 edges, 96 vertices, and six data points; it is available for use by a NN query processor that needs further processing. More specifically, VN3 can use this method to construct the NVD and then search k-NN, where k N 1. The proposed NN query evaluation method, called NVDL, has three advantages. First, it is a pre-computing approach that does not include a traditional filtering step. Thus, redundant or repeated computations for different types of distance (one for filter distance in an index, the other for exact distance in a refresh step) do not occur. In addition, a border point's two nearest neighbors can be obtained in a single look-up step provided that the border point is within the edge's interior. Second, it does not encounter the “slow convergence condition,” which often occurs in the incremental approach. Experimental evidence shows that NVDL is better than INE under such a condition. Third, position uncertainty is tolerated, at least to some extent. In the event that position uncertainty is at most an edge's length, then NVDL can choose to reply with all of the corresponding NVD list's linked nodes as a possible solution. From the results of the research, the proposed method is most appropriate for real-time applications with sparse data distribution.
885 886 887 888 889 890 891 892 893 894 895 896
[1] N. Beckmann, H.-P. Kriegel, R. Schneider, B. Seeger, The R*-tree: an efficient and robust access method for points and rectangles, Proc. ACM SIGMOD (1990) 898 322–331. 899 [2] R. Benetis, C.S. Jensen, G. Karčiauskas, S. Šaltenis, Nearest and reverse nearest neighbor queries for moving Objects, VLDB J. 15 (3) (2006) 229–250. 900
Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003
M.-T. Wang / Data & Knowledge Engineering xxx (2016) xxx–xxx
[29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40]
F
O
R O
P
U
N
C
O
[41]
D
[26] [27] [28]
E
[25]
T
[24]
C
[22] [23]
E
[14] [15] [16] [17] [18] [19] [20] [21]
R
[12] [13]
901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 Mei-Tzu Wang received her B.S. in mathematics from National Cheng-Kung University in 1977, her M.S. in information engineering from 952 959 Tamkang University in 1980, and her Ph.D. in management science from Tamkang University in 1993. She is an associate professor in the 953 960 department of information management at Chinese Culture University. Her research interests include databases, network security, and in- 954 961 955 formation theory. 962 956 957 958
G. Bordogna, G. Pasi, G. Psaila, Evaluating uncertain location-based spatial queries, Proc. ACM Symp. Appl. Comput. (2008) 1095–1100. G. Bordogna, M. Pagani, G. Pasi, G. Psaila, Managing uncertainty in location-based queries, Fuzzy Sets Syst. 160 (15) (2009) 2241–2252. S. Börzsönyi, D. Kossmann, K. Stocker, The skyline operator, Proc. Int'l Conf. Data Eng. (ICDE) 2001, pp. 421–430. Z. Chen, H.T. Shen, X. Zhou, J.X. Yu, Monitoring path nearest neighbor in road networks, Proc. ACM SIGMOD (2009) 591–602. R. Cheng, D.V. Kalashnikov, S. Prabhakar, Querying imprecise data in moving object environments, IEEE Trans. Knowl. Data Eng. 16 (9) (2004) 1112–1127. H.-J. Cho, C.-W. Chung, An efficient and scalable approach to CNN queries in a road network, Proc. Int'l Conf. Very Large Data Bases (VLDB) 2005, pp. 865–876. C. Faloutsos, M. Ranganathan, Y. Manolopoulos, Fast subsequence matching in time-series databases, Proc. ACM SIGMOD (1994) 419–429. E. Frentzos, Indexing objects moving on fixed networks, Lect. Notes Comput. Sci. 2750 (2003) 289–305. E. Frentzos, K. Gratsias, N. Pelekis, Y. Theodoridis, Algorithms for nearest neighbor search on moving object trajectories, GeoInformatica 11 (2) (June, 2007) 159–193. J. Goldstein, R. Ramakrishnan, U. Shaft, J.-B. Yu, Processing queries by linear constraints, Proc. ACM Symp. Principles of Database Systems 1997, pp. 257–267. R.H. Güting, M.H. Böhlen, M. Erwig, C.S. Jensen, N.A. Lorentzos, M. Schneider, M. Vazirgiannis, A foundation for representing and querying moving objects, ACM Trans. Database Syst. 25 (1) (2000) 1–42. R.H. Güting, M. Schneider, Moving Objects Databases, Morgan Kaufmann Publishers, 2005. A. Guttman, R-trees: a dynamic index structure for spatial searching, Proc. ACM SIGMOD (1984) 47–57. G.R. Hjaltason, H. Samet, Distance browsing in spatial databases, ACM Trans. Database Syst. 24 (2) (1999) 265–318. E. Horowitz, S. Sahni, Fundamentals of Data Structures, Central Book Company, 1983. H. Hu, D.L. Lee, J. Xu, Fast nearest neighbor search on road networks, Lect. Notes Comput. Sci. 3896 (2006) 186–203. X. Huang, C.S. Jensen, S. Šaltenis, The islands approach to nearest neighbor querying in spatial networks, Lect. Notes Comput. Sci. 3633 (2005) 73–90. X. Huang, C.S. Jensen, S. Šaltenis, Multiple K nearest neighbor query processing in spatial network databases, Lect. Notes Comput. Sci 4152 (2006) 266–281. C.S. Jensen, J. Kolář, T.B. Pedersen, I. Timko, Nearest neighbor queries in road networks, Proc. ACM Int'l Symposium on Advances in Geographic Information Systems (GIS ‘03) 2003, pp. 1–8. K.-S. Kim, K. Zettsu, Y. Kidawara, Y. Kiyoki, Path query processing for moving objects on road networks, SIGIR 2008 Worhshop on MobIR 2008, pp. 32–39. M. Kolahdouzan, C. Shahabi, Voronoi-based K nearest neighbor search for spatial network databases, Proc. Int'l Conf. Very Large Data Bases (VLDB) 2004, pp. 840–851. F. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, Z. Protopapas, Fast nearest neighbor search in medical image databases, Proc. Int'l Conf. Very Large Data Bases (VLDB) 1996, pp. 215–226. W.-S. Ku, R. Zimmermann, H. Wang, C.-N. Wan, Adaptive nearest neighbor queries in travel time networks, Proc. ACM Int'l Workshop on Geographic Information Systems (GIS ‘05) 2005, pp. 210–219. C.L. Liu, Elements of Discrete Mathematics, McGraw-Hill, 1985. M. Morse, J.M. Patel, W.I. Grosky, Efficient continuous skyline computation, Inf. Sci. 177 (17) (Sep. 2007) 3411–3437. K. Mouratidis, M.L. Yiu, D. Papadias, N. Mamoulis, Continuous nearest neighbor monitoring in road networks, Proc. Int'l Conf. Very Large Data Bases (VLDB) 2006, pp. 43–54. K. Mouratidis, D. Papadias, Continuous nearest neighbor queries over sliding windows, IEEE Trans. Knowl. Data Eng. 19 (6) (Jun. 2007) 789–803. B.G. Nickerson, T.T.T. Le, Efficient search of path-constrained moving objects, Technical Report TR08–191, Faculty of Computer Science, University of New Brunswick, Canada Sep. 2008, pp. 1–27. A. Okabe, B. Boots, K. Sugihara, S.N. Chiu, Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, second ed. John Wiley & Sons, 2000. D. Papadias, J. Zhang, N. Mamoulis, Y. Tao, Query processing in spatial network databases, Proc. Int'l Conf. Very Large Data Bases (VLDB) 2003, pp. 802–813. D. Pfoser, C.S. Jensen, Capturing the uncertainty of moving-object representations, Lect. Notes Comput. Sci. 1651 (1999) 111–131. T. Seidl, H.-P. Kriegel, Optimal multi-step k-nearest neighbor search, Proc. ACM SIGMOD (1998) 154–165. C. Shahabi, M.R. Kolahdouzan, M. Sharifzadeh, A oad network embedding technique for K-nearest neighbor search in moving object databases, GeoInformatica 7 (3) (2003) 255–273. G. Trajcevski, O. Wolfson, K. Hinrichs, S. Chamberlain, Managing uncertainty in moving objects databases, ACM Trans. Database Syst. 29 (3) (2004) 463–507. A. Voisard, B. David, A database perspective on geospatial data modeling, IEEE Trans. Knowl. Data Eng. 14 (2) (2002) 226–243. O. Wolfson, A.P. Sistla, S. Chamberlain, Y. Yesha, Updating and querying databases that track mobile units, Distrib. Parallel Databases 7 (1999) 257–287. M.L. Yiu, N. Mamoulis, D. Papadias, Aggregate nearest neighbor queries in road networks, IEEE Trans. Knowl. Data Eng. 17 (6) (2005) 820–833. M.L. Yiu, N. Mamoulis, X. Dai, Y. Tao, M. Vaitis, Efficient evaluation of probabilistic advanced spatial queries on existentially uncertain data, IEEE Trans. Knowl. Data Eng. 21 (1) (2009) 108–122. J. Zhang, M. Zhu, D. Papadias, Y. Tao, D.L. Lee, Location-based spatial queries, Proc. ACM SIGMOD (2003) 443–454.
R
[3] [4] [5] [6] [7] [8] [9] [10] [11]
25
Please cite this article as: M.-T. Wang, Nearest neighbor query processing using the network voronoi diagram, Data Knowl. Eng. (2016), http://dx.doi.org/10.1016/j.datak.2016.02.003