Direction relations and two-dimensional range queries: optimisation techniques

Direction relations and two-dimensional range queries: optimisation techniques

Data & Knowledge EISFYIER Engineering 27 (1998) 313-336 Direction relations and two-dimensional range queries: optimisation techniques Yannis Theo...

2MB Sizes 0 Downloads 22 Views

Data & Knowledge

EISFYIER

Engineering

27 (1998) 313-336

Direction relations and two-dimensional range queries: optimisation techniques Yannis Theodoridis”‘“,

Dim&is

Papadiasb, Emmanuel

Stefanakis”,

Timos Sellis”

“Computer Science Division, Department of Electrical and Computer Engineering, National Technical University of Athens, Zographou, 15773 Athens, Greece bDepartment of Computer Science, Hong Kong University of Science and Technology, Clearwater Bay, Kowloon, Hong Kong, China Received

12 December

1997; accepted

12 December

1997

Abstract This paper defines direction relations (e.g., north, northeast) between two-dimensional objects and shows how they can be efficiently retrieved using B-, KDB- and R- tree-based data structures. Essentially, our work studies optimisation techniques for 20 range queries that arise during the processing of direction relations. We test the efficiency of alternative indexing methods through extensive experimentation and present analytical models that estimate their performance. The analytical estimates are shown to be very close to the actual results and can be used by spatial query optimizers in order to predict the retrieval cost. In addition, we implement modifications of the existing structures that yield better performance for certain queries. We conclude the paper by discussing the most suitable method depending on the type of the range and the properties of the data. 0 1998 Published by Elsevier Science B.V. All rights reserved. Keywords: Spatial databases;

Direction relations; Indexing methods;

Performance

analysis

1. Introduction

The most common types of spatial relations in geographic applications include (cardinal) directions (e.g., south, northwest) [13], topological relations that describe concepts of neighbourhood and incidence (e.g., overlap, disjoint) [8], and distances (e.g., near, far) [ 171. The above types of relations have been used in a wide range of topics, such as Spatial Query Languages [12,25], Reasoning [4,17] and Consistency Checking Mechanisms [9,14]. This paper describes implementations of direction relations in Spatial Database Management Systems (SDBMS) and Geographical Information Systems (GIS). Despite the fact that directions constitute an important class of user queries, they have not been * Corresponding

author. E-mail: [email protected]

0169-023X/98/$ - see front matter 0 1998 Published PII: SO169-023X(97)00060-8

by Elsevier Science B.V. All rights reserved.

314

Y. Theodoridis et al. I Data & Knowledge

Engineering

27 (1998) 313-336

studied extensively in spatial access methods. Query processing and optimisation techniques have mainly focused on window queries [ 151, topological relations [4,26] and nearest neighbour queries [32]. The main reason for this, is the lack of universally accepted definitions; unlike topology where it seems to exist a set of widely accepted relations [8,6], there is no such a set of directions. Various types of direction relations have been used to match different needs that range from cognitive modelling [18] to image similarity retrieval [21] and from navigation [ 191 to user interfaces [3 I]. Here we define direction relations between objects in two-dimensional space. Our work extends previous attempts to formalise direction relations, which have concentrated mainly on point objects [24] or Minimum Bounding Rectangles [29,27]. Then we show how these relations can be transformed into range queries and retrieved in spatial DBMSs using different indexing methods. Essentially we compare the efficiency of indexing methods in two-dimensional range queries in the context of geographic applications. We also propose modifications of the traditional (spatial) data structures that yield improved performance for certain classes of queries. The results of this paper are directly applicable to Spatial Databases and GISs where the formalization of spatial relations is crucial for user interfaces and query optimisation strategies. Although we deal with geographic examples, potential applications for directions (and 2D range queries in general) include other domains such as CAD and VLSI design. In these domains, the queries can be very similar, but the linguistic terms used to express them may vary (e.g., above instead of north). The paper is organised as follows: in Section 2 we define several direction relations between objects in 2D-space and describe their transformation to two-dimensional ranges. In Section 3 we discuss the retrieval of direction relations (and the corresponding ranges) using alternative access methods (B-trees, KDB-trees and R-trees) and provide analytical formulas for their estimated performance. Section 4 tests the alternative implementations and compares analytical predictions with the actual results; the relative performance of the indexing methods is also discussed. Section 5 is concerned with modifications of indexing methods that increase performance for some queries, and Section 6 concludes with comments on future work.

2. Direction relations as range queries In this paper we follow a projection-based’ approach, that is, direction relations are defined using projection lines perpendicular to the coordinate axes. We assume an absolute frame of reference and a pair of orthocanonic X- and y- axes. Hereafter, p denotes the primary object (the object to be located) and q the reference object (the object in relation to which the primary object is located). Let pi be a point of object p and qj be a point of object q; P~,_~is the x-coordinate of point p,, pi,, is the y-coordinate of point pi, and so on. The relation ulZ_north between p and q, denotes that all points of p are north of all points of q: alZ_north(p, q) = VpiVqj (pi,, > qj,y). All-north can be characterised as a low resolution relation because its area of acceptance is large. On the other hand, we can define a higher resolution version of allnorth, as: alZ_strictly_north(p, 9) E ‘,Pi’qj (P;,, > 4j.y) A ‘P;‘qj](P;., > Sj.,) A (P;,, > qj,y)IA

’An alternative approach is based on the cone-shaped concept of direction, i.e., direction relations are defined using angular regions between objects [ 161.

Y. Theodoridis et al. I Data & Knowledge Engineering 27 (1998) 313-336

315

Vpi3qj [(pi,, < q,,,) A (pi,Y > qj,Y)]. According to this relation, all points of object p must be in the region bounded by the horizontal line that passes from q’s northmost point and by the vertical lines that also bound q. For example, although Belgium lies all-north of Italy, it does not satisfy the relation all_strictly_north (Fig. la). Although the previous discussion refers to actual two-dimensional objects, usually spatial access methods use approximations to efficiently retrieve candidates that could satisfy a query [3]. In this paper we examine methods based on Minimum Bounding Rectangles (MBRs). MBRs are the most common approximations in spatial applications because they need only two points for their representation; in particular, each object q is represented as an ordered pair (qi , ql) of representative points that correspond to the lower left (9;) and the upper right point (9:) of the MBR q’ that covers q. Fig. lb illustrates how the map of Fig. la can be approximated by MBRs. Multidimensional access methods for non point objects (e.g. R-trees) explicitly store MBRs, while methods for point objects (e.g. KDB-trees) or even one-dimensional ones (e.g. B-trees) can be used to compare object locations using representative points. In order to answer the query ‘find all objects p that satisfy the relation R with respect to an object q’ we have to retrieve all MBRs p’ that satisfy a set of range constraints R’ with respect to the MBR q’ of object q. Table 1 illustrates the direction relations on which we concentrate on this paper, and a mapping from direction relations R between actual objects to constraints R’ between MBR representative points. All relations concern the direction north, and they are applicable even if the objects are non-contiguous (i.e., they have disconnected components). Depending on the application needs, a large number of additional directions can be defined and implemented accordingly. These relations can be chosen so that several properties are satisfied: they

Fig. 1. MBR approximations

for the map of Europe.

Y. Theodoridis et al. I Data & Knowledge Engineering 27 (1998) 313-336

316

can be pairwise disjoint, provide a complete coverage, form a relation algebra, etc. The set that we study here does not satisfy any of these properties, because our goal is to show how direction relations of different resolution can be retrieved in spatial data structures and to compare the cost of retrieval. According to the last column of Table 1, each direction relation is transformed to a set of 1 up to 4 constraints that specify a range of the 2D space where the representative points pi and p: of candidate MBRs p’ should belong. In particular, the four potential ranges are: (1) Range of pi OII y- axis (e.g., p;,, > q;,,) (2) Range of pi on x- axis (e.g., q;,, q:,.“) (4) Range of p: on x- axis (e.g., p:,, > q,,,) The relations are chosen in a way that the number, the type (restricted on one or on both ends) of constraints and the axes involved differ. The performance of each indexing method is expected to be Table 1 Direction relations between objects and mapping to range constraints Relation R between objects

Range constraints R’ for MBRs p’ to be retrieved 1 constraint

alZ_north( p, q) = 'Pi'qj(Pi,y

Example

between MBRs

on y-axis:

PI!., >s:.,

'4J.v)

example: an(NL, AU)

some-north( P. q) =

2 constraints

3Pivq, VP,%, gPi%,

(4\., 4L,“)

(P,., ’ q,.,) A (Pa., ’ q,.y)h (P,.,
on y-axis:

example: sn(BU, GR)

4) =

1 constraint

(Pt., ’ qJ.Y)A [(Pj,, ‘4j.X) h(PC.Y ’ qI.Y)l A [(P,.x
(PL, ‘4:J 2 constraints

all_strictly_north(P, ‘Pi’qj vpi3qj

VPi3qj

example: asn(GE, IT)

on y-axis: on x-axis:

(41.X
317

Y. Theodoridis et al. / Data L Knowledge Engineering 27 (1998) 313-336 Table 1 Continued Relation R between

Example

objects

Range constraints R’ for MBRs p’ to be retrieved

some_strictly_north(p, q) =

2 constraints

gpivq, gpigq, VP,%, VP&,

(41.,
b., > q,.J A (pi., < q,.,) A [(P,,,‘q,.x)A(P,,, [(P,,,
‘s,.Y)lh ‘q,..J

on y-axis:

(4.X
example: ssn( BE, FR)

all_north-east(p, q) =

1 constraint

vp,vq,

(P;., ’ 4A.J 1 constraint on x-axis:

[(P,,x’q,,JA(PL,

‘q,J

CPlx ’ 4:.x)

example: ane(DE, NL)

some_north_east( p. q) = 3Pivqj 3P;%, VpJq,

L(P8.x‘4j.x)“(PJ.Y (Pt., <$.,)A [(P,,,‘q,.x)A(P,,,

example sne(AU, CH)

on y-axis:

2 constraints ‘ql..Y)l A ‘q,.J

on y-axis:

(41., 41..) 2 constraints on x-axis: (Pi,, > 41.x) (PL., ’ 4:.x)

similar for relations with sim&r range constraints. Therefore the behaviour of an indexing method to any spatial relation can be predicted from other relations with similar range characteristics [35]. Because MBRs differ from the actual objects they enclose, they are not always adequate to express the relation between the actual objects. For this reason, spatial queries involve the following two-step strategy [ 12,221: ?? First a filter step based on MBRs is used to rapidly eliminate MBRs of objects that could not possibly satisfy the query and select a set of potential candidates. ?? Then during a rejinement step each candidate is examined (by using computational geometry techniques) and false hits are detected and eliminated. Unlike topological relations, where the refinement step is the rule [26], only two direction relations of Table 1, some_strictly_north and some_north_east, need a refinement step. For the rest of the

Y. Theodoridis et al. I Data & Knowledge

318

Engineering

27 (1998) 313-336

relations all retrieved MBRs correspond to objects that satisfy the query. Details about the retrieval of direction relations using MBRs can be found in [28]. In the next sections we will show how the above results can be applied to indexing techniques available in commercial DBMS (namely, B-trees [5], KDB-trees [30] and R-trees [15,2]).

3. Retrieval of direction

relations

The retrieval of direction relations in existing DBMSs could be accomplished either by maintaining traditional indexes (e.g. B-trees) or, alternatively, by incorporating Abstract Data Types (ADTs) with specialised indexes defined by external code (e.g. KDB-trees or R-trees). Application developers should decide which method is the most appropriate for their application needs. In the rest of the section we describe how B-, KDB- and R-trees can be used to retrieve direction relations and provide analytical formulas that estimate the performance of the three methods. Existing formulas focus on traditional retrieval (match or range queries on B-trees and KDB-trees, and overlap queries on R-trees). Our work extends previous research by estimating the expected cost (i.e., number of disk accesses) for the retrieval of direction relations and the corresponding range queries. In the following discussion we assume that both data and query rectangles follow a random (i.e., uniform-like) distribution over the unit square address space. Table 2 lists the symbols to be used hereafter in our discussion. 3.1. Retrieval of direction relations using B-trees The first solution for the retrieval of direction relations includes the maintenance of a group of four alphanumeric indexes, such as B-trees’, each corresponding to one of the four numbers: p:,,, pi,,, p:,,, Table 2 List of Symbols Symbol

Description

P

Spatial object A point of object p x and y coordinates of point p, MBR of p Ordered pair of lower (pi) and upper (p:) point of p’ Size of p’s projection on axis x and y respectively Spatial relation Set of range constraints for R Retrieval cost (in terms of disk accesses) of relation R Range for B-tree range query Query windows for KDB-tree (lower or upper) range query Query window for R-tree range query R-tree node rectangle

P, Pw

P,.,

P’ (Pi.

P:)

IPL IPI,

R R’ C(R) r

Q’, Q” Q P

* In particular, the implementation used in existing systems is the B+ -tree index [20] and, in this paper, we will think of B’-trees when we use the term B-trees.

Y. Theodoridis et al. I Data & Knowledge Engineering 27 (1998) 313-336

319

P :,y. The processing of a query of the form ‘find all objects p that satisfy a given direction relation R with respect to ob_ject 4’ using the above set of four B-trees involves the following steps: Step 1. Depending in the relation to be retrieved, select the B-tree(s) to be searched (according to the last column of Table 1). Step 2. Search each B-tree involved to find the corresponding answer sets. Step 3. If more than one indexes are involved, find the intersection set (a ‘realistic’ assumption is that this procedure is executed in main memory). Step 4. If necessary, follow a refinement step for the selected objects. As an example, consider the query: ‘Find the countries p all_north_east of Switzerland (CH)‘. In this case, two out of four B-trees, those for p;,, and plx parameters, need to be searched, because only P I, and p;,, participate in the range constraints for all_north_east. The two trees are illustrated in Fig. 2a3 where each label indicates the appropriate coordinate for the corresponding object p. The sets {CZ’, LU’, BE’, PL’, UK’, NL’, IR’, DE’, SW’, NO’, FI’, IC’} and {SW’, CZ’, PL’, YU’, HU’, FI’, RO’, AL’, GR’, BU} are the two answer sets. The intersection set {CZ’, PL’, SW’, FI’} contains the object IDS that satisfy the query (illustrated as the dark shaded area in Fig. 2b). A refinement step is not needed for the retrieval of all_north_east; that is, all retrieved MBRs correspond to objects that satisfy the query. If the data keys are stored in a B-tree of height h with L leaf nodes, then the average cost C(r) for the retrieval of a constraint with length (range) r is [6]: C(r) = h + r *L

(1)

B-tree for p’,y

B-tree for p’

1.x

64 Fig. 2. Retrieval of relation all_norrh_easr

(b) using B-trees.

3 For illustration reasons, in the examples of the tree structures we assume a branching factor of 4, i.e., each node contains at most four entries.

320

Y. Theodoridis et al. I Data & Knowledge Engineering 27 (1998) 313-336

For the computation of the cost C(r) we need estimated values for the parameters of Eq. 1, namely h, L, r. Let m be the maximum number of entries in a B-tree node, c the average capacity of a node, and N the total number of keys stored in the leaf nodes. The following equations [l] describe the average h and L values: h= 1+[1og_~]

(2)

L=Nl(c*m)

(3)

What remains in order to have a complete expression for Eq. 1 is the value of parameter r, which depends on the particular constraint. Table 3 provides the values of Y for the possible range constraints of Table 1, if we assume that 191, - lqlv is the size of a data object MBR, and 1q1, * /qlY the size of a query object MBR (the constants refer only to x- coordinate since constraints for y- coordinate can be expressed in a similar way). Using information from Table 3 and Eqs. 2 and 3 we can estimate the expected cost for each constraint (Eq. 1) and, summing up, the expected cost C(R,K) for the retrieval of a spatial relation R with k constraints (1 s k d 4), i.e.,

C(R, k) = i C(r,) = i (h + ri. L) i=l i=l

(4)

The cost depends significantly on the particular direction relation because the number of B-trees to be searched is equal to the number of range constraints involved in the definition of that relation. All-north involves only one constraint (p;,, > q: ,), while some_strictly_north contains four: (q:,, < p:., < q:,JJ A (I$, > q:,,) A (41.x < p:,, < 4:.x) A (4,1.x
Table 3 Average values for range r of a constraint Average range r

Constraint ~:,~<4:,,

orp:,,>91,c

orp:x>9:,x

orp:,x<9q:,x

r = (1 - IsI,)/

4i,, 4 l,, or P :,, < 9’::c

r = 141,

pi., %,

r = (1 + Isl,Y2

Or p:.?L

r = (1 - (2. IPI, + MN2

Y. Theodoridis et al. I Data & Knowledge Engineering 27 (1998) 313-336

321

sets. Table 4 presents the query windows for each direction relation, assuming the unit work space [O,l] as global space. Query windows Q’ and QU refer to the KDB-trees for the lower and upper MBR comer points, respectively, expressed as a set of four values that correspond to the coordinates (Q:,,,

Qt.,. QL,,, Q:,,) and (Q;txT Q;,, Q:,xy QiJ

respectively.

Consider again the query: ‘l%nd the countries p all_north_east of Switzerland (CH)‘. In this case, only one out of the two KDB-trees needs to be searched, because only one query window (Q’) is specified (Fig. 3b). The involved KDB-tree is illustrated in Fig. 3a where each label indicates the Table 4 Query windows for the retrieval of direction relations using GLIB-trees Relation

Range query Q’

aZl_north(p, q)

(0, q;,,> 1, 1)

Range query Q”

some_north(p, q)

(0, q:,“> 1, 1)

all_strictly_north( p, q)

(q:,x. q:,YT q:,,* 1)

some_strictly_north(p, q)

all_north_eas?( p, q)

some_north_east( P, 4)

Illustration

Y. Theodoridis

322

et al. I Data & Knowledge

Engineering

27 (1998)

(4 Fig. 3. Retrieval

313-336

(b) of relation ali_norfh_easr using KDB-trees.

lower left point pi for each object MBR p’. The nodes to be searched are X and Y (1st level entries), D, F and G (2nd level entries), i.e., the ones that intersect Q’ (these node entries are coloured grey in Fig. 3a). Among the entries of nodes D, F and G, the ones that satisfy the range query are CZ’, PL’, SW’ and FI’. To our knowledge, there do not exist analytical models that describe the retrieval cost of KDB-trees in the literature. Robinson [30] estimates that the expected cost of a window query with size Q should be C(Q) = Q * P, where P is the total number of nodes (pages) of the tree structure. However, this estimation is oversimplified. In order to facilitate analysis, we consider a KDB-tree as a simple R-tree with two properties: (a) point data, and (b) non-overlapping nodes. According to [23,1 I], the estimated retrieval cost (i.e., number of disk accesses) of an overlap query with size 191,. ki, using R-trees (and consequently KDB-trees) is:

(5) where h is the height of the tree structure, m is the average capacity of a node, N is the node size at level j (the root is assumed at The expression for computing the height h = 1+

the expected N,

is the maximum total number of level j = h and h of the R-tree

N

i

log,.., ~ c*m 1 ’

number of entries in a KDB-tree node, c data entries, and n, * . nJ,y is the average the leaf-nodes at level j = 1). (or KDB-tree) is [ 111:

(6)

number of nodes N, at level j is:

=L (c . rn)]

and the average node sizes nj,, and nj.? are given by:

(7)

Y. Theodoridis et al. I Data & Knowledge Engineering 27 (1998) 313-336 nj,x = nj,y =(l/Nj)‘”

323 (8)

Using information from Table 4 and Eqs. 6, 7 and 8 we can estimate the cost for each query window Q’ and Q” (using Eq. 5) and, summing up, the expected cost C(R) for the retrieval of a direction relation R: C(R) =

C(le'i,. IQ?,) + C(iQ"I,, IQ“,)

3.3. Retrieval of direction

(9)

relations using R-trees

The R-tree multidimensional data structure [15] stores the MBRs of the actual data objects in the leaf nodes and each intermediate node stores rectangles (node rectangles) which enclose all rectangles that correspond to lower level nodes. The fact that R-trees permit overlap among node entries sometimes leads to unsuccessful hits on the tree structure. Several variants of the original method [33,2] have been proposed to address the problem of performance degradation caused by the overlapping regions or excessive dead-space (space in the intermediate nodes, not covering lower level nodes) respectively. In order to retrieve objects that satisfy a direction relation with respect to .a reference object q we have to specify the MBRs p’ that could approximate such objects (Table 1) and then to search the R-tree nodes that contain these MBRs. Table 5 presents the constraints for the node rectangles P for each direction relation of Table 1. Notice that the same relation between node rectangles and the reference MBR holds for all the levels of the tree structure. For instance, the intermediate nodes that could enclose other (intermediate or leaf) nodes P that satisfy the general constraint (P,,, > q:,,) should also satisfy the same constraint. This conclusion is applicable to all direction relations. Fig. 4 shows how the MBRs of Fig. lb are grouped and stored in an R-tree. At the lower level MBRs of countries (denoted by two letters) are grouped into seven leaf nodes A, B, C, D, E, F and G. At the next level, the seven leaf nodes are grouped into two larger intermediate nodes X and Y, which in turn compose the root node of the tree. If we consider the query ‘Find the countries p all_north_east of Switzerland (CH)‘: the nodes to be searched are X and Y (1st level), B, E, F and G (2nd level) i.e., the ones that satisfy the range constraint (Pu,, > q:,y)~ (P,,, > q:,,). Among the entries of leaf nodes B, E, F and G, the countries all-north-east of Switzerland are SW’, FI’, CZ’ and PL’. We have implemented and tested the most popular R-tree variations on the retrieval of direction relations using MBRs of variable sizes [28]. R*-trees [2] have consistently better performance than Table 5 Constraints

for the R-tree node rectangles

Relation

all_north(p, q) some_north( p, q) all_strictly_north( p. q) some_strictly_north( p, q) all-north-east( p, q) some_north_east( p, q)

Constraints

for the node rectangles

P to be searched

(P”., ’ q:.,) (P”., ’ 4L.Y) A (Pl., < 4L.J CPU., 4L.,)h(P,.x q;.r) (pu., ‘q:.,)“(Pu.x ‘4:.x) (p,,,‘q:.,)A(~,.y
Y. Theodoridis et al. I Data & Knowledge Engineering 27 (1998) 313-336

324

(a)

(b)

Fig. 4. Retrieval of relation all_north_eastusing R-trees.

both R-trees [15] and Rf-trees [33] (which were shown to be inefficient for direction relations) and are used in the rest of the paper (hereafter, when we use the term R-tree, we imply the R*-tree). Most of the work in the literature [23,11,36] has dealt with the expected performance of R-trees for processing overlap queries, i.e., the retrieval of data objects p that share common area with a query window q. Q. 5 describes the expected cost of retrieval for both R- and KDB-trees, where the height of the tree is given by Eq. 6. The fact that the data objects MBRs and R-tree nodes at each level are possibly overlapping, makes the computation of the R-tree nodes size different and not straightforward as the KDB-tree one (Eq. 8). If Nj is the number of nodes at level j, and Dj is the density4 of node rectangles then the average sizes nj,Xand nj,, of node rectangles at each direction are given by the following formula [36]:

nj..x

=

yy

=(Dj/A$)"2

where (Dj_J'*

(10)

-

1 2

(11)

(c - m)“* and *j-l *j

=

(12)

G

Notice from Eqs. 11 and 12 that Dj and Nj can be computed recursively using D, and NO which denote the density D and the amount N, respectively, of the data object MBRs. Therefore, we can 4 We define density D of a set of N objects D = (Z;“, , s,/global area).

to be the sum of the object

areas

s, divided

by the global

area:

Y. Theodoridis et al. I Data & Knowledge Engineering 27 (1998) 313-336

325

......................... .............

..........

all_strictly_north@,q)

some-north @,q)

:

Fig. 5. Query windows for the estimation

of the retrieval cost.

estimate the retrieval cost of an overlap query just with the knowledge of two data set properties (density D and amount N) and the query window q. Since Eq. 5 expresses the expected performance of R-trees on overlap queries with a query window q, in order to estimate the cost of a direction relation R(p, q) we need the following transformation: Wp, 4) * overlaW,

Q) .

In other words, the retrieval of a direction relation using R-trees is equivalent (in terms of cost) to the retrieval of an overlap query using an appropriate query window Q. The necessary transformation Q for each direction relation R should take into consideration the corresponding constraint of the node rectangles in Table 5 (and not data MBRs), because access to nodes (not data) is the single factor for the retrieval cost [28]. The appropriate query windows Q for the range constraints of Table 5 are illustrated in Fig. 5. Using information from Fig. 5 and Eqs. 10, 11 and 12 we can estimate the expected cost for the query window Q (Eq. 5) which equals to the expected cost C(R) for the retrieval of a direction relation R:

C’(R)= C(IQI,TIQIy)= j=l i {&

y)}

*(nj,_x + IQIx)*(nj,y+ IQI

(13)

Intuitively R-trees perform better than other indexing methods in cases where many constraints are involved in the definition of the direction relation of interest. The experimental results of the next section demonstrate that this is actually the case when four constraints are involved, while for one constraint, B-trees have a better performance and for the intermediate relations (two or three constraints) KDB-trees are strong competitors.

4. Performance

comparison

In the previous section we have argued that the performance of each indexing method significantly depends on the particular range. In this section we present several experimental results that justify our

326

Y. Theodoridis et al. I Data & Knowledge

Engineering

27 (1998) 313-336

argument. In order to experimentally quantify the performance, we created tree structures by inserting 10 000 randomly generated MBRs. We tested three data $Zes: ?? the first file contains small MBRs: the average area of each rectangle is 0.003% of the global area (i.e., D = 0.3), ?? the second file contains medium MBRs: the average area of each rectangle is 0.25% of the global area (i.e., D = 25), ?? the third file contains large MBRs: the average area of each rectangle is 5% of the global area (i.e., D = 500). The search procedure used three query jifiles for each data file containing 100 rectangles, also randomly generated, with similar size properties as the data rectangles. We used the same data/query files for the retrieval of direction relations with B-trees, KDB-trees and R-trees. The performance of each data structure per relation (number of disk accesses for each data/query size combination) is illustrated in Fig. 6. As a first conclusion we notice the way that the data and query size affect the performance of the data structures. The performance of B-trees and KDB-trees depends significantly on the query size while for R-trees the data size is the main factor responsible for the high or low performance of the structure. The main conclusions that can be drawn from Fig. 6 are: ?? The B-tree is the most efficient structure for l-constraint relations only (e.g. all_north). When more range constraints (even on the same axis) are involved it is not competitive. ?? The KDB-tree is the most competitive index when two constraints for the same representative point are involved (e.g. alZ_north_east). It is also very competitive when both representative points are involved and the corresponding query windows are too selective (e.g. aZZ_strictZy_north, some_strictZy_north). ?? The R-tree is the winner when four constraints are involved (some_strictZy_north, some_north_east). However, when the data rectangles are large, the R-tree is not able to organise the data efficiently and, therefore, it usually loses against the B-tree or the KDB-tree even for relations involving many constraints. From the above results, it is evident that there is not an overall winner but the performance of the structures is affected by the number of constraints involved and the size of the data and query rectangles. In order to evaluate the quality of the analytical models proposed in Section 3, we compared the expected and the experimental cost for the retrieval of the direction relations. For the analytical estimation of the B-tree, ISDB-tree and R-tree indexing mechanisms we used the formulas of Eqs. 4, 9 and 13, respectively, and the following typical values: number of data N = 10 000, average node capacity c = 0.67, maximum number of entries in a node m = 126 or 84 or 50 (for B-trees, KDB-trees and R-trees, respectively). Table 6 lists the relative errors of the actual results compared to the predictions of the proposed analytical models for the various data/query size combinations. The estimations of the performance of all indexing methods are very close to the experimental results. In most cases, the relative error is lower than 15%. Only for large data and relations with large answer sets, such as all-north and all-north-east the relative error may be higher. Such analytical models are useful to spatial query optimisers because they permit cost prediction. Since the most appropriate index for some relations (aZZ_strictZy_north, some_strictZy_north etc.) is not known beforehand, but depends on the size of the data and query MBRs, the analytical models can be used to select the most suitable method.

321

Y. Theodoridis et al. / Data & Knowledge Engineering 27 (1998) 313-336 ..--.

-

------Ra

140-f

____----

120.;__----____-'--100 .. 80 " ,_._..__-..-_ 60 ..

--.____.-

40 " 2o.r

_. s/a

. 5hn

. m/s nvm

s/l

. ml

us

l/m

VI

nVl

Us

lim

VI

um

111

some-north

all-north lw)T

s/s

s/m

wl

ml5

dm

some_strictly_north

all_strictly_north 180

140 120 100 160 80 L

. ..-.*

*,-----..

*. “**'--\

.._/-___ loo

--7

__----___--50 f

0:

s/s

shll

all-north-east Fig. 6. Performance

s/l

m/s

mhll

nvl

us

some - north- east comparison

(disk accesses

per data/query

size combination).

Fig. 7 illustrates the expected performance of the three data structures for the direction relations implemented. For the analytical estimations we used data and query MBRs with the sizes of each side varying from 0.001 up to 0.256 of the global unit space (i.e., the area of each rectangle varied from 0.0012 up to 0.2562 of the global area) and assumed that the data and query MBRs are equal sized, a property which usually appears in real geographic (and other) applications. According to Fig. 7, the choice of the most efficient method depends on the data and query size for three out of six direction relations (some_north, aZZ_strictZy_north and some_strictZy_north). R-trees perform better for these relations when object size is small but, after a specified threshold, KDB-trees

328

Y. Theodoridis et al. I Data & Knowledge

Table 6 Relative error in estimating

the performance

Direction relation

Engineering

27 (1998) 313-336

of the tree structures

Relative error B-trees

KDB-trees

R-trees

all_north some-north all_strictly_north some_strictly_north all_north_east some_north_east

O-25% O-10% O-20% O-10% O-25% O-15%

O-30% O-10% 5-15% 5-15% 5-30% O-15%

O-25% O-10% 5-30% O-30% O-20% S-25%

Total average

5%

10%

15%

-. . . .

&_

---KDEb-w-R-lme

_.._.._.......-.-.._--.....__ 60

-...

-I.*

t

*.

04 0.001

8 0.002

0.m

0.008

0.016

0.022

0.084

0.126

01

0.258

I

0.001

0.002

0.004

all-north

0.006

0.016

0.022

0.064

0.126

0.266

some-north

MO-

180

120 ..

140 120

100 "

loo

60 .. W~.......................~~"----'

I

,’ ,’ ,' .,

-1'

40 *.

01 O.Wl

I

0.002

0.004

0.008

0.018

0.032

0.064

0.126

0.266

0.001

0.002

0.004

0.063

0.016

0.032

0.084

0.126

0.266

some_strictly_north

all_strictly_north zoo 180

-*.-.____

160 140 120 loo 60

---------------------

M

0.001

o.io2O.&l o.io6o.o1e

o&2

o.iiu

0.;26

0.k

0.002

0.004

0.606

0.016

0.032

some_north_east

all_north_east Fig. 7. Analytical

0.001

performance

comparison

(disk accesses per data-query

size).

0.064

0.126

0.268

Y. Theodoridis et al. / Data & Knowledge Engineering 27 (1998) 313-336

329

(or B-trees for the some-north relation) are the best choice. The knowledge of such a threshold is very important for a spatial query optimiser. It is also important to notice that the ordering of the three data structures per relation remains unchangeable for the experimental and the analytical results of Figs. 7 and 8, respectively, since query optimisers primarily need to know the best solution and, secondarily, the exact retrieval cost. In general, usually there is no overall winner even for a specified direction relation. Therefore, we study modifications or extensions of the three data structures which could be more suitable for certain range queries. In the next sections we propose such extensions of B-, KDB- and R-trees.

5. Extensions One- and multi-dimensional data structures have been designed to efficiently answer some particular types of queries. For example, B-trees index alphanumeric data in order to answer match or range queries, KDB-trees are efficient for match or range queries on multi-dimensional points and R-trees were designed for overlap queries on multi-dimensional MBRs. However, direction relations between spatial objects in 2D space do not follow directly one of these categories and, therefore, in Section 3 we presented the appropriate procedures in order to use one of these ‘classic’ data structures. In this section we propose extensions of the three data structures that facilitate efficiency for some 2D range queries.

5.1. Extensions of B-trees In the case of B-trees, we propose schemes for the maintenance of information regarding the MBR extents in the index; we call the proposed structures composite B-trees. As shown in Section 3, the number of B-trees to be searched is equal to the number of constraints that compose the query. This inconvenience may be overcome if the B-tree accommodates additional information regarding the MBRs in its leaf nodes. That is, a composite key may be maintained instead of a simple numerical value (i.e., the coordinate value of one MBR comer). This composite key consists of the coordinate values of the two MBR comers in one axis (x- or y- axis) or even of all four coordinates of the two MBR comers. One of these coordinates is the primary component, and based on its value the B-tree is built. The rest of the coordinates are used for the elimination of irrelevant MBRs. The key of the composite B-tree considered here consists of two values that represent the lower and upper coordinates of each MBR in one of the two axes. One of these values is the primary component. This scheme reduces each MBR into two line segments, which represent its extents over the two dimensions of the space and are indexed separately. Clearly, the efficient retrieval of direction relations can be achieved when four composite B-trees are maintained: B,,X-treethat keeps pj,, as the primary component and P:,~ as the secondary component, and, similarly, B,,X-tree, B,,,-tree and B,,,-tree. Some relations imply the search of only one B-tree (such as aZZ_north),while the rest imply exactly two searches (i.e., one for each axis such as alZ_north_east). In addition, provided that the information regarding the MBRs’ extents over each axis are maintained in two indices, the approach to satisfy a query is not unique. For instance, relation some-north requires no index to be searched for the x- axis, but it may be satisfied by searching either

Y. Theodoridis et al. I Data & Knowledge

330

Engineering

27 (1998) 313-336

a &-tree (condition to be fulfilled: q;,, Cp,‘,, < q:.,; refinement condition: p:.,. > q:,,) or a B,,,-tree (condition to be fulfilled: p:,, > q:,,; refinement condition: q,‘,y
nsn

anslamssn~m

mmaslsslmcsx

MWUYsJb=

direction relation

direction relstion

direction mlalion

(a) small data / query fig. 8. Performance

(b) medium data / query comparison

of the classic and the composite

(c) large data / query B-tree.

Y. Theodoridis et al. I Data & Knowledge Engineering 27 (1998) 313-336

331

Compared to the other methods, the composite B-tree is the most efficient one for large data when more than one constraints are involved (e.g. some_north and some_str-ictly_north for large data files), since R-trees are unable to index such data efficiently, but it is sensitive to query size; large query windows are not handled efficiently by composite B-trees [37]. 5.2. Extensions

of KDB-trees

A similar approach can be considered for KDB-trees. KDB-trees handle two-dimensional data efficiently when the search procedure involves one of the two-representative points of the MBRs. However, most of the direction relations involve both points and, as a consequence, the intersection of two answer sets should be computed (Step 3 in Section 3.2). The determination of both the answer sets and their intersection can be a highly time-consuming procedure when dealing with large databases. We propose the maintenance of the opposite MBR corner, along with the corner on which indexing is based, in the leaf nodes of a composite KDB-tree. The additional point will serve for the fast elimination of irrelevant MBRs. Clearly, the efficient retrieval of direction relations can be achieved when two composite KDB-trees are maintained: a KDB,-tree where indexing is based on the lower left comers of the MBRs; and, a KDBU-tree where indexing is based on the upper right comers of the MBRs. A range relation query may be satisfied by searching one of the two composite KDB-trees. Similarly to the composite B-tree the selection of the most effective tree is not trivial, and should consider: (a) the direction relation involved; (b) the query window size and position; and (c) the distribution of MBRs over the work space. The three schemes of the composite B-tree &in also be applied for the selection of the most efficient composite KDB-tree, provided that linear ranges are substituted by area ranges. We have implemented and tested the three schemes of composite KDB-trees and again found the third one to be the most efficient solution. A comparison of the classic KDB-tree and the third scheme is illustrated in Fig. 9. As a general conclusion, the composite KDB-tree outperforms the classic KDB-tree in most cases. The opposite happens only for the all-north and alZ_north_east relations where only one index is

an

snII1mssn~ar

direction

m

r&lion

“”

P”

UT

*

nsle%lanmEme

direction mlarion

(a) small data / query Fig. 9. Performance

(b) medium data / query comparison

of the classic and the composite

dimcfion mlatioa

(c) large data I query KDB-tree.

Y. Theodoridis

332

et al. 1 Data LB Knowledge

m

YI111mmmlMW

direction

111

M

Engineering

1111

UK

IDC

27 (1998) 313-336

sn

111

dirsclionrelation

mlstio~

(a) small data / query Fig. 10. Performance

(b) medium data / query comparison

*ml

PI)

UT

UT

directionmlation

(c) large data / query .

of the 2D R-tree and the ID R-tree.

accessed (see Table 4). Compared to the other methods, the composite KDB-tree is the most efficient one for the some-north relation and small or medium data, where the corresponding query window is too selective, but it is insufficient for the rest ones. 5.3. Extensions of R-trees R-trees handle two-dimensional data efficiently when the search procedure involves both axes of the work space. However, several direction relations, such as alZ_north and some-north, involve search on only one axis. In such cases, the information regarding the other axis, which is maintained in the two-dimensional R-tree is useless. Clearly, an 1D R-tree (i.e., a segment tree), which is an index of the MBR extents along the axis of interest, would be more efficient because it is more compact (tree nodes accommodate a larger number of entries, since two instead of four coordinates are stored per MBR) and effective (the MBR extents along the other axis do not affect the maintenance of the index) than the 2D R-tree. For the retrieval of direction relations that involve both axes, two ID R-trees are needed to index the MBR extents over each axis separately. Each index provides a set of MBRs, and the intersection of the two sets composes the qualified set. In such cases though, the 2D R-tree is expected to be more efficient. We have implemented and tested the 1D R-tree in comparison to the classic 2D version. The results are illustrated in Fig. 10. According to Fig. 10, the 1D R-tree outperforms the 2D R-tree only for the aZZ_north and some-north relations. In the other cases, where two ID indexes need to be searched, it is insufficient. Some-north is the only relation where the 1D R-tree is the overall winner, provided that data MBRs are not large.

6. Conclusion This paper describes implementations of direction relations for spatial database systems. Direction relations constitute a special type of queries for spatial access methods, which so far have been

Y. Theodoridis et al. I Data & Knowledge

Engineering

27 (1998) 313-336

333

concerned with traditional window queries, topological relations and nearest neighbour queries. Despite the fact that direction queries are of equal importance to previous ones they have not been extensively implemented. In this work we defined direction relations between points and used these definitions as a basis for relations between objects. For the purposes of the paper we used a set of six object relations based on the north direction, but a large number of additional ones can be defined depending on the application domain. Then we showed how these relations can be transformed to range queries and retrieved in existing DBMS using three alternative indexing techniques: B-, KDBand R-trees. We also provided analytical formulas for the expected performance which, in most cases, proved to be very accurate for all the indexing methods (relative error usually lower than 15%), a fact that renders the derived formulas suitable for query optimisers. The main conclusion that arises from the experimental and analytical comparison of the alternative indexing techniques is that there does not exist a single data structure that performs best in all queries but the performance depends on the following factors: ?? the number and the type of range constraints involved in the definition of the direction relations of interest, ?? the data size (i.e., the size of the primary MBRs), and ?? the query size (i.e., the size of the reference MBRs). Besides the classic implementations of the three data structures, appropriate extensions of them (namely composite B-trees, composite KDB-trees and ID R-trees, respectively) were also implemented in order to facilitate efficiency for some types of queries. The guidelines to a spatial query optimiser that should choose the most suitable indexing method for a specific input query are summarised in Table 7. These guidelines can be extended to include other spatial data structures (e.g. Grid files) and other direction relations. We currently work on the conjunction of the above guidelines with other data structures, which adopt different techniques for organising the spatial objects’ approximations [34]. Table 7 Decision rules for the efficient retrieval of direction relations Direction relation (as a set of range constraints)

B-trees Classic

1 constraint on one axis (e.g., all-north) 2 constraints on one axis (e.g., some-north) 1 constraint on the first axis PLUS 1 constraint on the second axis (e.g., all_north_east) 1 constraint on the first axis PLUS 2 constraints on the second axis (e.g., all_strictly_north) 2 constraints on the first axis PLUS 2 constraints on the second axis (e.g., some_strictly_north, some_north_east) “May not be efficient for large data. bMay not be efficient for large queries.

KDB-trees Comp.

Classic

R-trees Comp.

2D

1D

334

Y. Theodoridis

et al. I Data & Knowledge

Engineering

27 (1998)

313-336

Progress can also be achieved in specialised data structures for queries that involve direction relations or combinations of several types of spatial information (‘find the k-nearest land parcels northeast of a lake’).

Acknowledgements Yannis Theodoridis, Emmanuel Stefanakis and Timos Sellis were partially supported by the European Commission under the TMR project ‘CHOROCHRONOS: A Research Network for Spatiotemporal Database Systems’. Emmanuel Stefanakis and Timos Sellis were also supported by the Department of Research and Technology of Greece (YPER’94). Dimitris Papadias was supported by a DAG Grant from Hong Kong Research Grants Council.

References [l] D.S. Batory, B’ trees and indexed sequential Management

of Data,

files: A performance

Proc.

comparison,

ACM SIGMOD

Intl. Conf

on

1981.

[2] N. Beckmann, [3] [4] [5]

[6] [7] [S]

H.P. Kriegel, R. Schneider, B. Seeger, The R*-tree: An efficient and robust access method for points and rectangles, Proc. ACM SIGMOD Conf. Management of Data, 1990. T. Brinkhoff, HP Kriegel, R. Schneider, Comparison of approximations of complex objects used for approximationbased query processing in spatial database systems, Proc. 9th IEEE Intl. Conf on Data Engineering, 1993. A.G. Cohn, The challenge of qualitative spatial reasoning, ACM Computing Survey 27(3) (1995) 323-325. D. Comer, The ubiquitous B-Tree, ACM Computing Surveys 1 l(2) (1979) 121-137. Z. Cui, A.G. Cohn, D.A. Randell, Qualitative and topological relationships in spatial databases, Proc. of the 3rd Intl. Symp. on Large Spatial Databases (SSD), 1993. E. Clementini, J. Sharma, M. Egenhofer, Modeling topological spatial relations: Strategies for query processing, Intl. Journal of Computer and Graphics 18(6) (1994) 815-822. M. Egenhofer, Reasoning about binary topological relations, Proc. of the 2nd Intl. Symp. on Large Spatial Databases (SSD),

I 99 1.

[9] M. Egenhofer, Systems

J. Sharma, Assessing

the consistency

of complete and incomplete

topological

Geographical

information,

1( 1) ( 1993) 47-68.

[IO] C. Faloutsos,

T. Sellis, N. Roussopoulos, Analysis of object oriented spatial access methods, Proc. of ACM SIGMOD of Data, 1987. [1 I] C. Faloutsos, I. Kamel, Beyond uniformity and independence: Analysis of R-trees using the concept of fractal dimension, Proc. of the 13th ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems (PODS), 1994. [12] A.U. Frank, MAPQUERY: Database query language for retrieval of geometric data and its graphical representation, Intl. Conf

on Management

ACM SIGGRAPH [13]

16(3)

A.U. Frank, Qualitative Systems

lO(3) ( 1996)

1141 M. Grigni,

(1982)

[ 151 A. Guttman, Management

Cardinal directions

as an example, Intl. Journal

of Geographic

Information

269-290.

D. Papadias,

Intelligence,

199-207.

spatial reasoning:

C. Papadimitrious,

Topological

inference,

Proc.

of the Intl.

Joint

Conf

of Artificial

1995.

R-trees: of Data,

A dynamic

index structure

for spatial

searching,

Proc.

of ACM

SIGMOD

Intl. Conf.

1984.

[ 161 D. Hernandez, [ 171 D. Hemandez,

Qualitative

Qualitative Representation

[18] A. Herskovits,

Language

distances,

Proc.

of Spatial Knowledge 2nd Intl. Conf

and Spatial Cognition

(Cambridge

(Springer Verlag LNAI, 1994).

on Spatial Information

University

Theory

Press, 1986).

(COSIT),

1995.

on

Y. Theodoridis et al. 1 Data & Knowledge Engineering 27 (1998) 313-336

335

u91 PD. Holmes, E. Jungert, Symbolic abd geometrc connectivity graph methods for route planning in digitized maps, IEEE Trans. on Pattern Analysis and Machine Intelligence 14(5) (1992) 549-565. 1973). WI D. Knuth, The Art of Computer Programming, Vol. 3, Sorting and Searching (Addison-Wesley, WI S.-Y. Lee, M.-C. Yang, J.-W. Chen, Signature file as a spatial filter for inconic image database, Journal of Visual Languages and Computing 3 (1992) 373-397. WI J. Orenstein, Spatial query processing in an object-oriented database system, Proc. of ACM SIGMOD Intl. Co@ on Management of Data, 1986. ~31 B. Pagel, H. Six, H. Toben, P. Widmayer, Towards an analysis of range query performance, Proc. of the I2th ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems (PODS), 1993. space, Very Large Data ~241D. Papadias, T. Sellis, Qualitative representation of spatial knowledge in two-dimensional Bases Journal 3(4) (1994) 479-516. language, Journal of Visual Languages and Computing 6( 1) v51 D. Papadias, T. Sellis, A pictorial query-by-example (1995) 53-72. WI D. Papadias, Y. Theodoridis, T. Sellis, M. Egenhofer, Topological relations in the world of minimum bounding rectangles: A study with R-trees, Proc. of ACM SIGMOD Intl. Conj on Management of Data, 1995. 1271D. Papadias, M.J. Egenhofer, J. Sharma, Hierarchical reasoning about direction relations, Proc. of the 4th ACM Workshop on GIS, 1996. ml D. Papadias, Y. Theodoridis, Spatial relations, minimum bounding rectangles and spatial data structures, Intl. Journal of Geographical Information Science 11(2) ( 1997) 11l- 138. ~291D. Peuquet, Z. Ci-Xiang, An algorithm to determine the directional relationship between arbitrarily shaped polygons in the plane, Pattern Recognition 20( 1) (1987) 65-74. J.T. Robinson, The K-D-B-Trees: A search structure for large multidimensional dynamic indexes. Proc. of ACM 1301 SIGMOD Intl. Conf. on Management of Data, 1981. [311N. Roussopoulos, C. Faloutsos, T. Sellis, An efficient pictorial database system for PSQL, IEEE Trans. on Software Engineering 14(5) (1988) 639-650. 1321N. Rou~sopoulos, F. Kelley, F. Vincent, Nearest neighbor queries, Proc. of ACM SIGMOD Intl. Co@ on Management of Data, 1995. objects, Proc. of the [331T. Sellis, N. Roussopoulos, C. Faloutsos, The R’-tree: A dynamic index for multi-dimensional Intl. Co@ on Very Lurge Data Bases (‘VLDB), 1987. [341E. Stefanakis, Y. Theodoridis, T. Sellis, Y.-C. Lee, Point representation of spatial objects and query window extension: A new technique for spatial access methods, Intl. Journal of Geographical Information Science 1 l(6) (1997) 529-554. 1351Y. Theodoridis, D. Papadias, Range queries involving spatial relations: A performance analysis, Proc. of the 2nd Intl. Con& on Spatial Information Theory (COSIT), 1995. [361Y. Theodoridis, T. Sellis, A model for the prediction of R-tree performance, Proc. of the 15th ACM SIGACT-SIGMODSIGART Symp. on Principles of Database Systems (PODS), 1996. 1371Y. Theodoridis, D. Papadias, E. Stefanakis, Supporting direction relations in spatial database systems, Proc. of the 7th Intl. Symp. on Spatial Data Handling (SDH), 1996. Yannis Theodoridii received his Diploma Degree in Electrical Engineering in 1990 and his Ph.D. in Electrical and Computer Engineering in 1996, both from the National Technical University of Athens, Greece. Since 1996, he is a Research Engineer in the Knowledge and Database Systems Laboratory, National Technical University of Athens, Greece. His research interests include spatial and spatiotemporal database systems, multimedia database systems, and geographical information systems. Dr. Theodoridis has oublishe :d several articles in refereed ioumals and international conferences in the above areas. Dr. Theodoridis is a member of IEEE and ACM.

Dimitris Papadias is an Assistant Professor at the Computer Science Dept. of the Hong Kong University of Science and Technology. Before joining HKUST he held research and teaching positions at the Institute for Intelligent Robotics and Information Systems (Canada), the Electrical and Computer Engineering Dept., National Technical University of Athens (Greece), the Department of Geoinformation, Technical University of Vienna (Austria), the Department of Computer Science and Engineering, University of California at San Diego, the National Center for Geographic Information and Analysis, University of Maine (USA), and the Artificial Intelligence Research Division, German National Research Center for Information Technology (GMD).

336

Y. Theodoridis et al. I Data & Knowledge Engineering 27 (1998) 313-336

Emmanuel Stefanakis was born in Athens, Greece, in 1969. He received his Ph.D. deeree in Comouter Science. Department Gf Electrical gnd Computer Engineering, National Technical University of Athens, Greece, 1997, M.Sc.E. degree, Department of Geodesy 1 and Geomatics Engineering, Universitv of New Brunswick: Canad;, 1994, and DipLEng. degree (with distinction), Department of Rural and Surveying Engineering, National Technical University of Athens, Greece. 1992. He has received several scholarships and awards in Greece and Canada and has several publications in international journals, books and conferences. Since 1991 he has been involved in several research projects dealing with database and GIS applications. His research interests include Database Systems, Geographic Information Systems and Digital Mapping.

Timos Sellis received his Diploma Degree in Electrical Engineering in 1982 from the National Technical University of Athens, Greece. In 1983 he received his M.Sc. degree from Harvard University and in 1986 his Ph.D. from the University of California at Berkeley, where he was a member of the INGRES group, both in Computer Science. In 1986, he joined the Department of Computer Science of the University of Maryland, College Park, as an Assistant Professor in 1992. Between 1992 and 1996 he was an Associate Professor at the Computer Science Division of the National Technical University of Athens, in Athens, Greece, where he is currently a Full Professor. His research interests include extended relational database systems, active database systems, and spatial, image and multimedia database svstems. Prof. Sellis has oublished over 60 articles in refereed journals and international conferences in the above areas. He is a recipient of a Presidential Young Investigator (PYI) award for 1990-1995. He is a member of the editorial boards of the International Journal on Intelligent Information Systems, and GeoInformatica. Prof. Sellis is a member of IEEE and ACM.