ARTICLE IN PRESS Information Systems 35 (2010) 592–614
Contents lists available at ScienceDirect
Information Systems journal homepage: www.elsevier.com/locate/infosys
Modelling and querying geographical data warehouses$ Joel da Silva , Anjolina G. de Oliveira, Robson N. Fidalgo, Ana Carolina Salgado, Vale´ria C. Times Center for Informatics, Federal University of Pernambuco, P.O. Box 7851, Recife-PE 50.732-970, Brazil
a r t i c l e i n f o
Keywords: SOLAP Geographical data warehouse Geographical and Multidimensional Query Language (GeoMDQL)
abstract A number of proposals for integrating geographical (Geographical Information Systems—GIS) and multidimensional (data warehouse—DW and online analytical processing—OLAP) processing are found in the database literature. However, most of the current approaches do not take into account the use of a GDW (geographical data warehouse) metamodel or query language to make available the simultaneous specification of multidimensional and spatial operators. To address this, this paper discusses the UML class diagram of a GDW metamodel and proposes its formal specifications. We then present a formal metamodel for a geographical data cube and propose the Geographical Multidimensional Query Language (GeoMDQL) as well. GeoMDQL is based on wellknown standards such as the MultiDimensional eXpressions (MDX) language and OGC simple features specification for SQL and has been specifically defined for spatial OLAP environments based on a GDW. We also present the GeoMDQL syntax and a discussion regarding the taxonomy of GeoMDQL query types. Additionally, aspects related to the GeoMDQL architecture implementation are described, along with a case study involving the Brazilian public healthcare system in order to illustrate the proposed query language. & 2009 Elsevier B.V. All rights reserved.
1. Introduction Support to the decision-making process may involve the use of technologies, such as DW (data warehouse) [25], OLAP (on-line analytical processing) [21,3,49] and GIS (Geographical Information Systems) [52,8]. DW is a typical database for supporting decision-making activities that is usually implemented with the star model, which is organized through fact tables and dimension tables. OLAP is a specific software category for multidimensional processing of data extracted from the DW and can be interpreted from different perspectives and with different degrees of details. GIS are specific systems for supporting geographical decisions that acquire, manipulate, visualize
$
Funded by BZG.
Corresponding author. Tel.: þ55 5537448900.
E-mail addresses:
[email protected],
[email protected] (J. da Silva),
[email protected] (A.G. de Oliveira),
[email protected] (R.N. Fidalgo),
[email protected] (A.C. Salgado),
[email protected] (V.C. Times). 0306-4379/$ - see front matter & 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.is.2009.10.005
and analyze spatial objects. In recent years, a number of researchers from the Information Technology community have thoroughly investigated the problem of integrating these technologies [1,28,27,42,11,43]. This new type of integrated environment has been referred to as SOLAP (spatial OLAP) [42] and benefits the strategic decisionmaking process by broadening the analysis universe regarding the business of an organization. As these technologies were originally conceived for different purposes, their integration is anything but trivial and, as a result, no consensus has yet been reached regarding the most appropriate way to do so. Nevertheless, conventional and geographical data must be integrated in a single database, which corresponds to a GDW (geographical data warehouse). According to Fidalgo et al. [13], a GDW can be defined as an extension of the traditional DW approach by adding a geographical component. Basically, this consists of extending the star model through the inclusion of geographical properties (including descriptive and geometrical data), which are defined as
ARTICLE IN PRESS J. da Silva et al. / Information Systems 35 (2010) 592–614
GDW dimensions and/or measures. The dimensions can store the geometries and descriptions of the geographical objects, whereas spatial measures can only store geometries. A GDW must maintain the traditional characteristics of a DW [25], i.e. directed to the subject, integrated, nonvolatile and varying in time. Furthermore, a GDW must offer support to storage, indexing structures, aggregation functions and georeferenced analysis in maps or tables [13,44]. This paper proposes a metamodel definition for helping the specification of GDW schemas that avoids spatial data redundancy by normalizing geometrical data. This metamodel extends the GeoDWFrame (Geographical DW Framework) definitions [13] by taking into account spatial measures. Moreover, it provides a means of storing the descriptions of the location of spatial objects and normalizing the geometrical data referring to these objects in order to improve query response time and storage requirements in a GDW [45,46]. We then define a geographical data cube to propose a query language, which is denominated GeoMDQL (Geographical and Multidimensional Query Language) and has an integrated syntax for simultaneous usage of both multidimensional and spatial operators and for retrieving data stored in a GDW based on our proposed metamodel. An example of the application of this query language is also given to provide comparisons between GeoMDQL and other related approaches as well as to illustrate advantages that might be obtained through the use of our SOLAP query language proposal. This paper is organized as follows. Section 2 gives an overview of some existing approaches for GDW modelling and developing a query language to retrieve geographical or multidimensional data. Section 3 illustrates the main issues concerning our GDW metamodel by describing a running example. Section 4 presents the definitions for our GDW metamodel. Our geographical data cube specifications are given in Section 5. Section 6 discusses the query language proposal, including a description of the GeoMDQL grammar and a discussion regarding the taxonomy of GeoMDQL query types. Section 7 shows our implementation results by illustrating the application of our metamodels and GeoMDQL to real data and performing a comparative analysis between GeoMDQL and other query languages. Section 8 outlines our final considerations on the work reported in this paper and points out some important issues for future work. 2. Related work A data model is an important component of a decision support application. The query language chosen for assisting in the development of such an application is also crucial. In this section, some relevant studies related to GDW modelling as well as geographical and multidimensional query languages are discussed. 2.1. Metamodels for geographical data warehouses A GDW model represents dimensions, measures and geographical types and how they can be arranged to obtain a GDW schema. Some GDW models are discussed below.
593
Stefanovic et al. [20] define three types of dimensions based on the members of spatial references of a hierarchy: non-spatial, spatial-to-non-spatial and totally spatial. The authors also differentiate numerical and spatial measures. A method for constructing and materializing spatial cubes as well as the efficiency and efficacy of algorithms for materializing cubes are also investigated. Rivest et al. [42] discuss the importance of new tools that enable exploring the potential of the spatial and temporal dimensions of a GDW in a spatial–temporal analysis process. The authors have extended the previous definition of spatial measures to address spatial measures that are computed by metrical or topological operators. Furthermore, the authors stress the need for more advanced querying capabilities to provide end users with topological and metrical operators. Another important study was carried out by Pedersen and Tryfona [37], who present a formal definition of a data model for a GDW given at the conceptual level. This model is based on the work performed by Han et al. [20], which considers spatial measures as a collection of geometries, whereas dimensions are always seen as non-spatial. However, this data model has a limited capability of expressiveness due to the restricted number of spatial object types that are taken into account. More recently, Malinowski and Zimanyi [29] have presented a GDW data model based on the MADS spatial–temporal conceptual model [35], which is, in turn, an object-based entity-relationship approach ðERCþÞ [47] that uses pictograms to represent geographical object properties and topological relationships. In this model, a dimension may contain one or more related levels to represent a geometrical attribute. Moreover, the fact table is represented as an existing n-ary relationship among these dimensions. The fact table attributes consist of measures and a spatial measure is seen as a geometry or function that computes a geometrical property, such as the length or surface of a spatial object. Further details on this metamodel formalization can be found in [47]. In [7], the MuSD (multigranular spatial data warehouse) data model is proposed, which is based on the concepts of spatial fact, spatial dimension and multi-level spatial measure. According to this model, a spatial fact registers a geographical event that is relevant for analysis. A spatial dimension describes the geometrical properties of facts with geographical meaning, while a multi-level spatial measure registers multiple geometries with different degrees of detail (e.g. an accident may be represented by a point along a road, a road segment or the whole road). Thus, the innovation of this model is the representation of spatial measures according to multiple levels of geometrical granularity. Zghal et al. [53] propose a metamodel and the CASME (Computer Aided Spatial Mart Engineering) tool. The metamodel is based on the UML class diagram and also addresses spatial dimensions, spatial hierarchies and spatial measures. Four types of dimensions are identified: non-geometrical spatial dimension, geometrical to non-geometrical spatial dimension, entirely geometrical dimension and temporal dimension. In this model, a measurement is numerical when it contains only
ARTICLE IN PRESS 594
J. da Silva et al. / Information Systems 35 (2010) 592–614
numerical data and spatial when it corresponds to a collection of pointers to spatial objects. Our GDW data model proposal intends to: (1) be formal, in order to ensure a concise model free of inconsistencies; (2) make use of a standard language for modelling, such as UML, which is unambiguous and easy to understand; (3) be based on standards from DW and GIS fields, such as CWM (Common Warehouse Metamodel) [31] for DW and those defined by OGC [5] for GIS so that it can be easily extended and reused; and (4) make use of pictograms to improve the expressiveness of the model. As shown in this section, there are a number of GDW modelling approaches, but none considers all the points discussed above. Despite employing pictograms, the work described in [29] is based on the ERCþ data model, which is not a standard nor is as widely used as an ER schema or UML class diagram. The metamodel outlined in [29] does not take into account the use of reference standards. The same is true for the work discussed in [53]. With regards to the results given by Zghal et al. [53], the modelling language is based on UML, but pictograms are not used to enhance the expressiveness of the GDW schema. Table 1 shows the data models approaches discussed in this section classified according to the issues intended to be covered by our proposed data model. 2.2. Languages for spatial and multidimensional querying MDX (multidimensional expressions) [51] is one of the most important approaches to querying multidimensional data. By using MDX, users can perform many complex queries in a multidimensional data cube, making available configurable data that cross different perspectives and aggregation levels by using multidimensional operators. Although similar to traditional SQL, MDX is not an SQL extension, but a special query language with a large number of analytical functions (e.g. drill-down, drill-up and slice/dice) and has been optimized for querying multidimensional data. Currently, MDX is supported by most OLAP vendors. The ISO SQL OLAP [16] specification was published in the year 2000 and is an improvement over the traditional SQL query language. ISO SQL OLAP offers a large number of functions for querying multidimensional databases by providing OLAP operators and is also supported by some existing OLAP suppliers. The MD-CAL (multidimensional calculus) [2] is another work that aims to provide a multidimensional query language. MD-CAL performs calculus operations from a fact table in a multidimensional data source, supporting high-level analysis on multidimensional data and allowing
the use of built-in scalar and aggregated functions in its expressions. Another approach for multidimensional querying is the data cube [19], which provides support for performing multidimensional data grouping, subtotals, cross-tabulation, drill-up and drill-down operators. The SQL-M approach [36] is composed of a data model, a formal algebra and a multidimensional query language for multidimensional data analysis. The authors of the SQLM stress that one of the greatest advantages of their work is the capability of manipulating complex and irregular hierarchies. Some studies have been developed in the research area of geographical query languages as well. The spatial SQL proposed by Egenhofer [9] is composed of two modules: (1) a query language and (2) a presentation language. The first module is based on the traditional SQL and preserves the SELECT-FROM-WHERE clause, whereas the presentation language, denominated GPL (Graphical Presentation Language), allows users to customize how spatial objects are presented. Another relevant work is GeoSQL [12], which is also based on the traditional SQL and is similar to the spatial SQL. The GeoSQL query language has been implemented as a module of an object-oriented GIS prototype denominated YH-GIS. In a query expression, non-spatial constraints are expressed using logical and comparison operators, similarly to the SQL WHERE clause. Spatial constraints are expressed using logical expressions and spatial predicates, which are based on the relationships found among the geographical features. SQL/SDA [26] extends the traditional SQL for spatial analysis and is based on the OGC SFS 4SQL (OGC simple features specification for SQL) specification [5]. It therefore offers a large number of spatial functions for the management and querying of geographical features. A graphical user interface written in Java provides icons that represent the most common spatial functions in order to help users in expressing their queries. Based on the OGC SFS 4SQL specification, a visual query language for spatial databases is proposed in [30]. The method chosen for this approach is the translation from the queries expressed in flow diagrams to an SQL-based spatial extension. Moreover, an improved graphical user interface has been developed for users to express their queries. One of the most important proposals related to spatial query languages is the ISO SQL MM [22]. This specification is an effort to include spatial processing functionalities in the traditional SQL and is also based on the OGC SFS 4SQL, resulting in the provision of a large number of functions for the management, storage, analysis and retrieval of geographical features.
Table 1 A comparative analysis among GDW data models. Related work
Formal definitions
Standard modelling languages
DW/GIS standards
Pictograms
Han et al. Rivest et al. Pedersen and Tryfona Malinowski and Zimanyi Damiani and Spaccapietra Zghal et al.
No No Yes No Yes No
No No No No No Yes
No No No No GIS (OGC) No
No No No Yes No No
ARTICLE IN PRESS J. da Silva et al. / Information Systems 35 (2010) 592–614
As we can see, none of the previously outlined papers provides a query language with capabilities to integrate both multidimensional and spatial operators in a unique syntax. The work that could be compared with ours [39] proposes an object-oriented geographical database, which has been extended to support links to analytical data stored in a multidimensional cube. Thus, starting from a spatial query result, it is possible to retrieve the multidimensional data related to these links. However, a query language that fully integrates multidimensional and spatial operators is not discussed. Moreover, the geographical database and the multidimensional data sources have not been fully integrated, as no geographical DW has yet been used. Our proposal has three major advantages over the language approaches discussed above: (1) it integrates multidimensional and spatial operators in a unique syntax; (2) it proposes a set of new specially designed operators; and (3) it includes a graphical interface that allows user to pose queries and visualize the answers. 3. A GDW metamodel In order to facilitate the understanding of our GDW metamodel concepts and illustrate its application, this section provides a detailed description of its class diagram and a discussion on a GDW schema given as a running example.
595
3.1. The metamodel class diagram Our metamodel has been specified using OCL (Object Constraint Language) restrictions [33] and UML (Unifed Modelling Language) [32] so that its specification is easy to understand. This metamodel is based on the relational package of CWM (Common Warehouse Metamodel) [31] and SFS (simple features specification) for SQL of OGC [5] in order to facilitate its usage and extension in other studies. It defines how the concepts (e.g. measures and conventional and geographical dimensions) of a GDW schema can be organized and are related to each other. It also provides a set of stereotypes with pictograms that are meant to assist and guide the project designer in the GDW modelling activity and serves as a basic metamodel for CASE tools aimed at the conceptual modelling and automatic generation of logical GDW schemas. Moreover, through its OCL restrictions, it enables the consistency determination of the models generated. Fig. 1 displays the UML class diagram for our metamodel, which is detailed as follows. CWM uses the terms table, row, and column for relation, tuple, and attribute, respectively. We use the corresponding terms interchangeably. The Schema, Table, Column, PrimaryKey and ForeignKey classes are part of the CWM relational package. The Schema class represents the dimensional and geographical schema of a GDW. A schema is a named set of zero or more tables, described by the Table class. Tables are composed of zero or
Fig. 1. The class diagram for the proposed metamodel.
ARTICLE IN PRESS 596
J. da Silva et al. / Information Systems 35 (2010) 592–614
more columns (Column) with at least one primary key restriction (PrimaryKey) and zero or more foreign keys (ForeignKey). These foreign keys associate columns of one table with columns of another table. Tables can be specialized in fact tables (FactTable) and dimension tables (DimensionTable), whereas columns are specialized in attributes of a table (TAttribute), degenerated dimensions (Degenerated) and measures (Measure). The latter can be specialized in common measures (Common) and spatial measures (Spatial). Spatial measures are specialized in classes that are associated with an SFS class to standardize and represent geometries of point (PointM), line strings (LineStringM), polygon (PolygonM), geometry collections (GeometryCollectionM), multiple points (MultiPointM), multiple line strings (MultiLineStringM) and multiple polygons (MultiPolygonM) types. A dimension table (DimensionTable) can be specialized in three different dimensions: conventional, hybrid and geographical. In the first case, as in a traditional DW, a GDW also provides support for dimensions that store only conventional data. The other two types of dimensions model the concepts of the GeoDWFrame proposal [13]. While the hybrid dimension deals with location descriptions and conventional data (e.g. client’s address plus age and gender), geographical dimensions are specialized in composite (Composite) and primitive (Primitive) dimensions. The geographical composite dimensions only stores location descriptions (e.g. country names), whereas the geographical primitive dimensions maintain the coordinate data (e.g. geometries of countries) related to the corresponding location descriptions kept in composite or hybrid dimensions. While a primitive geographical dimension represents the geometrical data used to handle (e.g. draw, query and index) spatial objects, a composite geographical dimension contains attributes to represent its primary key, the location description of the geo-objects and its foreign keys to primitive geographical dimensions. It is worth noting that, without the primitive geographical dimensions, a dimension with spatial data would have to store its location descriptions and geo-references together. Due to the intrinsic redundancy of dimensional data and the high costs of storing the geo-references (compared to foreign keys to primitive dimensions), keeping these data together in a single dimension table has proven to have a high storage cost [45,46]. A primitive geographical dimension is also specialized in classes that are associated with an SFS class in order to standardize and represent geometries of point (PointP), line strings (LineStringP), polygons (PolygonP), geometry collections (GeometryCollectionP), multiple points (MultiPointP), multiple line strings (Multi- LineStringP) and multiple polygons (MultiPolygonP). All concepts presented in this section were applied in the implementation of GeoDWCASE [15]. This is a CASE tool to help in the modelling activities and construction process of geographical data warehouses by making available a set of UML-based classes. This tool provides classes for modelling fact tables, dimension tables, geographical dimensions and hybrid dimensions. Moreover, some pictograms that may be used as UML
stereotypes for the available classes are found in the GeoDWCASE interface. This helps users in the identification and understanding of the semantic meaning of each class. Figs. 2 and 3 specify the metamodel stereotypes that are related to facts and dimensions of a GDW, respectively. These figures have tables with the following columns: (1) stereotype: the name of the stereotype; (2) pictogram: the icon associated with the stereotype (empty when nonexistent); and (3) description: the stereotype textual description. By using GeoDWCASE, the designer may check the model consistency at any time. For example, to ensure that a primitive geographical dimension table will not contain foreign keys, the OCL instruction orule body=‘‘self.foreignkeys-4forAll(r j not r.oclIsKindOf(Table))’’/4 is used.1 An example of a data model built using this tool is displayed in Fig. 12 (see Section 7.1). When finalizing the modelling tasks and validating the model, the designer may transform the GDW conceptual schema into its respective logical schema. GeoDWCASE facilitates this task by automatically generating the logical schema to be implemented. This implementation is written in Java and performs the transformation of the XMI elements (conceptual schema) into tables and their respective columns (logical schema) according to the DDL syntax (Data Definition Language) of the target SDBMS. As each spatial DBMS has a particular set of instructions for the treatment of geometries, the GeoDWCASE tool needs to implement a specific automatic transformation module for each spatial DBMS. Three of these modules have been implemented and transform GeoDWM schemas into DDL instructions of the following SDBMS: PostgreSQL with PostGIS, Oracle Spatial and MySQL spatial extensions. After executing the transformation, a .sql file is generated with the logical model.2 This file may then be visualized in GeoDWCASE, with no need for an external editor. 3.2. A running example An instance of the metamodel discussed in the previous section is displayed in Fig. 4 to illustrate our metamodel definitions, which will be given in the next section. This example considers an environmental application that aims to monitor areas at risk of flooding and their precipitation rates as well. Fig. 4 shows a GDW schema for this application study. In this schema, a fact table has two measures: (1) a conventional measure for storing the precipitation values and (2) a spatial measure for keeping the geometries of flood areas. We also consider that precipitation rates are derived from data acquisition platforms designed and built for the remote collection of precipitation data. These platforms are located in certain cities and within some hydrological basins. 1 All metamodel OCL rules can be obtained by accessing the following Web address: http://www.cin.ufpe.br/golapware/geodwcase/geodwm. gmfgen. 2 The SQL script generated by the GeoDWCASE Tool for the GDW schema presented in Fig. 12 (see Section 7.1), according to the PostgreSQL/ PostGIS SDBMS syntax, can be found at http://www.cin.ufpe.br/ golapware/geodwcase/gdwDatasusPostGIS.sql.
ARTICLE IN PRESS J. da Silva et al. / Information Systems 35 (2010) 592–614
597
Fig. 2. Our metamodel stereotypes for facts.
Fig. 3. Our metamodel stereotypes for dimensions.
As shown in the GDW schema given in Fig. 4, the table names have prefixes that associate them to our metamodel class definitions. This schema includes a composite geographical dimension (namely cgd_location) that contains a primary key attribute, four location descriptions for the following spatial objects: city, micro-region, meso-region and state (i.e. four location names) and four foreign keys to the corresponding primitive geographical dimensions (pgd_city, pgd_micro_region, pgd_meso_region and pgd_state). By storing the geometrical data only in primitive geographical dimensions, spatial data redundance is avoided as just foreign keys to these dimensions may be kept together with their respective location descriptions in composite geographical dimensions. The great advantage of our metamodel is related to this issue, specially due to the intrinsic redundancy of dimensional data, the high costs of storing geometries and the high number of spatial dimensions that may be found in GDW schemas. In order to investigate how spatial data redundancy affects query
performance over GDW and validate our metamodel proposal we have carried out some experimental work to compare redundant and non-redundant GDW schemas. These performance results can be found in [45,46]. An instance of a GDW schema for the rainfall application outlined above is given in Fig. 5. Note that without primitive and composite geographical dimensions, the geometrical and descriptive attribute values would have to be stored in the same tuple, which results in high storage costs, as shown in [45,46]. For example, consider a GDW dimension with a hundred rainfall data collection platforms that are in the same mesoregion, micro-region, state and city. If our data modelling approach is considered, the values of the geometrical and location description attributes for meso-region, micro-region, state and city will be stored in the respective primitive and composite geographical dimensions and there is no geometrical data redundancy (as shown in Fig. 5).
ARTICLE IN PRESS 598
J. da Silva et al. / Information Systems 35 (2010) 592–614
Fig. 4. A GDW schema.
Fig. 5. A GDW schema instance.
ARTICLE IN PRESS J. da Silva et al. / Information Systems 35 (2010) 592–614
4. The GDW metamodel definitions Before presenting the GeoMDQL language, its syntax and operators, we propose some formal definitions for our GDW metamodel. Despite the widespread interest in DW formalization in the field of DB research, little attention has been devoted to providing formal specifications for a GDW. Our GDW metamodel definition: (1) requires geometrical and descriptive data of spatial objects to be normalized in order to minimize geometrical data redundancy (mainly because geometries have a high storage cost); and (2) is based on well-accepted, open standards (i.e. OMG Common Warehouse Metamodel [31] and OpenGIS simple features specification for SQL [5]). In our approach, we assume that the GDW schema is based on the relational data model [41]. Thus, our metamodel for a GDW has two families of relation schemas: one family of DW dimension schemas and another of DW fact schemas. The domains of the attributes in these relation schemas are defined as follows. Definition 1 (Conventional and geometrical domain). A conventional domain is a set of values of a basic type, such as integer, real and string. A geometrical domain is a set of values of a geometrical type, such as Point, LineString, Polygon, GeometryCollection, MultiPoint, MultiLineString and MultiPolygon. Given an attribute A, we use the notation domðAÞ to denote the domain of the attribute A. Definition 2 (Common/location/geometrical attribute). Let A be an attribute. If domðAÞ is a conventional domain, A is either a location attribute or a common attribute. It is a location attribute when it contains descriptions of locations of spatial objects (e.g. country name), otherwise it is a common attribute (e.g. descriptions of a customer profile). If domðAÞ is a geometrical domain, A contains geometries of spatial objects (e.g. geometries of states) and is called a geometrical attribute. Definition 3 (DW dimension schema). A DW dimension schema is a relation schema, denoted by DðPK; FK 1 ; . . . ; FK r ; CA1 ; . . . ; CAm ; LA1 ; . . . ; LAt ; G1 ; . . . Gp Þ, made up of a schema name D and a list of attributes PK, FK 1 ; . . . ; FK r , CA1 ; . . . ; CAm , LA1 ; . . . ; LAt , G1 ; . . . ; Gp , in which: (1) PK is the primary key attribute; (2) the sublist ðFK i Þ may be empty or 1rirr and each FK i is a foreign key attribute to a relation in the family of DW dimension schemas; (3) the sublist ðCAj Þ may be empty or 1rjrm and each CAj is a common attribute; (4) the sublist ðLAk Þ may be empty or 1rkrt and each LAk is a location attribute; (5) the sublist ðGl Þ may be empty or 1rlrp and each Gl is a geometrical attribute.
Definition 4 (DW dimension relation). Given a DW dimension schema DðPK; FK 1 ; . . . ; FK r ; CA1 ; . . . ; CAm ; LA1 ; . . . ; LAt ; G1 ; . . . Gp Þ, the DW dimension relation dr of D, also denoted by drðDÞ, is a subset of the Cartesian product domðFK r Þ between domðPKÞ dom ðFK 1 Þ
599
domðCA1 Þ domðCAm Þ domðLA1 Þ domðLAt Þ domðG1 Þ domðGp Þ. The degree of drðDÞ is n ¼ 1 þ r þ m þ t þ p, i.e. the number of attributes of D. The relations of a DW dimension schema can be geographical, conventional or hybrid. We then define three relation schemas as given by Definitions 5–7. A geographical DW dimension schema can either have location attributes or geometrical attributes. While a conventional DW dimension schema can only have common attributes, a hybrid DW dimension schema must have common and location attributes. Note that conventional and hybrid DW dimension schemas cannot have geometrical attributes, nor can the geographical DW dimension schema have common attributes. Considering the intrinsic data redundancy in the DW dimension relations, a geographical DW dimension schema is specialized into primitive and composite in order to normalize the geometries of spatial objects. Thus, the geometrical attributes are stored in the primitive geographical DW dimension relations and the location attributes are stored in the composite geographical DW dimension relations (see Section 3). Definition 5 (Geographical DW dimension schema). A geographical DW dimension schema is a relation schema based on Definition 3 that can be either primitive or composite: Primitive geographical DW dimension schema: In this case, we have: (1) r ¼ m ¼ t ¼ 0; in other words, there is no foreign key, common attribute or location attribute; and (2) p ¼ 1, i.e. there is one geometrical attribute.
Composite geographical DW dimension schema: In this case, we have: (1) tZ1; there is at least one location attribute; (2) r ¼ t, i.e. there is one foreign key for each location attribute. Each foreign key references to a primitive geographical DW dimension relation; (3) m ¼ p ¼ 0 are empty, i.e. there is no common attribute or geometrical attribute. As illustrated in the GDW schema given in Fig. 4, pgd_stateðstate_pk; state_geomÞ, pgd_cityðcity_pk;city_geomÞ, pgd_basin_locationðbasin_location_pk; basin_geomÞ and pgd_ DCP_locationðdcp_location_pk; dcp_geomÞ are examples of primitive geographical DW dimension schemas, as they all have at least one geometrical attribute; domðstate_ geomÞ and domðcity_geomÞ are of the multipolygon type; domðbasin_geomÞ is of the polygon type; and domðdcp_ geomÞ is of the point type. For the composite geographical DW dimension schema, consider the following example cgd_locationðlocation_pk; city_fk;city_name; micro_region_fk;micro_region_name; meso _region_fk; meso_region_name; state_fk; state_nameÞ shown in Fig. 4. Note that this schema does not contain any
ARTICLE IN PRESS 600
J. da Silva et al. / Information Systems 35 (2010) 592–614
geometrical attribute, but it does have four foreign keys, namely city_fk, micro_region_fk, meso_region_fk and state_fk, pointing to four primitive geographical DW dimension relations (drðpgd_cityÞ, drðpgd_micro_regionÞ, drðpgd_meso_regionÞ and drðpgd_stateÞ). Definition 6 (Conventional DW dimension schema). A conventional DW dimension schema is a relation schema based on Definition 3, in which: (1) rZ0, i.e. there are zero or more of foreign keys and there is no foreign key to a primitive geographical DW dimension relation; (2) mZ1; there is at least one common attribute; (3) t ¼ p ¼ 0, i.e. there is no location attribute or geometrical attribute. In Fig. 4, d_timeðtime_pk; day; month_num; month_name; yearÞ is an example of a conventional DW dimension schema with only common attributes. Definition 7 (Hybrid DW dimension schema). A hybrid DW dimension schema is a relation schema based on Definition 3, in which: (1) rZ1, i.e. there is at least one foreign key to a primitive and/or composite geographical DW dimension relation; (2) mZ1, i.e. there is at least one common attribute; (3) tZ1, i.e. there is at least one location attribute for each foreign key to a primitive geographical DW dimension relation (for other foreign keys, there is no location attribute); (4) p ¼ 0, i.e. there is no geometrical attribute. As shown in Fig. 4, hd_hidrographic_basin ðbasin_ pk; basin_name; basin_location_fkÞ is an example of an hybrid DW dimension schema. This schema has a foreign key ðbasin_location_fkÞ to the primitive geographical DW dimension drðpgd_basin_locationÞ. Fig. 4 also displays another hybrid DW dimension schema, hd_ DCPðdcp_ pk; dcp_name; instal_date; location_fk; dcp_location_fkÞ, with two foreign keys: (1) location_fk, which refers to the composite geographical DW dimension relation drðcgd_locationÞ, and (2) dcp_location_fk, which refers to the primitive geographical DW dimension relation drðpgd_dcp_location). Definition 8 (DW fact schema). A DW fact schema is a relation schema, denoted by FðPK; CA1 ; . . . ; CAm ; LA1 ; . . . ; LAt ; CM1 ; . . . ; CMr ; GM 1 ; . . . ; GMq Þ, made up of a schema name F and a list of attributes PK, CA1 ; . . . ; CAm , GA1 ; . . . ; GAt , M1 ; . . . ; Mr , GM1 ; . . . ; GM q , in which: (1) PK is the primary key attribute, which is composed of the list ðFK 1 ; FK 2 ; . . . ; FK p Þ, in which each FK i ; 1rirp is a foreign key that refers to a DW dimension schema; (2) the sublist ðCAj Þ may be empty or 1rjrm and each CAj is a common attribute; (3) the sublist ðLAk Þ may be empty or 1rkrt and each LAk is a location attribute;
(4) the sublist ðCM l Þ may be empty or 1rlrr and each CMl is a measure that is a common attribute; (5) the sublist ðGM s Þ may be empty or 1rsrq and each GM s is a measure that is a geometrical attribute. Definition 9 (DW fact relation). Given a DW fact schema FðPK; CA1 ; . . . ; CAm ; LA1 ; . . . ; LAt ; CM1 ; . . . ; CMr ; GM1 ; . . . ; GMq Þ, in which PK is the sublist ðFK 1 ; . . . ; FK p Þ, the DW fact relation fr of F, also denoted by frðFÞ, is a subset of the Cartesian product between domðFK 1 Þ domðFK p Þ domðCA1 Þ domðCAm Þ domðLA1 Þ domðLAt Þ domðCM1 Þ domðCM r Þ domðGM1 Þ domðGM q Þ. The degree of frðFÞ is n ¼ p þ m þ t þ r þ q. Fig. 4 shows f _meteorologyðtime_fk; location_fk; basin_fk; dcp_ fk; precipitation; flooding_areaÞ as an example of a DW fact schema with four foreign keys (i.e. time_fk, location_fk, basin_fk and dcp_fk), a measure that is a common attribute (i.e. precipitation) and that represents rainfall rates derived from all the data collection platforms and a measure that is a geometrical attribute (i.e. flooding_area) representing areas covered by water. Definition 10 (Schema of a geographical data warehouse). A schema of a geographical data warehouse is a pair GDW ¼ /DS; FSS, in which DS is a non-empty finite family of DW dimension schemas and FS is a non-empty finite family of DW fact schemas. Fig. 4 is an example of a geographical data warehouse schema as defined in Definition 10. Definition 11 (Geographical data warehouse schema instance). Given a GDW ¼ /DS; FSS, a geographical DW schema instance is a pair GDWI ¼ /DR; FRS, where: (1) DR is a non-empty finite family of DW dimension relations and each element in DR is a drðDÞ such that D 2 DS; and (2) FR is a non-empty finite family of DW fact relations, where each element in FR is a frðFÞ such that F 2 FS. Finally, Fig. 5 is an example of a geographical data warehouse schema instance. 5. Geographical data cube definitions Based on the GDW definitions presented in the previous section, several data cubes may be instantiated. Thus, we now present other formalizations related to data cubes, including definitions for level, level members, dimension, dimension hierarchy and data cube. Definition 12 (Level/level members). Given a GDW ¼ /DS; FSS, a relation schema RS such that RS 2 DS or RS 2 FS and a relation rðRSÞ, a level l, defined from rðRSÞ and denoted by l ¼ /A; MS, is made up of a level name l, an attribute A of rðRSÞ and a set M of level members defined as follows:
if A is a location attribute and rðRSÞ is a composite geographical dimension relation (see Definition 5) or a hybrid dimension relation (see Definition 7) then: M is a set of tuples ðv; gÞ in which each v 2 domðAÞ (i.e. each v is a location attribute value) and g, called member
ARTICLE IN PRESS J. da Silva et al. / Information Systems 35 (2010) 592–614
geometry, is its associated geometry as described in Definitions 5 and 7. In this case, we say that level l is geographical; otherwise: M is a set of attribute values of A. In this case, we say that level l is conventional.
Each dimension in a data cube is a hierarchy between levels defined from a relation of a GDW. The precise definitions are given below. Definition 13 (Dimension/dimension hierarchy). Given a GDW ¼ /DS; FSS and a relation rðRSÞ such that RS 2 DS or RS 2 FS, a dimension d ¼ ðL [ fAllg; rh Þ, defined from rðRSÞ, is made up of a dimension name d and the totally ordered set ðL [ fAllg; rh Þ, called dimension hierarchy, in which: (1) L is a set of levels and each l 2 L is defined from rðRSÞ; moreover, all levels in L are either conventional (in this case, d is a conventional dimension) or geographical (d is a geographical dimension); (2) rh is a total order over L [ fAllg such that for l ¼ /A; MS; l0 ¼ /A0 ; M 0 S 2 L lrh l0 iff, (a) the levels in L are conventional and, for each m 2 M, there is an m0 2 M 0 such that mrm0 ; (b) or the levels in L are geographical and, for each ðv; gÞ 2 M, there is a ðv0 ; g 0 Þ 2 M 0 such that gDg 0 ; (3) All is an additional value, which is the greatest element of ðL [ fAllg; rh Þ, i.e. lrh All for all l 2 L. The value ALL added in the dimension hierarchy represents the aggregation over the entire dimension. In Fig. 6, we have specified two geographical levels, State and City, based on the DW dimension relation cgd_location, pgd_state and gd_city outlined in our running example (see Section 3.2). As we can see, all members of the geographical levels State (Fig. 6(A)) and City (Fig. 6(B)) have geographical information coming from the primitive geographical DW dimension relations pgd_state and pgd_city, respectively. The geographical
601
dimension Location is defined by the dimension hierarchy using these two levels, as shown in Fig. 6(C). The aggregation process in the definition of a data cube is performed by an aggregation function f that takes a multiset R built from tuples of a DW fact relation fr and produces a single value. In order to define the domain of f, we cannot use a projection over fr, since the projection does not accept duplicates. Thus, we use a multiset of tuples as the domain of f. The tuples of the DW fact relation in which f acts are defined from the levels of a set DS of dimensions (Remember that the primary key of a DW fact relation is a list of foreign keys to DW dimension relations.). Thus, in order to define the domain of f, we first introduce an operation called fact projection, which produces a multiset P. This operation is like a projection, but gives a multiset of tuples instead of a relation (i.e. a set of tuples), thus allowing duplication. We then define f as a partial function the domain of which is a DW fact relation fr and the domain of definition is a multiset of tuples R built from the tuples of fr in which f operates. The precise definition is given as follows: Definition 14 (Fact projection). Given a DW fact schema FðPK; CA1 ; . . . ; CAm ; LA1 ; . . . ; LAt ; CM1 ; . . . ; CMr ; GM 1 ; . . . ; GM q Þ, in which PK is ðFK 1 ; . . . ; FK p Þ, and a frðFÞ with degree n ¼ p þ m þ t þ r þ q, as defined in Definition 9, the fact projection Yfk1 ;...;fkx ;ca1 ...cay ;la1 ...las ;cm1 ...cmz ;gm1 ...gmw , in which 0rxrp, 0ryrm, 0rsrt, 0rzrr and 0rwrq maps the n-tuples of fr to a multiset of l-tuples, in which l ¼ x þ y þ s þ z þ w and lrn. Definition 15 (Aggregation function). An aggregation function fi 2 Fg , which is a countable set Fg ¼ ff1 ; f2 ; . . .g, is a partial function fi : fr-V[ ? in which: (1) frDdomðPKÞ domðCA1 Þ domðCAm Þ domðCM 1 Þ domðCM r Þ domðGM1 GMq is a DW fact relation in which fi operates, such that PK is FK 1 FK 2 FK p ; (2) V is conventional or a geometrical domain;
Fig. 6. Examples of geographical levels.
ARTICLE IN PRESS 602
J. da Silva et al. / Information Systems 35 (2010) 592–614
(3) the domain of definition of fi , i.e. the multiset R, is defined as follows: (a) Let P ¼ Yfk1 ;...;fkx ;ca1 ...cay ;la1 ...las ;cm1 ...cmz ;gm1 ...gmw be a fact projection over fr, where 0rxrp, 0ryrm, 0rsrt, 0rzrr and 0rwrq; (b) let DS be a set of dimensions; (c) R is a multiset of l-tuples of P selected from the levels of DS . The elements of fr in which fi is undefined are mapped to ?. Definition 16 (Data cube). A data cube (or cube) C is a three-tuple /DC ; frðFÞ; f S in which: (1) DC is a list of dimensions and has the same elements of DS , which is the set of dimensions used in the domain of definition of f; (2) fr is a DW fact relation; (3) f : fr-V[ ? is an aggregation function by which the facts of C are defined; (4) the dimension of the cube C is equal to the length of DC . We use the notation C n to denote the dimension of C such that the length of DC is equal to n. If DC has at least one geographical dimension, we say that C is a geographical cube, denoted by GC. A taxonomy of aggregation functions for SOLAP applications can be found in [6]. A query language based on this taxonomy and on the data cube definitions given above is proposed in the next section. 6. The GeoMDQL query language As shown in Section 2, there are a large number of proposals in the database literature regarding geographical and multidimensional query languages. However, there is no query language with a unified syntax that allows the use of spatial and multidimensional operators for querying a geographical data warehouse. Thus, we propose GeoMDQL (Geographical and Multidimensional Query Language), a query language based on both MDX [51] and OGC SFS 4SQL [5] that offers a unified syntax for the statement of queries containing both multidimensional and spatial operators. Positional operators defined according to the work described in [18,4] were also included in the GeoMDQL language. Section 6.1 discusses the GeoMDQL query types and its syntax is detailed in Section 6.2. The GeoMDQL processor architecture is then outlined in Section 6.3. 6.1. A taxonomy of GeoMDQL query types The aim of the GeoMDQL query language is to integrate the query syntax and operators from both geographical and multidimensional environments. Thus, with GeoMDQL, users can specify three types of queries: GEO, MD and GEOMD, which are detailed as follows. GEO: A GEO request only contains geographical parameters for performing a spatial query. For this query type,
users can employ spatial operators such as distance, intersects, contains, cover and crosses for evaluating spatial relationships between two geographical features. These GeoMDQL operators are based on the spatial operators given in [10,5] so that they may be related to well-known standards. As an example, we consider a land use data cube, in which users wish to know which farms intersect the Capibaribe River or farms that are totally or partially contained in the Capibaribe River basin. The result of these queries will always be a geographical feature or a set of geographical features displayed on a map. MD: An MD query only contains multidimensional parameters and allows the execution of a multidimensional query in the geographical data cube. This query only contains the OLAP operators found in [51], based on the MDX language. For example, in a retail data cube, users may wish to know the list of the 10 most sold products for each category and for a specific month of the year. To formulate this query, the user can use well-known multidimensional operators, such as rank, drill-up, slice or dice, and the request result will always be a multidimensional data table. GEOMD: A GEOMD request consists of a combination of the two previous query types. This request can be further classified into the two following other types: (1) Mapping GEOMD, corresponding to a multidimensional request that displays data with geographical correspondences on a map; and (2) Integration GEOMD, in which multidimensional and spatial constraints are specified and used in the request processing as well. As an example of a Mapping GEOMD query, we consider users querying a geographical cube to show on a map the location of their best customers. In this case, a query similar to an MD query is formulated using only multidimensional operators and the location of the query results is then displayed on a map. For such, the GeoMDQL query processor automatically executes the queries in a geographical cube to identify the MD query results that have geographical correspondences. In this case, the query results are always displayed using both maps and tables. In the case of an Integration GEOMD, users can use both multidimensional and spatial operators for formulating their requests. As an example of a land-use analysis, users may wish to know which farms intersect the Capibaribe river and produced over 10 tons of rice in 2005. For the case of an Integration GEOMD, the results can be displayed using maps and/or tables. User query results can be given as an input parameter for further queries, which can always be mapped to one of the query types given previously. Moreover, in the case of an Integration GEOMD query, users can use both multidimensional and spatial operators or a combination of the two. The GeoMDQL operators resulting from this combination are listed in the following section.
6.2. The GeoMDQL syntax GeoMDQL includes the following sets of operators: (1) all operators originally defined for MDX [51], as our language is based on it. It should be pointed out that, for this reason, users are able to define and implement new
ARTICLE IN PRESS J. da Silva et al. / Information Systems 35 (2010) 592–614
functions using the MDX UDF tool [51]; (2) the MDX operators that were re-implemented (overloaded) in order to enable the manipulation of the geographical data (these are displayed in Table 2); (3) the geographical operators listed in Table 3, which are based on [5,10] and were implemented in the GeoMDQL language in order to enable the processing of queries involving spatial operations on data from a geographical and multidimensional cube; and (4) new operators, specified and implemented especially for the GeoMDQL language, which are displayed in Table 4. For the new operators group given in Table 4, the GeoMDQL operators were identified by: (1) combining two types of operators selected from the two different areas (i.e. spatial and multidimensional). The operators HighestDistance, LowestDistance, RankArea and RankLength listed in Table 4 can be seen as examples of such combination; (2) applying a spatial operator to a multidimensional data and vice versa. The first and the last operators given in Table 4 are examples of this second approach. The operators shown in this table are those specially designed for GeoMDQL that have been implemented thus far. However, this does not imply that this set of operators is complete. An investigation into a comprehensive set of SOLAP operators and operators based on human reasoning regarding space is beyond the scope of this paper. Regarding the operators that were re-implemented (overloaded) (i.e. see Tables 2 and 3), all remained with the same semantic, but required considerable implementation effort because we had to rewrite these operators in order to
603
account for the new structure in which the data are organized (see Definition 16). We have defined the GeoMDQL grammar using the EBNF [17], as this formalism is found in many academic papers. It has also been applied to some commercial areas and the MDX original grammar was developed using this formal representation as well. Fig. 7 displays the specification of the main elements of a GeoMDQL query. The main component of the GeoMDQL grammar is geomdql_ statement, which is defined as a select_statement. This select_statement may contain the following definitions: (1) (WITH formula_specification)?; (2) SELECT axis_ specification_list; (3) FROM cube_specification; (4) (WHERE slicer_specification)?; (5) (cell_props)? and (6) (ON MAP)?. According to the EBNF syntax, the character ‘‘?’’ indicates that an element is optional, whereas the uppercase elements (e.g. WITH, SELECT, FROM, WHERE and ON MAP) are terminal elements of the GeoMDQL language. The formula_specification element maintains the original definition of the MDX [51] language and allows the specification of formulas for the creation of sets of calculated members based on the values stored in a data cube (see Definition 16). For instance, the statement WITH MEMBER Measures.[Profit Percent] AS ‘‘(Measures.[Store Sales]-Measures.[Store Cost]) / (Measures.[Store Cost])’’ makes use of two measures stored in a data cube (i.e. [Store Sales] and [Store Cost]) to create a new calculated member denominated profit percentage, the value of which is calculated at run time.
Table 2 GeoMDQL operators overloaded from MDX. Operator
Description
Syntax
DrilldownLevel
Drills down the members of a set at an specified level to one level below. Alternatively, it drills down to a specified level in the dimension hierarchy
DrilldownLevel(oset of Level Members4), DrilldownLevel(oset of Level Members4, oLevel4)
DrilldownMember
Drills down the members in a set that are found in a second set
DrilldownMember(oset of Level Members4, oset of Level Members4)
DrillupLevel
Drills up the members of a set at a specified level to one level above. Alternatively, drills up to a specified level in the hierarchy
DrillupLevel(oset of Level Members4), DrillupLevel(oset of Level Members4, oLevel4)
DrillupMember
Drills up the members in a set that are found in a second set
DrillupMember(oset of Level Members4, oset of Level Members4)
Ancestor
Returns the ancestor of a member at a certain level
Ancestor(oLevel Member4, oLevel4), Ancestor(oLevel Member4, oNumeric Expression4)
Descendants
Returns the set of descendants of a member to a specified level
Descendants(oLevel Member4), Descendants(oLevel Member4, oLevel4), Descendants(oLevel Member4,oNumeric Expression4)
Ascendants
Returns the set of the ascendants of a given member
Ascendants(oLevel Member4)
Members
Returns the set of members of a level
oLevel4.Members
Children
Returns the children of a member
oLevel Member4.Children
Siblings
Returns the siblings of a given member, including the member itself
oLevel Member4.Siblings
All
Returns the top level of a specified hierarchy
oDimension Hierarchy4.All
ARTICLE IN PRESS 604
J. da Silva et al. / Information Systems 35 (2010) 592–614
Table 3 The GeoMDQL operators based on OGC SFS4SQL. Operator
Description
Syntax
Distance
Returns the Cartesian distance between two members of a geographical Distance(oLevel Member4 j oMember Geometry4, oLevel dimension Member4 j oMember Geometry4)
Positional operators
These operators determine whether a member is at a certain position in At_North_Of(oLevel Member4 j oMember Geometry4, relation to another member. For example, At_North_Of determines oLevel Member4 j oMember Geometry4) whether a member is to the north of another member
Topological operators
These operators determine topological relationships between two members. For example, Intersects determines whether one member intersects another member
Intersects(oLevel Member4 j oMember Geometry4, oLevel Member4 j oMember Geometry4)
Intersection Returns the intersection area between two members
Intersection(oLevel Member4 j oMember Geometry4, oLevel Member4 j oMember Geometry4)
Union
Returns a geometry that is the union set of two members or a set of members
Union(oLevel Member4 j oMember Geometry4, oLevel Member4 j oMember Geometry4), Union(oset of Level Members4)
Buffer
Returns a geometry that represents all points for which distance from this geometry is less than or equal to a specified distance
Buffer(oLevel Member4, oNumeric Expression4)
Area
Returns the member area
Area(oLevel Member4)
Length
Returns the member length
Length(oLevel Member4)
Equals
Determines whether one member is equal to another
Equals(oLevel Member4 j oMember Geometry4, oLevel Member4 j oMember Geometry4)
Table 4 Operators specifically designed for GeoMDQL. Operator
Description
Syntax
DrillOut
For a given level, returns the neighboring members of a certain member
DrillOut(oLevel Member4)
HighestDistance Ranks all members that are within a given distance from a HighestDistance(oLevel Member4,oset of Level Members4), certain member and shows the results in descending HighestDistance(oLevel Member4, oNumeric Expression4, oset of order Level Members4) LowestDistance
Ranks all members that are within a given distance from a LowestDistance(oLevel Member4,oset of Level Members4), certain member and shows the results in ascending order LowestDistance(oLevel Member4, oNumeric Expression4, oset of Level Members4)
RankArea
Ranks all members according to their area
RankArea(oset of Level Members4)
RankLength
Ranks all members according to their length
RankLength(oset of Level Members4)
Point
Returns all centroid points for a given set of polygons
Point(oset of Level Members4)
Fig. 7. The GeoMDQL language syntax.
Note that axis_specification_list represents the axis definitions of a GeoMDQL query, which is similar to the original MDX syntax. For an axis specification, the multidimensional and spatial operators can be used for
navigating and handling members of the dimension hierarchies of DC in a data cube C ¼ /Dc ; ftðFÞ; f S (see Definition 13). Some of the GeoMDQL multidimensional operators are inherited from MDX, as shown in the following examples. The operator FILTER returns a set resulting from filtering another set based on a search condition. Thus, the statement FILTERðf½Location:½City: Membersg; ðMeasures: ½PopulationÞ 4 500; 000Þ ON ROWS may be used on the ROWS axis of a query to display all members of the level City in the dimension Location ¼ ðfState; City; Allg; rh Þ (where the dimension hierarchy rh is shown in Fig. 6) that are associated with the measure Population with a value larger than 500,000. Similarly, the TOPCOUNT operator returns a specified number of items taken from the top of an optionally ordered set.
ARTICLE IN PRESS J. da Silva et al. / Information Systems 35 (2010) 592–614
The statement TOPCOUNTðf½Location: ½State:Membersg; 5; Measures:½NeonateDeathsÞON COLUMNS may be used on the COLUMNS axis to display the members of the level State of the dimension Location that are related to the top five values of the measure Neonate Deaths. Besides being able to use the original MDX operators, users can select any of the overloaded operators listed in Table 2. Based on MDX, these operators have been overloaded in GeoMDQL to provide a means of navigating and analyzing a geographical data cube, defined according to Definition 16. For instance, to reduce the data granularity of a given dimension, the aggregation and disaggregation operations, namely drill-up and drill-down, may be chosen. These correspond to the DRILLDOWNLEVEL and DRILLUPLEVEL operators, respectively. The following query statement given as an example drills down from the level State to a level l such that lrh State in the dimension Location as defined in Fig. 6: DRILLDOWNLEVELð½Location:½State: MembersÞ. Similarly, the statement DRILLUPLEVEL ð½Location:½State:MembersÞ may be used to drill up from the level State to a level l0 of the dimension Location such that Staterh l0 . Moreover, the DRILLDOWNMEMBER and DRILLUPMEMBER operators are used for navigating in the dimension hierarchies of a data cube by reducing or increasing the data granularity of certain members of a given level. For example, the statement DRILLDOWNLEVELð½Location:½Region:Members; ½Location: ½Region:½South; ½Location:½Region:½NorthÞ drills down the members of a set that is found in a second set by applying the DRILLDOWN operation to the members South and North of the level States only. GeoMDQL also provides a set of OGC-based operators, which are listed in Table 3 and are used to solve spatial queries based on the geometrical data of geographical dimension tables. While the positional operators [4,18] determine whether two members stand in a given cardinal direction relationship to one another (e.g. At_South_Of ; At_East_Of ; At_West_Of ; At_North_East_Of ; At_South_East_ Of ; At_North_West_Of ; At_South_West_Of ), the topological operators identify whether topological relationships (e.g. Intersects, Touches, Crosses, Within, Overlaps and Contains) between two members can be satisfied. Table 4 displays the operators that may be used in an axis specification and have been specially designed for the GeoMDQL language, such as HIGHESTDISTANCE and LOWESTDISTANCE. For instance, the statement SELECT LOWESTDISTANCEð½Location:½City:½Recife; 5; ½Location: ½City: MembersÞ ON ROWS may be used on the ROWS axis to display the five closest City level members to the city of Recife. In this case, Location is clearly a geographical dimension. Moreover, the DRILLOUT operator may be used to identify all the neighbors of a member of a given geographical level. To illustrate this, the following query specification is given, which may be used to recover all neighbors of the city of Recife: DRILLOUTð½Location: ½City:½RecifeÞ. As HIGHESTDISTANCE is just the inverse operator of LOWESTDISTANCE, an example of its application is not given here. The cube_specification element syntax is kept as the original MDX definition and indicates which data cube is being requested. While cell_props maintains the original
605
MDX specification, slicer_specification corresponds to specific GeoMDQL restrictions that are formulated by using any of the spatial operators listed in Table 3. To illustrate this, consider the following GeoMDQL query given as an example: AT_NORTH_OFð½Location:½State; ½Location:½State: ½PernambucoÞ. This shows that a spatial operator may be used in the slicer specification of a GeoMDQL query to recover the members of the level State that are located to the north of the State of Pernambuco. ON MAP is a specific GeoMDQL clause used to display the query results on a map. Using the GeoMDQL query language syntax presented in this section, any query type discussed in Section 6.1 can be formulated and executed by our prototype system. Furthermore, if a user wishes to query an ad hoc chosen area by selecting it from a given map, a GeoMDQL operator is available for identifying the selected area, the geometry of which is given as a parameter for the GeoMDQL slicer specification clause. For example, with the statement ðWITHINð½Location : ½City; ðPOLYGON ðð 55:51 9:00; . . . ; 55:51 9:00Þ users can restrict the query context by retrieving cities located within the polygon given as the second parameter of the spatial operator WITHIN. It is clear that such queries should have a graphical user interface to allow users to interact with previous query results when performing new queries by just clicking and selecting objects.
6.3. The GeoMDQL architecture Our system architecture is displayed in Fig. 8 and is an instance of the GOLAPA architecture [14]. This architecture
Fig. 8. Geographical–multidimensional architecture.
ARTICLE IN PRESS 606
J. da Silva et al. / Information Systems 35 (2010) 592–614
is composed of three layers (I, II and III), which provide data, geographical and multidimensional processing, and the graphical user interface, respectively. Throughout these three layers, a metadata (METADATA) source is used to provide the integration of the geographical and multidimensional processing. The first layer (I) contains the geographical data warehouse (GDW), which is based on our metamodel (see Section 4). As stated previously, our GDW metamodel normalizes geometrical data, provides geographical data in any dimensional level and stores the descriptive data of geographical features. The open source SDBMS used for creating the GDW is the PostgreSQL, with its spatial extension denominated PostGis. For the extraction, transformation and loading of the geographical and multidimensional data, we have used scripts based on the PostgreSQL PL/pgSQL language. However, other DBMSs with
a spatial extension can be used (e.g. Oracle Spatial and MySQL). The metadata source (METADATA) plays an important role in this work. The integration metadata are accessed by the analytic-multidimensional and geographical processing engine component (see Fig. 8) whenever a GeoMDQL request is received. Thus, this engine can discover whether the multidimensional data have geographical correspondences. A geographical correspondence is the information representing the geometry of the spatial object. The metadata source implementation is currently based on XML technology. Geographical and multidimensional data cubes are defined using XML files. In the metadata source implementation, the concepts formally defined in Section 5 were used. The implementation of the second layer (II) is based on an extension of the Mondrian OLAP server [38]. In order to
Fig. 9. Schema and Cube elements of the Mondrian XML schema.
ARTICLE IN PRESS J. da Silva et al. / Information Systems 35 (2010) 592–614
extend the Mondrian OLAP server to provide a means of specifying geographical and multidimensional data cubes and processing geographical data, we have modified the XML schema related to the definitions, dimensions, hierarchies and levels of a data cube that is found in this server. For such, new attributes and elements were added to this schema to allow both the manipulation of geographical data and the execution of geographical and multidimensional operations by the modules in Layer II. Fig. 9 partially shows the Mondrian XML schema with the schema and data cube definitions. This representation follows the XML Spy [48] tool syntax. For the element Schema, the following attributes were added: (1) isGeoSchema, to indicate whether or not the current cube schema is geographical; and (2) geoMDSchema, to define the geographical schema name. Similarly, for the element Cube, the next two attributes, isGeoCube and geoCube, were added to specify whether the data cube is geographical and to store the geographical cube name, respectively. For the element Dimension, shown in Fig. 10(A), the attributes isGeoDimension and geoDimension were included to indicate whether the dimension is geographical and to keep the geographical dimension name, respectively. This
607
was also the case with regard to the element Hierarchy, which has similar attributes, namely isGeoHierarchy, which indicates the hierarchy type (i.e. geographical or not), and geoHierarchy, which maintains the geographical hierarchy name. Regarding the element Level of the Mondrian Schema (Fig. 10(B)), the following four attributes were included: isGeoLevel, geoLevel, geometryColumn and geoJoinColumn. These are, respectively, used in our prototype system to: (1) indicate whether a certain level is geographical; (2) represent the name of the GDW dimension table that is associated with the level; (3) specify which column of the dimension table has the geometry of the level members; and (4) maintain the primary key of the dimension table that has the geographical data to be used in the level definition. The remaining elements and attributes shown in Figs. 9 and 10 have not been changed and maintain the original definitions provided by the Mondrian OLAP server. From this extended XML schema, a set of Java classes is created and is then used by our processing engine to allow the processing of GEO, MD and GEOMD queries in the GDW. In Fig. 11, we present a possible instance of the XML schema detailed above. This instance is based on the
Fig. 10. Dimension, Hierarchy and Level elements of the Mondrian XML schema.
ARTICLE IN PRESS 608
J. da Silva et al. / Information Systems 35 (2010) 592–614
Fig. 11. Example of an XML metadata file.
geographical data warehouse presented in Fig. 5. This XML file shows the specification of a schema containing a geographical cube GC ¼ /fTime; Location; DCPg; f _meterology; SumS, where the set of levels of the conventional dimension Time contains the levels Year and Month. The geographical dimension Location contains the levels State and City. The third dimension of the cube, DCP (data collect platform), is geographical. Finally, the measure Precipitation is defined, which will be aggregated in accordance with the aggregation function Sum. The second layer (II) implements the Geographical Online Analytical Processing Engine (GOLAPE) component of the GOLAPA architecture. In Layer II of our architecture, we have the Analytic-Multidimensional and Geographical Processing Engine that is responsible for receiving and processing geographical and/or multidimensional requests. We have software modules for query processing, query optimization and query management. This engine is implemented by extending the Mondrian OLAP server [38] to provide support for spatial query processing. In this layer, all three types of queries listed in Section 6.1 can be processed. GEO, MD or GEOMD queries are expressed using the GeoMDQL query language presented in Section 6. In the query processing module, a parser written in Java implements the grammar of the GeoMDQL language presented in Section 6.2. The parser written for GeoMDQL performs the lexical analysis, translating the font file that contains the grammar of the language into lexemes and tokens, which allows the tokens to be recognized. This parser also performs the syntactic analysis, which is responsible for determining when a sentence is part of the grammar of the language. After performing the parsing, the query processor identifies the query type (i.e. GEO, MD
or integration or mapping GEOMD), which are detailed in Section 6.1. After performing the parsing and classification of a GeoMDQL query, the processing mechanism of Layer II accesses the source of the metadata in order to retrieve the information necessary for finalizing the processing and execution of the query. With this information, instructions are generated and used by the query management module to retrieve the geographical and/or multidimensional data solicited in the query. The query optimization module is still in development and aims to implement a cache and query rewriting mechanism in order to improve the processing performance. A set of Java classes was created based on the XML schema partially displayed in Figs. 9 and 10 to manipulate the instances of the schema, as shown in Fig. 11. This XML document defines the structure of the geographical and multidimensional cube that is being processed. The information contained in the XML file is also used to generate the SQL scripts that will be used to access the GDW tables in order to retrieve the geographical and multidimensional data and generate the proper data aggregations defined for the current data cube. In the third layer (III) of our architecture, we have the graphical user interface (GUI), which allows the specification of the queries and visualization of the results. For this layer, we take two implementation approaches. The first is directed at the Web, following current tendencies and allowing users to access the multidimensional and geographical processing environment through the Internet. For this approach, the OLAP client JPivot [40] was extended to enable the specification of geographical and/or multidimensional queries in the GeoMDQL
ARTICLE IN PRESS J. da Silva et al. / Information Systems 35 (2010) 592–614
syntax and subsequent submission of these queries to the query mechanism in Layer II. The query result viewer module was designed for graphically displaying the results in charts, tables and/or maps, using HTML language as well as SVG (Scalable Vector Graphics) [50] technology. In the second approach, we implemented a desktop client, which is based on the extension and integration of Java Plugin Framework (JPF) [23], OpenJUMP [34] and JRubik [24] technologies. This second approach also displays the results of the queries using graphs, maps and tables. However, it has a series of additional functions to format the results, export data to other formats and parallel work with multiple windows, which, in a Web implementation, would have a low performance and would require a high implementation cost. In this interface layer, the function implemented for the query editor module allows users to specify the queries in the GeoMDQL syntax by clicking on the tables, graphs and maps and interacting with a visual representation of the structure of the geographical data cube in which the query will be executed. As we can see, the GeoMDQL system architecture presented in this section is based on open, extensible standards. Thus, it is suitable for the development of environments for decision support at a low cost and according to current market standards. In the next section, we will describe a case study based on the public healthcare field in order to validate the ideas proposed in this paper. 7. A GeoMDQL application In order to demonstrate the use of our metamodels, query language and other proposed ideas, we present a case study with real data in this section. To illustrate the
609
GeoMDQL syntax and applicability, GeoMDQL query examples are listed in this section that have been designed for querying a Geographical Data Warehouse with data obtained from the Brazilian public healthcare system. This GDW structure is responsible for managing significant amounts of historical data that include georeferenced locations and enables us to explore the capabilities of multidimensional and geographical systems by improving the manipulation and evaluation of the data. The selected data include annual information on child mortality rates, women’s health, disease control and oral health, providing us with a means of analyzing the health and living conditions of the Brazilian population through the use of tables, charts and maps. Hence, this GDW may help authorities in the planning, management and evaluation of public policies on healthcare assistance as well as being an efficient means of delineating objectives in order to improve the existing public healthcare services offered to the Brazilian population. The GDW schema (see Fig. 12) was designed with our GeoDWCASE tool [15]. Based on this GDW schema, a large number of data cubes can be created. For this case study, we have designed a geographical and multidimensional data cube named BrPublicHealth. This data cube has measures such as Population, Deaths Under One Year of Age, Live Births, Underweight Live Births, Neonate Deaths, People Covered by the Family Health Program and Maternal Deaths. Time is a conventional dimension (i.e. does not contain geometrical data or location attributes) and is composed of a hierarchy with the level Year. Location is a geographical dimension composed of a hierarchy with the following geographical levels: Country, State, Meso Region, Micro Region and City. Climate is a geographical dimension containing a level
Fig. 12. A GDW for public healthcare analysis.
ARTICLE IN PRESS 610
J. da Silva et al. / Information Systems 35 (2010) 592–614
denominated Zone, which stores the geometries of all climatic zones in Brazil.
7.1. Query examples To exploit the data cube designed and outlined above, several GeoMDQL queries can be defined to help in the monitoring of Brazilian public healthcare. To illustrate this, Table 2 presents queries that have been grouped according to the taxonomy of query types discussed in Section 6.1 of this paper. For the query listed in the last row of Table 5, Fig. 13 shows the results generated by running our system, which implements the proposed GeoMDQL query language. Fig. 13(A) shows a table with live birth rates and the number of neonate deaths for each Brazilian state located
northeast of the Federal District. Fig. 13(B) displays the map resulting from this query processing and Fig. 13(C) displays the chart that can be optionally enabled by users to graphically exhibit the multidimensional data. With the functions available on our query interface, users can interact with the resulting table, configuring its appearance, altering the directions of the lines and columns as well as the form of displaying the data and data labels. It is also possible to interact with the resulting table in order to navigate through the hierarchy levels, increasing the details of the measures, which is automatically reflected in the graph and map. As in every application that works with maps, our interface also offers functions so that users can configure the display of the map resulting from a query, such as altering colors and labels. It is also possible to add graphs
Table 5 Examples of GeoMDQL queries for the public health data cube. Query
Query type
GeoMDQL syntax
For the year 2000, show the population rates and the number of neonate deaths for each Brazilian State and grouped by region.
MD
SELECT [Measures].[Population], [Measures].[Neonate Deaths] ON COLUMNS, DRILLDOWNLEVEL([Location].[Region].MEMBERS) ON ROWS FROM [BrPublicHealth] WHERE [Time].[Year].[2000]
Select all cities located within an ad hoc chosen area
GEO
SELECT [Location].[City].MEMBERS FROM [BrPublicHealth] WHERE ðWITHINð½Location:½City; ðPOLYGONðð55:519:00; . . . ; 55:519:00ÞÞÞÞÞ ON MAP
For the year 2002, show the 10 Brazilian States with the highest number of neonate deaths, highlighting the results on a map
Mapping GEOMD
SELECT [Measures].[Neonate Deaths] ON COLUMNS, TOPCOUNT([Location].[States].Members,10, Measures.[Neonate Deaths]) ON ROWS FROM [BrPublicHealth] WHERE [Time].[Year].[2002] ON MAP
Show the number of live births and neonate deaths for the Integration year 2002 and for States located northeast of the Federal GEOMD District
SELECT [Measures].[Live births],[Measures].[Neonate Deaths] ON COLUMNS, [Location].[State].MEMBERS ON ROWS FROM [BrPublicHealth] WHERE (([Time].[Year].[2002]) and (AT_NORTHEAST_OF([Location].[State], [Location].[State].[FEDERAL DISTRICT]) ON MAP
Fig. 13. A GeoMDQL query based on birth rates and neonate deaths.
ARTICLE IN PRESS J. da Silva et al. / Information Systems 35 (2010) 592–614
to a map, as shown in Fig. 13(B). The display of the graphs can also be easily configured, such as colors, captions, titles and type of graph. Fig. 13(D) displays the window of the query editor in which the GeoMDQL queries are specified. The layout of the windows containing the graphs, tables and query editor is not fixed, which enables users to configure the positions, moving, minimizing and maximizing according to the user’s needs. Another example of a GeoMDQL query of the Integration GeoMD is displayed in Fig. 14. In this query (see Fig. 14(A)), the objective is to retrieve the measures Population, Maternal Deaths, Neonate Deaths and Deaths Under One Year of Age for all micro-regions with a population of more than 250,000 inhabitants in 2002 and located within an ad hoc area defined by some polygon. For such, the FILTER and WITHIN operators of the GeoMDQL language are used. The table resulting from the query is displayed in Fig. 14(B) and the map is in Fig. 14(C). As we can see, a pie graph was added over each micro-region of the map, graphically demonstrating the measures Maternal Deaths, Neonate Deaths and Deaths Under One Year of Age. Optionally, users can interact with the table resulting from a query and apply a drill-down operation to any member. In our application, a user’s action is reflected in other components (i.e. maps or graphs). To exemplify this situation, a query is displayed in Fig. 15, in which the user interacts with the result of the previous query (given in Fig. 14) and applies a drill-down operation to the micro-regions MEIA PONTE and JANUARIA. With this operation, our application retrieves the averages in question (i.e. Population, Maternal Deaths, Neonate Deaths and Deaths Under One Year of Age) for each of the cities located within these two micro-regions, displaying the results in the table (Fig. 15(A)) and map (Fig. 15(B)). As we can see in Fig. 15(B), the map has two layers that correspond to the levels Micro-
611
Region and City of the geographic dimension Location in the BrPublicHealth data cube.
7.2. A comparative analysis This section provides a comparative analysis between our work and other studies on multidimensional and geographical query languages. For such, the following query example given in Piet [11] is used: List the total number of units sold, cost per product, form of advertising of the sale (promotion media—e.g. radio, TV, newspaper) and stores located in provinces bisected by some river. The query is displayed in Fig. 16(A) according to the syntax of the language GISOLAP-QL [11]. The same query was rewritten in the GeoMDQL syntax, as shown in Fig. 16(B). As can be seen in Fig. 16(A), the GISOLAP-QL language does not have an integrated syntax, nor does it have SOLAP operators. The lack of an integrated syntax may generate non-intuitive queries with a greater writing difficulty, particularly when more complex queries are submitted based on a combination of different operators. Another case for comparison with the GeoMDQL language is the SQL language with a spatial extension used by the MapWarehouse proposal [43]. This example further highlights the advantages of the GeoMDQL language with regard to both the approach proposed by the MapWarehouse work as well as all other work based on extenders of the SQL language for querying data in a SOLAP environment. Fig. 17(A) displays the SQL instruction of the MapWarehouse for the following query: Retrieve the corn crop areas within a given rectangular window for each mesoregion (region) and for each region (micro-region) in the state of Paraı´ba in May 2003. Fig. 17(B) shows how this query is expressed in GeoMDQL. As we can see, the GeoMDQL instruction is considerably simpler than that illustrated in
Fig. 14. A GeoMDQL query based on an ad hoc defined area.
ARTICLE IN PRESS 612
J. da Silva et al. / Information Systems 35 (2010) 592–614
Fig. 15. A GeoMDQL query based on a drill-down operation.
Fig. 16. Comparing GeoMDQL and GISOLAP-QL [11].
Fig. 17(A). It should be stressed that the examples given in this section are relatively simple SOLAP queries. If queries involving more operators are specified, the complexity of the SQL instruction is increased considerably. Our work differs from other approaches discussed in Section 2 with respect to the following issues that may affect decision making: (1) a GDW is used to integrate spatial and multidimensional data and to normalize geometries in order to speed up the query processing time; (2) a query language that both integrates multidimensional and spatial operators in a unique syntax, and proposes a set of specially designed SOLAP operators is used to allow users to write concise query sentences and use a new set of operators (e.g. allowing the representation
of areas ranked according to their extents); (3) query results may be displayed using maps, tables and graphics; and (4) pictograms and standards are used to improve the proposed data model expressiveness and to facilitate the system prototype reuse and maintenance, respectively.
8. Conclusions In this article, definitions for a GDW (geographical data warehouse) data model as well as geographical and multidimensional data cubes are presented. The GDW definitions provide a data model that normalizes the
ARTICLE IN PRESS J. da Silva et al. / Information Systems 35 (2010) 592–614
613
in terms of both multidimensional and spatial data, then a SOLAP tool should be used. The development of SOLAP tools has been deserved considerable attention in recent years by researchers. Our approach for building a SOLAP tool differs from other related studies because it uses: (1) a GDW with integrated and normalized geometrical data; (2) a query language with an integrated and concise syntax and a set of specially designed SOLAP operators; (3) tables, maps and graphics for displaying query results and (4) a data model with pictograms and based on standards, such as UML, OGC and CWM. All the issues listed above may be important for solving users decision making tasks. However, little research has been undertaken to assess the potential application of SOLAP tools regarding to a user-needs perspective. This empirical work represents a proposal of future work. The following are other planned approaches for future work: in Layer II of the GOLAPA architecture, implementations will be carried out for the query optimization module; in Layer I, improvements will be added to achieve a better user interaction with charts, tables and maps; GeoMDQL will also be extended to handle complex hierarchies that represent the partial containment of spatial objects as well as to manipulate spatial–temporal operators. References
Fig. 17. GeoMDQL and Mapwarehouse Spatial SQL [43] Comparison.
geometrical data in order to improve query response time and storage requirements in a GDW [45,46]. Although there are several approaches proposed for integrating multidimensional and geographical processing, none offers a query language with a singular syntax for simultaneously using multidimensional and spatial operators. This paper presented the GeoMDQL language, a new language based on MDX and specified for usage on SOLAP environments to retrieve geographical and multidimensional data stored in a GDW. To demonstrate the application of our metamodels and GeoMDQL query language, some points related to a public healthcare case study were also briefly presented. For this case study, a GDW having information about the national health department was built and may be used for monitoring and evaluating actions and services related to the Brazilian public healthcare system. Many other similar applications can be developed and their results are important to the Brazilian economy and government as well. It is important to mention that the implementations are all based on open and extendable patterns, which simplifies the evolution and re-usage of the proposed solution. SOLAP capabilities of current commercial information systems are based on the separated use of OLAP and GIS tools. However, if users have a need for results expressible
[1] S. Bimonte, A. Tchounikine, M. Miquel, Towards a spatial multidimensional model, in: Proceedings of 8th ACM International Workshop on Data Warehousing and OLAP, Bremen, Germany, 2005. [2] L. Cabibbo, R. Torlone, Querying multidimensional databases, in: Proceedings of the 6th International Workshop on Database Programming Languages, Estes Park, USA, 1998, pp. 319–335. [3] S. Chaudhuri, U. Dayal, Data warehousing and OLAP for decision support, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, ACM, Tucson, USA, 1997, pp. 507–508. [4] S. Cicerone, P. Di Felice, Cardinal directions between spatial objects: the pairwise-consistency problem, Information Science 164 (1–4) (2004) 165–188. [5] Open Geospatial Consortium. Simple features specification for sql, Technical Report, 1999 /http://portal.opengeospatial.org/files/?ar tifact_id=829S. [6] J. da Silva, A. de Oliveira, A.C. Salgado, V. Times, R. Fidalgo, C. Souza, A set of aggregation functions for spatial measures, in: Proceedings of the Eleventh ACM international workshop on Data warehousing and OLAP, ACM Press, Napa Valley, USA, 2008. [7] M.L. Damiani, S. Spaccapietra, Spatial data warehouse modelling, in: Processing and Managing Complex Data for Decision Support, Idea Group Publishing, Hershey, USA, 2006, pp. 21–27. [8] M.N. Demers, Fundamentals of Geographic Information Systems, third ed., Wiley, USA, 2002. [9] M.J. Egenhofer, Spatial sql: a query and presentation language, IEEE Transactions on Knowledge and Data Engineering 6 (1) (1994) 86–95. [10] M.J. Egenhofer, J.R. Herring, Categorizing binary topological relationships between regions, lines and points in geographic databases, Technical Report, Department of Surveying Engineering, University of Maine, 1991. [11] A. Escribano, L. Gomez, B. Kuijpers, A.A. Vaisman, Piet: a GIS-OLAP implementation, in: Proceedings of ACM 10th International Workshop on Data Warehousing and OLAP, Lisbon, Portugal, 2007, pp. 73–80. [12] H. Chen, F. Wang, J. Sha, S. Yang, Geosql: a spatial query language of object-oriented GIS, in: Proceedings of the 2nd International Workshop on Computer Science and Information Technologies, Ufa, Russia, 2000, pp. 215–219. [13] R.N. Fidalgo, V.C. Times, J. Silva, et al., Geodwframe: a framework for guiding the design of geographical dimensional schemas, in:
ARTICLE IN PRESS 614
[14]
[15]
[16] [17] [18]
[19]
[20]
[21] [22] [23] [24] [25] [26]
[27]
[28]
[29]
[30]
[31] [32] [33]
J. da Silva et al. / Information Systems 35 (2010) 592–614
Proceedings of the Data Warehousing and Knowledge Discovery, Zaragoza, Spain, 2004, pp. 26–37. R.N. Fidalgo, V.C. Times, J. Silva, et al., Providing multidimensional and geographical integration based on a GDW and metamodels, in: Proceedings of the Brazilian Symposium on Databases, Brası´lia, Brazil, 2004, pp. 148–162. R.L. Fonseca, R.N. Fidalgo, J. Silva, V.C. Times, Geodwcase: Uma ferramenta para projeto de data warehouses geogra´ficos, in: Proceedings of Brazilian Symposium on Databases, Demos Session, Joa~ o Pessoa, Brazil, 2007. International Organization for Standardization, Database Language SQL-Amendment 1: On-Line Analytical Processing (SQL/OLAP), 1999. L.M. Garshol, Bnf and ebnf: What are they and how do they work?, 2006 /www.garshol.priv.no/ download/text/bnf.htmlS. R.K. Goyal, M.J. Egenhofer, Consistent queries over cardinal directions across different levels of detail, in: Proceedings of the 11th International Workshop on Database and Expert System Applications, London, UK, 2000, pp. 876–880. J. Gray, A. Bosworth, et al., Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals, Data Mining and Knowledge Discovery 1 (1) (1997) 29–53. J. Han, N. Stefanovic, K. Koperski, Selective materialization: an efficient method for spatial data cube construction, in: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’98), 1998, pp. 144–158. W.H. Inmon, Building the Data Warehouse, third ed., Wiley, USA, 2002. ISO. ISO/IEC WD—SQL Multimedia and Application Packages. Part 3: Spatial, 2003. JPF. Java plugin framework, /http://jpf.sourceforge.net/S, Last Visit September 2009. JRubik. Jrubik, /http://rubik.sourceforge.net/jrubik/S, Last Visit September 2009. R. Kimball, M. Ross, The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, second ed., Wiley, New York, 2002. H. Lin, B. Huang, Sql/sda: a query language for supporting spatial data analysis and its web-based implementation, IEEE Transactions on Knowledge and Data Engineering 13 (4) (2001) 671–682. E. Malinowski, E. Zima´nyi, Spatial hierarchies and topological relationships in the spatial multidimer model, in: Proceedings of 22nd British National Conference on Databases, Sunderland, UK, 2005, pp. 17–28. E. Malinowski, E. Zima´nyi, Requirements specification and conceptual modeling for spatial data warehouses, in: Proceedings of the On The Move Federated Conferences and Workshops, Montpellier, France, 2006, pp. 1616–1625. E. Malinowski, E. Zima´nyi, Representing spatiality in a conceptual multidimensional model, in: Proceedings of the 12th Annual ACM International Workshop on Geographic Information Systems, Washington, USA, 2004, pp. 12–22. A.J. Morris, A.I. Abdelmoty, B.A. El-Geresy, et al., A filter flow visual querying language and interface for spatial databases, GeoInformatica 8 (2) (2004) 107–141. OMG. Common warehouse metamodel (CWM) specification 1.1, 2001. OMG, Unified Modeling Language: Superstructure, Object Modeling Group, 2004. OMG, Object Constraint Language Specification, Object Modeling Group, 2005.
[34] OpenJUMP, Openjump, /http://openjump.org/S, Last Visit September 2009. [35] C. Parent, et al., Spatio-temporal conceptual models: data structures þ space þ time, in: Proceedings of International Symposium on Advances in Geographic Information Systems, Kansas City, USA, 1999, pp. 26–33. [36] D. Pedersen, K. Riis, T. Bach Pedersen, A powerful and sql-compatible data model and query language for OLAP, Australian Computer Science Communications 24 (2) (2002) 121–130. [37] T. Pedersen, N. Tryfona, Pre-aggregation in spatial data warehouses, in: Proceedings of International Symposium on Advances in Spatial and Temporal Databases, Redondo Beach, USA, 2001, pp. 460–480. [38] Pentaho. Mondrian, /http://mondrian.pentaho.org/S, Last Visit September 2009. [39] E. Pourabbas, M. Rafanelli, A pictorial query language for querying geographic databases using positional and OLAP operators, SIGMOD Record 31 (2) (2002) 22–27. [40] JPivot Project. Jpivot, /http://jpivot.sourceforge.net/S, Last Visit September 2009. [41] S.B. Navathe, R. Elmasri, Fundamentals of Database Systems, fifth ed., Addison-Wesley, Reading, MA, 2006. [42] S. Rivest, et al., Solap technology: merging business intelligence with geospatial technology for interactive spatio-temporal exploration and analysis of data. Journal of International Society for Photogrammetry and Remote Sensing, (2005) 17–33. [43] M.C. Sampaio, A.G. de Sousa, C. de S. Baptista, Towards a logical multidimensional model for spatial data warehousing and OLAP, in: Proceedings of the 9th ACM International Workshop on Data Warehousing and OLAP, Arlington, USA, 2006, pp. 83–90. [44] J. Silva, V.C. Times, A.C. Salgado, et al., An open source and web based framework for geographic and multidimensional processing, in: Proceedings of the 2006 ACM Symposium on Applied Computing, Dijon, France, 2006, pp. 63–67. [45] T.L. Siqueira, R.R. Ciferri, V.C. Times, C.D. Ciferri, Investigating the effects of spatial data redundancy in query performance over geographical data warehouses, in: Proceedings of Brazilian Symposium on GeoInformatics, Rio de Janeiro, Brazil, 2008. [46] T.L.L. Siqueira, R.R. Ciferri, V.C. Times, C.D. de A. Ciferri, A spatial bitmap-based index for geographical data warehouses, in: Proceedings of the 24th Annual ACM Symposium on Applied Computing, Honolulu, Hawaii, 2009. [47] S. Spaccapietra, C. Parent, ERCþ: an object based entity relationship approach, in: Proceedings of Conceptual Modelling, Database and Case: An integrated View of Information Systems Development, Wiley, New York, 1992. [48] XML Spy. Altova xml spy, /http://www.altova.com/S, Last Visit September 2009. [49] E. Thomsen, OLAP Solutions: Building Multidimensional Information Systems, second ed., Wiley, New York, NY, USA, 2002. [50] W3C, Scalable vector graphics, /http://www.w3.org/tr/svg11/S, February 2008. [51] M. Whitehorn, R. Zare, M. Pasumansky, Fast Track to MDX, second ed., Springer, Berlin, 2005. [52] M. Worboys, M. Duckham, GIS, A Computing Perspective, second ed., CRC Press, Boca Raton, FL, 2004. [53] H.B. Zghal, S. Faiz, H.B. Ghe´zala, Casme: a case tool for spatial data marts design and generation, in: Proceedings of Design and Management of Data Warehouses, Berlin, Germany, 2003.