Ontology-driven geographic information integration: A survey of current approaches

ARTICLE IN PRESS Computers & Geosciences 35 (2009) 710–723 Contents lists available at ScienceDirect Computers & Geosciences journal homepage: www.e...

Download PDF

404KB Sizes 0 Downloads 52 Views

Report

PDF Reader
Full Text

ARTICLE IN PRESS Computers & Geosciences 35 (2009) 710–723

Contents lists available at ScienceDirect

Computers & Geosciences journal homepage: www.elsevier.com/locate/cageo

Ontology-driven geographic information integration: A survey of current approaches Agustina Buccella a,, Alejandra Cechich a, Pablo Fillottrani b a b

´n, Universidad Nacional del Comahue, Buenos Aires 1400, Neuquen 8300, Argentina GIISCO Research Group, Departamento de Ciencias de la Computacio ´n, Universidad Nacional del Sur, Avda Alem 1253, Bahia Blanca 8000, Argentina Departamento de Ciencias e Ingenieria de la Computacio

a r t i c l e i n f o

abstract

Article history: Received 17 April 2007 Received in revised form 25 January 2008 Accepted 15 February 2008

Integrating different information sources is a growing research area within different application domains. This is particularly true for the geographic information domain which is facing new challenges because newer and better technologies are capturing large amounts of information about the Earth. This trend can be combined with increases in the distribution of GIS (Geographic Information Systems) on the Web, which is leading to the proliferation of different geospatial information repositories and the subsequent need to integrate information across repositories to get consistent information. To overcome this situation, many proposals use ontologies in the integration process. In this paper we analyze and compare the most widely referred proposals of geographic information integration, focusing on those using ontologies as semantic tools to represent the sources, and to facilitate the integration process. & 2008 Elsevier Ltd. All rights reserved.

Keywords: Geographic Information Systems Data integration Heterogeneous databases Formal ontologies

1. Introduction Nowadays, GPS (Global Positioning System) technology is widely used in cell phones, cars, and other devices. All of this geographic information is analyzed and stored at different levels of detail in Geographic Information Systems (GIS), possibly distributed on the Web. A fast search for geographic information on the Web will return several links representing different parts of our World, possibly from more than one system. For example, information about rivers in some country can be obtained by querying two or more different systems. Although distribution of information is one of the problems here, there are some others: the systems are often developed by different agencies with different points of view and vocabularies, leading to heterogeneity problems. They

Corresponding author. Tel.: +54 299 449 0300; fax: +54 299 449 0313.

E-mail addresses: [email protected], [email protected] (A. Buccella), [email protected] (A. Cechich), [email protected] (P. Fillottrani). 0098-3004/$ - see front matter & 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.cageo.2008.02.033

are found in every communication between interoperating systems, where interoperability refers to various interactions between information from different sources including the task of data integration. Ontologies are extensively proposed as tools to help resolve heterogeneity problems. For example, different proposals use formal ontologies to enrich a conceptual schema and thus to improve the integration (Fonseca et al., 2002, 2003; Hakimpour, 2003; Hakimpour and Geppert, 2002) and the query process (Zhang, 2005). Generally speaking, two essential tasks are involved in a data integration process: semantic enrichment and mapping discovery (Sotnykova et al., 2005). The main goal of semantic enrichment is to reconcile semantic heterogeneity, by adding semantic information to the data. Many approaches add extra semantic information through the use of metadata or ontologies. For example, proposals extending common data models such as entityrelationship diagrams (Parent et al., 1999) and objectoriented ones (Borges et al., 2001) have been presented in order to add geographic features to the data models. We are particularly interested in those using ontologies

ARTICLE IN PRESS A. Buccella et al. / Computers & Geosciences 35 (2009) 710–723

because they are introduced to facilitate knowledge sharing and reuse between various agents (software and humans) (Fensel, 2003). The semantic enrichment task is essential to reach the second task, mapping discovery. Several surveys have emerged describing and analyzing methodologies, frameworks, and systems proposed for semantic matching, i.e. ontology matching. For example, surveys proposed by Shvaiko and Euzenat (2005) and Rahm and Bernstein (2001) analyze and compare methodologies and tools to build an integrated system; Rahm and Bernstein (2001) classify techniques by applying a set of criteria, such as schema vs. instance-based, element vs. structure granularity, etc. On the other hand, surveys based on ontologybased systems for data integration can be found in Kalfoglou and Schorlemmer (2003), Euzenat and Shvaiko (2007), Wache et al. (2001) and Klein (2001). The work presented in Euzenat and Shvaiko (2007) describes and analyzes a wide set of ontology matching proposals. Although it is focused on methodologies to improve matchings, new approaches of complete integrated systems are also taken into account. In addition, more information with links to several approaches can be found at OntologyMatching.org,1 in which proposals are compared by following a more extensive classiﬁcation than the one presented by Rahm and Bernstein (2001), which considers criteria such as the use of linguistic resources (e.g. if a thesauri or lexicon is used) and model-based reasoning (e.g. if DL reasoning is applied). However, all these surveys are focused on conventional systems, and they do not analyze systems based on geographic information. Considering the hypothesis tested by Mark et al. (2001) in which ‘‘geographic and non-geographic entities are ontologically distinct in a number of ways’’, a different analysis must be performed when geographic elements are included. In this work we analyze several proposals that consider geographic information as sources to be integrated. In selecting the proposals, we take into account some features from the ontological perspective as well as from the integration process itself, and focus on the mapping discovery tasks. The semantic enrichment tasks are out of the scope of this work although we evaluate how the semantic geographical information is represented and applied. This paper is organized as follows: the following section describes basic concepts and conventions that will be used throughout this paper. Section 3 describes our classiﬁcation, which is based on the main features involved in integrating geographic information. Later, Section 4 brieﬂy describes 11 proposals widely referenced in the literature, introducing their main characteristics. A discussion analyzing advantages, disadvantages, and absent aspects is presented in Section 5. It is aimed at ﬁnding common problems and particularities among the proposals. Finally, future trends are addressed as a consequence of our analysis.

1

http://www.ontologymatching.org

711

2. Basic concepts The concept of data or information integration is concerned with unifying data sharing some common semantics but originating from unrelated sources (Calvanese and Giacomo, 2005; Ullman, 2000). Unfortunately, heterogeneity is one of the most common problems an integration process can face. For example, two systems sharing data representing rivers can exhibit many different types of heterogeneity (Hakimpour, 2003):

heterogeneity in the conceptual model—one system

represents a river as an object class and the other as a relationship; heterogeneity in the spatial model—rivers can be represented by polygons (or a segment of pixels) in one system, while they are represented by lines in the second system; structure or schema heterogeneity—both systems hold the name of a river but one of them also keeps information about the border; and semantic heterogeneity—one system may consider a river as a natural stream of water larger than a creek with a border and the other deﬁnes a river as any natural stream of water reaching from the sea, a lake, etc. into the land.

In order to solve several of these heterogeneities, we introduce the concept of ontologies. This concept is initially introduced by Gruber (1993) as ‘‘a formal explicit speciﬁcation of a shared conceptualization’’. A conceptualization refers to an abstract model of how people commonly think about a real thing in the world, e.g. a chair. Explicit speciﬁcation means that the concepts and relations of the abstract model receive explicit names and deﬁnitions. And a shared conceptualization means that knowledge described in the ontology is accepted by a community. In addition, because ontologies should be formal as possible (a formal explicit speciﬁcation), a logical formalism is often needed to represent them, such as description logic (DL) (Baader et al., 2003) or frame-based logic (FLogic) (Kifer et al., 1995). Therefore, ontologies appear to provide semantic representations about knowledge of the real world allowing us to deﬁne a set of terms, interconnections, and rules of inference on a particular domain. Ontologies capture real-world semantics, which means that we can represent a real thing (by using a model) as it is in the real world. The deﬁnition of ontology we adopt in this paper draws from three deﬁnitions proposed in the literature (Katifori et al., 2007; Visser et al., 1997; Noy and McGuiness, 2001): Deﬁnition. An ontology is a 4-tuple O ¼ hC; P; I; Ai in which C is a set of classes, P a set of properties, I a set of instances, and A a set of axioms. Classes represent entities or objects of the real world; properties are associated to a class (i.e. as an attribute), or they represent relations between classes (such as generalization/ specialization, aggregations, compositions, etc.); instances represent individuals of classes; and axioms represent additional constraints involving classes and/or properties.

ARTICLE IN PRESS 712

A. Buccella et al. / Computers & Geosciences 35 (2009) 710–723

In this context, the problem of integrating different sources of information (databases, pages, etc.) is shifted to integrating different ontologies corresponding to these sources or domains, in which ontology merging, ontology mapping, and ontology integration (Klein, 2001; Euzenat and Shvaiko, 2007) are the main tasks of an integration process. The ﬁrst one creates a new ontology from two or more existing ontologies with overlapping parts. The second one, ontology mapping, relates similar (according to some metric) concepts or relations from different sources to each other by an equivalence relation. A mapping results in a virtual integration. Finally, ontology integration denotes the inclusion of one ontology into another one by usually using bridge axioms. The main difference with respect to ontology merging is that the second ontology is modiﬁed. In ontological terms, these tasks have to face semantic heterogeneity problems. There is ontological heterogeneity (Visser et al., 1997) if two systems make different ontological assumptions about their knowledge of the domain (e.g. one system assuming animals as birds, reptiles and mammals; and the other system assuming animals as carnivorous and herbivorous). These mismatches between ontologies can be divided into two main categories: language mismatches and ontology mismatches (Chalupsky, 2000; Visser et al., 1997; Wiederhold, 1994). The former occurs when ontologies written in different languages are combined. For example in Chalupsky (2000), syntax and expressiveness mismatches denote the differences among the language used to represent the ontologies and their expressiveness. It is also possible to ﬁnd mismatches in the same language

global ontology

such as the use of abbreviations, acronyms, punctuations, different data types, etc. These mismatches are called linguistic mismatches (Madhavan et al., 2001). The latter, ontology mismatches, occurs when ontologies describing overlapping domains are combined. This category describes different representations of a real world domain. Therefore, it does not matter whether the language is the same or different, but whether the concepts in the ontologies are different for a real world entity. Another example is the classiﬁcation performed by Visser et al. (1997) where ontology mismatches are classiﬁed taking into account two sub-processes during the creation of an ontology: conceptualizing a domain, and explicating the conceptualization. The former, conceptualization mismatches, may appear when two or more conceptualizations differ in the ontological concepts or in the way these concepts are related (e.g. by using different generalization/specialization relations). The latter, explication mismatches, are not deﬁned on the conceptualization of the domain but on the way the conceptualization is speciﬁed (e.g. by using different deﬁnitions for the same concept). Alternatively, other works in the literature have focused their researches on the ontological components within a federated system instead of the way ontologies are represented (e.g. by analyzing mismatches). In this way, the work in Wache et al. (2001) has deﬁned three main approaches depending on the components of the system: single ontology approach, multiple ontology approach, and hybrid ontology approach (Fig. 1). The single ontology approach (Fig. 1(a)) uses a global ontology as a shared vocabulary, that is, information sources are

local ontology

local ontology

shared vocabulary

local ontology

local ontology

local ontology

Fig. 1. Classiﬁcation of the use of ontologies (Wache et al., 2001).

local ontology

ARTICLE IN PRESS A. Buccella et al. / Computers & Geosciences 35 (2009) 710–723

related by only one ontology. Quick development is the main advantage of this approach; however, its common problem is that we must manage a global integrated ontology, which involves administration, maintenance, consistency and efﬁciency problems that are very hard to solve. In the multiple ontology approach (Fig. 1(b)), each information source is represented by an ontology built independently. Thus, this approach alleviates the problems presented by the single approach. The addition of a new source will be easy because only an ontology must be built. However, the lack of a shared vocabulary is a real problem when the ontologies are compared. Therefore, the third approach emerged (Fig. 1(c)) to solve the drawbacks of the two ﬁrst approaches. The shared vocabulary involves the basic terms of the domain relating the information of local ontologies. New information sources can be added easily. However, the shared vocabulary can be a problem if it increases rapidly. Based on the hybrid approach aforementioned, two new approaches (Calvanese et al., 2001; Calvanese and Giacomo, 2005) have emerged for specifying mappings in an integrated system. A mapping is a directed map between concepts of different sources (or ontologies) (Euzenat and Shvaiko, 2007). The way these mappings are deﬁned determines the mapping approach chosen. The global-as-view (GAV) approach is based on the deﬁnition of global concepts as views over the sources, that is, each concept of the global view is mapped to a query over the sources. Fig. 2(a) shows how, given a global ontology (G), mappings are deﬁned over the sources ðS1 2S3 Þ. The other approach, local-as-view (LAV), is based on the deﬁnition of the sources as views over the global view. Thus, the information the sources contain is described in terms of this global view, as Fig. 2(b) shows. Advantages and disadvantages of each approach have to be considered when an integration process is initiated (Calı`, 2002). For example, in the LAV approach restrictions over the sources can be deﬁned easily. On the contrary, in the GAV approach it is easier to deﬁne restrictions over the global view. With respect to scalability, in the LAV approach, adding a new source to the integrated system does not require changes in the global view. On the other hand, in the GAV approach when a new source is added, changes over the global view are required. However, the scalability over the global view is easier in this last approach. Finally,

considering query processing, LAV requires reasoning mechanisms in order to answer them. Contrarily, in GAV conventional mechanisms can be implemented. In order to take advantage of the beneﬁts of both approaches, a new approach called global-local-as-view (GLAV) has been proposed (Friedman et al., 1999; Fagin et al., 2003). This approach consists of association views over the global schema to views over the sources, i.e. mappings can be speciﬁed in both directions. GLAV system can be considered as a generalization of both GAV and LAV (Calı`, 2003).

3. Comparison criteria The comparison criteria have been chosen in order to evaluate three main aspects within the construction of an integrated system: semantic representation, mapping discovery, and geographic information. In this way, we divided the classiﬁcation into ontological, integration, and geographical aspects, respectively, in order to analyze how the proposals use ontologies to perform the integration process and how the geographical features (within the ontologies) are taken into account in the process. Fig. 3 shows our classiﬁcation with these three main aspects or branches. On the ﬁrst branch, with respect to the ontological aspects, we analyze if the ontologies are formal or non-formal. We assume that only those ontologies that are represented by using a formal (or logic) language are formal. Obviously, only with formal ontologies can reasoners be used to infer implicit relations. Otherwise, non-formal ontologies are those represented by using another structure or data model resource, such as a controlled vocabulary or taxonomies (De Brujin, 2003). In this case, other strategies that do not involve reasoners are used to ﬁnd relations among ontologies. Further, on the same branch, we evaluate the expressiveness of the ontologies. For example, ontologies, which are only represented by a taxonomy using super/subclass relations, are considered with lesser degree of expressiveness than those in which other types of associations are allowed. As a result, when compared, a ‘‘low’’ score implies the use of languages with limited capabilities to represent properties and axioms of the 4-tuple of the ontology deﬁnition (e.g. when ontologies are only

Global Ontolgy () River

S1

<>

S2

713

Global Ontolgy ()

City

River

<>

City

River(x) ← S1(x)

S1(x) ← River(x)

City(x) ← S2(x)

S2(x) ← City(x)

crosses(x,y) ← S3(x,y)

S3(x,y) ← crosses(x,y)

S3

S1

S2

S3

Fig. 2. GAV and LAV approaches. (a) Global-as-view approach and (b) local-as-view approach.

ARTICLE IN PRESS 714

A. Buccella et al. / Computers & Geosciences 35 (2009) 710–723

Comparison Critera

Ontological Aspects

Integration Aspects

Geographical Aspects

Formal Ontologies

Expressiveness

Ontological Components

Integration Process

Mapping Approach

Yes No

High Medium Low

Source ontologies Top-level ontologies

Manual Inferences

LAV GAV GLAV

Fig. 3. Comparison criteria.

taxonomies in which only information about generalization/specialization relations is modeled). A ‘‘medium’’ score means that besides these relations, other relations are possible, such as whole-of or part-of relations. Finally, a ‘‘high’’ score means that there is no restriction in the representation of the ontologies, that is all previous relations can be used together with others deﬁned by the user (such as an isOwner property between a Building and Owner classes). The last analysis performed on the ontological branch concerns the ontological components. These components are analyzed in order to assess which of them should be built. For example, if only source ontologies are built, or if also a top-level ontology is needed, or if other types of ontologies are necessary to ﬁnd mappings. Knowing this information will be useful to deduce which integration process will be applied. Thus, this aspect is also closely related to the integration branch. In the integration branch, we analyze whether the construction of mappings among the ontologies is a manual process or whether it can be performed semi- or completely automatically. We consider that a manual process is a process in which there is no tool or method to create mappings among ontologies. They must be created by an expert user by browsing the source ontologies. In semi- or completely automatic processes, we consider that a method is proposed to calculate similarities, for example by using a set of similarity functions, such as Tversky’s model or distance or named-based (syntactic) functions. Such similarity measures are useful in an integration process when non-formal ontologies are used and/or for complex cases that reasoners do not typically handle, such as speciﬁc cardinality constraints. In the case of the use of these functions, we evaluate if the proposals implement tools to apply the functions and we analyze if reasoning capabilities (inferences) are used to infer similarities from the data. In the next branch we analyze which of the three approaches (GAV, LAV or GLAV) is chosen by each proposal. A general analysis of advantages and disadvantages of each approach is described in Section 2. Thus, aspects such as modiﬁability and scalability are considered when one of these approaches is selected.

Finally, we analyze the representation of geographic information. Here, we consider whether the method recognizes that the information is geographic, and if not whether there is any difference in the integration method. Also considered is the way in which the geographic information is represented, and whether a set of topological relations to model source ontologies is provided to relate geographic concepts. More common topological relations are disjunction, adjacency, equality, inclusion, covering, overlap, etc. These relations are important because for every topological relation, there are a number of conditions that constrain the geometries of the participating spatial entities (topological constraints).

3.1. Related work There exist some surveys comparing and evaluating proposals focused on data or information integration (Kalfoglou and Schorlemmer, 2003; Euzenat and Shvaiko, 2007; Wache et al., 2001; Klein, 2001). Section 1 brieﬂy introduced a comparison among them together with the difference from our work. With respect to the proposal in Euzenat and Shvaiko (2007), which introduces a more complete classiﬁcation, the main difference is on geographical aspects. On one hand, we focus our attention on works integrating geographical sources in order to analyze the way ontologies have been designed towards integration. On the other hand, we consider conventional aspects such as the construction of a top-level ontology or the use of inferences in the integration process. For example, the classiﬁcation of syntactic techniques within the elementlevel classiﬁcation in Euzenat and Shvaiko (2007) is properly included into our integration process analysis. In addition, we have deﬁned the comparison criteria in accordance with the main characteristics of the proposals. For example, we observe that only some proposals include an instance-level comparison within the integration process. The instance-level techniques to ﬁnd similarities are out of the scope of our work because other methodologies involving resources such as National Mapping Agencies (NMAs) or geographic data sets have to be taken into account.

ARTICLE IN PRESS A. Buccella et al. / Computers & Geosciences 35 (2009) 710–723

Finally, we have selected the proposals mainly because of their relevance (how referred the proposals are) and their novelty (how recent and novel they are). 4. Proposals for geographic information integration This section describes 11 proposals giving solutions for the integration of geographic information. 4.1. The BUSTER system Visser (2004) introduces the BUSTER system (Bremen University Semantic Translator for Enhanced Retrieval) which follows the hybrid ontology approach (Fig. 1). The proposed architecture is designed to provide two subsystems: the ﬁrst one acting as an intelligent search engine to solve information retrieval problems; and the second one providing information integration or semantic translation to solve heterogeneity problems. Both tasks are associated with three levels: the syntactic, the structural and the semantic level. The ﬁrst phase of this approach is the acquisition phase, which consists of gathering information about the data source to be integrated. A special database (one for each source), named comprehensive source description (CDS), stores this information. Then, in the second phase named query phase, a user submits a query request to the network which is matched against a lookup ontology. When a match is found, a connection is established to the actual information sources. Note that each information source is represented by a speciﬁc ontology, named source ontology, containing an explicit description of the concepts covered by the data source. Thus, this proposal implements the GAV approach because a concept in the lookup ontology correspond to concepts of the source ontologies. In order to establish a communication channel to the data sources, wrappers are used in the syntactic level. Each wrapper deﬁnes a speciﬁc ﬁle or data format (such as for ODBC data sources, XML data ﬁles, or speciﬁc GIS formats). Then, in the structural level a mediator is used to combine, integrate and abstract the information obtained from the wrappers. BUSTER allows the use of different mediators which are conﬁgured by rules describing how the data are integrated. Finally, in the semantic level a context transformation process is performed in order to transform data from a sourcecontext to a goal-context. This transformation is done by applying rules and by re-classiﬁcation. Transformation by rules uses predeﬁned functions to transform data; for example, for transforming area measures from hectares to acres (Visser, 2004). A semantic mediator, named MECOTA (MEdiator with COntext Transformation) (Wache, 1999) is used here. MECOTA is a rule-based mediator designed to reconcile structural and semantic heterogeneity. MECOTA is based on the work in Sciore et al. (1994), which compares semantic values with respect to a context. Thus, MECOTA deﬁnes the concept of context transformation meaning that ‘‘a value has to be considered in its context and may be transformed into another context’’ (Wache, 1999) by

715

applying two different rules, combination and replacement rules. The ﬁrst rule constructs new information for structural heterogeneity problems, such as adding desired information from other sources. The second rule transforms information into another structure. For example, a rule that converts a price with the currency unit dollar into a price with the currency unit pound. In context-transformation by re-classiﬁcation (Stuckenschmidt and Visser, 2000), concepts of one data source are mapped to concepts of another data source by using the information of the CDS. Thus, terms of the source context are exchanged for terms from the goal-context. By means of property speciﬁcations, necessary and sufﬁcient conditions are deﬁned in order to perform the reclassiﬁcation process. That is, re-classifying the concepts of one context into another. The FACT reasoner system (Horrocks, 1998) is used to do this task. With respect to spatial representation and reasoning, BUSTER proposes the use of new place name structures (Hill, 2000) based on qualitative spatial models in order to improve the spatial reasoning capabilities. 4.2. Proposal of Kavouras et al. Kavouras et al. (2003) present a methodology to explore and identify semantic information provided by categories in geographic ontologies. The main idea of the proposal is preparing the ground for the integration process. A set of semantic relations are extracted by applying Natural Language Processing (NLP) techniques based on the works in Ravin (1990) and Jensen and Binot (1987). The source information from which the system extracts these relations are two repositories of geographic information and one lexical dictionary. They are CORINE LC, which stores information of the member states of the European Community; GDDD—Geographical Data Description Directory, which contains information on available digital geographic information from Europe’s NMAs; and WordNet (Miller, 1995), which is a lexical database for the English language. The categories of these sources are explained with a deﬁnition written in natural language. Semantic relations such as hyperonym, is-partof, has-parts, and adjacent-to relations are extracted by analyzing the parts of these deﬁnitions based on a set of representative categories. In order to determine the similarity between two categories, a similarity measure (Tversky, 1977) then counts the number of relations the categories have in common. In this way the methodology classiﬁes similar and compound terms providing an overview of the heterogeneity of the ontologies. Therefore, this methodology makes available measures as to what extent different ontologies can be integrated, and the associations between category types. 4.3. Proposal of Schwering et al. In this proposal, Schwering and Raubal (2005) introduce a method based on spatial relations between different geospatial concepts. In order to analyze similarities between concepts, a reduced group of spatial

ARTICLE IN PRESS 716

A. Buccella et al. / Computers & Geosciences 35 (2009) 710–723

relations are extracted from the set of spatial relations in natural language formalized by Shariff et al. (1998). They present a case study containing the OS MasterMap2 as an information source, and a shared vocabulary as a map between the user query and this source. In this way, this proposal implements the GAV approach. This shared vocabulary contains terms only describing properties and relations between concepts such as ‘‘ﬂooding area is next to river’’ in which ‘‘next’’ is a spatial relation and ‘‘ﬂooding area’’ and ‘‘river’’ are concepts or terms. In this way, concepts are described in terms of their relations to other concepts. When a user performs a query, spatial relations and a set of dimensions must be used. The idea of dimensions is based on the work in Goldstone (1994), which describes and compares multidimensional scaling (MDS) models. For example, dimensions can be ‘‘political afﬁliation’’ and ‘‘climate’’ when we are comparing countries. Therefore, dimensions are the relations of a concept with respect to other concept. Finally, similarities are calculated by using some distance functions. 4.4. Proposal of Sotnykova et al. Sotnykova et al. (2005) propose a methodology for the integration of spatio-temporal conceptual schemas by using conceptual models and DLs. The sources to be integrated are represented by using the MADS conceptual data model (Modeling of Application Data with Spatiotemporal features, Parent et al., 1999) with multiple representation capabilities (Parent et al., 2005). These capabilities allow geographic information to be manipulated through multiple perspectives on the same information. The proposal is based on the speciﬁcation and use of inter-schema knowledge, which includes the identiﬁcation of elements (or sets of elements) in two schemas describing the same facts in the real world, and the speciﬁcation of the extent to which the data instances and their type deﬁnitions relate to each other. With this concept in mind, this proposal is not focused on the semantic matching activity, but on a methodology to build an integrated system. An expert designer is then responsible for ﬁnding the possible mappings between two MADS conceptual models. Authors ensure that this manual similarity process is greatly simpliﬁed when MADS models are used. Firstly, the MADS conceptual models are translated to DL. Then, a designer has to deﬁne the correspondences by using a set of operators, and topological and temporal relations included in the methodology. These correspondences are between entities (named semantic correspondences—SC) and between properties (property semantic correspondences—PSC). Also, matching rules (MR) are included in order to determine which instances represent the same real-world objects via their key attribute values. Then, the designer must choose an integration technique for the schema elements involved in the SC previously deﬁned. This integration refers to both, schema and 2

http://www.ordnancesurvey.co.uk/oswebsite/products/osmastermap/

instances. Four different structural solutions (named patterns) are proposed to integrate the concepts, and they depend on the type of SC deﬁned. The ﬁrst one is the fusion pattern in which the information is preserved, that is, the data sets of the source schemas are merged and neither attribute values nor instances are lost. In the generalization-partition pattern the overlapping part of the populations is extracted and modeled as the subtype of the two source populations. Finally, the two last patterns, named multi-representations, give structural solutions preserving integrity constraints and structures. One of them links the structures being analyzed, and the other integrates them making a similar solution to the fusion pattern. Aspects such as reversibility, precision, and loss of information must be taken into account by the designer. The proposal can also be characterized as GAV because the global view is created by using the mappings generated when the sources are merged. Thus, each concept in the global view represents concepts in the sources. Another important thing about this proposal is the use of a reasoner (RACER, Haarslev and Mo¨ller, 2001 in this case) to validate the three main parts of the process: translation of MADS conceptual models; deﬁnition of SC; and application of patterns. Validation in DLs allows the satisﬁability of the resultant models to be checked. When an unsatisﬁability is detected, the designer must solve it.

4.5. Proposal of Hakimpour et al. The proposal of Hakimpour (2003) introduces an architecture and a methodology based on DL reasoning to create an integrated system of geographical information. The proposal is aimed at schema integration, that is, the process of ﬁnding similarities between semantically related elements from different schemas. This process is not automatic since an expert user is needed to perform some tasks. The architecture contains three main components: source ontologies, a reasoning system for merging ontologies and a global integrated schema. Thus, this approach is an hybrid one (Fig. 1). Source ontologies are created based on the elements of source schemas and taking into account a top-level ontology (which is an ontology with general terms and minimum constraints). The results are formal ontologies written in DLs, in which all the source ontologies commit to the same top-level ontology. In this way, as the sources are deﬁned as views over the top-level ontology, the proposal follows the LAV approach. Four similarity relations are used in order to deﬁne the mappings. In particular, an equality relation is deﬁned to denote two equal elements, a disjoint relation denotes two different elements, an overlapping relation denotes the conjunction of two elements, and a specialization relation denotes a specialization/generalization link between two elements. The reasoning system for merging ontologies component is used to ﬁnd these similarity relations. As source ontologies are based on a top-level ontology, the reasoning system only has to ﬁnd similarity relations

ARTICLE IN PRESS A. Buccella et al. / Computers & Geosciences 35 (2009) 710–723

among the source ontologies. The PowerLoom3 reasoning system is used to evaluate DLs deﬁnitions. Finally, the methodology suggests a set of steps to create the global integrated schema based on the similarity relations found. Sometimes there is a need for supervision by an expert user, for example when two elements are linked by an overlapping relation.

4.6. Proposal of Hess et al. The proposal of Hess and Iochpe (2004) deﬁnes a semantic integration method by using ontologies as mediators. The method starts with the translation of conceptual schemas (geographic database schemas) in a data format (GML 34) where heterogeneities can be handled. Two translators (syntactic and semantic) are responsible for solving conﬂicts and heterogeneities during this process, and an expert user is assigned to manage it. The whole process is driven by an ontology (such as a top-level or a domain ontology). Once this process is ﬁnished, conceptual schemas are ready to be integrated. A global ontology (possibly the same ontology used in the translation process) acts as a mediator because each conceptual schema is compared to this ontology. Thus, the proposal maps conceptual schemas to the same global ontology instead of comparing them to each other. As the last proposal, this is a LAV approach. In the integration process, syntactic and semantic levels are again deﬁned. At the syntactic level, the Levenshtein (1966) distance function is applied to concepts and attributes names (between the conceptual schemas and the ontology). At the semantic level, several functions comparing different elements are deﬁned. Firstly, the nearest neighbor function (Holt, 2000) returns a similarity value depending on the attributes each concept possesses. Then, two more functions are considered to compare three types of relationships, taxonomic (is-a), aggregation and composition relationships. The last two are compared by using the same similarity function. Finally, all these functions are put together in order to return a similarity value between 0 and 1. Each function is weighted in order to give more importance to one or more elements (or functions) of the model. Then, by using a threshold, mappings such as equality, dissimilarity, intersection, and containment, are determined depending on the result of the functions. In some cases new concepts are added, for example when a concept has no matching candidate in the ontology. The ontology on which conceptual schemas are based can then be updated to include these new concepts. Thus, a new ontology is obtained when the integration process is ﬁnished.

717

4.7. MDSM methodology In this proposal, Rodrı´guez and Egenhofer (2004) deﬁne the Matching Distance Similarity Measure (MDSM) as a technique for determining semantic similarity among spatial entity classes. Each concept of source ontologies is represented by using a structure deﬁning three features and three semantic relations. As features, parts represent structural elements of a concept (or class), such as ‘‘roof’’ and ‘‘ﬂoor’’ of a building; functions represent the purpose of a concept, that is what is done to or with a class, such as ‘‘educate’’ is the function of ‘‘college’’; and attributes correspond to additional characteristics of a concept that describe distinguishing features of a class, such as the ‘‘name’’ or ‘‘birthdate’’ of a person. As semantic relations, synonym relations represent the set of synonyms deﬁned to describe each term; hyperonym relations represent the is-a relation deﬁning a hierarchical structure where terms inherit all the features from their superordinate terms; and meronym relations represent part–whole relations distinguishing between the two involved roles, part-of and whole-of. In order to perform comparisons, the MDSM presents a combination of two different approaches for similarity assessment: the feature-matching process and the semantic distance. Parts, functions and attributes are compared by using a function in which common features increase the similarity results and different features decrease them (it is based on the Tversky’s, 1977 model). An operator is included in order to identify the most common superclass between two concepts and to calculate their depth in a hierarchy. A function combining the results obtained previously is then applied. It is a sum of products (value weightðwÞ), where each product involves the parts, functions and the attributes of the compared classes. The approach proposes two ways of assigning values to these weights, variability and commonality. These are based on the contextual information associated with classes. Here, contexts denote a user’s intended actions or operations, together with the classes upon which these operations can be applied. The ﬁrst one, variability, analyzes the variability of distinguishing features within the three features (parts, functions, and attributes) of the ontologies. Thus, a feature’s relevance decreases if it is shared by all classes, i.e. high frequencies translate into low relevance. The second one, commonality, is the inverse of the ﬁrst one, i.e. high frequencies translate into high relevance. Ontologies and contexts are deﬁned by using the source information and at least a thesaurus to deﬁne the semantic relations, such as WordNet (Miller, 1995). An expert user is responsible for performing this task.

4.8. ODGIS system 3

PowerLoom Manual. 2006 http://www.isi.edu/isd/LOOM/PowerLoom/ documentation/manual/manual.html 4 OpenGIS Consortium, Geography Markup Language, http://www. opengis.net/gml/

The ODGIS (Ontology-Driven Geographic Information Systems) system (Fonseca, 2001) introduces a framework for the integration of geographic information. This framework has two main aspects, knowledge generation in which

ARTICLE IN PRESS 718

A. Buccella et al. / Computers & Geosciences 35 (2009) 710–723

the ontologies are speciﬁed, and knowledge use in which a group of components interact to answer a query (by using mechanisms to retrieve instances of classes from ontologies). In knowledge generation, ontologies at different levels of detail are speciﬁed. Firstly, a top-level ontology describing the basic concepts is deﬁned. After that, more speciﬁc ontologies (domain ontologies) are created depending on the communities involved in the integration process, such as tourism or transportation communities. In addition other ontologies are proposed to be created, such as task or application ontologies. All these ontologies are made by expert users by using the same structure as the MDSM approach, that is based on distinguishing features (parts, functions, and attributes, Rodrı´guez and Egenhofer, 2003) and a set of semantic relations (is-a, part-of, and whole-of). Thus, this multi-level ontology approach applies different mechanisms for integration within each level of detail. The main contribution of this approach is the use of roles denoting the different functions an object can take depending on the perspective. For example, a lake is always a lake, but it can play the role of a ﬁsh habitat or a role of a reference point. Thus, each entity of an ontology can play many roles. Roles and hierarchy mechanisms are used as tools to integrate the different ontologies. In the case of hierarchies, the integration is made at the ﬁrst possible intersection going upward in the ontology tree. With respect to roles, a role in one class can be matched to another class or role. When the integration involves low level ontologies, such as application ontologies, new classes can be created through the use of inheritance. These new classes can play many roles corresponding to other classes in the ontologies. Since each role can come from a different ontology, the ontology integration is achieved through these classes. 4.9. MDSM þ TR methodology Janowicz (2005) proposes an extension of the MDSM (Rodrı´guez and Egenhofer, 2004) approach described previously. By adding thematic roles (TR) (Sowa, 2000) within the functional feature the authors strive to avoid wrong matches. A TR5 is a semantic relationship between a predicate (e.g. a verb) and an argument (e.g. the noun phrases) of a sentence. Some elements included in TR are: an agent which performs the action (e.g. Bill ate his soup quietly), a theme/patient which undergoes the action (e.g. the falling rocks crushed the car), a goal describing what the action is directed towards (e.g. the caravan continued on toward the distant oasis), location where the action occurs (e.g. Johnny and Linda played carelessly in the park), etc. To denote these TR authors use a matrix (Sowa, 2000) in which the six rows represent verb categories (such as action, process, etc.) and the four columns represent different kinds of participants (such as agent or location). One important thing is that each class plays a speciﬁc TR depending on the context. For example, a Person who arrives at a sport arena is regarded as actor 5

http://www.jfsowa.com/ontology/thematic.htm

(agent) whereas the sport arena is regarded as location. Therefore this proposal changes the deﬁnition of the functions by adding the TR the class plays in this function to the name of each function. Thus, there is a relation between functions and TR. In order to calculate similarity values, two new functions are deﬁned. The ﬁrst one compares the TR between two classes allowing partial matching. The second one involves the weight that has to be added to the function feature. Finally, as in the MDSM approach, a sum of products (value weightðwÞ) is added to the ﬁnal function. Again, to deﬁne the weights, variability and commonality can be used. 4.10. GeoNis system In the GeoNis system, Stoimenov et al. (2006) deﬁnes a hybrid ontology approach based on a semantic mediator architecture. In particular, based on a top-level ontology and on a reference model, the approach proposes an offline methodology of discovering mappings between concepts from local geographic ontologies. As in other approaches, local ontologies, the top-level ontology and the reference model are deﬁned by using hierarchical (is-a) and semantic relations such as synonyms, hyperonyms, meronyms and topological relations (arc-node, route node-route, and point-event). DL is the language by which the ontologies are represented. As in the Hakimpour (2003)’s proposal, mappings are deﬁned through four types of similarity relations, equality, dissimilarity, intersection, and containment. However, three different types of semantic mappings have to be found: direct mappings between the two local ontologies, indirect mappings across the top-level ontology, and mappings across the reference common model. Expert users are responsible for discovering all mappings except those that can be discovered by using the inference capabilities of a reasoner. The ﬁrst step is to ﬁnd a mapping between concepts from the two local ontologies by means of the isa or synonym relations. If no mapping is found, each local ontology is mapped to the top-level ontology (as parallel processes). The resultant mappings are then compared (within the top-level ontology) in order to determine if there is a correspondence between the respective concepts of the local ontologies. Thus, this proposal implements a GAV mapping. Finally, IF–THEN rules are used as part of the inference process to enable GeoNis to discover new mappings that cannot be found in the last process. These rules are written in DL. 4.11. Proposal of Aerts et al. The proposal of Aerts et al. (2006) describes a methodology to develop an integrated geographic system focusing on semantic heterogeneity in topographic databases. Geographic databases are translated into OWL6 6 OWL Web Ontology Language Semantics and Abstract Syntax. 2004 http://www.w3.org/TR/owl-semantics/

ARTICLE IN PRESS A. Buccella et al. / Computers & Geosciences 35 (2009) 710–723

719

Table 1 Ontology aspect applied to the proposals Ontologies

BUSTER Kavouras et al. Schwering et al. Sotnykova et al. Hakimpour et al. Hess et al. MDSM ODGIS MDSM þ TR GeoNis Aerts et al.

Formal ontology

Expressiveness

Ontological components

Yes No No Yes Yes No No No No Yes Yes

Medium Medium Medium High Low Medium Medium Low Medium Medium Low

Source ontologies Source ontologies Source ontologies Source ontologies Source ontologies and a top-level ontology Source ontologies and a top-level ontology Source ontologies Ontologies with different levels of detail or views Source ontologies Source ontologies and a top-level ontology Source ontologies and a top-level ontology

ontologies in order to take advantage of reasoners such as Pellet.7 Each ontology is based on an top-level ontology which contains only a limited vocabulary to express the meaning of topographic feature classes. In this way, a LAV approach is applied. Each of the concepts of the ontologies contains a set of necessary and sufﬁcient conditions deduced by applying Formal Concept Analysis (FCA, Stumme and Madche, 2001). FCA is used as a heuristic to extract these conditions from the deﬁnitions of topographic features of the NMAs. An NMA is an entity that collects and manages geodata on behalf of different government departments. In order to extract necessary and sufﬁcient conditions, properties (of the geographic databases) are categorized as classifying and non-classifying. Classifying properties are those that distinguish one topographic feature from another. Conversely, non-classifying properties are not enough to differentiate features (e.g. the fact that a motorway has a width is not enough to distinguish it among other kinds of roads). After assigning this classiﬁcation to feature classes in the FCA, these properties are translated into necessary and sufﬁcient conditions in OWL. Besides, these conditions are expressed in terms of the top-level ontology aforementioned. Finally, the reasoner is responsible for inferring hierarchical and equivalence relations. 5. Discussion In this section the 11 approaches described in the last section are discussed taking into account the comparison criteria described in Section 3. Table 1 contains the three features of the ontological aspect (Fig. 3) as applied to the proposals. The two ﬁrst columns analyze how the ontologies are represented. In the ﬁrst column we analyze if the proposals are based on formal ontologies or if they propose another representation to deﬁne them; and in the second column, we evaluate the expressiveness of the ontologies. As we can see, various combinations of these two ﬁrst columns are possible. For example, proposals such as 7

http://www.mindswap.org/2003/pellet/

MDSM, MDSM þ TR, and ODGIS represent the ontologies by using a structure in which parts, functions, and attributes are deﬁned. For the ﬁrst two proposals, the expressiveness of these ontologies is medium as they can be represented by using both meronym (part–whole) and hyperonym (such as a taxonomy) relations. In the case of ODGIS, the expressiveness is low because only taxonomies are allowed. The same happens with Aerts et al., and Hakimpour et al., however, those ontologies are formal. On the other hand, the proposals by BUSTER, Kavouras et al., and Schwering et al. use different ways to model the ontologies, such as a rule-based mediator, with a medium level of expressiveness in the ﬁrst one. Finally, the Sotnykova et al. proposal is scored with a high value on the second feature because the full range of capabilities provided by formal languages can be applied to model the ontologies. The proposals by Kavouras et al. and ODGIS are aimed at improving the deﬁnition of ontologies in order to indirectly improve the process of discovering mappings. In the case of Kavouras et al., authors are mainly focused on enriching the ontologies by using geographical resources as geographic repositories. Clearly, the way the ontologies are deﬁned will be crucial when an integration process is applied. As the last feature of the ontological aspect, the ontological components are analyzed in order to evaluate the number that must be built. For example, to determine if only source ontologies are built, or if there is also a toplevel ontology, or if other types of ontologies are necessary to ﬁnd mappings, etc. The third column of Table 1 shows which components are deﬁned by each proposal. We analyze this column together with the inferences column in the integration process feature (Table 2). In the two ﬁrst columns of Table 2, we analyze the way mappings are discovered. We can observe that all proposals that use a top-level ontology to mediate the source ontologies can take advantage of a reasoner to calculate inferences (similarities). For example, the proposal by Hakimpour et al., GeoNis, and the proposal by Aerts et al., construct source ontologies based on a commitment to a previously deﬁned top-level ontology, and apply a reasoner. Something similar happens in the proposal by Hess et al., although a reasoner is not applied because the ontologies are not formal. In the ODGIS

ARTICLE IN PRESS 720

A. Buccella et al. / Computers & Geosciences 35 (2009) 710–723

Table 2 Integration and geographic information aspects applied to the proposals Integration

Geographic information

Integration process

BUSTER Kavouras et al. Schwering et al. Sotnykova et al. Hakimpour et al. Hess et al. MDSM ODGIS MDSM þ TR GeoNis Aerts et al.

Mapping approach

Manual

Inferences

Yes No No Yes No No No No No No No

Yes No No No Yes No No No No Yes Yes

approach more than one ontology is built in order to provide greater information about the domain and thus facilitate the integration process. However, the activity of creating these ontologies is not an easy task. The remaining proposals do not use top-level ontologies and consequently do not use reasoning capabilities to calculate similarities. To do so in general, proposals apply a set of similarity functions, such as in MDSM and Kavouras et al. where functions based on the Tversky’s model are used. Proposals performing some manual step during the integration process require the assistance of an expert user. For example, BUSTER needs an expert user although it uses inferences during the query process. We can observe that systems with manual integration use formal ontologies and show medium/high expressiveness. The reason for this might be that such expressiveness for ontology languages makes intractable any automatic process in mapping discovery. The third column contains information about the approach applied to specify mappings: LAV, GAV or GLAV. For example, ODGIS implements a GLAV approach because mappings can be speciﬁed in any direction. The resulting ontologies can be integrated by using both views. Other proposals as BUSTER, Schwering et al., MDSM þ TR, Stonykova et al., Kavouras et al., and MDSM implement a GAV approach because source ontologies act as primitives in order to create the shared vocabulary. Proposals as Hakimpour et al., Hess et al., and Aerts et al. propose a LAV approach and the use of top-level ontologies to assist the mapping discovery process. Finally, in the last column of Table 2 we analyze the representation of geographic information, where only four proposals provide topological relations to model ontologies. In the BUSTER approach, qualitative models are used at the data level to retrieve instances of the sources. Only the proposal by Sotnykova et al. provides a full representation allowing models to represent spatio-temporal features. However, there are several works proposing the incorporation of spatial characteristics into the semantics of the DL language (Haarslev et al., 1998; Renz, 2002; Cohn and Hazarika, 2001) in order to compute regions and topological relations automatically. In addition, the

GAV GAV GAV GAV LAV LAV GAV GLAV GAV GAV LAV

Qualitative model Topological rel. Topological rel. MADS model No No No No No Topological rel. Topological rel.

qualitative spatial reasoning (QSR) area is still under research (Renz, 2007). As a general conclusion of our analysis we note that several considerations have to be made when an integrated system is built. Several proposals represent the ontologies by using a formal language in order to take advantage of inference mechanisms. Then, source ontologies must in general commit to the same top-level ontology to allow the reasoning system to start the integration process. Thus, source ontologies are not independent because different communities must agree on the top-level ontology. Nevertheless, in integrated systems the use of formal ontologies and consequently the use of reasoners should be mandatory. When the ontologies are formally represented by a logic language, several advantages are inherent. For example, reasoning capabilities such as consistency checking, subsumption of concepts, and detection of cardinality restrictions can be taken into account during the integration process. Besides, the addition of a new information source or new information from an existing source will then be easier than in other systems, because aspects such as redundancy and inconsistency would be automatically checked. Other proposals involve a set of functions that analyze the schema syntactically and semantically. These functions are applied to the source ontologies in order to ﬁnd mappings among them. We think that the use of these types of functions is useful when the ontologies are not complete (that is, there is absent information about the domain) and/or as the starting point of an integration process when a top-level ontology is not involved. For example, when the ontologies are not complete, a syntactic function might ﬁnd a mapping between elements of two ontologies that would not be found by a reasoner. Methods using thesaurus to ﬁnd synonym relations are also useful in this case. Another alternative is to replace the role of the top-level ontology by using similarity functions to ﬁnd the ﬁrst mappings. Then a reasoner can be applied to ﬁnd other implicit mappings such as those involving generalization/ specialization relations. Another consideration is the representation of the ontologies and the proposed structure. This last item is

ARTICLE IN PRESS A. Buccella et al. / Computers & Geosciences 35 (2009) 710–723

related to the approach applied by each proposal (Fig. 1) with respect to the ontologies. Most of the proposals have implemented an hybrid ontology approach as the main architecture of the system. Thus, local ontologies and a shared ontology are components of the system. Remember that this approach has several advantages over the others. However, in order to avoid the semantic heterogeneity problems that might occur between a local ontology and the shared ontology (ontology mismatches, Visser et al., 1997), a mapping component is also deﬁned. It maps terms and expressions deﬁned by a local ontology to terms and expressions of the shared ontology or vice versa (depending on which of the GAV or LAV approaches is applied). We think that systems based on the hybrid approach will have all its inherent advantages and, depending on the representation of the ontologies, new capabilities could be added. For example, when the ontologies are represented by using a formal language, the addition of a new information source, or new information from an existing source, will be easy because aspects such as redundancy and inconsistency will be automatically checked. Another advantage of using formal ontologies is related to the process of querying the system, where query languages based on formal representation of ontologies can be implemented (Zhang, 2005). With respect to the representation of the ontologies, the proposal by Sotnykova et al. is the only one that allows the ontologies to be represented formally and includes geographical features. However, although a logic language is used, the proposal does not take advantage of it to ﬁnd similarities, such as in the GeoNis proposal. An expert user is responsible for creating mappings among sources. Similarly, all proposals need user interaction in some part of the process. In general, it is not possible to automatically fully determine all mappings among ontologies, and thus an expert user must be involved. For example, when inconsistencies are found in the ontologies, an expert user is responsible for solving them (even when formal ontologies are implemented). 6. Future trends Several of the compared proposals are still in a development stage and, as we have explained, some aspects need to be addressed in order to achieve a good integration. Although we have found a set of mechanisms in common about how to build an integrating system (e.g. the use of ontologies) other important aspects must be taken into account by all proposals in order to achieve geographic integration. These aspects include:

Representation of the geographic information: The nature of the representation of the geographic information is one of the main aspects that should be considered. However, few proposals implement mechanisms to model this information and even when modeled, this information is not taken into account by the integration process. At this point we should consider the following hypothesis tested by Mark et al. (2001): ‘‘geographic and non-geographic entities are

721

ontologically distinct in a number of ways’’. Their experiments tested the degree to which ordinary people can code the geographic domain at the conceptual level. As a conclusion they ﬁnd there is a set of geographic terms that have a higher frequency of occurrence. In principle, by knowing which terms these are, the integration process could be simpliﬁed. The expressiveness of the ontologies is also a very important aspect because the elements of the ontologies must represent concepts in the real world. For example, if we can only represent taxonomies, important information about the domain will be absent (e.g. such as user-deﬁned properties to model owners of a building). Obviously, aspects as decidability and computational resources must be taken into account when more expressiveness is added and also when the ontologies are large. Formal representation of ontologies: With respect to the formal representation of the ontologies, less than half of the analyzed proposals provide a formal model for representing ontologies, and only one exhibits high expressiveness. Therefore, the use of formal ontologies is not very common in this domain and work in this area seems necessary. Those proposals applying formal ontologies have also applied formal reasoning mechanisms to improve both the integration process and the evolution of the integrated system. Even though some proposals have used top-level ontologies, which can interfere with the independence of the system (as all communities must agree on the same structure), all capabilities of a logic language are applied. Thus, inferences such as consistency checking and subsumption of concepts are taken into account. In addition, with emerging research in spatial reasoning more implicit relations among the ontologies can be found and spatial reasoning (Cohn and Hazarika, 2001) for the query process can be applied. The integration process: In general, the integration processes described here involve a set of steps to ﬁnd suitable mappings among source ontologies. The elements of the ontologies are compared by applying different mechanisms such as reasoning, syntactic and semantic functions, or structural analysis. All of them are based on the represented semantic knowledge. However, when systems involve geographic information, the integration processes must change. Geographic objects, ﬁelds, quantitative and qualitative relations, spatio-temporal variation, and scale are new concepts that must be taken into account. For example, a very common feature such as granularity in geographic models should be considered and analyzed when mappings are looked for. Two ontologies with different granularities will not have elements in common unless mechanisms to solve granularity issues are implemented (Fonseca, 2001). Therefore, integration methods should add new techniques to handle these new features.

Similarly, all these concerns will depend on the source ontologies. There are more than 8000 works discussing ontologies in several domains of computer science, and

ARTICLE IN PRESS 722

A. Buccella et al. / Computers & Geosciences 35 (2009) 710–723

more than 500 articles that discuss the representation of ontologies.8 But, how can we evaluate the ontology’s quality? How can we determine whether an ontology is complete? Is it representing the real world? Work such as that as Guarino and Welty (2000, 2004) focuses on these issues, proposing a methodology for validating taxonomies in ontologies. This type of work is crucial because any integration method, even using reasoners or similarity functions, will produce unsatisfactory results if they are based on low-quality ontologies.

Acknowledgments We would like to thank reviewers of this article for their helpful comments; and we are deeply grateful to Boyan Brodaric for his kind assistance in reviewing the ﬁnal version of this manuscript. This work is partially supported by the UNComa project 04/E072—Identiﬁcacio´n, Evaluacio´n y Uso de Composiciones Software. References Aerts, K., Maesen, K., Van Rompaey, A., 2006. A practical example of semantic interoperability of large-scale topographic databases using semantic web technologies. In: Proceedings of the AGILE’06: 9th Conference on Geographic Information Science, Visegra´d, Hungary, pp. 35–42. Baader, F., Calvanese, D., McGuiness, D., Nardi, D., Patel-Schneider, P. (Eds.), 2003. The Description Logic Handbook—Theory, Implementation and Applications. Cambridge University Press, Cambridge, UK, 574pp. Borges, K., Davis, C., Laender, A., 2001. Omt-g: an object-oriented data model for geographic applications. Geoinformatica 5 (3), 221–260. Calı`, A., 2003. Reasoning in data integration systems: why lav and gav are siblings. In: Proceedings of the ISMIS’03: 14th International Symposium on Methodologies for Intelligent Systems. Lecture Notes in Computer Science, vol. 2871. Springer, Berlin, Heidelberg, pp. 562–571. Calı`, A., Calvanese, D., Giacomo, G.D., Lenzerini, M., 2002. On the expressive power of data integration systems. In: Proceedings of the ER’02: 21st International Conference on Conceptual Modeling. Lecture Notes in Computer Science, vol. 2503. Springer, Berlin, Heidelberg, pp. 338–350. Calvanese, D., Giacomo, G.D., 2005. Data integration: a logic-based perspective. Artiﬁcial Intelligence Magazine 26 (1), 59–70. Calvanese, D., Giacomo, G.D., Lenzerini, M., 2001. A framework for ontology integration. In: Proceedings of the SWWS’01: 1st Semantic Web Working Symposium at the Emerging Semantic Web. Stanford University, Palo Alto, CA, USA, pp. 303–316. Chalupsky, H., 2000. Ontomorph: a translation system for symbolic logic. In: Proceedings of the KR’00: 7th International Conference on Principles of Knowledge Representation and Reasoning, Breckenridge, CO, USA, pp. 471–482. Cohn, A.G., Hazarika, M., 2001. Qualitative spatial representation and reasoning: an overview. Fundamenta Informaticae 46 (1–2), 1–29. De Brujin, J., 2003. Using ontologies: enabling knowledge sharing and reuse on the semantic web. Technical Report DERI-2003-10-29, DERI—Digital Enterprise Research Institute, Galway, Ireland, 60pp. Euzenat, J., Shvaiko, P., 2007. Ontology Matching. Springer, Berlin, Heidelberg, DE, 341pp. Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L., 2003. Data exchange: semantics and query answering. In: Proceedings of the ICDT’03: International Conference on Database Theory, Siena, Italy, pp. 207–224. Fensel, D., 2003. Ontologies: Silver Bullet for Knowledge Management and Electronic Commerce, second ed. Springer, Berlin, Germany, 184pp.

8

http://citeseer.ist.psu.edu/cs

Fonseca, F., 2001. Ontology-driven geographic information systems. Ph.D. Dissertation, University of Maine, Orono, ME, USA, 131pp. Fonseca, F., Egenhofer, M., Agouris, P., Caˆmara, C., 2002. Using ontologies for integrated geographic information systems. Transactions in GIS 6 (3), 231–257. Fonseca, F., Davis, C., Caˆmara, C., 2003. Bridging ontologies and conceptual schema in geographical information integration. Geoinformatica 7 (4), 307–321. Friedman, M., Levy, A., Millstein, T., 1999. Navigational plans for data integration. In: Proceedings of the AAAI/IAAI’99: 16th National Conference on Artiﬁcial Intelligence and 11th Innovative Applications of Artiﬁcial Intelligence. American Association for Artiﬁcial Intelligence, Menlo Park, CA, USA, pp. 67–73. Goldstone, R., 1994. Similarity, interactive activation, and mapping. Journal of Experimental Psychology: Learning, Memory, and Cognition 20 (1), 3–28. Gruber, T., 1993. A translation approach to portable ontology speciﬁcations. Knowledge Acquisition 5 (2), 199–220. Guarino, N., Welty, C., 2000. A formal ontology of properties. In: Proceedings of the EKAW’00: 12th European Workshop on Knowledge Acquisition, Modeling and Management. Lecture Notes in Computer Science, vol. 1937. Springer, London, UK, pp. 97–112. Guarino, N., Welty, C., 2004. An overview of ontoClean. In: Staab, S., Studer, R. (Eds.), Handbook on Ontologies. Springer, Berlin, Heidelberg, DE, pp. 151–172. Haarslev, V., Mo¨ller, R., 2001. Racer system description. In: Proceedings of the IJCAR ’01: 1st International Joint Conference on Automated Reasoning. Lecture Notes in Computer Science, vol. 2083, London, UK, pp. 701–706. Haarslev, V., Lutz, C., Moller, R., 1998. Foundation of spatiotemporal reasoning with description logics. In: Proceedings of the KR’98: 6th International Conference on Principles of Knowledge Representation and Reasoning. Morgan Kaufmann, Trento, Italy, pp. 112–123. Hakimpour, F., 2003. Using ontologies to resolve semantic heterogeneity for integrating spatial database schemata. Ph.D. Dissertation, Zurich University, Switzerland, 191pp. Hakimpour, F., Geppert, A., 2002. Global schema generation using formal ontologies. In: Proceedings of the ER’02: 21st International Conference on Conceptual Modeling. Lecture Notes in Computer Science, vol. 2503. Springer, Berlin, Heidelberg, pp. 307–321. Hess, G.N., Iochpe, C., 2004. Ontology-driven resolution of semantic heterogeneities in gdb conceptual schemas. In: Proceedings of the GEOINFO’04: VI Brazilian Symposium on GeoInformatics. Campos do Jorda˜o, Brazil, pp. 247–263. Hill, L.L., 2000. Core elements of digital gazetteers: placenames, categories, and footprints. In: Proceedings of the ECDL ’00: 4th European Conference on Research and Advanced Technology for Digital Libraries. Springer, London, UK, pp. 280–290. Holt, A., 2000. Understanding environmental and geographical complexities through similarity matching. Complexity International 7, 1–16. Horrocks, I., 1998. The fact system. In: Proceedings of the TABLEAUX’98: Automated Reasoning with Analytic Tableaux and Related Methods. Lecture Notes in Computer Science, vol. 1397. Springer, Berlin, pp. 307–312. Janowicz, K., 2005. Extending semantic similarity measurement thematic roles. In: Proceedings of the GeoS’05: First International Conference on GeoSpatial Semantics. Springer, Mexico City, Mexico, pp. 137–152. Jensen, K., Binot, J., 1987. Disambiguating propositional phrase attachments by using on-line dictionary deﬁnitions. Computational Linguistic 13 (3/4), 251–260. Kalfoglou, Y., Schorlemmer, M., 2003. Ontology mapping: the state of the art. The Knowledge Engineering Review 18 (1), 1–31. Katifori, A., Halatsis, C., Lepouras, G., Vassilakis, C., Giannopoulou, E., 2007. Ontology visualization methods: a survey. ACM Computing Surveys 39 (4), 1–42 (Article 10). Kavouras, M., Kokla, M., Tomai, E., 2003. Comparing categories among geographic ontologies. Computers & Geosciences 31 (2), 145–154 (special issue). Kifer, M., Lausen, G., Wu, J., 1995. Logical foundations of object-oriented and frame-based languages. Journal of the Association for Computing Machinery (ACM) 42 (4), 741–843. Klein, M., 2001. Combining and relating ontologies: an analysis of problems and solutions. In: Proceedings of the IJCAI’01: 17th International Joint Conferences on Artiﬁcial Intelligence, Seattle, WA, USA, pp. 53–62. Levenshtein, I.V., 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, 707–710.

ARTICLE IN PRESS A. Buccella et al. / Computers & Geosciences 35 (2009) 710–723

Madhavan, J., Bernstein, P.A., Rahm, E., 2001. Generic schema matching with cupid. In: Proceedings of the VLDB ’01: 27th International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp. 49–58. Mark, D.M., Skupin, A., Smith, B., 2001. Features, objects, and other things: ontological distinctions in the geographic domain. In: Proceedings of the COSIT’01: International Conference on Spatial Information Theory. Lecture Notes in Computer Science, vol. 2205. Springer, Berlin, pp. 488–502. Miller, G.A., 1995. Wordnet: a lexical database for english. Communication of the ACM 38 (11), 39–41. Noy, N.F., McGuiness, D.L., 2001. Ontology development 101: a guide to creating your ﬁrst ontology. Technical Report KSL-01-05 and SMI2001-0880, Stanford Knowledge Systems Laboratory and Stanford Medical Informatics, Stanford University, Palo Alto, CA, USA, 25pp. Parent, C., Spaccapietra, S., Zima´nyi, E., 1999. Spatio-temporal conceptual models: data structures þ space þ time. In: Proceedings of the GIS ’99: 7th ACM International Symposium on Advances in Geographic Information Systems. ACM Press, New York, NY, USA, pp. 26–33. Parent, C., Spaccapietra, S., Zima´nyi, E., 2005. The murmur project: modeling and querying multi-representation spatio-temporal databases. Information Systems 31 (8), 733–769. Rahm, E., Bernstein, P.A., 2001. A survey of approaches to automatic schema matching. Very Large Data Bases Journal 10 (4), 334–350. Ravin, Y., 1990. Disambiguating and interpreting verb deﬁnitions. In: Proceedings of the ACL’90: 28th Annual Meeting on Association for Computational Linguistics, Pittsburgh, PA, USA, pp. 260–267. Renz, J., 2002. Qualitative spatial reasoning with topological information. Lecture Notes in Computer Science, vol. 2293. Springer, New York, NY, USA, 207pp. Renz, J., 2007. Qualitative spatial and temporal reasoning: efﬁcient algorithms for everyone. In: Proceedings of the IJCAI’07: 20th International Joint Conference on Artiﬁcial Intelligence, Hyderabad, India, pp. 526–531. Rodrı´guez, M., Egenhofer, M., 2003. Determining semantic similarity among entity classes from different ontologies. IEEE Transactions on Knowledge and Data Engineering 15 (2), 442–456. Rodrı´guez, M., Egenhofer, M., 2004. Comparing geospatial entity classes: an asymmetric and context-dependent similarity measure. International Journal of Geographical Information Science 18 (3), 229–256. Schwering, A., Raubal, M., 2005. Spatial relations for semantic similarity measurement. In: Proceedings of the ER’05: 24th International Conference on Conceptual Modeling. Lecture Notes in Computer Science, vol. 3770. Springer, Berlin, Heidelberg, pp. 259–269. Sciore, E., Siegel, M., Rosenthal, A., 1994. Using semantic values to facilitate interoperability among heterogeneous information systems. ACM Transactions on Database Systems 19 (2), 254–290.

723

Shariff, A., Egenhofer, M., Mark, D., 1998. Natural-language spatial relations between linear and areal objects: the topology and metric of english-language terms. International Journal of Geographical Information Science 12 (3), 215–245. Shvaiko, P., Euzenat, J., 2005. A survey of schema-based matching approaches. Journal on Data Semantics IV, 146–171. Sotnykova, A., Vangenot, C., Cullot, N., Bennacer, N., Aufaure, M., 2005. Semantic mappings in description logics for spatio-temporal database schema integration. Journal on Data Semantics III, 143–167. Sowa, J., 2000. Knowledge Representation: Logical, Philosophical and Computational Foundations. Brooks Cole Publishing Co, Paciﬁc Grove, CA, USA, 512pp. Stoimenov, L., Stanimirovic, A., Djordjevic-Kajan, S., 2006. Discovering mappings between ontologies in semantic integration process. In: Proceedings of the AGILE’06: 9th Conference on Geographic Information Science, Visegra´d, Hungary, pp. 213–219. Stuckenschmidt, H., Visser, U., 2000. Semantic translation based on approximate re-classiﬁcation. In: Proceedings of the Workshop on Semantic Approximation Granularity and Vagueness, Brenkenridge, CO, USA, pp. 110–118. Stumme, G., Madche, A., 2001. Fca-merge: bottom-up merging of ontologies. In: Proceedings of the IJCAI’01: 17th International Joint Conference on Artiﬁcial Intelligence, Seattle, WA, pp. 225–230. Tversky, A., 1977. Features of similarity. Psychological Review 84 (4), 327–352. Ullman, J.D., 2000. Information integration using logical views. Theoretical Computer Science 239 (2), 189–210. Visser, P., Jones, D., Bench-Capon, T., Shave, M., 1997. An analysis of ontology mismatches; heterogeneity versus interoperability. In: Proceedings of the AAAI’97: Spring Symposium on Ontological Engineering. Stanford University, Palo Alto, CA, USA, pp. 164–172. Visser, U., 2004. Intelligent information integration for the semantic web. Lecture Notes in Computer Science, vol. 3159. Springer, Berlin, Heidelberg, 150pp. Wache, H., 1999. Towards rule-based context transformation in mediators. In: Proceedings of the EFIS’99: 2nd International Workshop on Engineering Federated Information Systems. Inﬁx-Verlag, Kuhlungsborn, Germany, pp. 107–122. Wache, H., Vo¨gele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neumann, H., Hu¨bner, S., 2001. Ontology-based integration of information—a survey of existing approaches. In: Proceedings of the IJCAI’01: 17th International Joint Conferences on Artiﬁcial Intelligence, Seattle, WA, pp. 108–117. Wiederhold, G., 1994. An algebra for ontology composition. In: Proceedings of the Monterey Workshop on Formal Methods, Monterey, CA, USA, pp. 56–61. Zhang, Z., 2005. Ontology query languages for the semantic web: a performance evaluation. M.Sc. Thesis, University of Georgia, Athens, 158pp.

Ontology-driven geographic information integration: A survey of current approaches

Ontology-driven geographic information integration: A survey of current approaches

Recommend Documents