SWRL rule-selection methodology for ontology interoperability

DATAK-01541; No of Pages 20 Data & Knowledge Engineering xxx (2015) xxx–xxx Contents lists available at ScienceDirect Data & Knowledge Engineering j...

Download PDF

2MB Sizes 0 Downloads 93 Views

Report

PDF Reader
Full Text

DATAK-01541; No of Pages 20 Data & Knowledge Engineering xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Data & Knowledge Engineering journal homepage: www.elsevier.com/locate/datak

SWRL rule-selection methodology for ontology interoperability Tarcisio Mendes de Farias a, Ana Roxin b,⁎, Christophe Nicolle b a b

Active3D, Dijon, France Checksem, Laboratory LE2I (UMR CNRS 6306), Univ. Bourgogne Franche-Comté, Dijon, France

a r t i c l e

i n f o

Article history: Received 6 March 2015 Received in revised form 11 September 2015 Accepted 11 September 2015 Available online xxxx Keywords: Ontology alignment OWL SWRL SPARQL Backward-chaining reasoning Interoperability

a b s t r a c t Data interoperability represents a great challenge for today's enterprises. Indeed, they use various information systems, each relying on several different models for data representation. Ontologies and notably ontology matching have been recognized as interesting approaches for solving the data interoperability problem. In this paper, we focus on improving the performance of queries addressed over ontology alignments expressed through SWRL rules. Indeed, when considering the context of executing queries over complex and numerous alignments, the number of SWRL rules highly impacts the query execution time. Moreover, when hybrid or backward-chaining reasoning is applied, the query execution time may grow exponentially. Still, the reasoners involved deliver performant results (in terms of execution time) when applied over reduced and simpler rule sets. Based on this statement, and to address the issue of improving the query execution time, we describe a novel approach that allows, for a given query, to ignore unnecessary rules. The proposed Rule Selector (RS) is a middleware between the considered systems and the reasoner present on the triple store side. Through the benchmarks realized we prove that our approach allows considerably minimizing query execution time. © 2015 Published by Elsevier B.V.

1. Introduction The needs for information system interoperation have signiﬁcantly arisen since 1989 with the advent of the Word Wide Web (WWW) and the popularization of the personal computer. For most companies, this changed the way in which companies do business. Not only they have to be more reactive to changes in market forces, but they also need to be capable of adapting their business offers [1]. For doing so, companies mainly rely on enterprise information systems (EISs) which are highly dynamic. Thus, achieving interoperability through standards such as XML [2] or STEP [3] is no longer adapted for delivering interoperability among EISs. EIS interoperability is deﬁned as “the capacity that two or more enterprises, and their systems, have of cooperating over a period of time towards a common objective” [4]. Given the importance of EISs from an economic perspective, data interoperability among such systems has become one important issue, thus slowing down an industry-wide adoption of EISs [4]. Regarding EIS interoperability, one main difﬁculty comes from the fact that different experts with several modeling practices and constraints (e.g. real-time capabilities, security, etc.) can produce different conceptual models for representing the same ensembles of data. This was also noticed by authors in [5] when they stated that such practices can lead, for example, to several conceptual representations with the same semantics. Still, regarding schema heterogeneity (i.e. data model heterogeneity), authors in [5] suggest transforming these conceptual models into a ﬁne-grained model based on a fact-oriented modeling approach, as described in [6] (i.e. an attribute free conceptualization). Authors in [7] believe that successful enterprise information integration approaches depend on the development of tools for tackling schema heterogeneity and meta-data management. Moreover, as EIS interoperability focuses

⁎ Corresponding author. Tel.: +33 668 795 154. E-mail addresses: [email protected] (T.M. de Farias), [email protected] (A. Roxin), [email protected] (C. Nicolle).

http://dx.doi.org/10.1016/j.datak.2015.09.001 0169-023X/© 2015 Published by Elsevier B.V.

Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

2

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

on data modeling and querying, another crucial aspect in this context is how to retrieve query results in an adequate and predictable time. Based on those remarks, delivering EIS interoperability relies on mitigating semantic (and schema) heterogeneity issues and achieving an adequate query response time. For doing so, ontologies have been recognized as interesting and promising in what concerns data interoperability, mainly because they handle formal representations of data over which one may perform reasoning [8,9]. In order to resolve the problem of data interoperability, one has to deﬁne semantic links between those models — this is known as ontology matching [10,11]. This process may be done either manually or automatically. Generally, its output is a list of pairs of concepts that correspond to each other. Such alignment between ontological models can be implemented by means of logical rules and can be performed at query time. Therefore, EIS interoperability can be delivered using an ontology-based approach coupled with a rule-based mapping for representing ontology alignments. For providing a dynamic interoperability, logic rules for interoperability (i.e. alignment rules) are applied by a reasoner at query execution time. Thus, we don't have to rewrite a static ﬁle (e.g. based on XML) for every data or even schema change. This is because alignment rules are solely considered at query execution time. Still, if schemas are modiﬁed, then new alignment rules are generated without needing to recompute data. We thus mitigate schema heterogeneity by deﬁning rules for aligning ontologies. Depending on the complexity of the considered ontology model, the ontology matching process may generate thousands of alignments. Among the existing alignment formats, Semantic Web Rule Language (SWRL) [12] allows specifying complex alignments in the form of Horn-like rules [12]. Those rules are stored with the ontology model and can be directly applied over existing data at query time, thus avoiding the storage of redundant data in the knowledge base (KB). In this paper, we focus on improving the performance of queries addressed over ontology alignments expressed through SWRL rules. When performing a query over such complex and various alignments, query execution time is often highly increased. This is because general purpose reasoners which support SWRL (e.g. KAON2 [13]) don't distinguish rules designed for interoperating ontologies from other rules. Still, these reasoners have a good performance (in terms of query execution time) when handling limited and simple rule sets. Based on this, and to address the issue of improving query execution time, we describe a novel approach that allows ignoring unnecessary rules when answering a query addressed over several aligned ontologies. Thus, by preventing the reasoner from performing useless inferences, we minimize query execution time. In an EIS context, it is crucial that a query process delivers results in almost real-time. To our best knowledge, this is the ﬁrst research work that addresses the problem of optimizing query execution time by limiting the number of logical rules in the context of ontology interoperability. Our contributions in this paper are summarized as follows: 1. We describe our approach for improving the performances of queries addressed over ontology alignments described in terms of SWRL rules. 2. We deﬁne a novel architecture for our rule selector (RS), architecture that contains two main modules: a. The Rule Pre-processing module is responsible for transforming existing SWRL rules describing the ontology alignment into canonical rules. In other words, this module performs a normalization of the existing rule set. b. The Query execution module aims at 2 goals: i. Identify the necessary and sufﬁcient rule subset that allows answering a given query; ii. Eventually, remove rules from the previously identiﬁed rule set (step 2.b.i) that produce rule cycles (e.g. A → B → A). 3. We specify the algorithms used by the two previously described modules: a. Algorithm for the Rule Pre-processing module b. Algorithm for the Query execution module 4. We provide an evaluation of the performance of our approach, by applying 4 different queries over an alignment of 2 ontologies. Moreover, we compare our approach with existing ones. The rest of this paper is organized as follows: Section 2 gives a more comprehensive view on the context of this work; Section 3 is a survey of related work. Next, we describe our approach, the proposed architecture and algorithms (Section 4), then we present the evaluation results (Section 5). We end this article with a summary and a discussion on future works (Section 6). 2. Scientiﬁc background The quantity of information available with the progress of information and communication technologies increased exponentially in the last decades. The problem of managing semantic heterogeneity is a consequence of it. To mitigate this problem, several solutions rely on semantic techniques, particularly ontology matching [10,11]. Before discussing the ontology matching approach, we ﬁrst want to formally deﬁne what is an ontology in the context of the Semantic Web. Deﬁnition 1. (Ontology). An ontology O is a tuple O = (S, A), S being its signature (conceptual entities used for knowledge representation) and A its associated set of axioms (expressed in a speciﬁc ontology language or knowledge representation formalism1) [14].

1

Often ﬁrst-order logic.

Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

3

Deﬁnition 2. (Signature of an ontology). The signature of an ontology is deﬁned as the union S = C ∪ I ∪ P of the ensemble of classes C, instances I and properties P [14]. Deﬁnition 3. (Conceptual and concrete entities). When distinguishing abstract objects of the domain and concrete data values, the ensembles C, I and P become respectively C = C ∪ D (concepts C and data types D), I = I ∪ V (individuals I and data values V ) and P = R ∪ T (relations R and types T ) [14]. Deﬁnition 4. (TBox and ABox). In a Description Logic (DL) knowledge base (or ontology), the conceptual statements form the Terminological Box (set of TBox axioms), whereas the instance level statements form the Assertional Box (set of ABox assertions) [15]. Table 1 lists several particular types of axioms that are common to most ontology formalisms. For a speciﬁc mapping of these axioms to existing ontology language, one may consult [16]. An ontology is deﬁned by means of ontology description languages and the resulting speciﬁcation contains the schema knowledge regarding the ensemble of classes, C, and properties, P. This knowledge is expressed using the axioms for subsumption, domain, range or disjointness, as listed in Table 1. A KB contains assertions about the instances in I , assertions that use instantiation and assertion axioms (see Table 1). For the rest of this paper, we will use the term “ontology” without making the distinction with “knowledge base (KB)”. When adding logical rules to DL-based ontologies, we obtain rule systems, which are seen today as a key knowledge representation and reasoning technology. Indeed, logical rules allow reasoning over machine-understandable descriptions of a domain of knowledge (the KB itself). Deﬁnition 5. (Inference). The mechanism that allows deriving new assertions based on existing axioms according to rules is called inference [17]. Deﬁnition 6. (Rule or rule axiom). A rule is composed of a rule head (also called consequent) and a body (called antecedent). If the body of a rule is true, then its head is derived as a new assertion [12,18]. Rule-based systems rely on rules for expressing knowledge through logic programming paradigm if–then, which is considered a special form of axiomatization [14]. Several languages exist for integrating such rules into ontologies. Among them we may cite SWRL (Semantic Web Rule Language) and its restrictions (for adding SWRL-style rules on top of OWL (Web Ontology Language) [19]) DLSafe Rules [20]. SWRL extends the set of existing OWL axioms in order to include Horn clauses [12]. Deﬁnition 7. (Horn clause). A Horn clause is an implication from an antecedent, usually a set of atomic formulae, to a consequent, a single atomic formula [12,18]. Main ontology description languages, OWL [19] and OWL 2 [21], are directly based on knowledge description languages called Description Logics (DL) [22] therefore they both include a logical framework comprising syntax and model-theoretic semantics. Another “heritage” from DL is the concern for practical reasoning and having effective reasoners (mainly extended from DL reasoners such as Pellet [23] and HermiT [24]). Effective reasoning means reaching conclusions: the axioms present in the considered KB (and forming the explicit knowledge) are processed computationally in order to deduce implicit knowledge. Such reasoning process is mainly performed based on formal logic, because it allows specifying (through formal semantics) the consequences of a set of axioms.

Table 1 Axiom types common to ontology formalisms [14]. Axiom type

Deﬁnition

Instantiation Assigns an instance to a class. Assertion Assigns two instances by means of a property. Subsumption For two classes, it states that any instance of the subsumed class is also an instance of the subsuming class. For two properties, it states that any two instance connected by the subsumed property are also linked by the subsuming one. Domain States the domain class for a property. Range States the range class for a property. Disjointness For two classes, it states that they have no element in common.

Notation

First-order logic expression

[(C, i)], i ∈ I, c ∈ C α∧(i, C) α→(i1, p, i2) [p(i1, i2)], i1, i2 ∈ I, p ∈ P [∀x: E1(x) → E2(x)], E1, E2 ∈ C ∪ P α△(E1, E2)

αD→(p, D) α→R(p, R) α⊕(C1, C2)

[∀x, y: p(x, y) → D(x)], p ∈ P, D ∈ C [∀x, y: p(x, y) → R(x), p ∈ P, R ∈ C [∀x: C1(x) ∧ C2(x)⊥], C1, C2 ∈ C

Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

4

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

Deﬁnition 8. (Consequence of an ontology). The consequences of an ontology are represented by the set of assertions implicitly caused by an ontology. These assertions directly impact answering to queries addressed to the ontology [25]. With the development of Semantic Web technologies and the growing number of their applications, the heterogeneity and the number of existing ontologies are exponentially raising. This brings the problem of determining whether two ontologies have the same meaning [10,11]. Answering this question comes to perform ontology matching. Deﬁnition 9. (Ontology matching). Matching is the process of identifying correspondences between entities of different ontologies [10]. Deﬁnition 10. (Ontology alignment). An alignment is a set of correspondences between one or more ontologies. The alignment is the output of the process of ontology matching [10]. When performing ontology matching, the consequences of each ontology involved in the process have to be taken into account. This raises the complexity of the ontology matching process, which is also negatively impacted by the fact that ontology languages such as OWL [19] and OWL 2 [21] fail in giving sufﬁciently-expressive representations of the knowledge necessary for performing correct ontology matching. As an example, in OWL-DL [21], we cannot deﬁne the property uncleOf using the properties brotherOf and parentOf because OWL-DL (on its own) doesn't support axiomatic rules [26]. As stated previously, SWRL adds the possibility to declare arbitrary Horn clauses to OWL-DL ontologies. Still, this can lead to undecidability in crucial reasoning tasks such as subsumption in OWL [27]. In order to regain decidability in such contexts, DL-safety [18,20] restrictions are implemented. Expressing ontology alignments through Horn rules (rule axioms) [10] allows providing a solution for resolving the heterogeneity problem. Indeed, several researches have identiﬁed the combination of axiomatic rules and Description Logics as a solution for achieving interoperability (see Deﬁnition 11) among ontologies [16]. When addressing a query to an ontology alignment, the contained Horn clauses have to be interpreted. This results in materializing the derived facts that come from these rules. In the case of forward-chaining, all facts entailed by the different Horn clauses are precomputed, therefore signiﬁcantly increasing the size of administered data. Deﬁnition 11. (Interoperability). In this work, we deﬁne interoperability as the capability to share data between different ontologies. For the interpretation of Horn rules at query execution, traditional logic programming languages, such as Prolog, use a method called SLD (Selective Linear Deﬁnite) resolution [28]. SLD stands for “SL resolution with Deﬁnite clauses” and consists in interpreting the top clause (the query) as a negation of a conjunction of sub-goals (sub-queries). For doing so, it relies on a backward reasoning technique [29,30]. When applied to OWL-based reasoners that have to manage SWRL rules (Horn clauses) four main approaches exist: • SWRL rules can be translated into FOL (First Order Logic) — in this case reasoning tasks are demonstrated by means of theorem provers (this is the case for the Hoolet reasoner [31]); • OWL-DL can be translated into rules which are then passed to a forward-chaining or backward-chaining engine (this is used by the Bossam [32], KAON2 [13,33], DL2DB [34], SWRL-IQ [35] and SWRL2COOL [36] & O-Device [37] reasoners); • SWRL rules are translated into DL axioms based on a rolling-up technique [38]. These rules are then processed by an OWL-DL reasoner based on the Tableaux calculus (this is the case for the Pellet [23] and RacerPro [39] reasoners) or Hypertableau calculus [40] (e.g. the HermiT reasoner [24]); • Reasoning is performed by applying a query rewriting approach. In this approach, SWRL rules are taken into account during query rewriting process (this is the case for the reasoner implemented in the Stardog triple store [41], a semantic graph database). Some reasoners use a hybrid approach where both backward- and forward-chaining techniques are implemented to improve query execution time. This is the case for DL2DB [34] and Jena [42], still the latter doesn't support the SWRL syntax. Table 2 summarizes the main reasoners which, to the best of our knowledge, support both OWL and SWRL languages. Deriving implicit facts from SWRL rules at query execution can indeed signiﬁcantly extend query processing time. Moreover, all reasoners which perform inferences at query execution time are OWL/SWRL general purpose reasoners (see Table 2). They don't distinguish rules used for ontology alignment from rules used for internal ontology inference (i.e. rules written without the intention of aligning ontologies). For optimizing the execution time of queries addressed over aligned ontologies, we can exploit the fact that the ensemble of SWRL rules is designed for the ontology alignment. Therefore, we can rewrite these rules (according to the procedure described in subsection 4.2.1). A general purpose reasoner will then have to process only rules relevant for answering a given query. By reducing the number of rules considered for query answering, we consequently improve query execution time, as proved through the benchmarks presented in Section 5. Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

5

Table 2 Interoperability rules deﬁned for each repository/knowledge base (KB). Reasoner

Reasoning at query execution time

KAON2 Yes [13,33] RacerPro 2.0 No [39]

ABox materialization

DL-expressivity

Methodology

Large-scale triple store

Query answering support

No

SHIQ (D)

No

Yes

Yes

SHIQ

It reduces KB to a disjunctive datalog program. Based on Tableaux calculus.

Yes, when associated to the AllegroGraph triple store. No

Yes

Yes

No

Yes

Hoolet [31]

Yes

No

SHOIN (D)

Pellet 2.0 [23] Hermit 1.3.8 [24]

No

Yes and supports incremental reasoning. Yes. Partial materialization during each consistency test. Yes

SROIQ (D)

No

SWRL2COOL No [36] & O-Device [37]

An inference engine based on the ﬁrst-ordertheorem prover Vampire. Based on Tableaux calculus.

SROIQ (D)

Based on Hypertableau calculus.

No

Yes

SHIF (D) and it supports some nominal and cardinality constructors (ON).

Conversion of SWRL rules to COOL (CLIPS Object-Oriented Language) for performing reasoning based on CLIPS. The CLIPS is as a RETE-based production rule engine. An inference engine based on XSB Prolog. Therefore, it reduces KB to a datalog program. Reduces KB to a datalog program. It is also based on rewriting techniques for answering queries. An inference engine based on the forward-chaining RETE algorithm. Based (mostly) on a query rewriting technique.

No

No

No

Yes

No

Yes

No

Yes

Yes

Yes

SWRL-IQ [35]

Yes

No

SHI

DL2DB [34]

Yes

No

SHI

Bossam [32]

No

Yes

SHI

Stardog [41]

Yes

No

SROIQ (D)

In this article, we propose an optimization of the number of ontology alignment rules that are considered at query time. This is developed in the form of a rule selector that performs the following two main actions: • Action 1: It selects the subset of rules that must necessarily be considered when answering the query. In other words, it attempts to construct the necessary and sufﬁcient set of rules needed for query processing. • Action 2: It then veriﬁes if there are eventual existing cycles among the pre-selected rules. For illustrating the problem answered by Action 1 (described in the previous paragraph), we can take the following example: suppose the program P1 with the following rules: r1 : c1ðX Þ→ c2ðX Þ r2 : c2ðX Þ → c1ðX Þ

ð1:1Þ

When answering the query “retrieve all instances of type c2”, then only rule r1 has been considered. For illustrating the problem answered by Action 2 (described previously in this section), additional deﬁnitions are needed. Rule can form rule cycles that can be further classiﬁed as simple or composed. Deﬁnition 12. (Simple cycle). A pair of rules forms a simple cycle, if a predicate is present both in the head of one rule and in the body of the second one, and viceversa. Deﬁnition 13. (Complex cycle). A set of rules having more than two elements forms a composed cycle, if, for a rule rn, there exists a predicate px in its head and a predicate py in its body, that can be reduced to a set of predicates including px. Given those two new deﬁnitions, the rules from the logical program P1 (see rule set 1.1) form a simple cycle. We can interpret rules r1 and r2 as a bi-conditional (c1(X) ↔ c2(X)). When aiming at achieving interoperability, it is crucial to be able to correctly interpret such rules, notably because this case often appears when working with ontology alignments. Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

6

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

Let us consider another program P2 (see rule set 1.2). In this case, the rules r3 and r4 form another simple cycle. Besides, we identify a compound cycle. Indeed, the property q1 in r5 is reduced to c1 ∧ p2 by applying r3, property p2 is reduced to c1 ∧ q1 ∧ p3 by applying r4. We therefore obtain the following rule cycle: c1 ∧ q1 ∧ p3 → p3. r3 : c1ðX Þ ∧ p2ðX; Y Þ → q1ðX; Y Þ r4 : c1ðX Þ ∧ q1ðX; Y Þ ∧ p3ðY; Z Þ → p2ðX; Z Þ r5 : q1ðX; Y Þ → p3ðX; Y Þ

ð1:2Þ

The RS described in this article positions itself as a middleware between the system initiating the query and the triple store containing the KB. Given the reasoners described in Table 2, we have chosen to apply our approach over a Stardog triple store, which uses a backward-chaining reasoner (based on query rewriting techniques) for performing inferences. Our choice is mainly motivated by the three following reasons: Stardog supports SWRL rules, it allows performing reasoning at query execution (backward-chaining) and it is a large-scale triple store. Indeed, our RS solely aims at hybrid or backward-chaining reasoning approaches (as a reminder, in forward-chaining reasoning all entailed facts are materialized). Furthermore, we need a reasoner associated to a large-scale triple store for handling large KBs when considering (numerous) aligned ontologies. Next section discusses existing related works, notably approaches that address data interoperability over ontologies and SPARQL query optimization. We also consider the case of one application using OWL and SWRL and having performance issues. 3. Related work 3.1. SPARQL query rewriting method for interoperability Authors in [43] present a SPARQL query rewriting approach for achieving interoperability over different ontologies stored in federated RDF (Resource Description Framework) databases. An RDF database is a graph database especially conceived for storing and retrieving RDF data. It is also called a triple store [44]. In the context of such databases, queries are formulated using the SPARQL query language [44] and are addressed to one or several SPARQL endpoints. Such queries are rewritten using the alignments deﬁned among the considered ontologies. Those alignments are expressed using a speciﬁc format, deﬁned by the authors. Ref. [43] fails in justifying why such speciﬁc alignment format is needed, especially as Ref. [10] exhaustively deﬁnes existing and possible alignment formats. Makris et al. further specify their approach in [45] by providing a set of functions for graph pattern rewriting, based on a set of alignments. Correndo et al. [46] propose a similar approach for query rewriting in order to retrieve data from different SPARQL endpoints. Nevertheless, the algorithm proposed only considers information described as a graph pattern for example, it does not consider constraints expressed within the SPARQL reserved word FILTER. Compared to this method, the one described in [43] has the advantage of relying on Description Logic, therefore supporting different query types (SELECT, CONSTRUCT, etc.), along with different SPARQL solution modiﬁers (LIMIT, ORDER BY, etc.). In this case, graph pattern operators remain unchanged during the rewriting process. Both methods avoid discussing cases involving several source and target ontologies. Correndo et al. [46] justify implementing SPARQL query rewriting for ontology interoperability by pointing at the fact that deﬁning ontology alignments on top of the logical layer implies reasoning over a huge amount of data which compromises the query execution time. 3.2. Query optimization Since 2006, several works have addressed SPARQL query optimization issues. This is mainly due to the complexity associated with the evaluation of SPARQL queries, which is PSPACE-complete [47,48]. The proposed optimization strategies are basically divided into logic and physical optimizations [49]. On the one hand, logical optimization is based on equivalency rules (e.g. join order, P1 AND P2 ≡ P2 AND P1). Additional equivalency rules have been proposed in [47,50,51] and [52]. These equivalency rules allow executing a query differently while retrieving the same results with a better performance. The difﬁcult task when considering logical query optimizers consists in deciding whether an equivalency rule can improve the evaluated time for the query execution. For doing so, most approaches rely on heuristics and costbased optimization methods. The principle is to choose the query execution plan (among several possible equivalent ones) that has the best estimated costs. On the other hand, physical optimization concerns existing algorithms for computing logical operators (e.g. AND, OPT, SORT, etc). So, an optimization strategy based on physical optimization comes to choosing which algorithms have the best estimated execution times in the operator's context [49]. Some of these algorithms may have different requirements regarding the input data; for example, whether it must be sorted. Some of them achieve faster results, notably for reasons such as ﬁtting the input data into the main memory. In [53], Kollia et al. present an algorithm for SPARQL query evaluation under OWL 2 Direct Semantics [21]. Their proposed optimizations address improving query response time for SPARQL-OWL queries (superset of SPARQL-DL). The optimizations implemented are graph pattern reordering and rewriting. Another interesting idea is to take into account the class-property hierarchy in order to reduce the number of veriﬁcations needed. In [54], Beimel et al. propose a framework for access control in the health-care domain using OWL and SWRL rules. The DLreasoner for OWL and the SWRL engine are the two components responsible for performing an incoming data-access request. Such a request is represented as an individual that belongs to a data-access rule class. In this work, Beimel et al. point out a few limitations Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

7

of their framework. These limitations are mainly due to the high complexity of the DL-reasoner and the SWRL engine, when applied for general purposes. This is why, as another optimization, the authors suggest developing their own special-purpose reasoner. Our approach detailed in the next section doesn't replace or improve the logical or physical query optimizations described in this subsection. However, our approach can be associated to them for globally improving the execution time of queries addressed over ontologies aligned using Horn-like rules. 3.3. Comparison of our approach to existing ones Approaches [43] and [46] (described in subsection 3.1) succeed in optimizing query execution time, but they lack extended inference capabilities, as provided by reasoners and rule engines. Because of this, we encourage the use of SWRL rules as it allows interoperating ontologies while inferring new alignment rules. As an example, let us consider three inter-operable ontologies Onto1, Onto2 and Onto3 (and their respective namespaces), their alignments being expressed through SWRL rules as shown by 2.1. Initial SWRL rules

SPARQL query Inferred factðsÞ

swrl1 : onto1:Að?xÞ → onto2:Bð?xÞ swrl2 : onto2:Bð?xÞ → onto3:C ð?xÞ Q : SELECT ?x WHERE f ?x rdf : type onto3:C: g onto1:Að?xÞ → onto3:C ð?xÞ

ð2:1Þ

ð2:2Þ ð2:3Þ

Considering a rule engine for interpreting those rules and the SPARQL query Q (2.2), the rule engine infers the transitive relation: onto1:A(?x) → onto3:C(?x) (see 2.3). Therefore, the query Q retrieves all instances that are of type onto1:A, onto2:B and onto3:C. When applying a query rewriting approach (as described in [43,46]) and considering only the alignments in 2.1, query Q is rewritten to “SELECT ?x WHERE {?x rdf:type onto3:B.}”. The alignment rule swrl1 is ignored and the instances of type onto1:A(?x) are not retrieved. This example clearly demonstrates the fact that results obtained with query rewriting approaches (such as [43] or [46]) fail in retrieving all pertaining results. This is due to the fact that these approaches are not capable of inferring new alignment rules. The methodology we describe in this article can be seen as a trade-off between optimizing query execution time and performing inference tasks over inter-operable ontologies. In our approach, we allow the selection of the alignments pertaining for answering a given query. This way, we are able to only infer data necessary to answer the query, and consequently, improve the query execution time. Another advantage of our approach over the ones relying on query rewriting is the fact that it allows addressing queries over an ensemble of aligned ontologies. In other words, our approach does not need to explicitly know which are the source or the target ontologies (see Deﬁnition 14), as was the case for approaches described in [43] and [46]. For example, an approach based on query rewriting does not consider queries formulated using terms from different ontologies, whereas in our approach, we do. The following sections describe how our approach allows addressing queries over federated ontologies. Moreover, in Section 5, we use benchmarks for comparing our solution with query rewriting approaches, [43] and [46]. Deﬁnition 14. The target ontology is the ontology that we want to interoperate with. The source ontology is the ontology that contains the data (ontology's ABox) to be made interoperable.

Fig. 1. Example of ontology interoperability achieved through an ontology alignment express by means of SWRL rules.

Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

8

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

Despite the extensive studies, to the best of our knowledge, there is no work addressing SPARQL query optimization in the context of semantic interoperability between OWL ontologies aligned based on Horn-like rules such as SWRL rules. 4. Our approach 4.1. Overall presentation As mentioned previously, ontology interoperability can be achieved through ontology alignments, expressed through various existing alignments formats. One of the advantages of ontology alignment formats is that they are expressed through languages that are easier to manipulate and more expressive than OWL-based languages. SWRL rules can be seen as an alignment format: they specify ontology alignments by means of logical positive Horn clauses. When sharing data from one ontology to another, the SWRL rules deﬁning the alignment of those ontologies are not rewrite rules, therefore avoiding the storage of redundant data [10]. Fig. 1 displays an example of two ontologies matched by means on an alignment expressed through SWRL rules composing a rule set, above both ontologies. Normally, the body of positive Horn rules, as an alignment format, contains predicates from ontologies different than the ontology to which belong the elements in the rule head. This is explained by the fact that data is shared from several source ontologies to one target ontology. Considering Fig. 1, let us suppose there is a system that queries the target ontology Onto1 and that is also able to access the data from the source ontology Onto2. In this case, only the rules which map the data from Onto2 to Onto1 must be considered (i.e. r2). If the query is addressed to Onto2, then Onto2 will be considered the target ontology and Onto1 the source ontology. An ontology is classiﬁed whether as a target or as a source depending on which ontology the query is addressed to. In some cases, an ontology can be a target and a source at the same time. This would be the case if the query would be addressed to Onto1 and Onto2 at the same time (for example the query “Q = {x | x ∈ C ∧ x ∈ A}” uses concepts from both Onto1 and Onto2). The ontology matching process between two or more ontologies can produce thousands of alignments. Describing these alignments as SWRL rules highly decreases the query performance when using hybrid or backward-chaining reasoners. However, the performance of such reasoners is correct as long as the rule set (over which they are applied) contains a limited number of rules and/or the rules are not complex.

Fig. 2. Proposed architecture for our Horn Rule Selector (RS).

Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

9

Our approach for improving query performance in terms of time and retrieval results consists in dividing the overall rule set into two subsets: an activated and a deactivated rule set. The activated rule set contains all rules which have to be considered for retrieving the data from the source ontologies (as deﬁned by the query). The deactivated rule set comprises the other rules from the initial rule set, rules that are ignored by the reasoner when answering the query. Fig. 2 shows the architecture of the Rule Selector (RS) proposed. The rules are selected from the raw rule set to compose the activated rule set, according to the query. The RS comprises two phases: • The preprocessing phase rewrites Horn-like rules, and creates two rule sets, one representing concept subsumptions (RSCS) the other one representing property subsumptions (RSPS). These two subsets are used during the preprocessing phase for merging the TBoxes of the ontologies only in terms of subsumption relations (i.e.: rdfs:subClassOf and rdfs:subPropertyOf) into a separated KB. • The query execution phase selects, based on the query graph patterns, rules from the preprocessed rule set (PRS) and executes the initial query by considering only rule selected in the preprocessing phase. In the following sections, we give a more detailed description of the above listed phases. 4.2. Implementation 4.2.1. Preprocessing phase In the context of data mining, data preprocessing is recommended because real-world data are generally incomplete, noisy and inconsistent. Similarly, in the context of OWL ontologies interoperability, the preprocessing of a set of positive Horn rules allows improving the execution and the results of inference tasks performed by the reasoners. The Rule Set Preprocessing (RSP) module is in charge of: • Rewriting the rules contained in the ontology alignment: ◦ Normalizing rules into canonical rules ◦ Deﬁning explicit range and domain for all rules • Identifying two rule subsets: the one for concept specialization and the one for property specialization. In this work, for achieving ontology interoperability, we use an ontology alignment deﬁned through a set of SWRL rules. In this case, the simplest structure of a SWRL rule has a body composed of elements from the source ontologies, and a head composed of elements from the target ontology. Based on this assumption, we deﬁne a canonical rule for interoperability as follows. Deﬁnition 15. (Canonical rule for interoperability). A canonical rule for interoperability is deﬁned as a SWRL rule whose body is composed of predicates from source ontologies and whose head is a unique predicate from the target ontology.

1

1

j

l

Canonical rule : P 1 ∧ P 2 ∧…∧P i →P k i where P j represents predicate P j as existing in the ontology j; and j ≠ l

ð4:1Þ

The SWRL rule in 4.1 is a generalization of a canonical rule. The complete process covered by the RSP module for rewriting the rules contained in the ontology alignment is summarized in Algorithm 1. The following paragraphs present how the RSP module transforms a given rule set into a set of preprocessed rules (PRS as shown in Fig. 2) according to the steps listed in Algorithm 1. 4.2.1.1. Rewriting of interoperability rules 4.2.1.1.1. Transformation into canonical rules. As illustrated in Fig. 2, the RSP module's ﬁrst task is to attempt transforming the initial raw rule set into canonical rules (Deﬁnition 15). This avoids the backward-chaining reasoners from performing useless tasks. We name this ﬁrst transformation procedure normalization of interoperability rules. Rule normalization means reducing predicates in the rule's body until there are no rules left in the predeﬁned raw rule set that allow reducing existing predicates to another predicate (or set of predicates) from any source ontology. Moreover, domain/range restrictions have to be checked when combining predicates for rule normalization. Therefore, ideally, a normalized rule has a body formed only by predicates from source ontologies, whereas its head contains predicate(s) from the target ontology. For a clearer view of this process, let us take the rule set listed in 4.2. For the sake of simplicity, we consider that the namespace from the URI (Uniform Resource Identiﬁer) identifying a predicate speciﬁes the ontology into which the predicate is deﬁned. In other words, onto1:p1(?x,?y) means that predicate p1 is deﬁned in the ontology Onto1. Again, the goal aimed at here is to transform the whole rule set into normalized rules. For a given rule, this means we want its body to contain only predicates from speciﬁc ontologies, but all these ontologies must be different than the one used for referencing Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

10

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

predicate in the rule's head. swrl1 : onto2:Að?xÞ → onto1:Bð?xÞ swrl2 : onto1:Bð?xÞ → onto2:Að?xÞ swrl3 : onto2:Að?xÞ ∧ onto1:p1ð?x; ?yÞ → onto2:p2ð?x; ?yÞ swrl4 : onto1:Bð?xÞ ∧ onto2:p2ð?x; ?yÞ → onto2:p3ð?x; ?yÞ swrl5 : onto1:C ð?xÞ ∧ onto2:p2ð?x; ?yÞ → onto2:p4ð?x; ?yÞ swrl6 : onto1:q1ð?x; ?yÞ → onto2:q2ð?x; ?yÞ

ð4:2Þ

We note that rules swrl3, swrl4 and swrl5 have predicates from the same ontology both in their head and body. The other rules (swrl1, swrl2 and swrl6) are already canonical rules. When applied, the RSP module transforms the rules from 4.2 into the rules listed in 4.3.

Algorithm 1. Rule Set Preprocessing (RSP) module

swrl1 : onto2:Að?xÞ → onto1:Bð?xÞ swrl2 : onto1:Bð?xÞ → onto2:Að?xÞ ’ swrl3 : onto1:Bð?xÞ ∧ onto1 : p1ð?x; ?yÞ → onto2:p2ð?x; ?yÞ ’ swrl4 : onto1:Bð?xÞ ∧ onto1 : p1ð?x; ?yÞ → onto2:p3ð?x; ?yÞ ’ swrl5 : onto1:C ð?xÞ ∧ onto1 : Bð?xÞ∧onto1 : p1ð?x; ?yÞ → onto2 : p4ð?x; ?yÞ swrl6 : onto1:q1ð?x; ?yÞ → onto2:q2ð?x; ?yÞ

ð4:3Þ

Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

11

The RSP module performs the following actions: • Rules swrl1 and swrl2 are left unchanged, as they are already in a canonical form • In rule swrl3, the RSP reduces predicate onto2:A(?x) by swrl1, meaning onto1:B(?x). Rule swrl3 thus becomes canonical. • In rule swrl 4 , the RSP reduces predicate onto2:p2(?x, ?y) by swrl3 into onto2:A(?x) ∧ onto1:p1(?x, ?y). The resulting rule (onto1:B(?x) ∧ onto2:A(?x) ∧ onto1:p1(?x, ?y)) isn't yet canonical, then the RSP performs another iteration, reducing onto2:A(?x) through swrl1 into onto1:B(?x). This results in the canonical rule swrl4′ listed in 4.3 • Considering rule swrl5, the RSP veriﬁes that predicate onto1:C(?x) is irreducible (which is true), then moves on to predicate onto2:p2(?x, ?y) which can be reduced by applying swrl3, as detailed above. This results in the canonical rule swrl5′ listed in 4.3 • Finally, swrl6 is already canonical and isn't further reduced. One remark considering the above described process: if rule swrl2 had not existed (onto1:B(?x) → onto2:A(?x)), then the reductions of onto2:p2(?x, ?y) would not have been applied. Indeed, we consider a predicate reduction only if it results in a canonical representation of this predicate. Again, a canonical representation of a predicate is composed of one or more predicates. They reference ontologies which are different than the one present into the head of the rule being normalized. 4.2.1.1.2. Speciﬁcation of rule domain and range. After having normalized the initial rules, the RSP checks, for all the rules, those that contain a property in their head. The so identiﬁed rules are rewritten, by specifying the domain and range for the property considered. This information is retrieved by means of a SPARQL Query, using RDFS reasoning. In our approach, for better reasoning optimization, we only consider ranges for object property; object property domain can be deduced by performing subsumption reasoning tasks. Considering the rules listed in 4.3, rules containing a property in their head are: swrl3′, swrl4′, swrl5′ and swrl6. The ﬁrst three rules have a domain deﬁned: onto1:B for swrl3′ and swrl4′ and onto1:C ∧ onto1:B for swrl5′. Therefore, only rule swrl6 has no domain deﬁned. The RSP checks the TBox of Onto1 for domain deﬁnition for property q1. For doing so, the following SPARQL query is addressed to Onto1: “SELECT ?x WHERE {onto1:q1 rdfs:domain ?x}”. Suppose the previous query returns onto1:D as the name of the domain concept for q1. Thus, swrl6 is replaced by the rule listed in 4.4. The RSP further checks if onto1:D cannot be reduced by applying additional rules. As it is not the case, swlr6 is no longer modiﬁed. Moreover, in this example all properties considered in our rules' head are datatype properties. Their range is thus not speciﬁed. ’

swrl6 : onto1:Dð?xÞ ∧ onto1:q1ð?x; ?yÞ → onto2:q2ð?x; ?yÞ

ð4:4Þ

If we add another rule, for example swrl7: onto3:E(?x) → onto1:D(?x) in the rule set listed in 4.3, then the RSP would further reduce swrl6′ by replacing onto1:D(?x) with onto3:E(?x). This would result in a new rule swrl6″ as illustrated in 4.5. swrl1 : onto2:Að?xÞ → onto1:Bð?xÞ swrl2 : onto1:Bð?xÞ → onto2:Að?xÞ ’ swrl3 : onto1:Bð?xÞ ∧ onto1:p1ð?x; ?yÞ → onto2:p2ð?x; ?yÞ ’ swrl4 : onto1:Bð?xÞ ∧ onto1:p1ð?x; ?yÞ → onto2:p3ð?x; ?yÞ ’ swrl5 : onto1:C ð?xÞ ∧ onto1:Bð?xÞ∧onto1:p1ð?x; ?yÞ → onto2:p4ð?x; ?yÞ swrl7 : onto3:Eð?xÞ → onto1:Dð?xÞ ’ swrl6 : onto1:Dð?xÞ ∧ onto1:q1ð?x; ?yÞ → onto2:q2ð?x; ?yÞ ’’ swrl6 : onto3:Eð?xÞ ∧ onto1:q1ð?x; ?yÞ → onto2:q2ð?x; ?yÞ

ð4:5Þ

The listing above (4.5) represents the ﬁnal generated preprocessed rule set (PRS) and is used as an input for the Query Execution module (see Section 4.2.2). 4.2.1.2. Creation of rule subsets. The second task of the RSP module is to output the two rule subsets RSCS (Rule Set for Concept Specialization) and RSPS (Rule Set for Property Specialization): • RSCS contains only the rules generating subsumption relationships between concepts (OWL classes). For example, rule swrl1 in 4.2 generates the following subsumption relationship “onto2:A rdfs:subClassOf onto1:B”. Rule swrl1 would thus be included in the RSCS. • RSPS contains only the rules generating subsumption relationships among properties (OWL object properties). For example, swrl6 in 4.2 generates the following subsumption relationship between onto1:q1 and onto2:q2 “onto1:q1 rdfs:subPropertyOf onto2:q2”. Rule swrl6 would thus be included in the RCPS. As illustrated in Fig. 2, RSCS and RSPS are inputs for the “Merging TBoxes (MT)” module. Indeed, the MT module is responsible for merging ontologies considered for interoperability based on rules present in RSCS and RSPS. These rules are interpreted as subsumption relationships among concepts and properties of the considered ontologies' Tboxes (see Deﬁnition 4). The instantiation of these subsumption relationships (rdfs:subClassOf, rdfs:subPropertyOf) allows merging the considered TBoxes in the form of a taxonomy. For the sake of optimization, the resulting TBox is stored in a new repository (i.e. Merged TBox Knowledge Base, MTKB), different from the one containing the initial ontologies (including their ABox that is unchanged during the pre-processing phase). This allows improving the query performance for the SR module, responsible for selecting pre-processed rules that are necessary for the execution of a given query. The SR module uses this separate repository (i.e. MTKB) for retrieving subsumption information to improve the rule selection process. Another advantage for having a separate repositories lies in the fact that ontologies (speciﬁed in OWL) and their ABoxes are kept isolated from the resulting alignment (speciﬁed through listings of SWRL rules). In the case the initial rule set is Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

12

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

modiﬁed (e.g. insertion of new rules, modiﬁcation of existing rules), this strategy allows deleting the previous MTKB and creating a new one (corresponding to the new execution of the pre-processing phase). 4.2.2. Query execution phase Once preprocessed the raw rule set, we can perform the selection of the speciﬁc rules that are necessary to answer the considered query. This is the main task of the SR module in Fig. 2. Once these rules selected, the SR module passes them to the “Remove Rule Cycle” module. This module has to check whether the selected rules contain simple or complex cycles and then remove those cycles. Finally, as an output, the Activated Rule Set (ARS) is generated. In other words, our approach for optimizing queries executed over a KB containing ontologies aligned by means of a set R of positive Horn rules (SWRL rules) comprises two main sub-tasks: 1. The ﬁrst task consists in selecting only the subset of R necessary to answer the considered query. This allows improving query execution time as the reasoner has fewer rules to consider, and consequently, less (unnecessary) data to infer. 2. The second task is to remove eventual existing rule cycles from the previously constituted rule set. The result of this task is the ARS. Using only the ARS for answering the considered query allows avoiding inﬁnite loops during query execution (loops that would've been caused by SWRL rules containing cycles). In the following sections, we give a detailed overview of the related processes for the above mentioned sub-tasks. The complete process covered by the Query Execution module is summarized in Algorithm 2. Algorithm 2. Query Execution (QE) phase

4.2.2.1. First sub-task — selection of necessary rules. For addressing this ﬁrst sub-task, we have developed a SPARQL Query Parser (QP). As shown in Fig. 2, the initial query is passed to the QP module which analyzes it and isolates the concepts and properties it contains. Moreover, if the query contains information about domain/range restrictions for object properties, this information is used by the SR module to rank the selected rules. The following paragraphs illustrate the functioning of the QP module on a precise example, notably query Q' (see 4.6). 0

Q : SELECT ?x;?y WHERE f?x rdf :type onto2:A: ?x onto2:p2 ?y:g

ð4:6Þ

When applied to Q', the QP module outputs the list U of all URIs of the resources mentioned in Q': U = {onto2:A, onto2:p2}. The predicate rdf:type isn't in the list U as we use class expression as a predicate for Horn rules' deﬁnition. Indeed, the SWRL syntax for the triple “?x rdf:type onto2:A” is “onto2:A(?x)”. Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

13

The QP module also outputs domain/range restrictions extracted from the query. So when considering Q', the QP module determines that onto2:A is a domain restriction for property onto2:p2. The output of the QP module is passed to the SR module. Therefore, the SR module has three input sources (see Fig. 2): • The output of the QP module (e.g. list U); • The preselected rule set (PRS) as listed in 4.5; • The MTKB containing the merging of TBoxes of the considered ontologies. The ﬁrst action performed by the SR module is to ﬁlter rules in the PRS and to select only those rules that can reduce the elements in U. Indeed, unselected rules from PRS will only infer data unnecessary for answering Q'. The second action performed by the SR module aims at further reducing the number of selected rules. For doing so, it ﬁrst identiﬁes the rules that have the same property in their head, and then uses Eq (4.7) for calculating a rule's score. After calculating the score, rules are ranked in a descending order based on their score (a high score means a high relevance to answer the given query). score ¼

w1 C þ w2 N w1 þ w2

ð4:7Þ

◦ C can value 0 or 1; C equals 1 if and only if the considered rule's body contains a domain/range restriction for the predicate present in the rule's head, and if this restriction matches the super- or sub-class of the domain/range restriction deﬁned in the query. ◦ N is the normalized number of elements present both in the query and in the rule's body. N is calculated as the ratio between the number of elements from the rule's body present in the considered query (Nq) and the number of predicates the rule's body contains (Nr). N ¼ Nq = Nr

ð4:8Þ

◦ w1 and w2 are weights used to specify the importance of, respectively, values C and N in rule score calculus. We deﬁne w1 N w2 as we choose to give more credit to the domain/range restriction information. Indeed, this type of information is more precise (therefore more valuable) for rule selection than the frequency of query terms in a given rule's body. Additionally, we deﬁne the following thresholds: • Tscore is a threshold that represents the minimum value for a score calculated for a rule in the ARS. If Tscore b 0, no ranking is performed and all previously SR-selected rules are considered; • Trules which represents the threshold for the number of rules having the same head, allowed in the ARS. Having Trules ≥ 1 ensures considering at least one rule. When applying the previous theoretical aspects to our considered example, we obtain the following assumptions: • The QP module has built the list U with the following elements: U = {onto2:A, onto2:p2}. Therefore, only PRS rules (see 4.5) that have elements from U in their head will be selected. This leads us to the following rules: swrl2 and swrl3′. • The QP module has determined that onto2:A is a domain restriction for property onto2:p2. Therefore, when computing rule score, C parameter equals 1 for: ◦ Rules having onto2:p2 in their head and onto2:A in their body as domain restriction for onto2:p2. ◦ Rules having in their body a sub- or a super-class of onto2:A as domain restriction for onto2:p2 (rules' head) . This is the case for rule swrl3′. At this point, we only have 2 rules for which a score should be calculated. So the output of the SR module is given in 4.9. swrl2 : onto1:Bð?xÞ → onto2:Að?xÞ ’ swrl3 : onto1:Bð?xÞ ∧ onto1:p1ð?x; ?yÞ → onto2:p2ð?x; ?yÞ

ð4:9Þ

Still, we have no rules having the same head. Therefore, in this example there is no need for performing score calculus. But, as we want to illustrate this process, let us suppose (for the sake of example) that our list of rules (4.5) contains an additional one, called swrl8, and also previously selected (as illustrated in the listing 4.10). swrl2 : onto1:Bð?xÞ → onto2:Að?xÞ ’ swrl3 : onto1:Bð?xÞ ∧ onto1:p1ð?x; ?yÞ → onto2:p2ð?x; ?yÞ swrl8 : onto1:C ð?xÞ ∧ onto1:p1ð?x; ?yÞ → onto2:p2ð?x; ?yÞ

ð4:10Þ

To illustrate the rank calculus, let us suppose that w1 = 2*w2, w2 = 1, Tscore = 0.25 and Trules = 9 (the quantity of rules in our rule set example). Only the rules with the same predicate in the head are considered during the rank process (i.e.: in 4.10, rules with onto2:p2 in their head, namely swrl3′ and swrl8). The SR module then computes the scores for swrl3′ and swrl8. For query Q', the QP module outputs the domain restriction onto2:A for onto2:p2. Next, the SR module retrieves the super- and sub-classes of onto2:A, based on the subsumption information in MTKB. For Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

14

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

a clearer view, Fig. 3 illustrates those super- and sub-classes of onto2:A. The double-arrowed link between A and B means that they are equivalent concepts. Equivalent concepts are included in the retrieved taxonomy: Taxonomy(onto2:A) = {onto2:I, onto2:A, onto1:B, onto1:F, onto1:G, onto1:H, onto1:T}. Therefore, C is equal to 1 for rules containing elements from the ensemble Taxonomy(onto2:A) as a domain restriction for onto:p2 in their body (while onto:p2 is the rule's head). When applying this to our two rules, we obtain the following: • For swrl3′, onto2:p2 has the domain restriction onto1:B, then C = 1. • For swrl8, C = 0 because onto1:C doesn't belong to the Taxonomy(onto2:A) ensemble. In other words, swrl8 does not infer new values of onto2:p2 for instances of onto2:A. Besides the calculus of C, N is computed for these two rules using Eq. (4.8). For both of them, N = 0. Indeed, N is deﬁned as the ratio between parameters Nq and Nr (see 4.8). Or, Nq being the number of elements from the rule's body present in the considered query, Nq = 0 for those two rules. Therefore, the score is calculated as follow for these rules: ’ Score swrl3 ¼ w1C þ w2 N = ðw1 þ w2Þ ¼ 21 þ 10 = ð2 þ 1Þ ≅ 0:66 Scoreðswrl8 Þ ¼ w1C þ w2 N = ðw1 þ w2Þ ¼ 20 þ 10 = ð2 þ 1Þ ¼ 0

ð4:11Þ

After calculating the score for the rules with the same head, the SR module selects these rules by applying the previously deﬁned thresholds: Tscore and Trules. SR module considers only the rules with a score ≥ Tscore and, among those rules, only the ﬁrst Trules rules. The threshold Trules is mainly used in the cases in which, for a given query, it is more important to retrieve a quick result than having a complete result (all relevant facts retrieved). This can be a strategy used to reduce the number of inferred data, and consequently, the query execution time. However, if we want to have the possibility to retrieve all results inferred by applying the rules with the same head, then Trules must be set equal to the cardinality of the PRS ensemble. In our previous example (i.e. Q' and rule set listed in 4.10), the score value is calculated for the rules having onto2:p2 in their head. So, only rules swrl3′ and swrl2 are selected from the listing 4.10. This is because the computed score for rule swrl3′ is higher than the threshold Tscore (indeed 0.66 N 0.25) and because swrl2 was previously selected. swrl2 does not participate in the ranking process. Furthermore, swrl8 is eliminated during the ranking process, because its score is equal to 0 (i.e. score(swrl8) b Tscore). After applying the Tscore elimination criteria for rules with the same head, the SR module applies the second criteria, namely the Trules threshold. As there are less than 9 (i.e. Trules = 9) rules having onto2:p2 in their head, the SR module still outputs are rules swrl2 and swrl3′.

4.2.2.2. Second sub-task — removing eventual rule cycles. For addressing this second sub-task of the Query Execution module, a speciﬁc procedure has been implemented for the Remove Rule Cycles (RRC) module. Still, when we consider our initial example, the input for the RRC module consists in rules listed in 4.9. Or, those rules do not contain a rule cycle (neither simple nor complex). Indeed, given the processes implemented before this step, most existing rule cycles have already been removed (notably because the SR module uses rules from the PRS). Therefore, in order to illustrate the functioning of the RRC module, let us take the following new example, as illustrated by Fig. 4. In this example, we consider 4 predicates (p1, p2, p3 and p4) and the related rules among them (R12, R21, R13, R41, R34 and R34′).

Fig. 3. Illustration of a portion of the Merged TBox Knowledge Base (MTKB).

Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

15

Fig. 4 corresponds to the following list of rules (see 4.11): R12 : p1ð?x; ?yÞ → p2ð?x; ?yÞ R21 : p2ð?x; ?yÞ → p1ð?x; ?yÞ R13 : p1ð?x; ?yÞ → p3ð?x; ?yÞ R34 : Að ?xÞ ∧ p3ð?x; ?yÞ → p4ð?x; ?yÞ R’34 : Bð?xÞ ∧ p3ð?x; ?yÞ → p4ð?x; ?yÞ R41 : p4ð ?x; ?yÞ → p1ð ?x; ?yÞ

ð4:12Þ

Based on those rules, the RRC module identiﬁes the following cycles: • A simple cycle: c1 ≡ p1 → p2 → p1 • A complex cycle: c2 ≡ p1 → p3 → p4 → p1 For removing the identiﬁed cycle, the RRC module performs the following actions: • For removing the simple cycle, it removes either R12 or R21. The data inferred through the removed rule is temporarily materialized. We choose to remove the ﬁrst rule returned by the algorithm. • For removing the complex cycle, the RRC module can remove the following rule subsets: {R13}, {R34, R34′} or {R41}. When choosing which rule subset to remove, the RRC will consider the subsets that have the lowest cardinality. We apply this choice because we assume that by doing so we will have less data to infer. This is based on the hypothesis that the less rules a cycle contains, the less inferred data would have to be materialized. In our example, this means either sets {R13} or {R41}. Once identiﬁed the list of ensembles having the lowest cardinality, we choose to eliminate its ﬁrst member. Again, the data from the removed rules is temporarily materialized. This is the reason why we choose for the RRC module the rule subset with the lowest cardinality. If we go back to our initial example, the ARS used for answering query Q′ contains the two rules listed in 4.9. These rules represent the necessary and sufﬁcient rules for answering the initial query Q′. The following section describes the methodology implemented and the data used for testing the performance of our approach. 5. Findings In order to measure the efﬁciency of the proposed approach, we consider two OWL ontologies (Onto1 and Onto2), for which we deﬁne an alignment through SWRL rules. Onto1 and Onto2 are two ontologies from the Architecture/Engineering/Construction (AEC) and Facility Management (FM) domain. They each represent a different data format as used for exchanging building related information. Table 3 lists the values for the number of classes, properties (both object and data properties are considered), triples in the TBoxes, and inverse properties along with the DL expressivity for each ontology considered. One can consider that each ontology represents a different data model, as used by different enterprises. We consider the following namespaces for our two OWL ontologies: onto1 and onto2, respectively. The alignment between those two ontologies comprises 474 SWRL rules, some of them containing more than 15 predicates in their body. 4.13 illustrates an example of such SWRL rule. Its body contains only terms from Onto2. This rule maps to Onto1 the category of

Fig. 4. Considered example of rule cycles.

Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

16

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx Table 3 Characteristics of Onto1 and Onto2. OWL entities

Onto1

Onto2

Classes Object properties Data properties Inverse properties Triples in the TBox DL expressivity

30 32 125 7 2212 ALCHIF(D)

802 1292 247 115 9978 ALUIF(D)

a construction product described using terms from Onto2. For the sake of simplicity, we note Cij the class Cj in ontology i, respectively pkl the pl property in ontology k, where i, j, k, l ∈ ℕ⁎. onto2:C 21 ð?x1Þ ∧ onto2:C 22 ð?x6Þ ∧ onto2:C 23 ð?x3Þ ∧ onto2:C 23 ð?x7Þ ∧ onto2:C 24 ð?x5Þ ∧ onto2:C 25 ð?x4Þ ∧ onto2:C 26 ð?x2Þ ∧ onto2:p21 ð?x4; ?x5Þ ∧ onto2:p22 ð?x5; ?x6Þ ∧ onto2:p23 ð?x2; ?x4Þ ∧ onto2:p24 ð?x2; ?x1Þ ∧ onto2:p25 ð?x2; ?x3Þ ∧ onto2:p28 ð?x7; ?x8Þ ∧ onto2:p26 ð?x5; ?x7Þ ∧ onto2:p27 ð?x6; ‘‘Category”Þ ∧ onto2:p28 ð?x3; ‘‘ProductResource”Þ → onto1:p11 ð?x1; ?x8Þ

ð4:13Þ

For our experiments, we have used a 2.2.1 Stardog triple store [41] which played the role of the server and was encapsulated in a virtual machine with the following conﬁguration: one microprocessor Intel Xeon CPU E5-2430 at 2.2GHz with 2 cores out of 6, 8GB of DDR3 RAM memory and the “Java Heap” size for the Java Virtual Machine set to 6GB. This triple store contains 4 repositories. Each repository stores the KB formed by Onto1 and Onto2 (meaning the TBox and the ABox of Onto1 and the TBox and the ABox of Onto2). Therefore, we choose to name those repositories KB1, KB2, KB3 and KB4. For the considered example, the ABox of one repository contains more than 832,000 triples. The number of triples for the TBoxes of Onto1 and respectively Onto2 are listed in Table 3. For experimentation purposes, we have chosen to implement a different number of interoperability rules (rules of the alignment between Onto1 and Onto2) for each repository. Table 4 lists the considered number of rules along with their characteristics. The client machine has the following conﬁguration: one microprocessor Intel Core CPU I5-3470 at 3.2GHz with 4 cores, 4GB of DDR3 RAM memory at 800 MHz and the “Java Heap” size set to 1GB. Our proposed RS is executed on the client machine. We aim at reproducing an industry scenario in which the client machine executes two applications based on different models represented by Onto1 and respectively Onto2. In our experiments, we did not take into account the rule preprocessing time. Indeed a rule set only has to be preprocessed when ﬁrst deﬁned or when it is modiﬁed. For the completeness of the study, the pre-processing mean time of the 474 SWRL rules lasted 0.87 s with a standard deviation of 0.04. This pre-processing phase has been executed 30 times by the client machine. The RS loads preprocessed rules in the main memory in order to quickly perform rule selection. Once this step completed, for each rule in the ARS, the RS populates the triple store with the triple corresponding to a SWRL rule instance. In other words, for each rule in the ARS, we insert the related instance of the swrl:Imp class [55]. Indeed, when stored in a triple store, SWRL rules becomes OWL individuals described by means of terms from the SWRL ontology. In this ontology, the swrl:Imp class describes a SWRL rule [55]. We applied this implementation as it allows us to speed up rule activation time; it is unnecessary to insert all triples describing a given rule, are necessary only triples that allow declaring this rule as an instance of the swrl:Imp class. Indeed, the triples representing each rule's body and head are already present in the triple store. Therefore, we add one triple per ARS rule. These triples are removed, once ﬁnished the query execution process. Table 5 shows the queries used in our experiments. For all experiments, w1 = 2, w2 = 1, Trules ≫ 474. Tscore equals 0 for query Q1, but Tscore = 0.0001 for queries Q2, Q3 and Q4. We use a positive value close to 0 in order to keep all rules that have an associated score different from zero. The query Q1 has only one element and we would like to be sure that all rules concerning this query will be selected; this is why Tscore = 0. Therefore, are selected the rules having onto1:p11 (and its sub-properties) in their head. With Q2, we query the values of onto1:p11 for instances of onto2:C21. So, this is taken into account when choosing the set of rules having less or the same number of rules as the set of rules selected from Q1. For Q2, the Algorithm 2 selects all acyclic rules whose body comprises onto2:C21 (or its super- or sub-classes) as a domain restriction for onto1:p11, and whose head contains onto1:p11 (or its subproperties).

Table 4 Interoperability rules deﬁned for each repository / knowledge base (KB).

KB1 KB2 KB3 KB4

Number of rules

Characteristics

474 266 178 Variable

All the rules contained in the initial rule set (all the rules forming the alignment between Onto1 and Onto2). All subsumption rules along with all the rules that have elements from Onto1 in their head. All rules from KB2 minus some of the rules that have elements from Onto1 in their head (we aimed at reducing the data inferred). All the rules contained in the Activated Rule Set (ARS).

Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

17

Table 5 List of queries addressed over the considered knowledge bases. Query name

SPARQL Query

Q1

SELECT ?x ?y WHERE {?x onto1:p11 ?y .} SELECT ?x ?y WHERE {?x a onto2:C21 . ?x onto1:p11 ?y .} SELECT ?x ?u WHERE {?x a onto1:C11 . ?y a onto2:C22 . ?z a onto1:C12 . ?y onto2:p21 ?z . ?y onto2:p22 ?x . ?x onto1:p11 ?u .} SELECT ?x ?y WHERE {?d onto1:p12 ?s . ?c onto2:p23 ?s . ?c onto2:p24 ?x . ?x a onto2:C23 . ?x onto2:p25 ?w . ?w onto2:p26 ?y . FILTER(?y N 1.26)}

Q2

Q3

Q4

The graph pattern used for query Q3 is more complex. In this case, rules' calculated score considers also the number of terms (described using URIs) present both in the rule's body and in Q3 (i.e. the calculus of Nq, and consequently, N). As a consequence, some unnecessary rules could be part of the ARS set. In this case (Q3 with Tscore = 0.0001), only one unnecessary rule is present in the ARS set. Nevertheless, we achieved reducing the initial rule set of 474 rules to 4 rules. While the initial rule set overloaded the RAM memory after 3 min of query runtime, our approach allowed retrieving results for Q3 in less than 500 milliseconds. Q4 is a query example with a constraint expressed by the keyword FILTER. Besides, Q2, Q3 and Q4 are all composed of terms (each identiﬁed by means of URIs) from both ontologies. When considering approaches described in [43] and [46], they only allow answering to queries addressing solely one ontology from the considered ensemble of inter-operable ontologies. This is because they use alignments for rewriting queries which only use terms from one ontology. The so rewritten query is then applied over another ontology. With our approach, we can write a query using terms from any ontology within the ensemble of inter-operable ontologies (e.g. Q2 that has onto2:C21 and onto1:p11 terms from Onto2 and Onto1, respectively). This gives more ﬂexibility to the user for constructing queries. It also allows developing information systems employing terms from the considered ensemble of inter-operable ontologies. This beneﬁt of our approach is proven when looking at queries Q2, Q3 and Q4 and their respective results. Indeed, they return the correct results (see Table 6) even if they contain terms from both considered ontologies (Onto1 and Onto2). Queries listed in Table 5 are each run 30 times over the knowledge bases KB1, KB2, KB3 and KB4. Table 6 shows the results we obtained. We also display the arithmetic mean (column “Mean execution time”) and the standard deviation. The “#RulesSet” column displays the number of rules in the KB. The “#Results” column displays the number of tuples retrieved as a result for the considered

Table 6 Query performance evaluation. Query

Knowledge base

Mean execution time (in seconds)

Standard deviation (σ)

#Rule set

#Results

Q1

KB1 KB2 KB3 KB4 KB1 KB2 KB3 KB4 KB1 KB2 KB3 KB4 KB1 KB2 KB3 KB4

– – 18.04 1.93 – – 22.88 0.15 – – 25.19 0.21 – 0.62 0.44 0.34

– – 0.18 1.27 – – 0.56 0.01 – – 0.42 0.02 – 0.71 0.67 0.07

474 266 178 16 474 266 178 2 474 266 178 4 474 266 178 9

0 0 1671 22834 0 0 103 103 0 0 2 2 0 16 16 16

Q2

Q3

Q4

Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

18

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

query (e.g. {?x,?y} for Q1). When specifying “-” in Table 6, we mean that no results were retrieved for the considered query. This is caused by the fact that the memory heap size (6GB) for the Java Virtual Machine has been exceeded. This happened generally after 3 min of query execution. To properly answer query Q1, our methodology has selected 16 rules from PRS set based on the initial set of 474 rules. The results show that without our approach no result is retrieved as long as the entire raw rule set is considered due to memory overload after about 3 min of query execution over KB1. When executed over KB2, Q1 evidences that even by reducing the cardinality of the raw rule set to 266 and by removing rule cycles, still no result is retrieved. We executed over KB3 (which has less than 40% of the rules from the initial set), Q1 returns less than 7% of all expected results. This is explained by the fact that several of the relevant rules for Q1 had been removed when conceiving our test knowledge bases. Moreover, when compared to Q1 over KB4, Q1 over KB3 has a duration 9 times greater and retrieves 10 times less results. Indeed, KB4 considers only ARS rules, so the results of Q1 executed over KB4 represent the gain (in terms of query execution time and results retrieved) achieved by implementing our approach. When applied to Q2, our method takes into account the domain restriction deﬁned within Q2 and creates an ARS set composed of only 2 rules instead of 16 rules, as it was previously the case for Q1 (which did not had any domain information for the property onto1:p11). This allows reducing the query's response mean time of about 13 times. Moreover, the standard deviation for Q2 executed over KB4 shows that the response time is very close to the mean time. So, with our approach, the response time is almost the same for any execution of Q2. Our rule selection method for Q1 and Q2 experiments was able to create an ARS set composed uniquely of the pertinent rules to answer these queries and retrieve all possible results. For Q3 and Q4, our approach deﬁnes an ARS set that contains some unnecessary rules; however it reduces the number of initial rules, and consequently, the inference tasks and query execution time. Superﬂuous rules are selected due to the threshold Tscore being close to zero (Tscore = 0.0001). A solution for eliminating those rules would be to specify higher values for Tscore, if a better response time is needed. Still, calculating the optimal threshold isn't trivial, and an error in this calculus implies risking the deletion of rules that are necessary for answering the considered query. For all experiments performed, the mean query execution times were considerably reduced. For the experiments using Q2, Q3 and Q4, the standard deviation for the query response time is much lower with our approach, meaning a query response time more centralized onto the mean. Nevertheless, this is not true for Q1 over KB4 and KB3. It can be explained because Q1 returns fewer results over KB3 than over KB4, due to the absence of some relevant rules in KB3 for properly answering Q1. Table 7 compares our approach with the query rewriting methods [43,46] (described in subsection 3.1). The environment used for testing is the same as before (as described previously in this section). In these new tests, the query rewriting process is executed by the client machine. Once this process is ﬁnished, the rewritten query is sent to the sever that contains the Stardog triple store. Solely Q1 is considered in our experiments because Q2, Q3 and Q4 have terms from different ontologies (and aren't processed by approaches described in [43] and [46]). Moreover, Correndo et al.'s approach [46] cannot process Q4 because this query contains a SPARQL construct (e.g. FILTER) that is not supported. All tests have executed Q1 30 times. Table 7 shows both considered approaches have the same mean and standard deviation for the query execution time. This is expected because these methods are quite similar. Experiments illustrated in Table 7 demonstrate that when we execute Q1 without taking into account reasoning tasks (e.g.: DL subsumptions) the mean execution time of this query is drastically reduced. However, in this case, not all results are retrieved (i.e. 1497 results are not retrieved because of missing inferences). Through the above tests, we notice that Stardog implements a cache memory that allows reducing query execution time in the case when a query is addressed more than once. The ﬁrst execution of Q1 without reasoning lasted about 2 s and the second one took only 0.1 s by using Correndo et al. [46] or Makris et al.'s [43] approaches. Our approach took about 2 s for answering Q1 in the ﬁrst execution as well for the second one. This is due to the fact that our approach must consider a reasoner for querying over rule-based aligned ontologies. By analyzing these experiments, Stardog's cache memory for queries isn't fully supported. All experiments of Q1 with reasoning for Correndo et al. [46] or Makris et al.'s [43] approaches show a standard deviation signiﬁcantly inferior compared to the mean execution time. This demonstrates that the cache memory optimization doesn't considerably improve query execution time when reasoning is applied. Compared to [43] and [46], our approach allows retrieving the same results with a mean execution time of about 17 times less. Furthermore, when considering the ﬁrst execution of Q1 without reasoning, the considered query rewriting methods had almost the same execution time as the one obtained using our approach. Moreover, our approach succeeds in retrieving all the correct results, while the other considered approaches don't.

Table 7 Query performance evaluation between our approach and query rewriting approaches for interoperability. Query

Approach

Mean execution time (in seconds)

Standard deviation (σ)

#Rule set

#Results

Q1 without reasoning Q1 with reasoning

Makris et al. [43] Correndo et al. [46] Makris et al. [43] Correndo et al. [46] Our approach

0.12 0.12 33.46 33.46 1.93

0.44 0.44 0.83 0.83 1.27

0 0 0 0 16

21337 21337 22834 22834 22834

Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

19

6. Conclusion and future work In this article, we investigated the problem of answering queries addressed over ontology alignments speciﬁed by means of SWRL rules. We have based our work on the assumption that removing unnecessary rules allows reasoners to perform only required inferences, therefore minimizing query execution time. Our approach allows identifying the subset of rules that have to be taken into account by the reasoner when processing a given query. The proposed RS is a middleware between the considered systems (which address queries to the triple store) and the reasoner present on the triple store side. Our approach can be applied to ontology-based enterprise models and/or systems, whose interoperability is achieved by means of Horn-like rules. The presented contribution allows considerably reducing the execution time of queries addressed over such models or systems. In practice, implementing our approach enables ontology-based enterprise systems to seamlessly share and process data dynamically in the KB. Our experiments' results show that our solution allows achieving interoperability among models at the level of the KB (expressed as ontologies, named graphs, etc.) and based on a rule set and a rule engine. Among the advantages offered by our approach, we may cite: • The possibility to compose queries with terms from different models; • The avoidance of data redundancy; • The respect of constraints imposed by industrial contexts regarding the query execution time. Future work concerns several possible optimizations. Among them we may cite the optimization of the Tscore parameter's calculus. We want to investigate what are the best parameter combinations and possible existing alternatives for rules ranking. We would also like to extend our work by investigating the possible advantages brought by the use of SWRL built-ins [12]. Still, our approach is generic enough to support alignments employing such constructs. An additional possible future work addresses the analysis of other techniques for ontology alignment, such as model-driven interoperability (MDI) [56,57]. One ﬁnal remark concerns the study of alternative query rewriting methods that can be applied in some speciﬁc cases. Indeed, in the case of a simple cycle as the one illustrated in Fig. 4, if we want to avoid temporarily materializing the data inferred by one of the rules, an alternative solution consists in rewriting the considered query. When rewriting this query, the goal is to normalize the query's vocabulary by making use of the eventual equivalencies deﬁned among ontology elements (concepts, properties, etc.). These equivalencies have to be mined to our merged TBox (MTKB). While this idea may look appealing, we need to precisely compute the cost of applying it. This means we have to perform additional benchmarks for highlighting the eventual beneﬁts brought by a query rewriting. Acknowledgements This work has been ﬁnanced by the French company ACTIVe3D (see http://www.active3d.net) and supported by the Burgundy Regional Council (see http://www.region-bourgogne.fr) through the CIFRE convention number 2013/1200. References [1] N. Pal, M. Lim, Emergence of the agile enterprise: building organizations for the global, digital economy, in: N. Pal, D. Pantaleo (Eds.), Agil. Enterp. Reinventing Your Organ. Success an On-Demand World, Springer, US, 2005. [2] The World Wide Web Consortium (W3C), XML Technology, www.w3.org/standards/xml (Online; accessed 2015-09-13). [3] ISO, Standard for the exchange of product data—ISO 10303 (STEP), ISO TC184/SC4, ISO, 1994 (hhttp://www.tc184-sc4.orgi. [Online; accessed 2015-09-13]). [4] C. Agostinho, et al., Towards a sustainable interoperability in networked enterprise information systems: trends of knowledge and model-driven technology, Comput. Industry2015. [5] Esma Yahia, Alexis Aubry, Hervé Panetto, Formal measures for semantic interoperability assessment in cooperative enterprise information systems, Comput. Ind. 63 (5) (2012) 443–457 (June 2012). [6] T. Halpin, Fact-oriented modelling: past, present and future, Conceptual Modelling in Information Systems Engineering2007 19–38. [7] Alon Y. Halevy, Naveen Ashish, Dina Bitton, Michael Carey, Denise Draper, Jeff Pollock, Arnon Rosenthal, Vishal Sikka, Enterprise information integration: successes, challenges and controversies, Proceedings of the 2005 ACM SIGMOD international conference on Management of data (SIGMOD '05), ACM, New York, NY, USA 2005, pp. 778–787. [8] Rami Rifaieh, Ahmed Arara, AïchaNabila Benharkat, Muro: a multirepresentation ontology as a foundation of enterprise information systems, Gautam Das and VedPrakash Gulati, editors, Intelligent Information Technology, volume 3356 of Lecture Notes in Computer Science, Springer, Berlin Heidelberg 2005, pp. 292–301. [9] Silvana Castano, Alfio Ferrara, Stefano Montanelli, Ontology-based interoperability services for semantic collaboration in open networked systems, in: D. Konstantas, et al., (Eds.), Interoperability of Enterprise Software and Applications, Springer, London 2006, pp. 135–146. [10] J. Euzenat, P. Shvaiko, Ontology Matching, Second Edition Springer-Verlag, Berlin Heidelberg, Germany, 2013, http://dx.doi.org/10.1007/978-3-642-38721-0. [11] Shvaiko Pavel, Jerome Euzenat, Ontology matching: state of the art and future challenges, IEEE Trans. on Knowl. Data Eng. 25 (1) (2013) 158–176 (January). [12] Ian Horrocks, Peter F. Patel-Schneider, Harold Boley, Said Tabet, Benjamin Grosof, Mike Dean, SWRL: a Semantic Web Rule Language combining OWL and RuleML, W3C Member submissionMay 21st 2004 (http://www.w3.org/Submission/SWRL/ [Online; accessed 2015-09-13]). [13] B. Motik, R. Studer, KAON2 — a scalable reasoning tool for the semantic Web, Proceedings of the 2nd European Semantic Web Conference, 2005. [14] Stephan Grimm, Andreas Abecker, Johanna Völker, Rudi Studer, Ontologies and the Semantic Web, in: John Dominique, Dieter Fensel, James A. Hendler (Eds.), Handbook of semantic web technologies, Springer-Verlag, Berlin Heidelberg, 2011 (Chapter 13). [15] Volker Haarslev, Hsueh-Ieng Pai, Nematollaah Shiri, Uncertainty reasoning for ontologies with general tboxes in description logic, in: Paulo Cesar G. Da Costa, et al., (Eds.), Uncertainty Reasoning for the Semantic Web I, volume 5327 of LNCS, Springer, Berlin Heidelberg 2008, pp. 385–402. [16] John F. Sowa, Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks Cole Publishing Co, Pacific Grove, CA, 2000 (http:// www.jfsowa.com/krbook/ [Online; accessed 2015-02-26]). [17] The World Wide Web Consortium (W3C), Semantic Web: Inference, http://www.w3.org/standards/semanticweb/inference (Online; accessed 2015-09-13).

Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

20

T.M. de Farias et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

[18] Boris Motik, Ulrike Sattler, Rudi Studer, Query Answering for OWL-DL with Rules, in Journal of Web Semantics (Elsevier) 3 (1): 41–60. http://www.cs.ox.ac.uk/ boris.motik/pubs/mss05query-journal.pdf [Online; accessed 2015-09-13]. [19] The World Wide Web Consortium (W3C), OWL Web Ontology Language Semantics and Abstract Syntax, http://www.w3.org/TR/2004/REC-owl-semantics20040210/2004 (Online; accessed 2015-09-13). [20] B. Motik, Reasoning in Description Logics Using Resolution and Deductive Databases, Universität Karlsruhe (TH), 2006 (Ph.D. thesis). [21] The World Wide Web Consortium (W3C), Owl 2 Web Ontology Language Direct Semantics, second edition, 2012 (http://www.w3.org/TR/2012/REC-owl2direct-semantics-20121211/, [Online; accessed 2015-09-13]). [22] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. Patel-Schneider (Eds.), The Description Logic Handbook, Cambridge University Press, Cambridge, 2003. [23] Evren Sirin, Bijan Parsia, Bernardo Cuenca Grau, Aditya Lakyanpur, Yarden Katz, Pellet: a practical OWL-DL reasoner, Web Semant. Sci. Serv. Agents on the World Wide Web 5 (2) (June 2007) 51–53. [24] Birte Glimm, Ian Horrocks, Boris Motik, Giorgos Stoilos, Zhe Wang, HermiT: an OWL 2 reasoner, J. Autom. Reason. 53 (3) (2014) 245–269 (October 2014). [25] Jérôme Euzenat. Networks of ontologies and alignments. In M2R SWXO Lecture Notes. [26] Kim Bruce, Lecture 18: Axiomatic Semantics & Type safety, CSCI 131, Pomona College, 2011 (http://www.cs.pomona.edu/~kim/CSC131F11/Lectures/Lecture18/ Lecture18.pdf [Online; accessed 2015-09-13]). [27] Cristina Baroglio, Piero A. Bonatti, Jan Maluszynski, Massimo Marchiori, Axel Polleres, Sebastian Schaffert, Reasoning Web: 4th international summer school 2008, Venice Italy, September 7-11, 2008, Tutorial Lectures. 1st edition, Springer Publishing Company, Incorporated, 2008. [28] Krzysztof R. Apt, From Logic Programming to Prolog, vol. 362Prentice Hall, London, 1997. [29] Jean Gallier, SLD—resolution and logic programming(originally published by Wiley, 1986) Logic for Computer Science: Foundations of Automatic Theorem Proving2003 (chapter 9, https://cs.uwaterloo.ca/~david/cl/sld-gallier.pdf [Online; accessed 2015-09-13]). [30] S.J. Russell, P. Norvig, Artificial Intelligence: A Modern Approach, 3rd edition, 2009. [31] Hoolet, http://owl.man.ac.uk/hoolet/ (Online; accessed 2015-09-13). [32] Minsu Jang, Joo-chan Sohn, Bossam: an extended rule engine for OWL Inferencing, Proceedings of RuleML 2004 (LNCS Vol. 3323), Nov. 8, 2004. [33] Boris Motik, Ulrike Sattler, A comparison of reasoning techniques for querying large description logic ABoxes, in: Miki Hermann, Andrei Voronkov (Eds.), Proceedings of the 13th International Conference on Logic for Programming, Artificial Intelligence, and Reasoning (LPAR'06), Springer-Verlag, Berlin, Heidelberg 2006, pp. 227–241. [34] Jing Mei, Li Ma, Yue Pan, Ontology query answering on databases, in: Isabel Cruz, Stefan Decker, Dean Allemang, Chris Preist, Daniel Schwabe (Eds.), Proceedings of the 5th International Conference on the Semantic Web (ISWC'06), Springer-Verlag, Berlin, Heidelberg 2006, pp. 445–458. [35] Daniel Elenius, SWRL-IQ: A prolog-based query tool for OWL and SWRL, Proceedings of OWL: Experiences and Directions Workshop 2012, Heraklion, Crete, Greece, May 27–28, 2012. [36] Emmanouil Rigas, Georgios Meditskos, Nick Bassiliades, Swrl2cool: object oriented transformation of SWRL in the clips production rule engine, in: Maglogiannis, et al., (Eds.), Artificial Intelligence: Theories and Applications, volume 7297 of LNCS, Springer, Berlin Heidelberg 2012, pp. 49–56. [37] G. Meditskos, N. Bassiliades, A rule-based object-oriented OWL reasoner, IEEE Trans. Knowl. Data Eng. 20 (3) (2008) 397–410. [38] Bijan Parsia, Evran Sirin, Bernardo Cuenca Grau, Edna Ruckhaus, Daniel Hewlett, Cautiously Approaching SWRL, Elsevier Science, 2005 (https://cs.uwaterloo.ca/ ~gweddell/cs848/SWRL_Parsia_et_al.pdf, [Online; accessed 2015-09-13]). [39] Volker Haarslev, Kay Hidde, Ralf Möller, Michael Wessel, The racerpro knowledge representation and reasoning system, Semant. web 3, 32012 267–277. [40] Boris Motik, Rob Shearer, Ian Horrocks, Hypertableau reasoning for description logics, J. Artif. Int. Res. 36 (1) (2009) 165–228. [41] Clark, Parsia, The Stardog Manual, http://docs.stardog.com/ (Online; accessed 2015-09-13). [42] Apache Jena, Reasoners and Rule Engines: Jena Inference Support, http://jena.apache.org/documentation/inference/ (Online; accessed 2015-09-13). [43] Konstantinos Makris, Nektarios Gioldasis, Nikos Bikakis, Stavros Christodoulakis, Ontology mapping and sparql rewriting for querying federated rdf data sources, Proceedings of the 2010 International Conference on On the Move to Meaningful Internet Systems: Part II, OTM'10, Springer-Verlag, Berlin, Heidelberg 2010, pp. 1108–1117. [44] Dean Allemang, James Hendler, Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008. [45] Konstantinos Makris, Nektarios Gioldasis, Nikos Bikakis, Stavros Christodoulakis, SPARQL Rewriting for Query Mediation Over Mapped Ontologies, http://www. music.tuc.gr/reports/SPARQLREWRITING.PDF, 2010 (Online; accessed 2015-09-13). [46] Gianluca Correndo, Manuel Salvadores, Ian Millard, Hugh Glaser, Nigel Shadbolt, SPARQL query rewriting for implementing data integration over linked data, Proceedings of the 2010 EDBT/ICDT Workshops, EDBT '10, ACM, New York, NY, USA 2010, pp. 4:1–4:11. [47] Jorge Pérez, Marcelo Arenas, Claudio Gutierrez, Proceedings of the 5th international conference on the semantic Web, ISWC′06, Semantics and complexity of SPARQLSpringer-Verlag, Berlin, Heidelberg 2006, pp. 30–43. [48] Jorge Pérez, Marcelo Arenas, Claudio Gutierrez, Semantics and complexity of sparql, ACM Trans. Database Syst. 34 (3) (September 2009) (16:1–16:45). [49] Sven Groppe, Data Management and Query Processing in Semantic Web Databases, 1st edition Springer Publishing Company, Incorporated, 2011. [50] Sven Groppe, Jinghua Groppe, Volker Linnemann, Using an index of precomputed joins in order to speed up SPARQL processing, in: Jorge Cardoso, José Cordeiro, Joaquim Filipe (Eds.), Proceedings 9th International Conference on Enterprise Information Systems (ICEIS 2007 (1), Volume DISI), INSTICC, Funchal, Madeira, Portugal June 12–16 2007, pp. 13–20. [51] Sven Groppe, Jinghua Groppe, Dirk Kukulenz, Volker Linnemann, A SPARQL engine for streaming RDF data, Proceedings 3rd International Conference on SignalImage Technology and Internet-Based Systems (SITIS 2007) December 16–19 2007, pp. 154–161 (Shanghai, China, This paper received an honorable mention at the SITIS′07 Conference). [52] Ralf Heese, Query graph model for sparql, Proceedings of the 2006 International Conference on Advances in Conceptual Modeling: Theory and Practice, CoMo-GIS ′06, Springer-Verlag, Berlin, Heidelberg 2006, pp. 445–454. [53] Ilianna Kollia, Birte Glimm, Ian Horrocks, SPARQL query answering over OWL ontologies, Proceedings of the 8th Extended Semantic Web Conference on The Semantic Web: Research and Applications — Volume Part I, ESWC′11, Springer-Verlag, Berlin, Heidelberg 2011, pp. 382–396. [54] Dizza Beimel, Mor Peleg, Using OWL and SWRL to represent and reason with situation-based access control policies, Data Knowl. Eng. 70 (6) (2011) 596–615. [55] The SWRL Ontology, http://www.daml.org/rules/proposal/swrl.owl (Online; accessed 2015-09-13). [56] A.J. Berre, B. Elvesæter, N. Figay, C. Guglielmina, S.G. Johnsen, D. Karlsen, T. Knothe, S. Lippe, The Athena interoperability framework, in: R.J. Gonçalves, et al., (Eds.), Enterprise Interoperability II, Springer, London 2007, pp. 569–580. [57] Hervé Panetto, Monica Scannapieco, Martin Zelm, INTEROP NoE: interoperability research for networked enterprises applications and software, On the Move to Meaningful Internet Systems 2004: OTM 2004 Workshops. Springer, Berlin Heidelberg, 2004.

Please cite this article as: T.M. de Farias, et al., SWRL rule-selection methodology for ontology interoperability, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.09.001

SWRL rule-selection methodology for ontology interoperability

SWRL rule-selection methodology for ontology interoperability

Recommend Documents