Int. J. Human-Computer Studies (2002) 56, 665–720 doi:10.1006/ijhc.1010 Available online at http://www.idealibrary.com.on
A cooperative framework for integrating ontologies Jesualdo TomaŁ s FernaŁ ndez-Breis and Rodrigo MartiŁ nez-BeŁjar Departamento de Ingenier!ıa de la Informacio! n y las Comunicaciones, Universidad de Murcia, 30071-Espinardo (Murcia), Spain. emails:
[email protected],
[email protected] (Received 21 July 1999 and accepted in revised form 8 May 2002) Nowadays, there are systems and frameworks that support Ontology construction processes. However, ontology integration processes have not sufficiently been specified to date. In this article, by making use of a cooperative philosophy, we describe a real framework for the integration of ontologies supplied by a predetermined set of (expert) users, who may be interconnected through a communication network. This framework is based on a set of well-defined assumptions that guarantee the consistency of the ontology derived from the ontology integration process. Moreover, in the approach presented here, every (expert) user may consult the so-derived ontology constructed until a given moment in order to refine his or her private ontology. In addition to this, the model proposed in this work allows the experts involved in the construction of the ontology to use their own terminology when querying the global ontology obtained until a given instant from their own co-operative work. The validation of the framework is also included in this work. # 2002 Elsevier Science Ltd. All rights reserved. KEYWORDS: ontology; knowledge integration; knowledge acquisition.
1. Introduction Recently, there has been increased interest in ontologies and in the idea of reusing existing (domain) ontologies (Crow & Shadbolt, 2001). Moreover, there is a high level of consensus in considering that costly human resources as well as intense organizational work are normally required to construct ontologies. Thus, on the one hand, experts and knowledge engineers must work together in order to create an ontology. On the other, it is necessary to coordinate experts’ work and to manage the information flowing between human agents involved in the ontology’s construction. In this sense, a complex elicitation process between experts and knowledge engineers is usually required. This situation presents several problems, such as the need to have meetings between experts and knowledge engineers and to establish a consensus on the task schedule that must be adhered to in order to construct an ontology (Hameed, Sleeman & Preece, 2001). Fortunately, experts’ availability can be overcome to some extent by exploiting the possibilities of a communication network to which a set of user nodes is connected. In this manner, (expert) user nodes can be distributed in both space and time while all of them are solving a certain task. 1071-5819/02/$ - see front matter
# 2002 Elsevier Science Ltd. All rights reserved.
666
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
According to Reimer (1998), Knowledge Integration can be seen from two points of view: integration of different knowledge bases and integration of different representations of the same knowledge at different formalization levels. This work is focused on the second perspective, that is, we will deal with the problem of ontology integration. Thus, the aim of this work is to use a set of end-users interconnected via a communication network to generate an ontology as a result of integrating a set of ontologies that can be provided by them. To be more precise, we will restrict ourselves to organizational and definitional aspects of a given set of ontologies, in order to obtain their integration into a global ontology, which will be sent to a certain user as a response to his or her information request. We therefore intend to perform ontological integration, which has been considered a complete process rather than a single ontological activity (Pinto & Martins, 2001). Different authors have structured this process in different ways, although the community seems to agree on the objectives of such a process. For these authors, the process aggregates, combines and assembles together different source ontologies to form the resulting ontology, possibly after reused ontologies have undergone some changes. In McGuiness, Fikes, Rice and Wilder, 2000, ontology integration consists of the iteration of three steps: (1) finding overlapping areas within the ontologies; (2) relating concepts; (3) checking the consistency, coherency and non-redundancy of the result. In the approach introduced here, the basic idea is that several experts on the same topic are encouraged to work on an ontology construction process in a cooperative way. However, working cooperatively can give rise to several problems (Fern!andezBreis & Mart!ınez-B!ejar, 2000a). Some of these are as follows. Redundant information. Two different experts might attempt to describe the same part of the domain knowledge. Given this eventuality, it would be desirable for the system to be capable of managing this possible situation so that redundancies could be avoided. Use of synonymous terms for a concept. Apart from dealing with redundant information, different experts may employ different terminology for the same concept. In other words, there might be correspondence between different terms employed for a given concept (Shaw & Gaines, 1989). During the ontology construction process, information concerning the use of synonymous terms for a concept must be stored and managed, since a particular terminology should not be imposed on any expert during the Knowledge Acquisition process. However, an ontology would strive towards ‘‘consensual knowledge’’, that is, a fixed terminology. Synonyms are possible but, ideally, everybody should agree on the terminology. The Ontolingua Server overcomes all of these problems (Farquhar, Fikes & Rice, 1997), since there cannot be two identical concept identifiers. Concepts are internally represented as qualified concepts with respect to the ontology to which they belong. In addition to this, when a user manifests his or her intention to use an ontology in order to extend it, this system (i.e. the Ontolingua Server) never allows this end-user to add information that is inconsistent with the contents of the ontology that is being built. Every cooperative work-based tool should include a component that permits dialogue among the agents involved. This issue has been studied, for example, in the Ontolingua Server, which allows several users to modify the same ontology using the concept of a shared session. It also provides mechanisms that allow agents to
INTEGRATING ONTOLOGIES
667
communicate via electronic mail. In Tadzebao and WebOnto (Domingue, 1998), and KARAT (Abecker, Aitken, Schmalhofer & Tschaitschain, 1998), the referred dialogue is considered as one of the most important activities. Thus, in KARAT, dialogue is promoted through knowledge sharing, making it visible to all agents. Moreover, in these systems, dialogue is established with the purpose of facilitating (refinementoriented) discussion between two different experts over a pre-created ontology. From our perspective, cooperative dialogue possesses a different nature in that the agents are on one hand the collective of ontology suppliers (i.e. expert-users), and on the other, the global ontology generators. Benjamins and Fensel (1998) have presented (KA)2, an initiative for cooperative development of an ontology about research on Knowledge Acquisition. In this work, the use of the World Wide Web (WWW) is presented as an important factor for cooperative ontology development. Firstly, the WWW represents the most important knowledge source in the world. Secondly, the technology underlying the WWW allows cooperation in ontology construction processes, and the ontology is centralized while its instances are distributed over the WWW. ONTOBROKER (Fensel, Decker, Erdmann & Studer, 1998) shows how distributed Organizational Memories can be consulted by any agent by means of the WWW. The structure of this paper is as follows. In Section 2, the system implemented is presented. The implementation and evaluation processes are also briefly described. In Section 3, the main assumptions considered in this work for integrating ontologies are described. Moreover, the framework, based on these assumptions, that permits integrating ontologies is put forward in this section. Section 4 shows an example that can help to understand how the framework and system work. Section 5 contains a brief comparison of our system with other ontological tools. Finally, in Section 6 we present some conclusions.
2. The system In this section, the implementation and validation of the system are addressed. Also, an example of the way in which the system works is presented at the end of this section. The implementation subsection deals with kinds of system users and the facilities with which they are provided. The application of the system to different domains is described in the validation subsection. Finally, an example is shown, addressing the way in which the integration framework, formalized in the following section, works. 2.1. IMPLEMENTATION
The general objective of the application was to develop a system that could serve as a framework for cooperatively built, integration-derived (i.e. global) ontologies. In addition to this, we pursued a more ambitious goal in that every user could benefit from what other users had already contributed to an integration-derived ontology. In other words, the system was also designed to facilitate knowledge sharing. The system implemented does not allow a user either to modify another user’s contribution to a global ontology, or to see (ontological) knowledge, belonging to
668
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
another user, that has not already been incorporated into that particular ontology. Instead, the system makes it possible to access the benefits produced by the integration of knowledge contribution (of the set of users) represented by means of ontologies. In this system, two kinds of users are defined: normal and expert. Normal users are ‘‘information consultants’’, or people who are curious about some topic. On the other hand, expert users are ‘‘integration-derived ontology constructors’’, namely, those who generate knowledge for the system, so that normal users can consult it. Also, every expert user can obtain information about other expert users’ contributions, provided that they have already input knowledge into an already existing integrationderived ontology. The decision about who is a normal/expert user is automatic. In this sense, as a result of an evaluation of the data supplied by a user, the system decides whether a user is normal or expert. Such an evaluation is founded on the fact that the class of appropriate experts is predefined by the system administrator(s) so that only those thought to be real experts can contribute to the ontology integration process. We have implemented a client–server infrastructure in such a way that each user corresponds to a client and the ontology construction engine is reflected in the server. In other words, every user only knows the existence of a server, but he/she cannot access another user; hence this user has to make service requests to the server (i.e. access is exclusively via the server). Expert users must enter their particular ontologies to the system by means of files. In particular, ontologies entering the system are constructed from a text file, so that given a so-constructed ontology the concepts contained in the file are in correspondence with the concepts in the (relevant) ontology, obtained by a level-by-level extraction process starting from the root node. In addition to this, users must specify in these files information about each concept, including its associated terms, its synonymous terms, its specific attributes, its mereological parents (if any), its taxonomic parents (if any) and its specializations. Given a concept, we call specialization a pair (concept, attribute) where concept stands for the concept from which the concept under question is a child; and attribute represents the attribute (i.e. the view) considered, thus establishing that the concept under question is a child’s concept of concept. The file format as well as the approach here presented have also been utilized elsewhere for enterprise modelling (Fern!andez-Breis & Mart!ınez-B!ejar, 2000b), and it can be described as follows. The file is comprised of the set of concepts which are part of the ontology. Each concept is defined through its attributes, its name and its parent concepts, either mereological or taxonomic ones. The following lines explain in more detail the information to be supplied for each concept. *
Concept * Name of the concept. * Alternative names for the concept. Its format is as follows. – Number of alternative names. – List of alternative names. * Specific attributes. Format. – Number of specific attributes. – List of specific attributes’ names.
669
INTEGRATING ONTOLOGIES *
*
* * *
Mereological parents – Number of mereological parent concepts. – List of mereological parent concepts’ names. Taxonomic parents – Number of taxonomic parent concepts. – List of taxonomic parent concepts’ names. Specialization list Number of specializations. List of specializations.
Figure 1 represents the menu options available for experts, while the options for normal users are constrained due to their role in the system. To be more precise, the latter cannot send new information in terms of ontologies to the server (update the server) because they are not experts, and given that they cannot have their own ontology file, they cannot view particular ontologies. They are only allowed to view the result of the integration process. Let us suppose that the user has selected the working topic. Once we have chosen to look up the integration of the previously selected topic (and if the process has finished successfully) the (normal or expert) user will be presented a first visualization of the ontology like the one reflected in Figure 2. The user may also change the name of the concepts belonging to the ontology presented. Finally, the ontology can be visualized as a tree, displaying all the available information regarding the selected concept from the tree. The user may also change the name of the concepts, but in this case possible names are restricted to a set defined by the current name and the alternative ones.
2.2. EXAMPLE
Suppose that there are two different user-nodes, identified by UNA and UNB, respectively, working on the construction of an ontology about the Faculty of Sciences at the University of Murcia. Let us assume that the situation at t=t1 is the one reflected in Figures 3 and 4, where for every concept only the specific (i.e. non-inherited) attributes have been written in both mereological ontologies. Let us also assume that at t2>t1 the user-node UNA wishes to view the (global) ontology constructed so far. Then, by applying the proposed integration algorithm, the ontology shown in Figure 5 will be
Figure 1. Expert user menu.
670
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
Figure 2. A visualization of a user ontology.
Figure 3. Ontologies at instant t1.
671
INTEGRATING ONTOLOGIES
Figure 4. Visualization of the ontology UNA by using the implemented system.
Figure 5. Integrated ontology at t2.
built. That algorithm, which is detailed in further sections, can be summarized as follows. The first step is to select, amongst the set of ontologies to be integrated, the best subset of ontologies that can be integrated together, those that are neither equivalent nor inconsistent. The selected ontologies are inserted into a new ontology, which is called the integration-derived ontology. Then, the algorithm continues by detecting
672
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
Figure 6. Instantiated, integrated ontology for UNA.
synonymous concepts and transforming the ontologies. The result is that all of the ontologies will use the same terminology. This ontology is called the instantiated, integration-derived ontology. The final step is to merge the branches from the previously obtained instantiated, integration derived ontology in order to unify knowledge and generate the final ontology, which is called the transformed, integrationderived ontology. Then, as the consult request comes from UNA, and by applying (some of) the definitions pointed out in the following section, this user node will be provided with the ontology shown in Figure 6. It can be observed that the concept PEOPLE has been converted into PERSON in the ontology corresponding to UNB, since both concepts have been found to have the same attributes (according to the name-based equality criterion). After that, suppose that at t3>t2, an (external) end-user sends an information request to the system. Then, by applying the mentioned algorithm, he or she will be provided with the ontology depicted in Figure 7 (see also Figure 8). Note that the ontology corresponding to UNB was selected as the one to which knowledge (e.g. the attribute HEIGHT) has been added, as this ontology has more nodes than the ontology corresponding to UNA. 2.3. VALIDATION
The system has been evaluated by people with different knowledge levels, but mainly by under-graduate and post-graduate students. The system has been used for 1 year by final (fifth) year Computer Science students and by some Ph.D. students. The range of domains to which the system has been applied ranges from information technology (e.g. videoconferencing) to medicine (e.g. Leukemia). The jobs consisted of building different ontologies about the domains assigned to the students according to the knowledge acquired from interviewing different experts. Each student was assigned a topic and his/her work consisted of interviewing an expert and building the corresponding ontology. Therefore, each student had to act as a knowledge engineer and acquire knowledge from the experts. Thus, once the students obtained the ontologies, other students assigned to the same topic had to use the system to achieve the integration of their results to see whether or not the system and underlying framework were useful for them. In some sense, when the students are using the system, they are acting as quasiexperts since their ontologies represent knowledge acquired from an expert.
673
INTEGRATING ONTOLOGIES
Figure 7. Transformed ontology at t3.
Figure 8. A transformed ontology constructed with the implemented system.
The framework used for validating the usefulness of the system and integration framework is based on the measurement of knowledge gained by users. The gain is measured by comparing the user’s ontology with the transformation-, integrationderived ontology, and then enumerating the number of concepts, relations and attributes that appear in the transformation-, integration-derived ontology that did not appear in the expert’s private ontology. Each domain required a different workload in order to generate the ontology so that the size of the ontologies obtained by each student depended on the domain. This made
674
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
the size of the ontologies not to be homogeneous for the different domains that were dealt with in this project. Therefore, we think that the results should not be analysed and compared in absolute terms but in relative ones. Thus, percentage values are presented as the result of the evaluation of the system and the methodology used for integrating knowledge represented by means of domain ontologies. Three domains have been selected for showing results: (1) videoconferencing over multicast networks; (2) integrated circuits; and (3) the climate. Let us introduce these domains. Videoconferencing over multicast networks. This application domain deals with the transmission of information for videoconferencing using multicast IP instead of using the traditional unicast IP. This job was assigned to two students who interviewed experts in this application domain. Integrated circuits. This domain was assigned to three students who interviewed different experts on the design and implementation of integrated circuits. These experts provided information about the characteristics of integrated circuits, the technology used for making them and so on. Climate. Two students were assigned the study of the climate and the different elements that determine the climate of a region and different atmospheric events such as storms, tropical rains and so on. As a result of their assignment, two ontologies were built and integrated by using our system. 2.3.1. Measuring knowledge gain. This subsection explains how the knowledge that users gain by using the system is measured. For this, a weighted average of knowledge gain has been calculated according to the three knowledge categories used for making this evaluation, namely, concepts, attributes and relations. Moreover, the same procedure has been used for all knowledge categories. The formula is based on the following elements. * *
Value of a knowledge category in each expert ontology. Value of a knowledge category in the transformed, integration-derived ontology.
Let us first introduce the notation used in the formulae. n is the number of ontologies that take part in the integration process. |Xi| stands for the number of knowledge entities [i.e. concepts (C), attributes (A) or relations (R)] in the ontology i. Thus, the number of concepts in the ontology i is written as |Ci|, the number of attributes as |Ai|, and the number of relations as |Ri|. Therefore, X2{A,C,R}, and i=1..n. Local gain (Di) represents the knowledge the expert i gains by using the integration framework. That is, the percentage of knowledge included in transformed, integrationderived ontologies but not in the ontology i. It can be considered the gain with respect to the ontology i; X2{A,C,R}; i=1..n, and Xint stands for the corresponding knowledge category in the transformed, integration-derived ontology. Therefore, the average-weighted gain is calculated by using the following formulae: Pn gain ¼
ðjXi j Di Þ 1P ; n 1 jXi j
Di ¼
ðjXint j jXi jÞ : jXi j
675
INTEGRATING ONTOLOGIES
Let us explain the reasons for using these formulae. The local gain calculates the percentage of knowledge added by the transformed, integration-derived ontology, which is an appropriate manner for obtaining relative results. It was stated earlier that we are interested in obtaining relative values instead of absolute ones due to the fact that ontologies corresponding to different domains had a different size. Thus, the computation of the absolute knowledge gain produced, in our opinion, unreliable results. The knowledge gain is obtained for the three mentioned knowledge categories by calculating the weighted average of the local gains. The local gains are weighted in order to consider the different quantity of knowledge that each ontology contains. This consideration must be taken into account in order to calculate the results of a specific application domain for obtaining realistic results. Let us present an example. Let us suppose that there are two ontologies, namely O1 and O2, such that the number of concepts in O1, written |C1|, is 108 and the number of concepts in O2, written |C2|, is 65. After integrating both ontologies, a transformed, integration-derived ontology, Oint, has been achieved and the number of concepts in Oint, 135 108 135 65 100 ¼ 25%; D2 ¼ 100 ¼ 107% D1 ¼ 108 * 65 * written |Oint|, is 135. Let us now calculate the local gain for O1 and O2, namely D1 and D2, respectively. Hence, the knowledge gain for the concepts can be calculated as follows: 108 * 25 þ 65 * 107 ¼ 55:8%: 108 þ 65 A non-weighted average would have produced a 66% gain but this would not have been very realistic because small ontologies would have had more influence than large ones on the final result. gain ¼
2.3.2. Results of the evaluation. The previous way of measuring knowledge gain has been applied to three selected domains, and the results are presented in Table 1. As can be appreciated, using the system and the integration framework is an advantage for users because the transformed, integration-derived ontology has more knowledge than each individual ontology. A high value in the knowledge gain indicates that the different experts have specified different parts of the domain while a low knowledge gain indicates that the experts are describing the domain in a very similar way. Furthermore, some conclusions can be drawn from these results.
Table 1 Evaluation results for the chosen domains Domain Videoconferencing Integrated circuits Climate
Concepts (%)
Attributes (%)
Relations (%)
14.1 19.8 55.8
50 63.6 36.3
25 33 54
676
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
The higher rate of agreement between experts, and therefore a lower knowledge gain in this category, occurs with the concepts. This is due to different reasons. It is not very surprising that different experts find the same conceptual entities within a given domain, although, on the other hand, the synonym detection capability of the system can also perform this function. As for the results, the higher knowledge gain appears in the attribute category. The system can facilitate this disagreement. In the system, attributes are considered equivalent when they have the same name because they lack internal structure. On the other hand, it is more difficult to agree on the attributes than on the concepts, since each expert can define a concept from different, although compatible, perspectives; that is, they may share the definition of the basic characteristics of a concept, but each expert adds different minor characteristics to that concept. The results concerning relations are similar to the ones concerning concepts, because relations occurs among concepts, so that if experts find a similar set of conceptual entities in an application domain, the set of relations among them is likely to be very similar too. The nature of domains can be another reason, since the first two domains belong to the technology area, whereas the third one does not. In the third domain no agreement exists (i.e. climatic change) amongst the scientific community, although knowing in which category the largest gain occurs is not the main point of discussion. The system has also been directly evaluated by experts in different application domains, such as environmental planning. Thus, we prompted a set of experts located at different (geographic) sites to construct an ontology about visual landscape assessment, which is a subtask performed in environmental planning. These experts belong to different Spanish institutions such as the Technical University of Madrid, the Spanish Scientific Research Council and the University of Murcia. Consequently, this ontology has been crucial for a real project and a KBS for use in landscape assessment has been developed. As an example of the usefulness of this system, Figure 9 shows a (real) concept hierarchy corresponding to an integration-derived ontology for this domain. Finally, it should be noted that the system has not been evaluated on large application domains due to the problematic nature of evaluating KA frameworks and tools (Shadbolt, O’Hara & Crow, 1999). However, we are aware of the desirability of studying framework performance on integrating large ontologies (about 100–200 concepts).
3. Formalizing a framework for ontological integration This section provides the complete formalization of the system that has been presented in the previous section. This formalization process is carried out over various steps. First of all, the assumptions used in this work are presented in order to establish the basic notions that must be taken into account when reading further parts of this paper. These assumptions mostly deal with the ontological model that has been used for this research. Thus, the following sub-section represents the formalization of the ontological model. Therefore, all of the significant parts of an ontology (according to our model) will be formalized. Formalizing the ontological model has extreme significance for this paper since ontologies are the way in which we represent the domain knowledge.
INTEGRATING ONTOLOGIES
677
Figure 9. An ontology for the visual Landscape Assessment task from the vegetation–land use viewpoint.
Once the ontological model has been presented in a formal manner, the framework for integrating knowledge must be formalized too. Knowledge Integration is seen in this work as ontological integration, since knowledge is represented here by means of ontologies. Therefore, the defined and formalized framework is focussed on providing an infrastructure that allows for the integration of ontologies. So, the starting point of the integration framework is a set of ontologies belonging to different users. There may also be more than one ontology from the same user, but in this case the oldest one would be considered to have become obsolete and would therefore be discarded from the integration process. When the integration process is triggered by a user, this process will have to go through different steps before completing its task. One basic operation of the integration process is to compare the knowledge covered by different ontologies. Therefore, different functions to check for equivalence between ontologies or concepts are necessary, as well as functions to find inconsistencies between ontologies and concepts. Equivalencies and inconsistencies have been considered in this work from two points of view, namely, (1) attributes and (2) the organizational structure. Another function needed is one for deciding when two concepts are synonymous, which is a function that will permit users to use their own terminology, provided that the system is able to detect synonymy among concepts. The appendix contains a series of algorithms based on the definitions that appear across this section for the integration of knowledge represented by means of ontologies. In the following lines, we formalize everything that we have discussed until now. 3.1. ASSUMPTIONS AND ONTOLOGICAL FUNCTIONS
3.1.1. Assumptions. An ontology is viewed in this work as a specification of a domain knowledge conceptualization (van Heijst, Schreiber & Wielinga, 1997). In addition,
678
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
ontologies will be represented here by means of multiple hierarchical restricted domains (MHRD) in a similar sense to that employed by other authors (see, for instance, Eschenbach & Heydrich, 1995). In particular, we give the term MHRD to a set of concepts holding the following. *
*
*
They are defined through a set of attributes, so the presence of axioms between these attributes will not be considered. There can be taxonomic relations among the concepts, so that attribute (multiple) inheritance is permitted. There can also be mereological relationships among concepts.
In addition to this, the ontology representation schema adopted here includes ‘‘structural’’ axioms, that is, axioms that result from the relations concept has attribute, concept 1 is a concept 2 and concept 1 is a part of concept 2. Moreover, this schema also embodies other axioms (referred to as properties in Section 3) derived from some properties concerning interconceptual relationships in taxonomic and mereological organizations. We must clarify that defining ontologies without non-structural axioms does not mean that users cannot define these kinds of axioms as a (part of the) specification of a conceptualization. What we do is to split up the classic definition of ontology (i.e. the one including structural and non-structural axioms) into two parts so that we call ontology to the whole specification of a conceptualization excluding non-structural axioms. Regarding mereological relationships among concepts, there are a variety of theories for building a mereological ontology that define the part-of relation and its properties. Among these theories, Classical Extensional Mereology (Simons, 1987; Borst, 1997) is one of the most popular in the Knowledge Acquisition Community. From the perspective of this theory, transitivity states that when an individual is a proper part of a second individual that is in turn a proper part of a third, then the first is also a proper part of the third. However, transitivity seems to be inappropriate in situations like ‘‘Maria’s head is part of Maria’’, and ‘‘Maria is part of the Faculty of Informatics’’. In this case, we could deduce that ‘‘Maria’s head is part of the Faculty of Informatics’’. Hence, transitivity is not allowed in this paper for mereological organizations, although we will assume that other (mereological) properties are present in the framework proposed later in this work. Thus, the (mereological) part-of ontologies defined here will be supposed to hold irreflexivity [nothing is a (proper) part of itself], and asymmetry (if one thing is a proper part of another, then the second is not a proper part of the first). On the other hand, although there are other properties that may be present in mereological ontologies (e.g. overlapping, disjointedness and consistency), we will restrict ourselves to analysing mereological organizations by taking into account the previous (mereological) properties and some other (new) attribute-based ones. These will be introduced further in this article. The framework used to integrate a set of pre-existent ontologies will be constructed from a local, operative network (for example, a set of users connected to the Internet or some intranet). This network will be considered as a user-nodes’ ontologies transmission mechanism (one ontology per user-node) for a further integration of
INTEGRATING ONTOLOGIES
679
these ones. Some considerations should be made concerning the attributes and their representation in this ontological model. An attribute is only identified by its name because it has neither internal structure nor value. Attributes are just their names so that the attribute ‘‘leg’’ of a person would be equivalent to the attribute ‘‘leg’’ of a UEFA championship’s round if both attributes were compared in isolation. Therefore, the ontological functions presented in further sections only use the name of their attributes to make comparisons between attributes. This is a drawback of the approach, but the (extensible) way in which the functions have been defined allows scope for future extension (just the function that compares two attributes should be changed). On the other hand, if there are inconsistencies between (a part of) the knowledge belonging to the ontology corresponding to a particular user-node at the instant t, written Oi(t), included in the ontology that results from the integration of those ontologies at the instant t, written Oint(t), [by assuming an ontological inconsistency definition based on both the organizational structure (i.e. taxonomic/mereological) and the conceptual definition}through a set of attributes-], the knowledge coming from Oi(t) will be taken as valid. This policy has been adopted by taking into account that the expert (i.e. a special kind of user-node) may have made queries to Oint(t) in building his or her (private) ontology. 3.1.2. Ontological functions. By taking into account the elements constituting an ontology in this paper, the following ontological functions based on previous work (see, for example, Martinez-Bejar, Benjamins & Mart!ın-Rubio, 1997) will be used. The following ontology will be used as a reference for describing the functions and definitions that will be presented in the following sections. In this example, each concept is represented by a rectangle, the label of which stands for the name of the concept. Attributes are also written inside the rectangles whereas labelled arrows from the child concept to the parent represent relationships. Thus, in order to ensure that all concepts within a conceptual hierarchy taking part in an ontology are associated with (multiple) hierarchies by means of is-a or part-of relationships, the following functions are proposed. Firstly, we can define a function to compute taxonomic relationships between concepts. TMHRD(t) function. Let MHRD(t) be a multiple hierarchical restricted domain until the instant t. The taxonomic multiple hierarchical restricted domain until the instant t, written TMHRD(t), is defined as a set of concepts such that they are interassociated in (multiple) hierarchies by means of is-a relationships. In a formal manner, TMHRD(t) is defined as a set of concepts holding that: [Cardinal(TMHRD(t))52] , [for all ci(t) 2 TMHRD(t) exists cj(t) 2 TMHRD(t) s.t. IS-A(ci(t), cj(t)) v IS-A(cj(t), ci(t))], where IS-A(a, b) stands for ‘a is a b’. In the previous example (see Figure 10), TMHRD(t)={publication, person, newspaper, magazine, book, staff, student, novel, essay} because the following relationships hold: IS-A(publication, newspaper); IS-A(publication, magazine); ISA(publication, book); IS-A(person, staff); IS-A(person, student); IS-A(book, novel); IS-A(book, essay). Similarly, a function to compute mereological relationships can be defined as follows: M2HRD(t) function. Let MHRD(t) be a multiple hierarchical restricted domain until
680
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
Figure 10. Expert A’s ontology.
the instant t. The mereological multiple hierarchical restricted domain until the instant t, written M2HRD(t), is defined as a set of concepts such that they are inter-associated in (multiple) hierarchies by means of part-of relationships. In a formal manner, M2HRD(t) is defined as a set of concepts holding that: [Cardinal(M2HRD(t))52] , [for all ci(t) 2 M2HRD(t) exists cj(t) 2 M2HRD(t) s.t. PART-OF(ci(t), cj(t)) v PART-OF(cj(t), ci(t))], where PART-OF(a, b) stands for ‘a is a part of b’. According to the previous example, M2HRD(t)={library, publication, person, newspaper, magazine, article} since the following relationships hold:PART-OF(publication, library); PART-OF(person, library); PART-OF(article, newspaper); PARTOF(article, magazine) At this point, it is easy to establish a function to compute the taxonomic parent/child concepts, if any, of a given concept. T-parents/T-children function. Let ci(t) be a concept belonging to a non-empty TMHRD(t). The set of taxonomic parents of ci(t) until the instant t, written T-parents (ci(t)), is defined as the set {cj(t)2TMHRD(t) s.t. IS-A(ci(t), cj(t))}. The counterpart for child concepts can be written as follows: The set of taxonomic child concepts of ci(t) until the instant t, written T-children(ci(t)), is defined as the set {ck(t)2TMHRD(t) s.t. IS-A(ck(t), ci(t))}. Let us take the concept ‘‘book’’ from the previous example as the reference for illustrating these two functions. Let us calculate the functions T-parents (book) and T-children(book): T-parents(book)={publication} since IS-A (book, publication) is an existing relationship in the library domain. T-children(book)={novel, essay} since the following relationships hold in the domain ontology: IS-A(novel, book); IS-A(essay, book).
681
INTEGRATING ONTOLOGIES
Now, we can proceed in a similar manner in order to define parent/child concepts in mereological organizations. This can be achieved as follows: M-parents/M-children function. Let ci(t) be a concept belonging to a non-empty M2HRD(t). The set of mereological parents of ci(t) until the instant t, written M-parents (ci(t)), is defined as the set {cj(t) 2M2HRD(t) s.t. PART-OF(ci(t), cj(t))}. M-parents(publication)={library} since PART-OF(publication, library). The counterpart for child concepts can be written as follows: The set of mereological child concepts of ci(t) until the instant t, written M-children(ci(t)), is defined as the set {ck(t)2M2HRD(t) s.t. PART-OF(ck(t), ci(t))}. M-children(library)={publication, person} since PART-OF(publication, library) and PART-OF(person, library). In general, a concept can take part in both mereological and taxonomic organizations. This possibility is reflected in the following function. PMHRD(t) function. Let TMHRD(t) be a taxonomic multiple hierarchical restricted domain until the instant t and let M2HRD(t) be a mereological multiple hierarchical restricted domain until the instant t. The partial multiple hierarchical restricted domain until the instant t, written PMHRD(t), is defined as the union set TMHRD(t)[ M2HRD(t) According to the example used as a reference in this section: PMHRD(t)={publication, person, newspaper, magazine, book, staff, student, novel, essay}[{library, publication, person, newspaper, magazine, article}={publication, person, newspaper, magazine, book, staff, student, novel, essay, article}. Independently of the relationships concerning a certain concept, it may be necessary to know the conceptual attributes which have not been inherited. For that purpose, we will utilize the following function. SPE function. Let ci(t) be a concept belonging to a non-empty PMHRD(t). The set of specific attributes until the instant t, written SPE(ci(t)), is defined as the set of attributes of ci(t) which have not been inherited from any other concept. For example, the set of specific attributes of the concept ‘book’ is: SPE(book)={author, ISBN, style}. In an analogous manner, it may sometimes be necessary to know the conceptual attributes that have been inherited from other concepts. For this purpose, we will employ the next function. INH-T function. Let ci(t) be a concept belonging to a non-empty PMHRD(t) and let Cta(t) be a subset of T-parents(ci(t)). Then, the set of inherited attributes from Cta(t) associated to ci(t), written INH-T(ci(t), Cta(t)) can be obtained in the following manner: INH-Tðci ðtÞ; Cta ðtÞÞ ¼
CardinalðC [ ta ðtÞÞ
SPEðcj ðtÞÞ:
j¼1cj ðtÞ2Cta ðtÞ
According to our example, the set of inherited attributes of the concept ‘book’ is INHT(book)={title, editor, pages}.
682
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
By accounting for the two kinds of attributes (i.e. inherited and specific ones) that a given concept can possess within the knowledge representation schema used here, it is easy to derive the following function. ATT function. Let ci(t) be a concept belonging to a non-empty PMHRD(t). The set of attributes for ci(t) until the instant t, written ATT(ci(t)), is defined as the union set INH-T(ci(t), T-parents(ci(t)))[SPE(ci(t)). The complete set of attributes of the concept ‘book’ is ATT(book)={title, editor, pages}[{author, ISBN, style}={author, editor, ISBN, pages, style, title}. At this point, it is convenient to provide a summary of the ontological functions presented up to now. Table 2 summarizes the functions that can be applied to a single ontology in order to obtain its different sub-structures and Table 3 summarizes the function that can be applied to a concept already belonging to an ontology. Finally, a function to calculate the number of attributes in common between any two concepts can be defined as follows: Degree of overlapping function. Let ci(t) and cj(t) be two different concepts belonging to a PMHRD(t). The degree of overlapping between ci(t) and cj(t) at the instant t, written degree of overlapping(ci(t),cj(t)), is defined as the cardinal of the intersection set ATT(ci(t)) \ ATT(cj(t)). Let us calculate the degree of overlapping of the concepts ‘book’ and ‘person’. ATT(book)={author, editor, ISBN, pages, style, title} ATT(person)={name, id, address, role} Degree of overlapping(book, person)={author, editor, ISBN, pages, style, title}\ {name, id, address, role}=1. Therefore, the degree of overlapping between book and person is 0.
Table 2 Functions for an ontology Function
Informal description
TMHRD M2HRD PMHRD
Calculate the set of concept linked through taxonomy Calculate the set of concept linked through mereology Calculate the set of concept linked through taxonomy or mereology
Table 3 Functions for a single concept Function
Informal description
T-parent T-children M-parent M-children SPE INH-T ATT
Calculate Calculate Calculate Calculate Calculate Calculate Calculate
the the the the the the the
taxonomic parents of a concept taxonomic children of a concept mereological parents of a concept mereological children of a concept specific attributes of a concept inherited taxonomic attributes of a concept attributes of a concept, specific and inherited
INTEGRATING ONTOLOGIES
683
Let us now calculate the degree of overlapping between ‘book’ and ‘newspaper’. ATT(book)={author, editor, ISBN, pages, style, title} ATT(newspaper)={date, editor, pages, title} Degree of overlapping(book, newspaper)={author, editor, ISBN, pages, style, title} \ {date, editor, pages, title}={editor, pages, title}. Therefore, the degree of overlapping between book and newspaper is 3. By using (some of) the above functions, some properties of inter-conceptual relationships underlying an ontology defined as above can be expressed in a formal manner such as the following: For all c(t), c0 (t)2M2HRD(t), c(t)ac0 (t), [c0 (t)2M-parents(c(t))c(t) 2= M-parents(c0 (t))] (mereological asymmetry) For all c(t), c0 (t)2TMHRD(t), c(t)ac0 (t), [c0 (t)2T-parents(c(t))c(t) 2= T-parents(c0 (t))] (taxonomic asymmetry) For all c(t), c0 (t), c00 (t)2TMHRD(t), c(t)ac0 (t)ac00 (t), [(c0 (t)2T-parents(c(t)))^(c00 (t)2Tparents(c0 (t))) (c00 (t)2T-parents(c(t)))] (taxonomic transitivity) For all c(t), c0 (t), c00 (t)2M2HRD(t), c(t)ac0 (t)ac00 (t), not[(c0 (t)2M-parents(c(t)))^ (c00 (t)2M-parents(c0 (t)))(c00 (t)2M-parents(c(t)))] (mereological non-transitivity) For all c(t)2M2HRD(t), c(t) 2= M-parents(c(t)) (mereological irreflexivity) For all c(t)2TMHRD(t), c(t) 2= T-parents(c(t)) (taxonomic irreflexivity). At this point, we can establish a property stating that if two concepts are linked by the relationship part-of, both linked to another concept ck by means of the is-a relationship, they inherit the properties defined in ck. Formally, this property can be written as follows: For all c(t), c0 (t)2PMHRD(t) [c0 (t)2M-parents(c(t))^(T-parents(c(t)) \ Tparents(c0 (t))a1) ! (degree of overlapping(c(t), c0 (t))>0]. Finally, if two concepts have neither taxonomic parent concepts nor specific attributes in common, then we can say that both concepts have nothing in common. To express this formally: For all c(t), c0 (t)2PMHRD(t) [(SPE(c(t)) \ SPE(c0 (t))=1)^(T-parents(c(t)) \ Tparents(c0 (t))=1) ! (degree of overlapping(c(t), c0 (t))=0)].
3.2. THE FRAMEWORK FOR INTEGRATING KNOWLEDGE
All of these previous functions are necessary because working with several ontologies requires comparison of the knowledge stored in each of them in order to integrate them. Now, the integration process can be presented. All of the ontologies to be integrated must cover the same topic, that is, the same domain. This is the first consideration to be taken into account when integrating ontologies by means of the framework presented in this work. The next stage is to select which ontologies are going to be finally integrated since there can be equivalent or inconsistent ontologies in the set of source ontologies to be integrated. In order to perform this operation, it is important to know whether any
684
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
two ontologies meet logical requirements for integration. Two ontologies cannot be integrated together if they are either inconsistent or equivalent. It is obvious that inconsistent ontologies cannot be integrated together because they would incorporate contradictory knowledge into the integration-derived ontology. On the other hand, equivalent ontologies are not adequate for integration because they would incorporate redundancy into the integration-derived ontology. After calculating the set of ontologies that is compatible for integration with each source ontology, a decision must be made concerning what the best sets amongst them are. This decision must be taken according to the goal to be achieved. In our case, the system attempts to maximize information quality and quantity. So, our criterion will be the highest number of ontologies. Thus, the best subset will be the one holding this criterion. At that moment, the integration process can start. A new ontology is going to be generated; this knowledge is created not from scratch however, but from users’ ontologies. The next operation to perform is to establish the previously selected set of ontologies as being part of the new ontology. This is done (in practice) by inserting each source ontology into the integration-derived ontology as a mereological child of the root concept, whose name is the topic that is being integrated. The formalization for this part of the integration process can be found in Section 3.2.1. So far, we have arrived at a point at which the set of (source) ontologies have been filtered (by removing those that are inconsistent or equivalent) and placed into a common ontology. Section 3.2.2 formalizes the following steps of the process. The next stage is transforming each (source) ontology’s terminology into a shared terminology. This feature would be desirable for further steps of the integration process. The ontology obtained from this terminology unification process is called the instantiated and integrated ontology. In order to make this vocabulary unification, one of the ontologies that takes part in the integration process must be taken as a reference. More precisely, the terminology of the reference ontology must be taken as the reference for the process. This discussion moves now to decide which ontology must be considered the reference for completing this terminological unification. In this work, the reference terminology depends on which user has requested integration of the topic. If the user has a private ontology on that topic, this ontology will be considered the reference for unifying the terminology of all the ontologies. Otherwise, the decision depends on the criterion we choose for making the selection. In our case, the criterion is the ontology with the largest number of concepts, so we potentially minimize the number of changes to be made. The unification process can briefly be described as follows. For each concept C of each ontology included in the integration-derived ontology, if there exists a concept C0 in the reference ontology such that both concepts are synonyms then we change the name of C to the name of C0 . We have arrived at the final stage of the integration process. Once the terminology has been unified, the system is able to generate the final transformed ontology from the instantiated and integrated one. In order to do this, our approach will take one of the sub-ontologies that are mereological children of the root of the instantiated and transformed ontology as the skeleton of the transformed one. In addition to this, new concepts, attributes and relationships will be added to this one to obtain the desired final ontology that will be presented to the user. In this case, it is not significant whether
685
INTEGRATING ONTOLOGIES
or not the reference ontology belongs to the user that requested the integration of the topic, so the criterion that has been used here is to select the ontology with highest number of concepts as the reference one. All the ontologies are sorted according to the number of concepts they have and they will be inserted into the transformed ontology in the order established by that sorting process. The intention behind inserting the ontologies in this way is to minimize the number of modifications to be made to the transformed ontology, since the more ontologies are included, the more knowledge the transformed ontology has. Through the adoption of such a policy, it is more probable that the knowledge of the remaining ontologies has already been included into the transformed ontology. Therefore, the next stage is the transformation of the reference ontology by inserting new concepts, relationships and attributes that belong to the other ontologies taking part in the process. This mechanism processes one ontology at each cycle and this ontology is compared with the transformed ontology until that moment in order to discover which new knowledge can be added to the transformed ontology. Each ontology is processed in a concept-by-concept manner, checking for an equivalent or synonymous concept in the transformed ontology. In this case, both concepts must be unified in terms of attributes and parent concepts. If there is no equivalent or synonymous concept in the transformed ontology, the concept must be added to the transformed ontology in its corresponding place according to its relationships. Before expressing the integration mechanism that has been described in this section in a more formal manner, some ontologies that are used in this section to illustrate the definitions are shown. In particular, four ontologies are used for this purpose, i.e. those appearing in Figures 10–13 inclusive. 3.2.1. A framework for integrating ontologies. As mentioned previously, the goal of this work is to design a framework to construct ontologies from those provided by a concrete set of user-nodes. Consequently, the system must be able to face plausible
Figure 11. Expert B’s ontology.
686
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
Figure 12. Expert C’s ontology.
Figure 13. Expert D’s ontology.
(consistency) conflicts between the ontologies that are to be integrated at an instant t. In particular, every time a user inserts or modifies knowledge in his or her (private) ontology, such knowledge will be incorporated in Oint(t). It should be noticed that several user-nodes could send knowledge to construct the integrated ontology at the same time. So, it is necessary to distinguish between different pieces of knowledge provided by different user-nodes. Thus, the modus operandi of the framework proposed here to integrate a set of ontologies into another one is detailed in the following lines.
INTEGRATING ONTOLOGIES
687
Let Oi(t) be an ontology collected from the ith user-node until the instant t; i=1,. . ., n; n=number of active user nodes at t. Then, the first assumption relative to the ontology integration model proposed here can be written as follows. Principle 1: User-node-driven integration. All of the knowledge in Oint(tj) must be consistent with Oi(ti) for all i in {1,2,. . .,n} and tj>ti. That ensures that Oint(t) will always contain the latest version of all usernodes’ ontologies. By considering the ontological functions introduced in Section 3, and by supposing a name-based equality criterion, the following definition concerning the comparison between the different elements of two ontologies can be established. Two concepts from two different ontologies are said to be equivalent from the perspective of attributes corresponding to each concept if they have the same set of attributes. Formulaically speaking: Definition 1: Attribute-based conceptual equivalency. Let PMHRDi(t) and PMHRDj(t) be two partial hierarchical restricted domains until the instant t, and let c(t) and c0 (t) be two concepts such that c(t)2PMHRDi(t) and c0 (t)2PMHRDj(t). c(t) and c0 (t) are said to be conceptually equivalent from the attribute point of view, written A equivalency(c(t), c0 (t)), iff ATT(c(t))=ATT(c0 (t)). Let us illustrate this definition with the following example. Let us take the concepts ‘‘staff’’ and ‘‘article’’ from the ontologies corresponding to Expert B (Figure 11) and Expert C (Figure 12). ATT(staffB)={salary, position, name, id, address, role} ATT(staffC)={salary, position, name, id, address, role} The concepts ‘‘staffB’’ and ‘‘staffC’’ have the same attributes so both concepts meet the A equivalency. ATT(articleB)={length, author, keywords} ATT(articleC)={subject} The concepts ‘‘articleB’’ and ‘‘articleC’’ do not have the same attributes so both concepts do not meet the A equivalency. Similarly, we can extend the previous definition to deal with sets of concepts as follows. Two sets of concepts are equivalent from the attribute point of view if both sets are empty, or for all concepts in one set, there is another concept in the other set so that both concepts are equivalent from the attribute point of view. Formally expressed: Definition 2: Extended attribute-based conceptual equivalency. Let Ci and Cj be two sets of concepts. Ci and Cj are said to be two conceptually equivalent concept sets from the attribute point of view, written EA equivalency (Ci, Cj), iff the following holds: [Ci=Cj=1] or [(for all c(t) in Ci, exists c0 (t) in Cj s.t. A equivalency(c(t),(c0 (t))) and (for all c(t) in Cj, exists c0 (t) in Ci s.t. A equivalency(c(t), c0 (t))]
688
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
Let us take the same ontologies to present an example for this definition. Let us suppose that we have the following set of concepts: SetB={people, staff, student} SetC={person, staff, student} ATT(peopleB)={name, id, address, role} ATT(staffB)={salary, position, name, id, address, role} ATT(studentB)={number, name, id, address, role} ATT(personC)={name, id, address, role} ATT(staffC)={salary, position, name, id, address, role} ATT(studentC)={number, name, id, address, role} If we apply the definition, we can see that the following A equivalencies exist: A Equivalency(peopleB, personC) A Equivalency(staffB, staffC) A Equivalency(studentB, studentC) Now, by considering that a concept having the same parent or child concepts as some other concept that may have no attributes in common with this concept, the following can be written. Two concepts are inconsistent from the attribute point of view if all the following conditions hold: the terms given to both concepts are the same; the concepts have no attributes in common; the respective sets of mereological/taxonomic parent concepts, if any, of both concepts are equivalent from the attribute point of view; the respective sets of mereological/taxonomic child concepts, if any, of both concepts are equivalent from the attribute point of view. Now, formulaically speaking: Definition 3: Attribute-based conceptual inconsistency. Let PMHRDi(t) and PMHRDj(t) be two partial hierarchical restricted domains until the instant t, and let c(t) and c0 (t) be two concepts such that c(t) 2PMHRDi(t) and c0 (t)2PMHRDj(t). c(t) and c0 (t) are said to be conceptually inconsistent from the attribute point of view, written A inconsistency (c(t), c0 (t)), iff the following holds: (NAME(c(t))=NAME(c0 (t))) and (ATT(c(t)) \ ATT(c0 (t))=1) and EA equivalency (M-parents(c(t)), M-parents(c0 (t))) and EA equivalency(T-parents(c(t)), T-parents (c0 (t))) and EA equivalency(M-children(c(t)), M-children(c0 (t))) and EA equivalency (T-children(c(t)), T-children(c0 (t))) where NAME(x) stands for the term given to the concept x in its corresponding PMHRD(t), x2{c(t), c0 (t)}. An example for this definition are the concepts ‘‘articleB’’ and ‘‘articleC’’, since they have the same name and ATT(articleB) \ ATT(articleC)=1. Therefore, both concepts are inconsistent from the attribute point of view. ATT(articleB)={length, author, keywords} ATT(articleC)={subject} If we make use of Definition 2 in the case of having two sets of concepts with no equivalent concepts, from the attribute point of view, between these sets, we can say
INTEGRATING ONTOLOGIES
689
that two sets of concepts are disjoint if there is no concept in one set equivalent from the attribute point of view to another concept in the other set. We can also state this as follows Definition 4: Disjoint concept set Let Ci and Cj be two sets of concepts. Ci and Cj are said to be two disjoint concept sets, written dis concept sets(Ci, Cj), iff: (Cia1 or Cja1) and [for all c(t) in Ci, not (exists c0 (t) in Cj s.t. A equivalency(c(t), c0 (t)))] Let SetB={publication, magazine, article} be the set comprised of concepts belonging to Expert B’s ontology; Let SetC={article, book, people} be the set comprised of concepts belonging to Expert C’s ontology. For every concept that belongs to SetB there is not one concept belonging to SetC such that there is an A equivalency among them, and vice versa. Given two concepts allocated in their respective (taxonomic or mereological) organizations, we can make use of structural properties to establish conceptual equivalency as well as conceptual inconsistency between these concepts as follows. Two concepts are equivalent from the organizational structure point of view if the next two conditions hold. Firstly, the respective set of mereological/taxonomic parent concepts, if any, for both concepts are equivalent from the attribute point of view. Secondly, the respective set of mereological/taxonomic child concepts, if any, for both concepts are equivalent from the attribute point of view. In a formulaic fashion: Definition 5: Organizational structure-based conceptual equivalency. Let PMHRDi(t) and PMHRDj(t) be two partial hierarchical restricted domains until the instant t; and let c(t) and c0 (t) be two concepts such that c(t) 2PMHRDi(t) and c0 (t)2PMHRDj(t). c(t) and c0 (t) are said to be conceptually equivalent from the organizational structure point of view, written OS equivalency(c(t), c0 (t)), iff the following holds: EA equivalency(M-parents(c(t)), M-parents(c0 (t))) and EA equivalency(T-parents(c(t)), T-parents(c0 (t))) and EA equivalency(M-children(c(t)), M-children(c0 (t))) and EA equivalency(T-children(c(t)), T-children(c0 (t))). ‘‘People’’ and ‘‘person’’ are two OS equivalent concepts, one belonging to Expert B’s ontology and one to Expert C’s ontology. Let us see how all of the conditions are met. M-parents (people)={library}; M-parents (person)={library}. Therefore, EA equivalency (M-parents(people), M-parents (people))=true; T-parents(people)=T-parents(person)=1. Therefore, EA equivalency (T-parents(people), T-parents (people))=true. M-children(people)=M-children(person)=1. Therefore, EA equivalency (M-children(people), M-children (people))=true. T-children(people)={staffB, studentB}; T-children (person)={staffC, studentC} EA equivalency (T-children(people), T-children (people))=true since A equivalent(staffB, staffC) and A equivalent(studentB, studentC).
690
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
Proceeding in a similar way, any two concepts will be inconsistent from the organizational structure point of view if the three following conditions hold: The concepts are equivalent from the attribute perspective; For any pair of elements, each belonging to respective sets of mereological/taxonomic parent concepts of the concepts under question, there is no equivalence from the attribute point of view between those elements (i.e. they are disjointed). For any pair of elements, each belonging to respective sets of mereological/ taxonomic child concepts of the concepts under question, there is no equivalence from the attribute point of view between those elements (i.e. they are disjointed). Now, if we express this in a formal way, we obtain the following: Definition 6: Organizational structure-based conceptual inconsistency. Let PMHRDi(t) and PMHRDj(t) be two partial hierarchical restricted domains until the instant t, and let c(t) and c0 (t) be two concepts such that c(t)2PMHRDi(t) and c0 (t)2PMHRDj(t). c(t) and c0 (t) are said to be conceptually inconsistent from the organizational structure point of view, written OS inconsistency (c(t), c0 (t)), iff all the following conditions hold: A-equivalency(c(t), c0 (t)) and dis concept set(M-parents(c(t)), M-parents(c0 (t))) and dis concept set(T-parents(c(t)), T-parents(c0 (t))) and dis concept set(M-children(c(t)), M-children(c0 (t))) and dis concept set(T-children(c(t)), T-children(c0 (t))). To illustrate this definition, let us use the ontologies belonging to Expert A (Figure 10) and Expert D (Figure 13). The concepts ‘‘novelA’’ and ‘‘novelD’’ are OS inconsistent because they hold the conditions of the previous definition. Condition 1: ATT(novelA)={chapters}; ATT(novelD)={chapters}. Therefore, A-equivalency(novelA, novelB)=true. Condition 2: M-parents(novelA)=1; M-parents(novelD)={magazine}. Therefore, dis concept set(M-parents(novelA), M-parents(novelD))=true. Condition 3: T-parents(novelA)={book}; T-parents(novelD)=1. Therefore, dis concept set(T-parents(novelA), T-parents(novelD))=true. Condition 4: M-children(novelA)=1; M-children(novelD)=1. Therefore, dis concept set(M-children(novelA), M-children(novelD))=true. Condition 5: T-children(novelA)=1; T-children(novelD)=1. Therefore, dis concept set(T-children(novelA), T-children(novelD))=true.
INTEGRATING ONTOLOGIES
691
Based on both the organizational structure and conceptual aspects pointed out in previous definitions, we can establish that two different concepts are said to be synonymous if they are equivalent from both the attribute and the organizational structure point of view. In a formal fashion: Definition 7: Synonymous concepts Let PMHRDi(t) and PMHRDj(t) be two partial hierarchical restricted domains until the instant t, and let c(t) and c0 (t) be two concepts such that c(t)2PMHRDi(t) and c0 (t)2PMHRDj(t). c(t) and c0 (t) are said to be synonymous concepts, written syn concepts(c(t),c0 (t)), iff the following holds: A equivalency(c(t),c0 (t)) and OS equivalency(c(t), c0 (t)). If we look at Figures 11 and 12, the concepts ‘‘peopleB’’ and ‘‘personC’’ are found to be synonymous concepts since they are both A and OS equivalent. ATT(peopleB)={name, id, address, role}; ATT(personC)={name, id, address, role}. Therefore, A-equivalency (peopleB, personC)=true. On the other hand, as it was shown in Definition 5, they are both OS equivalent, so that OS equivalency(peopleB, personC)=true. On the other hand, we will say that two different concepts are incompatible if there is some attribute-based or organizational structure-based inconsistency between them. In a formal manner, the following can be defined: Definition 8: Incompatible concepts Let PMHRDi(t) and PMHRDj(t) be two partial hierarchical restricted domains until the instant t, and let c(t) and c0 (t) be two concepts such that c(t)2PMHRDi(t) and c0 (t)2PMHRDj(t). c(t) and c0 (t) are said to be incompatible concepts, written incompatible concepts(c(t),c0 (t)), iff the following holds: A inconsistency(c(t),c0 (t)) or OS inconsistency(c(t), c0 (t)). An example for this definition are the concepts ‘‘articleB’’ and ‘‘articleC’’, since they are A-inconsistent. ATT(articleB)={length, author, keywords}; ATT(articleC)={subject}. Therefore, A inconsistency(articleB, articleC)=true. That implies that incompatible concepts(articleB, articleC)=true. By looking at the two last definitions, it cannot be deduced that the negation of the syn concepts predicate implies the affirmation of the incompatible concepts one, since there are various possible states between the two predicates (although the two predicates cannot be true at the same time). At this point, it is easy to establish when two ontologies are either equivalent or inconsistent, in the following way. Two ontologies are equivalent if for all concepts within one of the ontologies, there is another concept in the other ontology equivalent from both the organizational
692
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
structure and the attribute point of view. We can also express this as follows: Definition 9: Equivalent ontologies. Let PMHRDi(t) and PMHRDj(t) be two partial hierarchical restricted domains until the instant t corresponding to the ontologies Oi(t) and Oj(t), respectively, considered at the instant t. Oi(t) and Oj(t) are said to be two equivalent ontologies, written equivalent ontologies (Oi(t), Oj(t)), iff the following holds: [for all c(t) in PMHRDi(t), exists c0 (t) in PMHRDj(t) s.t. syn concepts(c(t), c0 (t))] and [for all c(t) in PMHRDj(t), exists c0 (t) in PMHRDi(t) s.t. syn concepts(c(t), c0 (t))]. No two equivalent ontologies amongst the four ontologies used as examples in this work exist. However, if we consider parts of the different ontologies we can find equivalent sub-ontologies. Let us take a sub-ontology from experts B’s and C’s ontologies. Let us take into account only the root concept (i.e. Library) and their mereological children ‘‘people’’ and ‘‘person’’, respectively, and the respective mereological and taxonomic descendants and so on. In other words, we will consider the ‘‘person’’/’’people’’ branches of the ontologies, but not the ‘‘publication’’ branch (see Figures 14 and 15). Then, those sub-ontologies would be equivalent ontologies since for every concept in an ontology, a synonymous one exists in the other ontology. synonym concepts(libraryB, libraryC) true. synonym concepts(peopleB, personC) true. synonym concepts(staffB, staffC) true. synonym concepts(studentB, studentC) true. Two ontologies are inconsistent if there is some concept in one of the ontologies that is inconsistent with another concept in the other ontology, rendering both concepts
Figure 14. B’s sub-ontology.
693
INTEGRATING ONTOLOGIES
Figure 15. C’s sub-ontology.
inconsistent (from the organizational structure or the attribute point of view). Now, we can express this in a formal manner as follows: Definition 10: Inconsistent ontologies Let PMHRDi(t) and PMHRDj(t) be two partial hierarchical restricted domains until the instant t corresponding to the ontologies Oi(t) and Oj(t), respectively, considered at the instant t. Oi(t) and Oj(t) are said to be two inconsistent ontologies, written inconsistent ontologies (Oi(t), Oj(t)), iff the following holds: c(t) exists in PMHRDi(t), c0 (t) exists in PMHRDj(t) s.t. incompatible conceptsðcðtÞ; c0 ðtÞÞ where PMHRDi(t) and PMHRDj(t) stand for two partial hierarchical restricted domains until the instant t corresponding to the ontologies Oi ðtÞ and Oj ðtÞ, respectively. The ontologies corresponding to Figures 14 and 15 (B’s and C’s ontologies) are inconsistent since there are two incompatible concepts (one from each ontology), namely, ‘‘articleB’’ and ‘‘articleC’’. Two ontologies are compatible if and only if they are neither equivalent nor inconsistent. Formulaically speaking: Definition 11: Compatible ontologies Let Oi ðtÞ and Oj ðtÞ be two ontologies. Oi ðtÞ and Oj ðtÞ are said to be compatible ontologies iff not (equivalent ontologiesðOi ðtÞ; Oj ðtÞÞ) and not (inconsistent ontologies(Oi ðtÞ; Oj ðtÞ)). The ontologies corresponding to Figs. 13 and 14 (A’s and B’s ontologies, respectively) are two compatible ontologies, since they are neither equivalent nor inconsistent. They are not equivalent since there is no concept in B’s ontologies that is synonymous to the following concepts of A’s ontology:{book, novel, essay}. On the
694
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
other hand, they are not inconsistent because two incompatible concepts (one from each ontology) do not exist. Therefore, both ontologies are compatible. At this point, it is necessary to summarize the definitions introduced in this section. Table 4 describes the functions applicable to two concepts. Table 5 is the description of the function designed for sets of concepts whereas Table 6 presents functions for two ontologies. Now, we will define the concept of initialized ontology by considering that an ontology is initialized if the following conditions hold: topic is the content of its root, which has no mereological children, and has an attribute being number of active user nodes; every ontology corresponding to a user-node has a suffix added to its root concept, namely, the one corresponding to that user-node identifier. In a formal fashion, the following can be written: Definition 12: Integration-derived ontology Let topic be the topic about which an end-user requires (ontological) information; let number of active user nodes be an attribute of the root concept of an ontology, written Oinit(t), stating the number of user-nodes participating in an ontology integration process; and let Oi be the ontology corresponding to the ith usernode, i ¼ 1; . . . ; valðnÞ where val(n) is the value of the attribute number of active user nodes. Let candidates(t) be the set of ontologies to be integrated. The first step is to remove from candidates(t) those ontologies that produce redundancy or inconsistency. For this purpose, let compatiblei(t) be the set of ontologies Oj ðtÞ belonging to candidates(t) that are compatible with Oi ðtÞ and with the elements in compatiblei(t), i ¼ 1::valðnÞ. So, this operation calculates the sets of ontologies that can take part in the integration process together. Let subset(t) be the best set amongst all the compatiblei(t), i ¼ 1::valðnÞ.
Table 4 Functions for 2 concepts Function
Informal description
Degree of overlapping A equivalency
Calculate the degree of overlapping between two concepts Check for the equivalency of two concepts from the attribute viewpoint Check for the inconsistency of two concepts from the attribute viewpoint Check for the equivalency of two concepts from the organizational viewpoint Check for the inconsistency of two concepts from the organizational viewpoint Test whether two concepts are synonym Test whether two concepts can take part of the same integration process
A inconsistency OS equivalency OS inconsistency Synonym concepts Incompatible concepts
695
INTEGRATING ONTOLOGIES
Table 5 Function for sets of concepts Function
Informal description
EA equivalency
Check for the equivalency of two set of concepts from the attribute point of view Test whether two sets of concepts are disjoint
dis concept sets
Table 6 Functions for 2 ontologies Function
Informal description
Equivalent ontologies Inconsistent ontologies Compatible ontologies
Test whether two ontologies are equivalent Test whether two ontologies are inconsistent Test whether two ontologies can take part of the same integration process without introducing redundancy or inconsistency
Consequently, the integration-derived ontology Oinit(t) is the one that meets the following properties: (root(Oinit(t))=topic) and (ATT(root(Oinit(t)))={number of active user nodes}) and (for all i, root(Oi(t))=root(Oi(t)) user nodei) and ðM-childrenðrootðOinitðtÞÞÞ ¼ YvalðnÞ i¼1 rootðOi ðtÞÞÞ i=1,.., card(subset(t)). where root(Oi(t)) stands for the root concept of the ith ontology at the instant t, and user nodei stands for the ith user-node’s identifier. For example, let us assume that there are four users, referred to as expert A (OA(t)), expert B (OB(t)), expert C (OC(t)), and expert D (OD(t)), who are building an ontology of the Library domain. Suppose also that at a certain instant their private ontologies, referred to as O1(t) and O2(t), respectively, are defined by Figures 13–16. Let us show how the system proceeds in order to initialize the ontologies for the integration process. The first step is to build the set of ontologies that will be candidates for taking part in the integration process: candidates(t)={OA(t), OB(t), OC(t), OD(t)}. Now, we can calculate the compatible ontologies for each candidate ontology (see Table 7). compatibleOa(t)(t)={OA(t), OB(t)} because: compatible ontologies (OA(t), OB(t))=true; and compatible ontologies (OA(t), OC(t))=false; and compatible ontologies (OA(t), OD(t))=false. compatibleOb(t)(t)={OB(t), OA(t)} because: compatible ontologies (OB(t), OB(t))=true; and
696
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
Figure 16. Integration-derived ontology.
Table 7 Compatibilities among ontologies Compatibilities
OA
OB
OA OB OC OD
X X
X X
OC
OD
X X
X
compatible ontologies (OB(t), OC(t))=false; and compatible ontologies (OB(t), OD(t))=true; but compatible ontologies (OA(t), OA(t))=false, so OD(t) cannot be included into compatibleOb(t)(t). compatibleOc(t)(t)={OC(t)} because: compatible ontologies (OC(t), OA(t))=false; and compatible ontologies (OC(t), OB(t))=false; and compatible ontologies (OC(t), OD(t))=false. compatibleOd(t)(t)={OD(t), OB(t)} because: compatible ontologies (OD(t), OA(t))=false; and
INTEGRATING ONTOLOGIES
697
compatible ontologies (OD(t), OB(t))=true; and compatible ontologies (OD(t), OC(t))=false. At this point, the system must decide which set of compatible ontologies to choose to perform the integration. Let us suppose that the integration request has been made by expert A. Expert A’s ontology must be included into the chosen subset so that the candidate subsets are reduced to two: compatibleOa(t)(t) and compatibleOb(t), which contain the same ontologies so that subset={OA(t), OB(t)}. Now, the integrationderived ontology can be achieved, and the result of this process is shown in Figure 16. By looking at the above definition, it will be noted that (ontological) contributions are linked to a root concept (i.e., topic) through a mereological connection. The reason for adopting this policy is that we assume that each contribution is a piece of knowledge taking part in a set, namely, the integration-derived ontology. 3.2.2. Using integration-derived ontologies. According to the contents of previous sections, when user-nodes are building their proper ontologies, they might be interested in accessing the current integrated ontology, written Oint(t) (i.e. that which is derived from the integration of these ontologies). This can be useful when some user-node wishes to know what other user-nodes have submitted to Oint(t). On the other hand, it should be noticed that, given a topic on which an integrated ontology is to be constructed, not all the user-nodes have to know the same about all of the different aspects related to that topic. Moreover, the use of different terms to express the same concept could complicate the Oint(t) construction process. So, a synonymous term management policy can be said to be a reasonable, interesting endeavour, that has already been incorporated in other frameworks like WordNet (Miller, 1990). In the framework proposed in this article, every time a user-node sends a query to Oint(t), he or she will see a ‘virtual’ Oint(t) ontology. In this ontology, concept nodes that are found to be synonyms of some others contained in the ontology corresponding to that user-node, will be replaced by the corresponding terms used by the user-node in question. This strategy can be expressed in the following manner. Given an integration-derived ontology Oint(t) and an ontology corresponding to a user-node Oi(t), both ontologies considered at a particular instant, the instantiated, integrated ontology for that user-node will be the one holding the following conditions * *
*
Its (topological) links are the same as those of Oint(t). Every concept within it must belong either to the partial multiple hierarchical restricted domain until the instant t corresponding to the ontology Oi(t) (if there is at least one concept belonging to the partial multiple hierarchical restricted domain until the instant t, corresponding to Oint(t) that is equivalent to the concept under question from both the attribute and the organizational structure point of view) or, otherwise to that one corresponding to Oint(t). Every concept belonging to the partial multiple hierarchical restricted domain until the instant t corresponding to Oint(t) must belong either to the partial multiple hierarchical restricted domain until the instant t corresponding to the ontology Oi(t) (if there is at least one concept belonging to the instantiated integrated ontology that
698
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
is equivalent to the concept under question, from both the attribute and the organizational structure point of view) or, otherwise, it belongs to the instantiated, integrated ontology. This ontology Oi ðtÞ is the reference ontology for the instantiation process provided that the terminology of the integration-derived ontology is adapted to Oi ðtÞ’s terminology. The selection of the reference ontology is a topic which deserves further explanation. The system handles two types of users, expert and normal. The main difference is that normal users cannot build their own ontologies so that they can only request integration of a specific topic. Therefore, when an expert requests information about a topic, if he or she has built an ontology about that topic, that ontology will be used as the reference ontology. In case the expert does not have his/her own ontology on that topic, the expert will be treated as a normal user in this situation and the reference ontology will be decided according to the maximum content principle introduced further in this paper. We can also state the above as follows: Definition 13: Instantiated, integrated ontology. Let Oi ðtÞ be an ontology considered at the instant t and let Oint(t) be the integrationderived ontology at the instant t. The instantiated, integrated ontology with respect to Oi(t), written Oiint(t) or IDO(t), is defined as an ontology holding all the following conditions. (1) Its (topological) links are the same as those of Oint(t). (2) 8c0 (t)2PMHRD’(t) the following holds: 8 iff exists cint ðtÞ 2 PMHRDint ðtÞs:t: > < PMHRDi ðtÞ 0 c ðtÞ 2 syn conceptsðc0 ðtÞ; cint ðtÞÞ; ð3Þ > : PMHRDint ðtÞ otherwise: 8cint(t)2PMHRD’(t) the following holds: 8 0 > < PMHRDi ðtÞ iff exists c ðtÞ 2 PMHRD’ðtÞs:t: cint ðtÞ 2 syn conceptsðcint ðtÞ; c0 ðtÞÞ; > : PMHRD’ðtÞ otherwise: where PMHRDx(t) stands for the partial multiple hierarchical restricted domain until the instant t corresponding to the ontology Ox(t), x2{int, i}; PMHRD’(t) stands for the partial multiple hierarchical restricted domain until the instant t corresponding to the ontology Oiint(t). According to the example followed in this section, we can build the instantiated, integrated ontology with respect to O2 ðtÞ, written O2int ðtÞ, as shown in Figure 17, where the number written in superscript in O2int ðtÞ has been omitted for readability reasons. That figure represents the instantiated, integrated ontology for the integration-derived ontology that appears in Figure 16. Let us suppose that the integration request was made by expert A so that OA ðtÞ (Figure 10) is the ontology to take as the reference for
INTEGRATING ONTOLOGIES
699
Figure 17. An instantiated, integrated ontology.
instantiating the ontology. Let us illustrate now how Definition 13 is applied to the integration-derived ontology from Figure 16. That ontology is comprised of two experts’ ontologies (expert A’s and expert B’s ontologies). Every concept of each ontology must be checked in order to search for any synonymous concept in the other respective ontology, and each ontology is comprised of the following concepts: PHMRDA(t)={publication, newspaper, magazine, book, article, novel, essay, person, staff, student} PHMRDB(t)={publication, newspaper, magazine, article, people, staff, student}. The concepts belonging to PHMRDA(t) will not change its name since OA(t) is the reference ontology. Therefore, this checking step only refers to concepts included in PHMRDB(t). When checking for synonymous concepts, the algorithm can respond in two ways, affirmative and negative. If the response is negative, nothing is done because there is no synonymous concept in the reference ontology. If the response is affirmative, two possible situations can take place. Either their names are the same (in this case, nothing is done; this is the case of publication, newspaper, magazine, article, staff and student), or their names are different. Given that situation, the name of the concept that belongs to PHMRDB(t) is replaced by the name of its synonymous concept in the reference ontology. This is the case of people, whose name is changed into person. However, the name assigned by expert B to this concept is not lost, but kept as an alternative name for the concept. This information is kept in the concept ‘‘person’’ of PHMRDA(t) and PHMRDB(t). Although helping user-nodes to build their own ontologies can be thought to be a very powerful utility of an integrated ontology, its main function must be to yield
700
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
information for an end-user request (i.e. not an expert user-node) about a topic proposed by him or her. Moreover, in order to show that information in a compact way, it does not seem a good solution to provide the user with every complete ontology produced by every (expert) user-node. On the contrary, the policy adopted here has been that of showing the requesting end-user just an ontology obtained from the integration of all of those ontologies in Oint(t). However, this integration can be very complex since, in general, the same concepts (even with different attributes) can be present in different ontologies in Oint(t). To face this problem, we will follow an algorithm based on the principle that every user wishes to obtain the maximum amount of information concerning a certain topic. In particular, in building the ontology that is to be shown to the user, we will start from the (private) ontology with the maximum number of concepts. Formally, the principle to which we have alluded can be expressed as follows: Principle 2: Maximum information content. Let Og ðtÞ be a global ontology obtained as a result of integrating some ontologies until the instant t and let Oi ðtÞ be the ith ontology included in Og(t) until the instant t, i=1,2. . .., n; n=number of user-nodes whose respective ontologies are taking part of Og(t) at the instant t. Then, the information supplied to the user will have, at least, as many concepts as Om(t) has, Om(t) being an ontology in Og(t) such that n CardinalðPMHRDm ðtÞÞ ¼ maxi CardinalðPMHRDi ðtÞÞ i¼1 ; where PMHRDx(t) stands for the partial multiple hierarchical restricted domain until the instant t corresponding to the ontology Ox ðtÞ; x 2 m; i. Now, by using the definitions introduced earlier in this paper, we can write an algorithm for generating ‘‘transformed’’ ontologies. By the term transformed, we mean that these ontologies are not intended to be the precise translation of the total information contained in Og(t), but an integration of a portion of that information, namely, the knowledge [inferred from Og(t)] consistent with the assumptions put forward in Section 2. Thus, every time an end-user sends an information request to the system, he or she will be provided with an ontology. Such an ontology will have the concepts of the user-node ontology with the maximum number of concepts. In addition to this, every concept in the ontology shown to the user will have the attributes obtained from the integration-derived ontology. For this function, the contribution of all user-nodes will be taken into account. Hence, the aforementioned considerations have been taken into account in the following definitions. If the set of ontologies belonging to IDO(t) is classified in decreasing order according to their number of concepts, then the first ontology of this ordered set is termed the personalized ontology at that instant. We can write this by means of the following definition: Definition 14: Personalized ontology. Let IDO(t) be an integration-derived ontology until the instant t; and let Onto set(t) be a non-empty ordered set containing the ontologies belonging to IDO(t) and holding that Cardinal(PMHRDi(t))5Cardinal(PMHRDj(t)) iff i5j, where PMHRDk(t) stands for the partial multiple hierarchical restricted domain corresponding to the kth element
INTEGRATING ONTOLOGIES
701
of Onto set(t), written Onto setk(t) k2{i,j}, 14i, j4Cardinal(M-children(root(IDO(t))). The personalized ontology until t corresponding to Onto set(t), written PO(t), is defined as Onto set1(t). This definition can be seen as the initialization of the transformed, integrationderived ontology that is the final result of the process. PO(t) can be considered as the skeleton used for merging different experts’ ontologies, so that new concepts, attributes and relationships will be included in it in order to generate the transformation integration-derived ontology. Definition 14 explains the structure of Onto set(t) when there is no explicit reference ontology for the process so that the ontologies are ordered according to maximum information content . When the process has a reference ontology, Onto Set(t) is ordered in a different way: * *
Onto set1(t)=reference ontology. The rest of Onto set (t) is a non-empty ordered set containing the ontologies belonging to IDO(t) and holding that Cardinal(PMHRDi(t))5Cardinal(PMHRDj(t)), such that 15i, j4Cardinal(M-children(root(IDO(t))), since Onto set1(t) is the reference ontology.
By taking the previous example into account, we can state that in the case Onto set(t)={OA(t),OB(t)}, and therefore PO(t)=OA(t). It must be stated that the ontology assigned to PO(t) will always be the reference ontology where it exists. In case there is no reference ontology, the maximum information content principle is followed as is stated in Definition 14. If the integration request had been made by Expert B, then PO(t) would have been OB(t) and Onto set(t)={OB(t), OA(t)}. A concept is an updated concept if the terms given to both concepts are the same, and the specific attributes of the second are a subset of the specific attributes of the first. In a formal fashion: Definition 15: Updated concept Let PMHRDpo(t) be the partial multiple hierarchical restricted domain corresponding to a personalized ontology until the instant t; let PMHRDi(t) be a partial multiple hierarchical restricted domain until the instant t corresponding to an incoming ontology; and let c1(t) and c2(t) be two concepts belonging to PMHRDi(t) and PMHRDpo(t) respectively. Then, c2(t) is said to be an updated concept of c1(t), written updated concept(c1(t), c2(t)), if (NAME(c1(t))=NAME(c2(t))) and(SPE(c1(t)) SPE(c2(t))) where NAME(x) stands for the term given to the concept x in PMHRDj(t), x2{c1(t), c2(t)}, j2{po, i}. Let us suppose that c1(t) is the concept ‘‘staff’’ included in OA(t) whose set of specific attributes is {salary, position} and c2(t) is a concept ‘‘staff’’ whose set of specific attributes is {salary, position, degree}. Then c2(t) is an updated concept of c1(t) since the conditions mentioned above are met. Now, five new definitions are going to be introduced: personalized updated ontology (PUO), ambiguity free ontology (AFO), semantic conflict-free ontology (SCFO), transformed ontology (TO) and transformed, integration-derived ontology (TIDO). So
702
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
far, an example of the application of each definition has been presented just after the definition, but this will not happen with the following definitions. There are some dependencies among those definitions so that in order to make it easy to understand and with the purpose of using the same example across the section, the practical application of those definitions will appear after the TIDO definition. A personalized ontology PO(t) is said to be a personalized updated ontology if given a concept from an incoming ontology, there is some other concept in the former such that it is a parent (concept) of the other concept considered previously in the latter, holding that they are synonyms and that the new concept’s set of specific attributes is included in that of the concept of PO(t). (In this case, the first concept of the incoming ontology will be incorporated as a taxonomic or mereorological child concept of that concept of the personalized ontology.) Now, by expressing this in a formal way, we obtain: Definition 16: Personalised, updated ontology Let PMHRDpo(t) be the partial multiple hierarchical restricted domain corresponding to a personalized ontology until t, written PO(t); let cpo(t) be a concept belonging to PMHRDpo(t); and let cin(t) be a concept belonging to a partial candidate incoming ontology, written Oin(t). Then, the personalised, updated ontology constructed from PO ðtÞ; cpo ðtÞ; cin ðtÞ and Oin ðtÞ, written PUOðPOðtÞ; cpo ðtÞ; cin ðtÞ; Oin ðtÞÞ is defined as a modification of PO(t) s.t. T-children(cpo(t))=T-children(cpo(t))[{cin(t)} (if more knowledge(cpo(t), cin(t))= ‘taxonomic’) or M-children(cpo(t))=M-children(cpo(t))[{cin(t)} (if more knowledge(cpo(t), cin(t))= ‘mereorological’), Where more knowledge(cpo(t), cin(t))=‘taxonomic’ iff exists c0 in(t)2PMHRDin(t) s.t. [c0 in(t)2Tparents(cin(t))] and updated concept(c0 in(t), cpo(t)); more knowledge(cpo(t), cin(t))=‘mereorological’ iff exits c0 in(t)2PMHRDin(t) s.t. [c0 in(t)2M-parents(cin(t))] and updated concept(c0 in(t), cpo(t)). When searching for the correct allocation of the incoming concept cin(t) in the ontology PO(t), each concept cpo(t) must be checked because each concept potentially has a relationship with cin(t). PUO is the definition in charge of discovering the real relationships between cin(t) and the concepts in PO(t). If a relationship is found between cpo(t) and cin(t), the latter will be added as a child of the former. These relationships are established when parent concepts of cin(t) are found in PO(t). This is indicated by the function updated concept since cin(t)’s parents in PO(t) will usually have been modified when they were included into PO(t). Given a personalized ontology PO(t), it is said to be an ambiguity-free ontology given a concept cin(t) belonging to an incoming ontology if the following conditions hold: the term given to this concept incorporates [in Oin(t)] its ontology’s supplier (i.e. a usernode identifier) as a suffix; and PO(t) is an updated personalized ontology; and
INTEGRATING ONTOLOGIES
703
the set of specific attributes of cin(t) are the specific attributes of cin(t) in the partial multiple hierarchical restricted domain corresponding to PO(t). We can also state the above as follows: Definition 17: Ambiguity-free ontology Let PMHRDpo(t) be a partial multiple hierarchical restricted domain corresponding to a personalized ontology until t, written PO(t); let cin(t) be a concept belonging to a partial multiple hierarchical restricted domain corresponding to a candidate incoming ontology, written Oin(t); and let cpin(t) be cin(t) in PMHRDpo(t). Then, the ambiguity-free ontology constructed from PO(t), cin(t) and Oin(t), written AFOðPOðtÞ; cin ðtÞ; Oin ðtÞÞ, is defined as a modification of PO(t) holding the following conditions: (1) NAME(cpin(t))=NAME(cin(t)). (2) SPE(cpin(t))=SPE(cin(t)). (3) For all c(t) in PMHRDpo(t), PUO(PO(t),c(t), cin(t), Oin(t)). PUO(PO(t),c(t), cin(t), Oin(t)) stands for the personalized updated ontology constructed from PO(t), c(t), cin(t) and Oin(t), Given a personalized ontology, written PO(t), we can define it as a semantic conflictfree ontology with respect to a concept, written cin(t), belonging to an incoming ontology if one of the following conditions holds. For all concepts in PO(t), its set of specific attributes also contains the attributes of the set of attributes of cin(t). (This occurs if all these concepts are equivalent to cin(t) from both the organizational structure and the attribute point of view.) PO(t) is an ambiguity-free ontology with respect to cin(t). (This occurs if there is a concept belonging to the partial multiple hierarchical restricted domain corresponding to PO(t) such that the terms given to both concepts are the same and both concepts are not equivalent from the attribute or the organizational structure point of view.) Formulaically speaking: Definition 18: Semantic conflict-free ontology Let PMHRDpo(t) be a partial multiple hierarchical restricted domain, corresponding to a personalized ontology until t, written PO(t); let cin(t) be a concept belonging to a partial multiple hierarchical restricted domain until t, written PMHRDin(t), corresponding to a candidate incoming ontology, written Oin(t). Then, the semantic conflictfree ontology constructed from PO(t), cin(t) and Oin(t), written SCFO(PO(t), cin(t), Oin(t)), is defined as follows. A modification of PO(t) holding that for all c(t) in PMHRDpo(t), SPE(c(t))= SPE(c(t))[SPE(cin(t)) if syn concepts(c(t), cin(t))y. A modification of PO(t) holding that SPE(c(t))=SPE(c(t))[SPE(cin(t)) if exists c(t)2PMHRDpo(t) s.t. [not (syn concepts(c(t), cin(t)))] and (NAME(c(t))=NAME(cin (t))); or
y This condition would be more significant if the A-equivalency between concepts had not just been defined as a name comparison.
704
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
A modification of PO(t) holding that AFO(PO(t), cin(t), Oin(t)) if for all c0 (t)2PMHRDpo(t) s.t. [not (syn concepts(c(t), cin(t)))] where AFO(PO(t),cin(t), Oin(t)) stands for the ambiguity-free ontology constructed from PO(t), cin(t) and Oin(t). Given a personalized ontology PO(t), it is said to be a transformed ontology obtained from PO(t) and an incoming ontology Oin(t) if it is a semantic conflict-free ontology with respect to every concept belonging to the partial multiple hierarchical restricted domain until the instant t corresponding to Oin(t). This can formally be expressed as the next definition establishes. Definition 19: Transformed ontology. Let PO(t) be a personalized ontology until t and let PMHRDin(t) be the partial multiple hierarchical restricted domain until the instant t corresponding to an incoming ontology, written Oin(t). Then, the transformed ontology obtained from PO(t) and Oin(t), written TOðPOðtÞ; Oin ðtÞÞ, is defined as a modification of PO(t) holding that for all c(t) in PMHRDin(t), SCFO(PO(t), c(t),Oin(t)) where SCFO(PO(t), c(t),Oin(t)) stands for the semantic conflict-free ontology constructed from PO(t), c(t) and Oin(t)). Let us assume a personalized ontology PO(t) and a non-empty ordered set, whose cardinal is at least two, containing the ontologies belonging to an IDO(t). Then, TO(t) is said to be the transformed integration-derived ontology obtained from PO(t), if TO(t) is a transformed ontology obtained from itself and from every ontology within this set. Now, if we express this in a formal way we arrive at the following definition: Definition 20: Transformed integration-derived ontology. Let IDO(t) be an instantiated, integration-derived ontology until the instant t; let Onto set(t) be a non-empty ordered set containing the ontologies belonging to IDO(t) and holding that Cardinal(Onto set(t))52 and that Cardinal(PMHRDi(t))5Cardinal(PMHRDj(t)) iff i5j, where PMHRDk(t) stands for the partial multiple hierarchical restricted domain corresponding to the kth element of Onto set(t); and let PO(t) be a personalized ontology until t. Then, the transformed integration-derived ontology obtained from PO(t) and Onto set(t), written TIDO(PO(t), Onto set(t)), is defined as a modification of PO(t) holding that for all O(t) in Onto set(t)\PO(t)}, TO(PO(t), O(t)). Finally, we will assume that TIDO(PO(t), Onto set(t))=PO(t) if Cardinal(Onto set(t))=1, that is, given a transformed integration-derived ontology, this will remain unmodified if there is only one ontology in Onto set(t). Let us identify each element of this definition in our example. IDO(t) is the ontology shown in Figure 17, Onto set is the set of ontologies {OA(t), OB(t)} which meets the ordering condition established. PO(t) is OA(t) since OA(t) is the reference ontology. Therefore, the ontology obtained when applying TIDO(OA(t), {OA(t), OB(t)}\{OA(t)) is the ontology shown in Figure 18. This ontology is the result of applying TIDO({OB(t)}), that is, TO(PO(t), OB(t)) holds. The TO function is uniquely applied in this case to OB(t). Therefore, SCFO is applied to each component of PHMRDB(t), which is the set {publication, newspaper, magazine, article, person (people), staff, student}.
INTEGRATING ONTOLOGIES
705
Figure 18. The transformed, integration-derived ontology.
The result of each application of SCFO will be the inclusion of each concept of PHMRDB(t) in PO(t) in order to obtain the final transformed integration-derived ontology. The SCFO definition is applied when the system tries to include a concept of an ontology belonging to Onto set(t) into PO(t). The first situation occurs when it is applied to a concept which has a synonymous concept in PO(t). In our example, that would happen when applying the definition to the concepts ‘‘publication’’, ‘‘newspaper’’, ‘‘magazine’’, ‘‘article’’, ‘‘person’’, and ‘‘student’’ in PHMRDB(t). They all have synonymous concepts in PO(t), so that each concept needs only to be merged with its respective synonym. The second situation arises when the concept has the same name as the one in PO(t) but they are not synonymous. This is the case of the concepts ‘‘staffB’’ and ‘‘staffPO’’. Hence, the attributes of both concepts must be merged and the corresponding new relationships appended to PO(t). The third situation does not arise in this example and it corresponds to the occurrence of a concept that is new for PO(t), that is, it has neither a synonymous concept nor a concept with the same name in PO(t). Consequently, the AFO definition is applied in order to allocate this new concept in its adequate position in PO(t). This definition adds a new concept cpin(t) to PO(t). The name of the new concept will be the original one (NAME(cin(t)). Its specific attributes will be the same as those of the original concept. Its allocation in the ontology (given by its relationships) is decided by applying the PUO definition, which is applied once per concept in PO(t). Our example does not reflect this situation but it would have arisen if the integration request had been made by expert B. This example is shown in the next section.
4. Revisiting the example This example attempts to show the integration and transformation process from a different and more intuitive viewpoint. It is based on the example used for illustrating
706
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
Figure 19. The instantiated, integration-derived ontology for this example.
the concepts introduced in the previous section but in this case expert B is in charge of requesting integration. The compatibility sets for the ontologies to be integrated are shown in Table 7. The ontologies used for the integration are OA(t) and OB(t). The integration-derived ontology for this case is the same as that which was obtained earlier and shown in Figure 16. In the case of the instantiated, integration-derived ontology, the situation changes since the vocabulary must be adapted to that of B. Figure 19 shows the instantiated, integration-derived ontology. According to Definition 14, PO(t)=OB(t) and onto set(t)={OB(t), OA(t)} provided that OB(t) is the reference ontology in this case. Now, the following step is to apply Definition 20 to obtain the transformation-derived ontology, which we call TO (see Definition 19) applied to PO(t) and onto set(t)={OA(t)}. In this definition, the SCFO concept (Definition 18) is invoked for each concept in PHMRDA(t). PHMRDA(t)={publication, newspaper, magazine, book, article, novel, essay, person, staff, student} Let us see how the SCFO works for each concept. PO(t) is OB(t) (seeFigure 20); cðtÞ is publicationA. By applying the SCFO(PO(t), c(t),OA(t) function, the system searches in PO(t) for a synonymous concept of cðtÞ. As a result of this search, the concept ‘‘publication’’ in PO(t) is considered a synonym of cðtÞ. Thus, the first condition (see Definition 18) is triggered and PO(t) does not experience any change provided that publicationA and publicationPO have the same attributes.
INTEGRATING ONTOLOGIES
707
Figure 20. PO(t) at the beginning of the process.
For cðtÞ in {newspaperA, magazineA, articleA, personA, studentA} the process would be similar and the ontology would not be modified. At this point, PO(t) would still be OB(t). However, PO(t) is changed when the concepts {bookA, novelA, essayA and staffA} are dealt with. SCFO(PO(t), book, OA(t)): there is neither a synonymous concept of book nor a concept with the name ‘‘book’’, so the AFO definition must be invoked. AFO(PO(t), book, OA(t)): a new concept for PO(t) is generated so that its name is ‘‘book’’ and its set of specific attributes is {author, ISBN, style}. Then, the PUO definition (Definition 19) is required to allocate the concept in PO(t). The unique PUO call that modifies PO(t) is PUO(PO(t), publicationpo, bookA, OA(t)), because ‘‘publication’’ is a taxonomic parent of ‘‘book’’ in OA, and publicationpo is an updated concept of publicationA. Therefore, the new concept bookpo must be linked to publicationpo as its taxonomic children. The result of this step is shown in Figure 21. SCFO(PO(t), novel, OA(t)): there is neither a synonymous concept of novel nor a concept with the name ‘‘novel’’ so the AFO definition must again be invoked. AFO(PO(t), novel, OA(t)): a new concept for PO(t) is generated, its name being ‘‘novel’’ and its set of specific attributes being {chapters}. As before, the PUO definition is invoked and PO(t) is modified by PUO(PO(t), bookpo, novelA, OA(t)), because ‘‘novel’’ is a taxonomic child of ‘‘book’’ in OA, and bookpo is an updated concept of bookA. Therefore, the new concept novelpo must be linked to bookpo as its taxonomic child. The process would be similar for SCFO(PO(t), essay, OA(t)) so this step will not be shown. The result of these two steps is shown in Figure 22. Finally, there is one more concept to process, staffA. SCFO(PO(t), staff, OA(t)): there is a concept in PO(t) with the name ‘‘staff’’ so both concepts must be merged.
708
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
Figure 21. PO(t) after inserting the concept book.
Figure 22. PO(t) after processing novel and essay.
Before merging, SPE(staffpo)={salary, degree} SPE(staffA)={salary, position}; After merging, SPE(staffpo)={salary, degree, position}. Both concepts have the same relationships in their respective ontologies, so that new relations do not need to be added to PO(t). The transformed integration-derived ontology has finally been obtained and it is shown in Figure 23.
INTEGRATING ONTOLOGIES
709
Figure 23. The transformation, integration-derived ontology.
5. Comparison with other ontological engineering tools In Duineveld, Stoter, Weiden, Kenepa and Benjamins (2000) some ontological engineering tools have been evaluated according to different parameters such as interfaces, ontological facilities and cooperation facilities. The systems evaluated were Ontolingua (Rice, Farquhar, Piernot & Gruber, 1996; Farquhar et al., 1997), WebOnto (Domingue, 1998), Prot!eg!eWin (Eriksson, Fergeson, Shahar & Musen, 1999), OntoSaurus (ISX, 1991), and ODE (Fern!andez, Go! mez-P!erez, Pazos J. & Pazos A., 1999). The information of this section is based on the results shown in the aforementioned paper. Let us compare our framework and system facilities with those provided by the aforementioned tools. 5.1. ONTOLOGICAL FEATURES
This section compares the facilities offered by the previously named ontological tools regarding ours and with respect to the development of domain ontologies. All of the above tools include the taxonomic relationship. The multiple inheritance is supported by all of them as well although, for instance, ODE does not include any inference engine so that inheritance can be performed by itself. Most of them allow the possibility of defining exhaustive and/or disjoint decompositions whereas our system does not provide this facility. The mereological relationship is not included in these tools ‘‘due to the non-consensus on the semantics of the part-of relation’’. However, this relation is defined and employed in our framework, although a simple model of mereology has been used. The mereological relationship has been said to be composed of different
710
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
relations by authors such as Artale, Franconi, Guarino and Pazzi (1996) and FridmanNoy and Hafner (1997). These authors state that there is no a single part-of (partwhole) relation but some part-whole ones, such as Component/Object (i.e. engine/car), Member/Collection (i.e. tree/forest), Portion/Mass (i.e. slice/pie), Stuff/Object (i.e. steel/ bike), Feature/Activity (i.e. grasping/stacking) and Place/Area (i.e. city/country). Part of our ongoing work is focused on extending our ontological model to cover all of these part-whole relations.
5.2. COOPERATION FACILITIES
In this section, the possibilities offered by the aforementioned tools in relation to those presented here for constructing ontologies cooperatively are analysed. 5.2.1. Ontolingua. Cooperation is seen in a different way in Ontolingua. Several users can edit the same ontology simultaneously and the changes made by one user are immediately visible for the rest of users, whereas each user has a private ontology and cooperation is performed via the integration mechanism. With our approach, users do not need to be notified of the changes made by another user, but each user works independently of the rest so that their work is simplified. Ontolingua also provides a different user-dependent access policy. Furthermore, an ontology can be locked for editing by a user while still being available for browsing. In Ontolingua, accessibility constraints are imposed when the ontology is being modified. In our approach, these constraints have a different nature. In our system, there are different types of users, namely, experts and normal users. Normal users do not possess any ontology whereas expert users can possess ontologies on the different topics covered by the system. 5.2.2. Webonto. The cooperation approach used in WebOnto is similar to the one used in Ontolingua in that users modify the same ontology but when one user is modifying it, other users can only browse through it (in Ontolingua, users receive notification of any modification but they are not locked as in WebOnto). Our approach is in some sense more appropriate for users, since they can modify their ontologies, as soon as they have knowledge to append, modify or remove without being locked by other users. 5.2.3. Ontosaurus. Here, users also cooperate to build the same ontology so that locking is necessary, although it provides the facility of sending emails to other system users. The locking method is different from the one used by Ontolingua or WebOnto because the changes are not immediately available for the other users and they cannot browse through the ontology until the user who is modifying the ontology finishes the modifications. As a last point, we must point out that neither Prot!eg!e-Win nor ODE provide cooperative construction of ontologies.
INTEGRATING ONTOLOGIES
711
6. Conclusion and discussion In this paper, we have presented a framework that allows cooperative construction of ontologies; various (expert) agents work on the creation of an ontology from their particular contributions, which are meant to be ‘‘private’’ ontologies corresponding to each agent. Moreover, these agents can be geographically dispersed. The basic operation in this process is the integration of a set of predefined ontologies. In this sense, we must clarify that the term ‘‘ontology integration’’ must be understood in a different sense to that underlying the ontology inclusion term in the Ontolingua Server (Farquhar et al., 1997). In this framework, the term ‘‘ontology inclusion’’ is related to the reuse of ontologies in order to create other ones, so that the task of creating an ontology is less costly. The inclusion process is performed by a vocabulary translation in order to adapt the terminology used in the first ontology to that if the ontology into which it is to be included. Thus, this translation is reflected in the axioms, so that those belonging to one ontology can be included in the other one. However, in the present work, the choice of the most appropriate terminology to be included in the integrationderived ontology that a certain end-user (i.e. not an agent) will finally be shown is a principled, dynamic process. To be more precise, this choice is a function of various parameters: (1) the agent that requests integration; (2) any one ontology present in the integration-derived ontology has to be consistent with the rest of ontologies present in the integration-derived ontology; (3) no single ontology present in the integration-derived ontology can be redundant with the rest of ontologies present in the integration-derived ontology; and (4) the amount of knowledge that a certain ontology contains. The fourth parameter is evaluated by the maximum information content introduced in this paper, while the second and third are methods used to solve some of the most significant problems with cooperative work, inconsistency and redundancy, respectively. The first parameter can be split into two parts, namely, framework-side and user-side. The framework-side is only triggered when the user-side does not establish any constraint concerning the terminology to use, so that the framework applies the maximum information content principle. The user-side part is in charge of deciding which terminology to use. Nowadays, the unique facility that the system provides for this is the use of the terminology of the expert user who applied for integration as the reference terminology. Therefore, an increase in the client facilities, such as allowing the requesting user to choose the terminology to use from terminology used by other experts, should be provided. Euzenat (1996) has presented a system for collaborative construction of Consensual Knowledge Bases. Such a system is based on peer-reviewed journals: before some piece of knowledge is introduced into the Knowledge Base, it must be submitted to, and accepted by, a given community. In order to achieve this, a protocol for submitting knowledge is defined. In this protocol, a consensual response is required from the agents in the community before accepting changes in the Knowledge Base. This consensus principle guarantees the consistency of the introduced knowledge and leads collaborative dialogue among the experts. An important concern underlying this approach is that the community must use the same terminology. In our approach, this is overcome via a mechanism for synonymous concept management that allows each
712
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
agent to operate with his/her particular vocabulary. We consider that this is necessary in order to maintain linguistic efficiency during the Knowledge Acquisition phase. In Garc!ıa (1996), a system for building an ontology relative to a domain, which is divided into homogeneous subdomains (allowing several experts for each subdomain), is presented. This work is similar to Euzenat’s, in that the integration of ontologies is carried out in a hierarchical way. In Garc!ıa’s work, the problem of integration of similar concepts coming from different ontologies is treated. The detection and definition of such a type of concept relies on the carrying agent and implies that each agent knows the terminology used by the other agents, in order to establish a ‘‘bridge’’ between these concepts. The main problem of this approach is that the consensus necessary to establish such a ‘‘bridge’’ between two similar concepts increases the number of messages necessary for collaboration. On the other hand, this process implies that an agent must know the vocabulary of the other agents and examine the other ontologies. On the contrary, in the approach suggested in this paper, agents do not have to know the terminology employed by other agents, as each agent uses his or her own terminology. In this approach, the refinement of private ontologies is also possible as a result of feedback from the collaborative dialogue. Domain merging, approached via ontology merging, is the central point in Wiederhold’s (1994) work. In order to solve the problem of synonymous concepts, he proposes a similar framework to that presented in this article, in the sense that his approach maintains the original terminologies associated to the particular ontologies. However, it differs in the way that such conflicts are detected. In his work, it is assumed that terms never mean the same thing unless explicitly instructed. If it is decided that two concepts mean the same thing, a rule must explicitly indicate this matching. This set of rules should be managed by a set of collaborators, who are supposed to know the whole terminology associated to a set of possible subdomains. In the system proposed in our work, the detection of synonymous concepts is performed by the system. Another difference with our approach is that Wiederhold’s merging is only focused on terms along the common parts of domains involved in the merging process. In order to achieve this, an algebra with binary operators (union, intersection and difference) is defined. Moreover, in our framework, the refinement of private ontologies is permitted as a result of the cooperative process whereas in Wiederhold’s approach this kind of refinement is not possible, since ontologies subject to the merging process are considered a final product. Visser and his colleagues (Visser, Jones, Bench-Capon & Shave, 1998) have also presented an approach to integrate different sources of knowledge, so that ontological mismatches are (manually) solved by means of a set of functions that play the role of Wiederhold’s matching rules. In Fridman-Noy and Musen (1999), a different approach for integrating ontologies is presented. In this approach, two different processes are introduced: merging and aligning. Merging is the process by which two ontologies are merged into a single ontology; additionally, both source ontologies have to cover similar or overlapping domains. In our work, every ontology must represent the same domain in order to be eligible for integration, although the semantics of both processes are similar. Ontological alignment implies linking concepts from both source ontologies. On the other hand, the SMART algorithm (Fridman-Noy & Musen, 1999; Musen & FridmanNoy, 1999) merges concepts whose names match or are linguistically similar (for
INTEGRATING ONTOLOGIES
713
example, Military-Unit and Modern-Military Unit). However, internal and structural properties are not considered. Such authors presented a new algorithm called PROMPT in Fridman-Noy and Musen (2000), which is the evolution of SMART and it takes into account not only linguistic similarities but also the structure of the ontology in order to detect concepts suitable to be merged. In PROMPT-based systems, the user always takes the decision about what concepts are synonyms (and therefore merged) because PROMPT is designed to support the merging process, while our system performs automatic ontological integration. In this work, an element similar to our reference ontology, called preferred ontology, can also be found and it is also used as a reference for solving conflicts. In the future, we want to increase the participation of users in the integration process by providing an interactive integration process as well as facilities for defining user preferences. Inconsistencies are another interesting issue to discuss. In Tamma and Bench-Capon (2001), two types of inconsistencies are specified: semantic and structural. In this paper, structural inconsistencies are those that arise due to differences in conceptual properties. Semantic inconsistencies are those caused by the difference in semantics and the granularity level of the representation. For us, ontological semantics include structure, so that all the inconsistencies found are semantic, although they can be drawn from a structural (relations-based) point of view or from an attribute point of view. The work presented in this article is also similar to APECKS (Tennison & Shadbolt, 1998). In this approach, all agents (i.e. knowledge engineers as well as experts) are shown the different ontologies in order to compare them. However, in APECKS all of the ontologies are shown to any user, so that any user consulting a given ontology (depending on his or her familiarity with the topic being treated) may become confused. This confusion may be caused by the (eventual) complexity and inconsistency of the large number of concepts that he or she may be shown. In our approach, this only occurs automatically among expert-nodes (those who supply ontologies), while endusers (those who only wish to learn or know something about a given topic) are shown only one (system-constructed) ontology in order to avoid confusion. The system proposed by Nakata (Nakata, Voss, Juhnke & Kreifelts, 1998) presents a similar problem. In Nakata’s system, there is no mention of possible restrictions on user access to the system (or parts of it) in terms of security, so that external (nonauthorized) users could modify knowledge in a non-desirable manner for suppliers of (collaborative) knowledge. In the system proposed here, this problem is overcome, since end-users can only consult, but not modify, the integration-derived ontology and usernodes can only see/modify their own ontologies. Another significant characteristic of the approach suggested here is that, through cooperative dialogue provided by the aforementioned framework, agents can consult the current contents of a cooperatively constructed ontology at a given instant in order to collect as much information as possible when building their respective ontologies. The synonymous concept management adapts the information required by the agent under question to his or her own terminology. In other words, each agent has his or her particular view of the time-dependent evolution of the integration-derived ontology. Our notion of candidate ontologies is different to that presented in Pinto and Martins (2001). In this approach, candidate ontologies are ontologies not yet taking part in the
714
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
integration process whereas in our approach, candidate ontologies are ontologies that take part in such a process. However, in both cases some candidate ontologies might be discarded, and would therefore not take part in the activity of constructing the resulting ontology, although both approaches have different criteria for discarding candidates. In Shaw and Gaines (1989), conceptual systems are compared. The integration and transformation process is partly a comparison process but both approaches are significantly different. In the previously cited approach, conceptual systems are built and agreed upon by experts taking part in the process. Then, concepts have attributes assigned to them, following which each attribute is individually valued by each expert. After that, the experts exchange and compare their results. As a result of this comparison, the concepts are grouped into four categories, namely, consensus, conflict, correspondence and contrast. Then, the conflicts must be solved or flagged as irreparable. This is a manual process and requires active participation by the experts. On the other hand, our work makes the comparison of conceptual systems (ontologies in our case) automatic. In our framework, the four relationships can be distinguished. Consensus is included into our framework as the equivalence of concepts; irreparable conflicts are the source of inconsistencies in our framework; correspondence is the case of synonymy; and finally, contrast appears when experts define independent parts of the same domain. For us, the unique expert task is to create knowledge, and the system is in charge of comparing this knowledge to other experts’ knowledge. The way in which concepts are categorized also differs. In Shaw and Gaines (1989) categorization is based on the quantitative result of the comparison process whereas our decisions are made on the basis of qualitative factors, such as the existence of common attributes or common relationships. The framework described in this paper could also be used in the cooperative workbased development of corporate memories, as defined in Dieng, Corby, Giboin and Ribiere (1998). To accomplish this task, each corporation department can be viewed as a particular supplier of the ontology concerning information produced in that department. Thus, the framework proposed permits the fusion of different ontological models. Moreover, the use of the ontology view concept allows the vocabulary employed in a global corporate memory to adapt to that which is utilized in a particular department, without imposing any predetermined common vocabulary. Regarding future work, it may be observed that the approach presented in this work presents a limitation concerned with attribute management problems. For example, (ontology) problems, involving synonymous attributes when an ontology is generated from the integration of other ontologies in the stated manner, should be subject to treatment. Another possible research line related to attribute management is the possibility that attributes could be defined through structures, that is (name-ofattribute, value)-style pairs. We think that this would enrich the performance of the system since a greater variety of (real) problems could be contemplated. This would also permit the establishment of numerical similarity measurements between attributes and/ or concepts, in line with those adopted in Shaw and Gaines (1989). By dealing with the definition of organizational structure-based inconsistency between two concepts, it could be said that we have given a conservative definition. The reason for affirming this can be explained by looking at the organizational (sub)structures in Figure 24.
715
INTEGRATING ONTOLOGIES
Figure 24. Two organizational structures.
It could be thought that the identifier C is defined in a different way in each (mereological) ontology, but with C alluding to the same concept. The problem is how to detect such a situation by using the name equality-based criterion to decide on the (attribute-based) equivalence between two concepts. In addition to this, we may find a real context where C represents a polysemic concept so that C is defined with a different meaning in each ontology (concerning the same topic) although it is the same word. In this case, it is obvious that they are different concepts and that the ontologies are inconsistent; the system should therefore not allow for the integration of these concepts. The lack of an ontology editor is another system constraint because all of the modifications that users wish to make must be performed outside the system. This implies that information between user and system is via-file, besides the graphical visualization of the ontologies. However, an ontological editor prototype that will simplify editing is already available (though not yet fully tested). An alternative solution would be to make use of the ontology construction facilities that already exist. We also plan to extend the approach in order to contemplate more (real) situations than the present ones for mereological organizations in the current system. Thus, transitivity should be allowed for some domains. For instance, the system should infer that if ‘‘a finger is a part of a hand’’ and ‘‘a hand is a part of an arm’’, then ‘a finger is a part of an arm’. Also, a fusion of other mereological practical theories (see, for example, Borst, 1997) with that which is put forward in this work should be subject to study, in order to augment system performance. Another plausible improvement could be the division of the mereology used here into different types of part-of relations (Artale et al., 1996; Fridman-Noy & Hafner, 1997). Furthermore, the inclusion of new types of relationships such as topology, temporal precedence, causality, similarity, dependence, etc., is currently under scrutiny, since we anticipate that these will enhance our ontological model and the integration framework, in addition to producing a more realistic system. On the other hand, another interesting topic is the integration of the system with existing ontology servers. Nowadays the system only handles its own format; the intention is to extend it to manage different ontological specification language that would make the integration of ontologies from different ontology servers, such as the Ontolingua server, possible. This option is, however, still unavailable. As it is stated in Klein (2001), several approaches seek to overcome mismatches at different levels (i.e. ontological, linguistic, terminological and so on) although there are other important problems such as ontology versioning. An extension for managing ontology versioning would also be an interesting feature to add to the system. Finally, we agree with the authors in Guarino and Welty (2000) on the need for cleaner
716
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
taxonomies. Our taxonomies are organized according to the notion of specializations. However, we believe that this is not enough to achieve clean taxonomies; hence we are planning to add formalisms like those suggested in Guarino and Welty (2000) in order to accomplish such a task. This paper has been possible thanks to the financial support of the S!eneca Foundation, Coordination Centre for Research, through the S!eneca Program (FPI). Thanks go to Richard Barbosa at the University of Murcia for reading over the final draft of this article and for offering some useful suggestions. The research has been performed under projects FIT-150200-2001-320, FIT-070000-2001-785, and Seneca PL/3/FS/00.
References Abecker, A., Aitken, S., Schmalhofer, F. & Tschaitschain B. (1998). KARATEKIT: tools for the Knowledge-Creating Company. Proceedings of the 11th Banff Knowledge Acquisition for Knowledge Based Systems Workshop, Vol. 2, pp. KM-1.1-KM-1.18. Banff, Canada. Artale, A., Franconi, A., Guarino, N. & Pazzi, L. (1996). Part-whole relations in objectcentered systems: an overview. Data and Knowledge Engineering. Amsterdam: North Holland, Elsevier (October, 2). Benjamins, V. R. & Fensel, D. (1998). Community is knowledge! in (KA)2. Proceedings of the 11th Banff Knowledge Acquisition for Knowledge Based Systems Workshop, Vol. 2, pp. KM2.1-KM-2.18. Banff, Canada. Borst, W. N. (1997). Construction of engineering ontologies for knowledge sharing and reuse. Ph.D. Thesis, University of Twente, Enschede, The Netherlands. Crow, L. & Shadbolt, N. (2001). Extracting focused knowledge from the semantic web. International Journal of Human-Computer Studies, 54, 155–184 Dieng, R., Corby, O., Giboin, A. & Ribiere, M. (1998). Methods and Tools for Corporate Knowledge Management. Proceedings of the 11th Banff Knowledge Acquisition for Knowledge Based Systems Workshop, Vol. 2, KM-3.1-KM-3.20. Banff, Canada. Domingue, J. (1998). Tadzebao and WebOnto: discussing, browsing, and editing ontologies on the web. Proceedings of the 11th Workshop on Knowledge Acquisition, Modelling and Management, KAW’98. Banff, Canada. Duineveld, A. J., Stoter, R., Weiden, M. R., Kenepa, B. & Benjamins, V. R. (2000). WonderTools? A comparative study of ontological engineering tools. International Journal of Human-Computer Studies 52, 1111–1133. Eriksson, H., Fergeson, R., Shahar, Y. & Musen, M.A. (1999). Automatics generation of ontology editors. Proceedings of 12th Banff Workshop on Knowledge Acquisition, Modelling and Management:4.4. Banff, Canada. Eschenbach, C. & Heydrich, W. (1995). Classical mereology and restricted domains. International Journal of Human-Computer Studies 43, 723–740. Euzenat, J. (1996). Corporate memory through cooperative creation of knowledge based systems and hyper-Documents. In Proceedings of the 10th Workshop on Knowledge Acquisition, Modelling and Management, pp. 36.1–36.20. Banff, Canada. Farquhar, A., Fikes, R. & Rice, J. (1997). The Ontolingua Server: a tool for collaborative ontology construction. International Journal of Human-Computer Studies, 46, 707–727. Fensel, D., Decker, S., Erdmann, M. & Studer, M. (1998). Ontotobroker: or how to enable intelligent access to the WWW. Proceedings of the 11th Banff Knowledge Acquisition for Knowledge Based Systems Workshop, Vol. 2, pp. SHARE-8.1-SHARE-8.18. Banff, Canada. FernaŁ ndez, M., GoŁmez-PeŁ rez, A., Pazos, J. & Pazos, A. (1999). Building a chemical ontology using methontology and the ontology desing environment. IEEE Intelligent Systems, 41, 37–46.
INTEGRATING ONTOLOGIES
717
FernaŁ ndez-Breis, J. T. & MartiŁ nez-BeŁ jar, R. (2000a). A cooperative tool for facilitating Knowledge Management. Expert Systems with Applications, 18, 315–330. FernaŁ ndez-Breis, J. T. & Mart!ınez-BeŁ jar, R. (2000b). A web-based framework for integrating knowledge. In: Industrial Knowledge Management: A Micro-level Approach, 123–138. Fridman-Noy, N. & Hafner, C. D. (1997). The state of the art in ontology design. A survey and comparative review. AI Magazine, Fall, 18, 53–97. Fridman-Noy, N. & Musen, M. A. (1999). An algorithm for merging and aligning ontologies: automation and tool support. 16th National Conference on Artificial Intelligence, Workshop on Ontology Management, Orlando, FL. Fridman-Noy, N. & Musen, M. A. (2000). PROMPT: algorithm and tool for automated ontology merging and alignment. 17th National Conference on Artificial Intelligence, Austin, Texas. Garcia, C. (1996). Cooperative building of an ontology within multi-expertise framework. Proceedings of COOP’96, pp. 435–454. Juan-Les-Pains, France. Guarino, N. & Welty, C. (2000). A Formal Ontology of Properties. R. Dieng & O. Corby, Eds. Knowledge Engineering and Knowledge Management: Methods, Models and Tools, Lecture Notes in Artificial Intelligence 1937, pp. 97–113. Berlin: Springer-Verlag. Hameed, A., Sleeman, D. & Preece, A. (2001). Detecting mismatches among experts’ ontologies acquired through knowledge elicitation. Proceedings of 21st SGES International Conference on Knowledge Based Systems and Applied Artificial Intelligence. Cambridge, UK. Klein, M. (2001). Combining and relating ontologies: an analysis of problems and solutions. Proceedings of International Joint Conference on Artificial Intelligence. Seattle, Washington, USA. ISX Corporation (1991). LOOM Users Guide, version 1.4. MartiŁnez-BeŁ jar, R., Benjamins, V. R. & MartiŁ n-Rubio, F. (1997). Designing Operators for Constructing Domain Knowledge Ontologies. In E. Plaza & R. Benjamins (Eds.), Knowledge Acquisition, Modelling and Management, Lecture Notes in Artificial Intelligence, Germany, Springer-Verlag, pp. 159–173. McGuiness, D. L., Fikes, R., Rice, J. & Wilder, S. (2000). An environment for merging and testing large ontologies. In A. Cohn, F. Giunchiglia, & B. Selman, Eds. KR2000: Principles of Knowledge Representation and Reasoning, San Francisco, USA, pp. 483–493. Miller, G. A. (1990). WORDNET: an Online Lexical Database. International Journal of Lexicography, 3–4, 235–312. Musen, M. A. & Fridman-Noy, N. (1999). SMART: automated support for ontology merging and alignment. Proceedings of the 12th Banff Workshop on Knowledge Acquisition, Modelling, and Management. Banff, Alberta, Canada. Nakata, K., Voss, A., Juhnke, M. & Kreifelts, T. (1998). Knowledge extraction, integration and management. Proceedings of the Second International Conference on Practical Aspects of Knowledge Management (PARM’98). pp. 20–1, 20–11. Basel, Switzerland, 29–30 October. Pinto, H. S. & Martins, J. P. (2001). Ontology integration: how to perform the process. Proceedings of International Joint Conference on Artificial Intelligence. Seattle, Washington, USA. Reimer, U. (1998). Knowledge integration for building organizational memories. Proceedings of the 11th Banff Knowledge Acquisition for Knowledge Based Systems Workshop, Vol. 2, pp. KM-6.1-KM-6.20, Banff Canada. Rice, J., Farquhar, A., Piernot, P. & Gruber, T. (1996). Lessons learned using the web as an application interface. Proceedings of the CHI’96 Conference on Human Factors in Computing Systems, pp. 103–110. ACM, Vancouver, BC, Canada. Shadbolt, N. R., O’Hara, K. & Crow, L. (1999). The experimental evaluation of knowledge acquisition techniques and methods: history, methods, and new directions. International Journal of Human-Computer Studies, 51, 729–755.
718
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
Shaw, M. L. G. & Gaines, B. R. (1989). A methodology for recognizing conflict, correspondence, consensus and contrast in a knowledge acquisition system. Knowledge Acquisition, 1, 341–363. Simons, P. (1987). Parts: A Study in Ontology. Oxford: Clarendon Press. Tamma, V. A. M. & Bench-Capon, T. J. M. (2001). A knowledge model to support inconsistency management when reasoning with shared knowledge. Proceedings of International Joint Conference on Artificial Intelligence. Seattle, Washington, USA. Tennison, J. & Shadboldt, N. (1998). APECKS: a tool to support living ontologies. Proceedings of the 11th Workshop on Knowledge Acquisition, Modelling and Management. Banff, Canada. Van Heijst, G., Schreiber, A. T. & Wielinga, B. J. (1997). Using explicit ontologies in KBS development, International Journal of Human-Computer Studies 45, 183–292. Visser, P. R. S., Jones, D. M., Bench-Capon, T. J. M. & Shave, M. J. R. (1998). Assessing heterogeneity by classifying ontology mismatches, formal ontology in information systems. N. Guarino, Ed. Proceedings FOIS’98, Trento, Italy, pp. 148–162. Amsterdam, The Netherlands: IOS Press. Wiederhold, G. (1994). Interoperation, mediation and ontologies. Proceedings of FGCS’94 Workshop on Heterogeneous Cooperative Knowledge-Bases, pp. 33–48. Tokyo, Japan, December 1994.
Appendix Ontology integration (O-int) algorithm Input: Oinit(t), Oin(t) Output: : IDO|Compatibles max(t)|+1(t) IDO1(t)=Initialization Ontology(Oinit(t), Oin(t)) Candidates(t)={{Oi(t), i=1,2,. . ., n}[{Oj(t), i=1,2,. . ., m}, Oi2Oin(t), Oj2Oinit(t) s.t. Oj 2= Candidates(t) if exists Oi s.t. NAME(Oi)=NAME(Oj)}. Compatiblesi(t)={Oi(t)}[{Ok s.t. compatible ontologies(Oi,Ok)), i=1,2,. . .,| Candidates(t)|, Ok2Candidates(t)} max=Select (Compatibles(t)) IDOi+1(t)={UO(IDOi(t),Ol(t)), Ol(t)2Compatiblesmax(t), i=1,2,. . .,| Compatiblesmax(t)|} Initialization Ontology algorithm Inputs: Oinit(t), Oin(t) Output: Oinit(t) root(Oinit(t))=topic M-children(root(Oinit(t)))=1 ATT(root(Oinit(t)))={number of active user nodes } root(Oi(t))=root(Oi(t)) user nodei, i=1,2,. . .,|Oin(t)| Select function This is a heuristic function in charge of selecting the most promising group of compatible ontologies based on a specific criterion, such as selecting the group with the highest number of ontologies, or the group with the highest number of concepts and so on.
719
INTEGRATING ONTOLOGIES
Transformed integration-derived ontology algorithm Input: OntoSet(t) Output: TIDO|OntoSet(t)| PO1(t)=OntoSet1(t) TIDO1(t)=PO1(t) TIDOi+1(t)=TO (POi(t), Oi(t)), i=1,2,3,. . .,|OntoSet(t)| Transformed ontology algorithm Inputs: PO(t), O(t) Output: PO|O(t)|(t) POi+1(t)=SCFO( POi(t), ci(t), O(t)), i=1,2,. . ., |O(t)|, ci(t)2O(t) Overload operator Syntax: A(t)B(t) where A stands for an ontology and B for a set of concepts Semantic: It generates a new ontology CðtÞ For all cðtÞ 2 CðtÞ: cðtÞ 2 BðtÞ if exists c0 ðtÞ 2 AðtÞ such that NAME(cðtÞ)=NAME(c0 ðtÞ) cðtÞ 2 AðtÞ otherwise Semantic conflict-free ontology algorithm Inputs: PO(t), cin(t), Oin(t) Output: PO0 (t) PO0 (t)=PO(t) { {ci1 ðtÞ s.t. exists ci2 ðtÞ2PMHRDpo(t) s.t. A equivalency(ci2 ðtÞ; cin ðtÞ)^ T equivalency(ci2 ðtÞ, cin ðtÞ) } (condition 1)
(NAME(ci1 ðtÞ)=NAME(ci2 ðtÞ))^
XOR {cj1 ðtÞ s.t. exists cj2 ðtÞ2PMHRDpo(t) s.t. (NAME(cj1 ðtÞ)=NAME(cj2 ðtÞ))^ not(A equivalency(cj2 ðtÞ; cin ðtÞ)^ T equivalency(cj2 ðtÞ; cin ðtÞÞ)^NAME(cin(t))=NAME(cj2 ðtÞ)} (condition 2) XOR {AFO( PO(t), cin(t), Oin(t)) if for all c(t)2PMHRDpo(t) , :(A equivalency(c(t), cin(t)) ^ T equivalency(cðtÞ; cin ðtÞÞ)^(NAME(cin ðtÞ)aNAME(cðtÞ))} } if condition 1 then ci2 ðtÞ ¼ ci1 ðtÞ SPE(ci1 ðtÞ=SPE(ci2 ðtÞ)[SPE(cin ðtÞ) if condition 2 then cj2 ðtÞ ¼ cj1 ðtÞ SPEðcj1 ðtÞ=SPEðcj2 ðtÞÞ[SPE(cin ðtÞ) Ambiguity free ontology algorithm Inputs: PO(t), cin(t), Oin(t)
720
J. T. FERNA´NDEZ-BREIS AND R. MARTI´NEZ-BE´JAR
Output: PO0 (t) PO0 (t)=PO(t) {cin(t) s.t. NAME(cin(t))=NAME(cin(t)) UNi}[{PUO(PO(t), cðtÞ; cin ðtÞ; Oin ðtÞ) for all cðtÞ2PMHRDpo(t)}[{cpin(t) 2PO(t) s.t. SPE(cpin(t))=SPE(cin(t)))} Updated personalized ontology algorithm Inputs: PO(t), cpo(t), cin(t), Oin(t) Output: PO0 (t) PO0 (t)=PO(t){c0 po(t)} Where c0 po(t)=cpo(t) T-children(c0 po(t))=T-children(cpo(t))[{cin(t)} if more knowledge(cpo(t),cin(t))= ‘taxonomic’ M-children(c0 po(t))=M-children(cpo(t))[{ cin(t)} if more knowledge(cpo(t),cin(t))= ‘mereorological’