Available online at www.sciencedirect.com Available online at www.sciencedirect.com
Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2018) 000–000 Procedia Computer Science (2018) 000–000 Procedia Computer Science 12700 (2018) 426–435
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
The First International Conference On Intelligent Computing in Data Sciences The First International Conference On Intelligent Computing in Data Sciences
Towards Towards aa Novel Novel and and Generic Generic Approach Approach for for OWL OWL Ontology Ontology Weighting Weighting Hasna Abioui∗∗, Ali Idarrou, Ali Bouzit, Driss Mammass Hasna Abioui , Ali Idarrou, Ali Bouzit, Driss Mammass Laboratory IRF-SIC, University IbnZohr, Agadir,Morocco Laboratory IRF-SIC, University IbnZohr, Agadir,Morocco
Abstract Abstract Semantic search is qualified – by web-related enterprises as well as, academic research – as a key technology, ensuring important Semantic search qualified – by data web-related enterprises research – as a key technology, ensuringontologies important improvements in is terms of shared understanding, whileasitwell leadsas,toacademic refined and targeted interpretations. Accordingly, improvements in terms of shared data understanding, while it leads to refined and targeted interpretations. Accordingly, ontologies are the focal asset for a well-functioning semantic search approach, since their ability to share, represent and reuse explicit and are the focal assetspecification. for a well-functioning semantic searchofapproach, their ability share, represent and reuse explicit and semantic domain Nowadays, a multitude ontologiessince containing up totohundreds of thousands of concepts are semantic specification. multitude of ontologiesorcontaining up to hundreds thousands ofthe concepts are proposed. domain Thus, our challenge asNowadays, researchersaexceeds conceptualizing creating ontologies to beingofable to choose fitting and proposed. Thus, our challenge as researchers exceeds conceptualizing or creating ontologies to as being able to choose fitting and suitable one, taking into account specific criteria. This paper comes within the same context it presents a novelthe approach for suitable one, taking into account specific criteria. This appropriate paper comesone within context ontologies. as it presents novel approach for weighting OWL ontologies, in order to choose the most from the a setsame of proposed Oura approach takes into weighting in order to choose appropriate one from of proposed ontologies. Our approach takes into account notOWL onlyontologies, the taxonomic structure, but the alsomost the semantic aspect of thea set ontology. Furthermore, semantic relationships and account not only are the the taxonomic alsothe thesemantic semanticrichness aspect of ontology. Furthermore, semantic relationships and specific concepts favored structure, since theybut reflect of the the ontology. specific concepts are the favored since they reflect the semantic richness of the ontology. c 2018 2018 The The Authors. Authors. Published Published by by Elsevier Elsevier B.V. B.V. © c 2018 Authors. by the Elsevier B.V. This is anThe open access Published article under CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). Selection and This is an open access article under the CC BY-NC-ND (http://creativecommons.org/licenses/by-nc-nd/3.0/). Selection and under responsibility of Neural Network Society peer-review of International International Neurallicense Network Society Morocco Morocco Regional Regional Chapter.. Chapter. peer-review under responsibility of International Neural Network Society Morocco Regional Chapter.. Keywords: Ontologies; Ontology Weighting; Taxonomic Structure; Semantic Relationships Keywords: Ontologies; Ontology Weighting; Taxonomic Structure; Semantic Relationships
1. Introduction 1. Introduction Being inspired from ontology – the science referring to a fundamental metaphysic branch in philosophy, studying inspired fromofontology – the science referring to aappeared fundamental branch in philosophy, studying theBeing general properties what exists –, ontology the object, in themetaphysic early 1990s, and adopted by the computer the general properties of what exists –, ontology the object, appeared in the early 1990s, and adopted by the computer science community. Since then and, over the years, the notion of ontology becomes the focal point and interest of science community. Since then and, overinthe years,fields, the notion becomes thea focal and interest of many computer science research works several due toofitsontology unavoidable role as meanspoint to share, represent many computer science research works in several fields, due to its unavoidable role as a means to share, represent and reuse explicit and semantic domain specification. It's qualified as a key technology for the success of not only and reuse communities explicit and semantic It's qualified as a key for the success of not only historical that saw domain it born, specification. namely, knowledge engineering andtechnology artificial intelligence, but also various historical communities that saw it born, namely, knowledge engineering and artificial intelligence, but also various recent applications, essentially, natural language processing, information retrieval and semantic web, where the use of recent applications, natural language processing, retrieval and semantic web, where the use of ontologies becomes essentially, a very common practice. During the past information few years, the conception and development of ontologies ontologies becomes a very common practice. During the past few years, the conception and development of ontologies ∗ ∗
Corresponding author. Tel.: +212-670-581-533. Corresponding Tel.: +212-670-581-533. E-mail address:author.
[email protected] E-mail address:
[email protected] c 2018 The Authors. Published by Elsevier B.V. 1877-0509 c 2018 1877-0509 The article Authors. Published Elsevier B.V. This is an open access under the CCbyBY-NC-ND 1877-0509 © 2018 The Authors. Published by Elsevierlicense B.V. (http://creativecommons.org/licenses/by-nc-nd/3.0/). Selection and peer-review This isresponsibility an open access article underNeural the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). Selection and peer-review under of International Society Morocco Regional Chapter.. This is an open access article under the Network CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). Selection and under responsibility of International Neural Network Society Morocco Regional Chapter.. peer-review under responsibility of International Neural Network Society Morocco Regional Chapter. 10.1016/j.procs.2018.01.140
2
Hasna Abioui et al. / Procedia Computer Science 127 (2018) 426–435 H.Abioui et al. / Procedia Computer Science 00 (2018) 000–000
427
are no longer limited to artificial intelligence laboratories, as long as, domain experts may simply do such tasks, since a variety of editors dedicated to create and edit ontologies exist, like Prot´eg´e 1 , Swoop [1], OntoEdit [2], etc. Hence, a multitude of either domain ontologies or generic ontologies containing up to hundreds of thousands of concepts are proposed. For instance, biomedical community produced large standard and structured vocabularies, offering a common biomedical terminology which can be shared and reused across various fields. We name some of the most powerful ontologies in clinical and biomedical domains, such as, the SNOMED-CT project [3], which is the result of merging two other ontologies: SNOMED-RT and Clinical Terms, also, GALEN[4], the European project offering the terminology reusability in clinical systems, then, Gene Ontology GO [5], representing a biological terminology which gathers all biological processes, concerning cells, molecules and their components, etc. Thus, either creating or having an ontology regardless of the purpose of its use or the application in which it will take part, is no longer a challenge as well as having the fitting and suitable one for that purpose. Especially, when it comes to choosing between more than one ontology for a specific domain. To this point, finding the most consistent, coherent and rich one becomes the challenge. Against this background, some initiatives stressed the importance of carefully choosing the most suitable ontology among others for a specific application, by setting indicators and criteria that must be taken into account when selecting an ontology. In [6], authors believe that choosing efficiently the appropriate ontology is relative to the compliance with semantic web standards, the relevance of metadata and the quality of the vocabulary describing contents. While [7], [8] and [9] propose ontology evaluation and ranking tools, according to either a set of terms representing the semantic content or a set of metrics describing various aspects and characteristics of ontologies. As can be noted, there are no established criteria to adopt when choosing an ontology, it may depend on the objectives behind the use of the ontology or also the richness of this latter translated by the number of concepts, relations between them, axioms, structures and levels of granularity. In front of all these points we stated, facilitating the ontology selection phase and make it more targeted, in addition to the huge number of ontologies existing for different fields, the need to find the best method for selecting an ontology remains a priority. In this paper, we propose a new strategy aimed to evaluate the importance of an ontology compared to the others from the same application fields. The approach is based on a novel OWL ontology weighting method that we will present too and which exploit all provided information by the ontology, namely, structural and semantic data. The remainder of the paper is organized as follow. The second section sheds light on the general background of the research work and relative works; the third section defines the approach objectives, based on the limits of previous work, then describes the ontology weighting approach with pre-processing and implementation details. Finally, the paper ends with the description of the performed experiments. 2. Background and Related Work This section provides background information on both main focal points of our research, namely, the concept of ontology, the ontology weighting and an overview of related works. 2.1. Ontology During the last years, the use of ontologies becomes more prevalent and commonplace – notably, in heterogeneous environment – where it helps to manage and solve heterogeneity problems in case of concept misperception [10] on account, generally, of different data structures and semantic misinterpretation. Thus, ontologies are described as being the practical uniform conceptual model, providing sets of knowledge presented as structured and representative data models of a specific domain. They allow communication, interchanging and interoperability among diversified data systems, offer characteristics and features that promote the reuse and the sharing of data and also, facilitate information retrieval by integrating semantics. All this in order to disambiguate meanings of concepts into resources through using classes, instances of classes, relations between classes, axioms and finally functions. By digging into the literature, various definitions of the term ontology were proposed by scientific communities. [11] links this variety to the multiple inheritances of which the ontology disposes of, that are summarized in four inheritance types: (1) The philosophic inheritance, treating the study of being and its properties. (2) The logical inheritance, 1 Prot´ eg´e
is available at https://protege.stanford.edu/
Hasna Abioui et al. / Procedia Computer Science 127 (2018) 426–435 H.Abioui et al. / Procedia Computer Science 00 (2018) 000–000
428
3
which presents the ontology as a set of concepts linked based on taxonomic relationships. (3) The terminological inheritance shedding light on the difference between the term and the concept. (4) And finally, the last type based on the document hierarchical structure. The basic and more popular definition adopted by researchers and used in literature is the one proposed by Gruber in [12], then updated by [13], to define ontology as a formal and explicit specification of a shared conceptualization. Through this definition, we emphasize the three main keywords designating and describing the ontology: the formalization of the conceptualization in order to avoid all ambiguities, therefore the ontology format should be machine readable and based on natural language. the normalization that gives more structure and meaning to the resource data, and finally, the notion of sharing, aiming to interchanging knowledge by making it accepted by a group forming a community not only limited in some individuals. Regarding the ontology structure, as we stated before, it is presented as sets of a domain knowledge, organized in the form of concepts or generic classes linked through specialization and semantic relations. Hence, the inheritance notion exists by moving from the generic concepts to the specified ones, as for saving concepts properties. An ontology structure is formally defined by [14] and [15], as a set of five elements O = {C, R, H c , RS , AX} ; where C and R, represent respectively concepts and relations, H c is the taxonomy representing concepts, while RS defines semantic relations, and then, AX which represents the set of axioms based on a description logic language (DL). Based on those five elements, the ontology remains the complete formal knowledge model, that can be a part of any information retrieval system and taking a place in different phases of it; essentially, (1) when browsing a data collection, in order to get a relevant navigation, as long as, the semantic aspect will be treated by analyzing concepts and semantic relationships between them. (2) During the indexation process, the use of ontologies is required to promote the quality indexation, since it aims to classify the index objects by using taxonomic relationships that gives ontology. (3) At the time of semantic annotation, with the aim to describe and further enrich the document content by using formal structures and metadata defining the document knowledge domain; unlike the indexation process where only taxonomy is taken into account, for semantic annotation all types of relationships are considered, with the advantage for semantic ones. (4) when analyzing requests in a way to semantically interpret them, then translate them by adopting the appropriate language and conceptual model for resources. Three types of ontologies are distinguished: upper-level ontologies containing general concepts that may concern various application domains, such as, BFO (Basic Formal Ontology) [16], DBpedia [17] and YAMATO [18] ontologies. And domain ontologies that are specific and limited to one domain of knowledge, such as biomedical ontologies, music ontologies, etc. Finally, the third type, which is more particular and limited to a specific application field within a domain, and these are called application ontologies. EFO (Experimental Factor Ontology)2 is one of application ontologies limited to biomedical field. 2.2. Ontology weighting Ontology weighting consists of assigning weight to the ontology through weighting its components. In fact, it can be considered as a relevant factor to evaluate the quality and consistency of the ontology. Furthermore, in order to compute concepts similarities generally – either those belonging to the ontology or the document –, the adoption of weighted concepts-based similarity approaches is an essential task. Various methods aiming to weight ontologies or their concepts are proposed. According to our research context, that concerns the ontology weighting, with regard to compare and rank ontologies based on weighting concepts, two strategies are retained: the first one, which has a probabilistic aspect, starting from the premise that, in a specialization hierarchy, a generic concept includes specific one, therefore, this latter has a lower likelihood compared to the generic concept, which explains the decrease of the concept weights when exploring the specialization hierarchy from the root which is the most generic concept towards the leaves that represent the most specific concepts. We note that the weighting based on the probabilistic model favors generic parents, instead of specific ones which normally, have a finer granularity. In doing so, an important part of the semantic informational content will be ignored, since only generic concepts are kept, while choosing specific and finer granularity concepts should provide a precise and target similarities or descriptions, relatively to the aim of comparison. It is in that context that the second and opposite strategy came. The strategy favoring specific concepts beside generic ones and depends on the hierarchy depth and paths separating concepts in order to compare 2 Experimental
Factor Ontology, available at https://www.ebi.ac.uk/efo/
4
Hasna Abioui et al. / Procedia Computer Science 127 (2018) 426–435 H.Abioui et al. / Procedia Computer Science 00 (2018) 000–000
429
them. [19] adopts the first strategy, by calculating a probabilistic concept weight for a child concept, based on the weight of its parent concept and the total number of its children. Then, based on that probabilistic weight, it proposes a Bayesian-based approach to weight specific domain ontologies. [23] and [20] adopt the second strategy, so they favor specific concepts – which refer in a graph to leaves – by affecting coefficients that are incremented downward along the specialization hierarchy. So, when comparing two concepts from the same hierarchical level, always the one being the leaf, has a bigger coefficient versus the other having descendants, while it expresses less information. Our work presented in this paper is inspired from [23] , with the difference that this latter compare ontologies by weighting just their concepts. Concerning our method, it takes into account additionally to weighted concepts, the ontology relations by making the difference between taxonomic relations and semantic ones, since we believe in the necessity to compare all ontology components to favor an ontology from others. All cited works above restrict the ontology to a taxonomy of concepts which means that the information content is extracted just from the specialization hierarchy, hence, ignore other semantic aspect component and features of ontology, namely, semantic relationships between concepts and description logic reasonings that provides. [22] is based on the density factor to calculate concepts weights, depending on the rule saying that more a concept is related to other concepts, more it is important and has an effect to give a better interpretation and similarity compared to other concepts. So, the concept weight, in that case, is calculated based on the number of relations that go into the concept and those that go out of the concept against the total number of the ontology relations. Unlike previously cited works, this one doesn’t take the ontology structure into account which is important as a criterion, since as we stated before, specific concepts should be favored versus the generic ones. For instance, it may be that, a generic concept has a higher density factor comparing to the concept subsuming from it. But by ignoring the hierarchical structure, the generic concept is retained as the more representative and descriptive concept and having a higher weight instead of the specific concept which must have the same relations and properties by logical reasoning as long as each property of the generic concept is inherited by the specific concept. Another approach proposed by [21], uses a rigid approach to weight concepts, by defining five levels of relevance between a document term and a concept presented by five constant values: direct (1.0), strong (0.7), normal (0.4), weak (0.2), irrelevant (0.0); which are calculated based on Natural Language Processing method using term frequency and inverse document frequency (tf-idf ). 3. Our proposed approach for OWL ontology weighting As stated before, through this work we present a novel and generic approach to weight OWL ontologies of a repository, in order to semantically and structurally compare them, hence, facilitate the choice of the appropriate ontology for a specific use in terms of richness and abundance. Indeed, this choice and order must be done automatically, taking into account the following factors: • The number of ontology concepts: the more an ontology contains an important number of concepts, the more it will be able to represent, enrich and describe a resource content. • The number of relations linking these concepts: not only the number of concepts reflects the importance of an ontology, but also the number of relations linking concepts to each other. In fact, two concepts linked by a single relation form a poor combination in terms of quality of information, compared to another combination formed by the same two concepts and many relations. • The concept depth: basically, specific concept presents a finer granularity information comparing to the generic concept; and hence, when having the choice between a generic and specific concept, this latter must be retained. Otherwise said, specific concepts are favored. • The type of relations linking the concepts: as it is obvious that, the ontology goes beyond the taxonomy and the thesaurus, in terms of the information that provides, reflecting the semantic aspect that it demonstrated. Against this background, we are interested in treating semantic relations of an ontology. Although ontologies may be similar in terms of hierarchical description and structure of concepts, but different in terms of semantic relations that can be assigned to a couple of concepts. This difference comes down necessarily to the fact that an ontology was conceived for a specific use or need. Thus, any information provided by semantic relations can enrich the basic relations linking two concepts. In this context, we privilege the semantic relations between two concepts compared to the taxonomic relations.
430
Hasna Abioui et al. / Procedia Computer Science 127 (2018) 426–435 H.Abioui et al. / Procedia Computer Science 00 (2018) 000–000
5
The ontology weighting process involves three essential stages, as shown in Fig. 1. below. As a first step, ontologies must be preprocessed, thence, we can apply a generic method for weighting, that must be operational on any OWL ontology regardless of its conception while respecting all criteria we mentioned above.
Fig. 1. Ontology Weighting Process
Once the ontology is preprocessed, the second step consists of calculating the ontology total weight, based on the number of its concepts and relations against the total number of ontologies concepts and relations. Finally, the last phase, which consists of assigning coefficients to the ontology components, namely, concepts and relations. Individuals are not taken into account by this study. In the next section, we detail the approach proposed for the ontology preprocessing. 3.1. Ontology preprocessing and coefficients assignment for concepts and relations Basically, each ontology is differently conceived hinged on specific needs and expectations, that are at the origin of this conception. The first challenge was to find a generic method to assign coefficients to the ontology concepts, in order to compute their weights. based on that, an ontology preprocessing was essential. We remind that our objective through this step is to browse the ontology, in order to assign coefficients to the concepts firstly, while respecting the hierarchical order by favoring specific concepts instead of generic ones, then to the relations. However, the taxonomic structure of ontology doesnt provide necessarily all included information. Consequently, the ontology browsing will not be limited to axioms which represent just affirmed assertions as a prior knowledge. In addition, we will adopt an inference-based treatment – employing reasoners – which is supported by existing knowledge and fundamental rules to obtain implicit information. Despite the exploitation of ontologies by inference, conception-related issues have been raised during the ontology browsing, and that poses a problem when assigning coefficients to the concepts. We summarize these issues as follows: • An equivalent concept to concepts intersection. Basically, in such cases, the resulted concept is not considered as a specific concept of those from the intersection. While such information must be included in the ontology to ensure a balance between the semantic and taxonomic aspects in the ontology. In similar cases, we add to each resulting concept, axioms indicating the kinship link connecting this latter with all intersection concepts. • A concept equivalent to concepts and properties intersection. Generally, those resulting concepts derive from a single class among those of the intersection, which we call a named class. Whereas other concepts – being part of the expression – serve as a restriction, which makes the resulting concept more specific than the named class from which it derives. In front of such situation, the resulting concept is not considered as a specific one of the concept from which it derives, and this leads to coefficient assignment problems, as long as the hierarchical order is not respected. In that case, we add the axiom which indicates the kinship link associating the resulting concept with the named class.
6
Hasna Abioui et al. / Procedia Computer Science 127 (2018) 426–435 H.Abioui et al. / Procedia Computer Science 00 (2018) 000–000
431
After the ontology preprocessing, we move to the next step, which is the coefficients assignment of both concepts and relations. Regarding concepts, as mentioned before, the affection of coefficients is done according to the hierarchical order, exploiting the concept depth in the ontology, additionally to the concept specification degree, which means, that we favor the specific concepts instead of generic ones. The affectation of the concept coefficients is done as follows: • The root-concept coefficient is always initialized by the value 1, then, it will be incremented passing from a generic level to a specific next one. Those coming from the same level have the same coefficient. • Once the coefficient affectation of generic concepts is done, we move to the specific ones starting from the highest level: All higher-level leaves of the ontology will have as a coefficient the incremented-by-one last generic concept coefficient. While moving from a level to the following one coefficients will be incremented. By this way, specific concepts will be favored over generic ones. Furthermore, in the same hierarchical level, concepts having many kinship relations will be favored over those having just one. Thus, a specific concept resulting from the intersection of many concepts will be affected, according to the generic concept having the higher coefficient. With respect to relations, we favor and give more importance to the semantic ones, instead of the subsumption ones. Since we believe, that semantic relations represent one of comparison criteria between ontologies having the same taxonomic structure. Hence, we assign to a semantic relation a doubled subsumption coefficient. Fig. 2. presents an example of concepts and relations coefficients assignment for the ontology O1 .
Fig. 2. Assignment of coefficients to concepts and relations of O1 ontology
3.2. The ontology weighting We believe that in order to compare ontologies, not only concepts must be compared, also the number and type of relations linking those concepts should be taken into account. Based on that, we define – using the formula (1) – the total weight PT otal (Ok ) of a repository ontology Ok , by calculating the total number of Ok components against the total number of all repository ontologies components, multiplied by the number of ontologies in the repository. Fig. 3. presents an example of total weights for ontologies O1 and O2 . As shown in the figure, both ontologies include a taxonomic structure, and semantic relations.
PT otal (Ok ) =
|Ok | + |Rk | ∗N N (|Oi | + |Ri |)
(1)
i=1
Where |Ok | ( |Oi |) is the number of concepts of Ok (respectively Oi ), |Rk | ( |Ri |) is the number of relations of Ok ontology (respectively Ri ), and N is the number of ontologies contained into the repository.
432
Hasna Abioui et al. /Computer Procedia Science Computer 127 (2018) 426–435 H.Abioui et al. / Procedia 00 Science (2018) 000–000
7
Fig. 3. Total Weights of ontologies O1 and O2
The second step consists of dividing this total weight over all the elements composing the ontology, by ensuring that the criteria mentioned above concerning the coefficient assignment are taken into account. To do this, we need to calculate the margin εk for an ontology Ok through the formula (2), based on the total number of concepts and relations coefficients (3). The margin will be used to calculate weights for each element of the ontology.
εk =
S c (Ok ) =
|Ok | i=1
1 PT otal (Ok ) PT otal (Ok ) ∗ + 2 (S C (Ok ) ∗ |Ok |)4 (S R (Ok ) ∗ |Rk |)4
Coe f f (Ci , Ok )
and
S R (Ok ) =
|Rk |
Coe f f (Ri , Ok )
(2)
(3)
i=1
Where PT otal (Ok ) is the total weight of the ontology Ok calculated using the formula (1), S c (Ok ) is the sum of the Ok concept coefficients, S R (Ok ) is the sum of the Ok relation coefficients, and |Ok | ( |Rk |) the number of concepts (respectively of relations) of the ontology Ok . Then, we calculate the basic weight for concepts and relations, without taking into account the coefficients of the ontology components according to formulas (4) and (5).
λCk =
PT otal (Ok ) − 2 ∗ εk ∗ (|Rk | ∗ S R (Ok ) + S C (Ok )) 2 ∗ |Ok |
(4)
λRk =
PT otal (Ok ) − 2 ∗ εk ∗ (|Ok | ∗ S C (Ok ) + S R (Ok )) 2 ∗ |Rk |
(5)
Once we calculate the basic weights of concepts and relations, what remains is to deduce the effective weight of each element of the ontology , by using the corresponding basic weights and coeffiscients.
8
Hasna Abioui et al. / Procedia Computer Science 127 (2018) 426–435 H.Abioui et al. / Procedia Computer Science 00 (2018) 000–000
433
The effective weight of each concept of the ontology is calculated through the formula (6), while the effective weight of each relation is calculated using the formula (7). P(Ci , Ok ) = λCk + εk ∗ Coe f f (Ci , Ok )
(6)
P(Ri , Ok ) = λRk + εk ∗ Coe f f (Ri , Ok )
(7)
Where λCk and λRk are respectively, the basic weights of concepts and relations, εk is the margin to move from a level i to the level i+1, Coe f f (Ci , Ok ) represents the coefficient of the concept Ci from the ontology Ok , and Coe f f (Ri , Ok ) the coefficient of the relation Ri from the ontology Ok . We notice that the sum of component weights of an ontology is equal to the total weight already calculated by the formula (1), as described in formula (8). Fig. 4. represents the weighted version of the ontology O1 . Thus, the sum of all ontology components weights is equal to the total weight of the ontology calculated above. |Ok |
P(Ci , Ok ) +
i=1
|Rk |
P(Ri , Ok ) = PT otal (Ok )
(8)
i=1
Fig. 4. Weighted O1 Ontology Components
4. Experimental Study 4.1. The Ontology Preprocessing Step results To demonstrate the approach feasibility and effectiveness, we applied it on several OWL ontologies. In this paper, we will show the application results with respect to the preprocessing phase, on three well-known ontologies in the semantic web community. People, Pizza and DBpedia [17] ontologies, that are developed for educational purpose. When choosing ontologies for the experiment study, we insist that they must differ in terms of type – as we test on both upper-level ontology and application ontology – and metric size. Table 1 shows the ontologies metrics, which are significantly different, notably, those we are interested in, namely, the number of classes, axioms, and property. Table
Hasna Abioui et al. / Procedia Computer Science 127 (2018) 426–435 H.Abioui et al. / Procedia Computer Science 00 (2018) 000–000
434
9
1 shows the ontologies metrics before and after applying the preprocessing. As we notice, the preprocessing concerns only the subclass axioms that we added, to solve some ambiguous situations explained in the section. 3.1. 28 sub-class axioms are added to People ontology, 14 for Pizza and 3681 for DBpedia. Table 1. Ontologies Metrics Before and After the preprocessing. Ontology Preprocessing Logical axiom count Class count Object property count Sub-Class Axioms Equivalent Classes
× 108 60 14 33 21
People Ontology √
× 712 100 8 259 15
136 60 14 61 21
Pizza Ontology √ 726 100 8 273 15
DBpedia Ontology √
× 7194 1173 1142 763 410
10875 1173 1142 4444 410
4.2. The ontologies weighting results In this section, we present People ontology components weights. We suppose when calculating the total weights, that the repository contains both People and Pizza ontologies. Table 3 shows concepts and their weights grouped by coefficients. The ontology root, referred to as Thing is the most generic concept from the hierarchy, hence, it has the weakest coefficient and consequently, the lowest weight. While the most specific concept which is the farthest leaf from the root in terms of depth, has the strong coefficient – equal to 10 – and significant weight comparing to the other concepts weights. As shown in the Table 3, the most important generic concepts are the deepest ones and those having 4 as a coefficient number. Regarding relations weights, they are also grouped by the type of the relation. We remind that our aim is to favor the semantic relations instead of the subsumption ones. For that reason, we attribute a neutral coefficient to the subsumption relations, while we assign its double to the semantic relations. Table 2 presents the ontology relations divided into 61 subsumption relations and 46 semantic one. Table 2. People Ontology Relations Weights. The Number Of Relations
Coefficients Of Relations
Relations Weights
61 46
1 2
0.0016123411214946778 0.001612341121494681
Table 3. People Ontology Concepts Weights .
Generic Concepts
Specific Concepts
The Number Of Concepts
Coefficients Of Concepts
Concepts Weights
1 9 4 3
1 2 3 4
0.0028753416666657235 0.002875341666665727 0.0028753416666657304 0.0028753416666657335
5 16 7 9 5 1
5 6 7 8 9 10
0.002875341666665737 0.0028753416666657404 0.002875341666665744 0.002875341666665774 0.002875341666665751 0.002875341666665754
10
Hasna Abioui et al. / Procedia Computer Science 127 (2018) 426–435 H.Abioui et al. / Procedia Computer Science 00 (2018) 000–000
435
5. Conclusion In this work, we demonstrated the details of a new method for weighting ontologies and their components. Our method took into account both features of an ontology: the asserted axioms and inferred ones, unlike existing methods that restrict ontology into a taxonomic structure. By doing this, they ignore the semantic aspect that provides the ontology. Furthermore, our method is based on concepts as well as relations, as an ideal criterion to consider when comparing ontologies. Then, to refine more weights and make them semantically significant, we insisted while assigning weights to favor specific concepts beside generic ones and semantic relations instead of subsumption ones. Through this paper, we presented the feasibility of our method. While comparing ontologies in order to choose the most appropriate one for a specific use, will be done within a particular context. Future work will investigate the effectiveness of the proposed approach within an application in a concrete domain, by proposing ontologies of a specific domain, for instance, the medical field ontologies, then choose the most appropriate one to use as a semantic knowledge base integrated in a semantic annotation process. References [1] A. Kalyanpur, B. Parsia, E. Sirin, B. CuencaGrau J. Hendler. (2006).“Swoop: A Web Ontology Editing Browser”. Web Semantics: Science, Services and Agents on the World Wide Web Vol 4, Issue (2): 144–153. [2] Y. Sure, M. Erdmann, J. Angele, S. Staab, R. Studer, D. Wenke. (2002). “OntoEdit: Collaborative Ontology Development for the Semantic Web”. The Semantic Web ISWC 2002: 221–235. [3] C. Price, and K. Spackman. (2000). “SNOMED clinical terms”. BJHC and IM-British Journal of Healthcare Computing and Information Management 17 (3): 27–31. [4] A.Rector, J.Rogers P.Pole. (1996). “The GALEN High Level Ontology”. Medical Informatics in Europe (MIE) , Copenhagen. [5] Gene Ontology Consortium. (2006).“The Gene Ontology (GO) project”. Nucleic Acids Research Vol. 34 . Database issue D322–D326. [6] DataLift D2.1deliverable. (2011). “M´ethodes et indicateurs pour la selection dontologies fiables et utilisables”. http://datalift.org/project/deliverables/ [7] A.Lozano-Tello,A.G´omez-P´erez. (2004). “ONTOMETRIC:A Method to Choose the Appropriate Ontology”. Journal of Database Management (JDM) Vol. 15 Issue (2): 18. [8] H.Tan, P. Lambrix. (2009). “Selecting an Ontology for Biomedical Text Mining”. In Proceedings of the Workshop on BioNLP: 55–62. [9] S. Tartir, I. Budak Arpinar. (2007). “Ontology Evaluation and Ranking using OntoQA”. In Proceedings of International Conference on Semantic Computing (ICSC). [10] A. Kunaefi, R. Sarno. (2013). “Ontology Mapping for ERP Business Process Variations”. In Seminar Nasional Teknologi Informasi dan Multimedia STMIK AMIKOM. [11] M. Franois Sy. (2012). “Utilisation dontologies comme support a` la recherche et a` la navigation dans une collection de documents”. Doctoral Thesis. [12] T.R. Gruber. (1993). “A translation approach to portable ontology specifications”. In Knowledge Acquisition: 199–220. [13] R. Studer, V. Richard Benjamins, D. Fensel. (1998). “Knowledge Engineering: Principles and Methods”. Data Knowl. Eng. 25(1-2): 161–197. [14] A. Maedche,S. Staab. (2002). “Ontology learning for the Semantic Web”. The Springer International Series in Engineering and Computer Science Book 665. [15] N. Aussenac-Gilles. (2008). “Le web s´emantique, quel renouvellement pour la recherche d’information ? ”. Recherche d'information : Etat des lieux et perspectives, Recherche d 'information et web , Mohand boughanem, Jacques Savoy: 97–132. [16] R. Arp, B. Smith, A. Spear. (2015). “Building Ontologies with Basic Formal Ontology ”. In University Press Scholarship Online, MIT Press, forthcoming. [17] J. Lehmann, R. Isele, M. Jakob and al. (2012). “DBpedia A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia”. In Semantic Web Interoperability, Usability, Applicability an IOS Press Journal. [18] R. Mizoguchi. (2010). “YAMATO: Yet Another More Advanced Top-level Ontology”. Proceedings of the Sixth Australian Ontology Workshop. [19] A. Formica, M. Missikoff, E. Pourabbas, F.Taglino. (2016). “A Bayesian Approach for Weighted Ontologies and Semantic Search”. In Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management ,Vol 2 : 171–178. [20] N. Seco, T. Veale, J. Hayes. (2004). “An intrinsic information content metric for semantic similarity in Word- Net”. Proceedings of the 16th Eureopean Conference on Artificial Intelligence : 1089–1090. [21] W.-D. Fang, L. Zhang, Y.-X. Wang, S.-B Dong. (2005). “Toward A Semantic Search Engin Based On Ontologies”. In Proceedings of the 4th International Conference on Machine Learning and Cybernetics. [22] W. Hayuhardhika, N. Purta, Sugiyanto, R. Sarno, M. Sidiq. (2013).“ Weighted Ontology and Weighted Tree Similarity Algorithm for Diagnosing Diabetes Mellitus”. In Proceedings of International Conference on Computer, Control, Informatics and Its Applications: 267–272. [23] S. BenMeftah, K. Khrouf, J. Feki, M. Ben Kraiem, C. Soul´e-Dupuy. (2012). “Une approche pour l'extraction automatique de structures s´emantiques de documents XML”. INFORSID : 523–538.