Improving the efficiency of NSGA-II based ontology aligning technology

Author’s Accepted Manuscript Improving the Efficiency of NSGA-II based Ontology Aligning Technology Xingsi Xue, Yuping Wang www.elsevier.com PII: DO...

Download PDF

609KB Sizes 0 Downloads 28 Views

Report

PDF Reader
Full Text

Author’s Accepted Manuscript Improving the Efficiency of NSGA-II based Ontology Aligning Technology Xingsi Xue, Yuping Wang

www.elsevier.com

PII: DOI: Reference:

S0169-023X(16)30365-2 http://dx.doi.org/10.1016/j.datak.2016.12.002 DATAK1576

To appear in: Data & Knowledge Engineering Received date: 14 January 2014 Revised date: 25 May 2015 Accepted date: 8 December 2016 Cite this article as: Xingsi Xue and Yuping Wang, Improving the Efficiency of NSGA-II based Ontology Aligning Technology, Data & Knowledge Engineering, http://dx.doi.org/10.1016/j.datak.2016.12.002 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Improving the Eﬃciency of NSGA-II based Ontology Aligning Technology Xingsi Xuea,b,c , Yuping Wang∗,a a

School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi, 710071 China b College of Information Science and Engineering, Fujian University of Technology, Fuzhou, Fujian, 350118 China c Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian University of Technology, Fuzhou, Fujian, 350118 China

Abstract There is evidence from Ontology Alignment Evaluation Initiative (OAEI) that ontology matchers do not necessarily ﬁnd the same correct correspondences. Therefore, usually several competing matchers are applied to the same pair of entities in order to increase evidence towards a potential match or mismatch. How to select the proper matcher’s alignments and eﬃciently tune them becomes one of the challenges in ontology matching domain. To this end, in this paper, we propose to use the Dynamic Alignment Candidates Selection Strategy and Metamodel to raise the eﬃciency of the process of using NSGA-II to optimize the ontology alignment by prescreening the less promising aligning results to be combined and individuals to be evaluated in the NSGA-II, respectively. The experiment results show that, comparing with the approach by using NSGA-II solely, the utilization of Dynamic Alignment Candidates Selection Strategy and Metamodel is able to highly reduce the time and main memory consumption of the tuning process while at the same time ensures the correctness and completeness of the alignments. Moreover, our proposal is also more eﬃcient than the state-of-the-art ontology aligning systems. Key words: ontology alignment, Dynamic Alignment Candidates Selection, ∗

Corresponding Author Email addresses: [email protected] (Xingsi Xue), [email protected] (Yuping Wang )

Preprint submitted to Data & Knowledge Engineering

January 20, 2017

Metamodel, NSGA-II 1. Introduction With the development of Semantic Web, there will be an explosion in the number of ontologies. Many of these ontologies may co-exist in a same area for similar application purposes. However, because of human subjectivity, these ontologies may deﬁne one entity with diﬀerent names or in diﬀerent ways, raising so-called heterogeneity problem which poses as a barrier to semantic interoperability on the ontology level [1]. In order to support semantic interoperability in organizations in many domains through disparate ontologies, we must ﬁnd the semantic correspondences among ontologies’ elements. This process is commonly known as ontology alignment which can be described as follows: given two ontologies, each describing a set of discrete entities (which can be classes, properties, predicates, etc.), ﬁnd the relationships (e.g., equivalence or subsumption) that hold between these entities [2]. Unfortunately, manually aligning the ontologies is time-consuming, error-prone and clearly not possible on the Web scale. Thus the development of the alignment systems to assist the ontology alignment is crucial for the success of Semantic Web. Nowadays, numerous alignment systems have arisen and each of them could provide, in a fully automatic or semi-automatic way, a numerical value of similarity between elements from diverse ontologies that can be used to determine whether those elements are semantically similar or not. However, there is no single ontology matcher that clearly dominates others and often one performs well in some cases and not so well in some other cases [3]. Therefore, both for design and run time aligning, it is necessary to be able to take advantage of the best conﬁguration of various matchers in an alignment system. For these reasons, to achieve high accuracy for a large variety of ontologies, most current ontology alignment systems combine a set of diﬀerent matchers by aggregating their independently obtained results [4]. How to select weights and thresholds in ontology aligning process in order to aggregate the aligning results of various similarity measures to obtain a satisfactory alignment is called meta-matching [5]. This can be viewed as an optimization problem and be addressed by evolutionary approaches like Evolutionary Algorithm (EA). Nevertheless, for dynamic applications, it is necessary to perform the combination of similarity measures and system self-tuning at run time, and 2

thus, beside quality (correctness and completeness) of the aligning results, the eﬃciency (wrt execution time and main memory) of the aligning process is of prime importance especially when a user cannot wait too long for the system to respond or when memory is limited. Therefore, state-of-the-art ontology meta-matching systems tend to adopt diﬀerent strategies within the same infrastructure to improve the eﬃciency of aligning process. Even though, the intelligent aggregation of multiple aligning results is still an open problem. According to the reference [56], all ontology alignment processes based on evolutionary approaches developed so far perform an evaluation of the produced alignments based on multiobjectives “a priori” approaches and the utilize of NSGA-II [6] (fast Non-dominated Sorting Genetic Algorithm), which is a popular multi-objective evolutionary algorithm, to solve the ontology alignment problem is able to overcome the well-known drawbacks of “a priori” methods. Therefore, in this paper we utilize NSGA-II to optimize the process of combining four diﬀerent basic similarity measures (Syntactic Measure, Linguistic Measure, Taxonomy-based Measure and Instance-based Measure) in a meta-matching system. In order to raise the eﬃciency of the optimizing process, before the tuning process, Dynamic Alignment Candidates Selection Strategy is utilized to prescreen the less promising aligning results, i.e. discard those poorly performed ontology alignments, to be combined and reduce the search space of NSGA-II. In order to reduce the number of time and memory consuming evaluations during the tuning process through NSGA-II, Metamodel is further presented to improve the eﬃciency and to the best of our knowledge, this is the ﬁrst time to utilize both Dynamic Alignment Candidates Selection Strateg and Metamodel assisted evolutionary approach to solve the ontology aligning problem. The rest of the paper is organized as follows: section 2 is devoted to discuss the related work; section 3 introduces the basic deﬁnitions; section 4 describes Error Ratio based Dynamic Alignment Candidates Selection Strategy; section 5 presents NSGA-II for Ontology Alignment Optimization Problem; section 6 formulates the metamodel-assisted NSGA-II; section 7 shows the experimental results; ﬁnally, section 8 draws conclusions and proposes the future improvement.

3

2. Related Work 2.1. Alignment Candidates Selection There is evidence from OAEI [7] that ontology matchers do not necessarily ﬁnd the same correct correspondences. Usually several competing matchers are applied to the same pair of entities in order to increase evidence towards a potential match or mismatch[3]. Moreover, the system which aggregates many matchers does not always outperform that aggregating some of them. That is to say, some matchers may generate noise which will aﬀect the quality of the ﬁnal aggregating result. Therefore, how to select the proper matcher’s executing results is a very important problem. With respect to this problem, several state of the art matching systems have proposed their own strategies. RiMOM[8] utilizes more than eight diﬀerent matchers. Based on the deﬁnition of three ontology feature factors, which are estimated based on two ontologies to be matched, that is, label similarity, structure similarity, and label meaning, a strategy selection method is adopted. The matching strategies which are suited to the highest factors are selected to use. But, the association between factors and strategies and the weights which are used to combine the similarity values are all predeﬁned. Falcon-AO[9] uses four elementary matchers. Similarly, the association between detected similarities and matchers to be combined is predeﬁned. The cardinality parameters are not considered at the mapping selection phase. SAMBO and SAMBOdtf[10] have ﬁve basic matchers, which are combined using the weighted average of similarities, where the weights are predeﬁned. A strategy based on double threshold which is adopted at the mapping selection phase is described as follows: pairs above the threshold are retained as suggestions, those in between the lower and the upper threshold are ﬁltered using structural information, and the rest is discarded. PROMPT [11],an algorithm for semi-automatic merging and alignment, is able to guide a user to the next possible point of merging or alignment, to suggest what operations should be performed there, and to perform certain operations automatically. PROMPT also determines possible inconsistencies in the state of the ontology, which result from the users actions, and suggests ways to remedy these inconsistencies. In this paper, we utilize an Error Ratio based Dynamic Alignment Candidates Selection Strategy to automatically discard those poorly performed ontology alignments to improve the eﬃciency of the optimizing process in the premise of alignment quality assurance. 4

2.2. Multi-objective Evolutionary Algorithm and Metamodel Besides selecting matchers, self-conﬁguring or tuning matchers is another open problem in ontology alignment domain. In this paper, we choose to use Multi-objective Evolutionary Algorithm (MOEA), which is a powerful and robust tool for the optimization in diﬃcult search spaces in order to tune the parameters of meta-matching system. Among various MOEAs, NSGA-II is very popular and has been successfully applied to many real-world problems. For example, NSGA-II is applied to solve the combined economic and emission dispatch problem[12], the multi-objective reactive power planning problem[13], the customer churn prediction in telecommunications[14], the feature selection for facial expression recognition[11], etc. NSGA-II focuses on achieving not only a good diversity of Pareto optimal solutions but also a close approximation of Pareto optimal fronts. However, during the process of optimizing the ontology alignment through NSGA-II, a large number of evaluations are needed in order to achieve a suﬃcient good approximation of the Pareto front. The function evaluations for the problem of optimizing ontology alignment are time and memory consuming. To be speciﬁcally, the evaluation function call takes on average 17 seconds on an Intel Core (TM) i7 of 2.93 GHz and 168 GB memory in one generation. In our work, in order to reduce the number of time and memory consuming evaluations, Metamodel, which could be understood as surrogate evaluation models that are built using existing information [15], is introduced to approximate the objective function value using solutions that have already been evaluated during the tuning process through NSGA-II. As the core technology, Metamodeling approach is helpful for NSGA-II to considerably improve the eﬃciency of solving process by using large number of precise evaluations. Nowadays, various metamodeling approaches for screening less promising solutions have been proposed and the most frequently employed ones are based on artiﬁcial neural networks (ANN) and Gaussian Random Field Model (GRFM). With respect to ANN[16], multilayer perceptrons[17] or exactly interpolating radial basis function (RBF) networks[18] are used, either in their standard forms or by incorporating add-on features such as measures for the relative importance of input variables[19]. Regarding GRFM, it is also used to predict objective function values for new candidate solutions by exploiting information recorded during previous evaluations. Unlike ANN, GRFM dose not only provide estimations of function values but also conﬁdence intervals for the predictions. Recent publications show that Metamodel based on GRFM turns out to be quite robust and proved to be successful in many 5

applications in the past[20, 21, 22]. Therefore, in this paper, we use GRFM to accelerate the searching process of NSGA-II. 3. Preliminaries 3.1. Ontology and Ontology Alignment There are many deﬁnitions of ontology over years. But the most frequently referenced one was given by Gruber in 1993 which deﬁned the ontology as an explicit speciﬁcation of a conceptualization. For convenience of the work in this paper, an ontology can be deﬁned as follows: An ontology is a triple O = (C, P, I, A), where: • C is the set of classes, i.e. the set of concepts that populate the domain of interest, • P is the set of properties, i.e. the set of relations existing between the concepts of domain, • I is the set of individuals, i.e. the set of objects of the real world, representing the instances of a concept. • A is the set of axioms, i.e. the main building blocks for ﬁxing the semantic interpretation of the concepts and the relations [23]. In particular, individuals or instances are the basic, ”ground level” components of an ontology. The individuals in an ontology may include concrete objects such as people, animals, tables, automobiles, molecules, and planets, as well as abstract individuals such as numbers and words [23]. In general, classes, properties and individuals are referred to as entities. Ontologies are seen as the solution to data heterogeneity on the web. However, the existing ontologies could themselves introduce heterogeneity: given two ontologies, the same entity can be given diﬀerent names or simply be deﬁned in diﬀerent ways, whereas both ontologies may express the same knowledge but in diﬀerent languages[24]. To solve this problem, a so-called ontology alignment process is necessary. Formally, an alignment between two ontologies can be deﬁned as follows: An alignment A between two ontologies is a set of mapping elements. A mapping element is a 4-tuple (e, e , n, r), where:

6

• e and e are the entities of the ﬁrst and the second ontology, respectively, • n is a conﬁdence measure in some mathematical structure (typically in the [0, 1] range) holding for the correspondence between the entities e and e , • r is a relation (typically the equivalence) holding between the entities e and e . The ontology alignment process can be deﬁned as follows: The alignment process can be seen as a function φ which, from a pair of ontologies O and O to be aligned, a set of parameters p and a set of resources r, returns a new alignment AN between these ontologies: AN = φ(O, O , p, r). The ontology alignment process computes a mapping element by using a similarity measure, which determines the closeness value n (related to a given relation R) between the entities e and e in the range [0, 1], where 0 stands for complete inequality and 1 for complete equality. Next, we describe a general classiﬁcation of the most used similarity measures. 3.2. Similarity Measures Typically, similarity measures between entities of each ontology could be categorized in syntactic, linguistic, taxonomy-based and instance-based measures. In the following, we present some common similarity measures belonging to these four categories. 3.2.1. Syntactic Measure Syntactic Measure computes a string distance or edit distance between the ontology entities. In our work, we utilize Levenshtein distance [25] which calculates the number of operations, such as modiﬁcation, deletion and insertion of a character, which are necessary to transform one string into another. Formally, the Levenshtein distance between two strings s1 and s2 is deﬁned by the following equation: Levenshtein(s1 , s2 ) = max(0, where: 7

min(|s1 |, |s2 |) − d(s1 , s2 ) ) min(|s1 |, |s2 |)

(1)

• |s1 | and |s2 | is the length of string s1 and s2 , respectively, • d(s1 , s2 ) is the number of operation necessary to transform s1 into s2 . Another measure is the Jaro distance [25], an edit distance that uses the number of common characters in the two strings and the positions in which they appear. Given strings s1 and s2 , the Jaro distance is deﬁned as follows: JaroDistance(s1 , s2 ) 1 com(s1 , s2 ) com(s1 , s2 ) com(s1 , s2 ) − trans(s1 , s2 ) = ( + + ) 3 |s1 | |s2 | com(s1 , s2 )

(2)

where: • |s1 | and |s2 | is the length of string s1 and s2 , respectively, • com(s1 ,s2 ) is the number of common characters of s1 and s2 , • trans(s1 ,s2 ) is the number of pairs consisting of common characters that appear in diﬀerent positions. 3.2.2. Linguistic Measure Linguistic measure calculates the similarity between ontology entities by considering linguistic relations such as synonymy, hypernym, and so on. In the proposed work, WordNet [26], which is an electronic lexical database where various senses of words are put together into sets of synonyms, is used to calculate a synonymy-based distance by considering the name of entities. Given two words w1 and w2 , LinguisticDistance(w1 , w2 ) equals: • 1, if the word w1 and w2 are synonymous, • 0.5, if the word w1 is the hypernym of w2 or vice versa, • 0, otherwise. 3.2.3. Taxonomy-based Measure Taxonomy-based measures consider only the specialization relation. The intuition behind taxonomic measures is that terms connected with subsumption relation are already similar, therefore, their neighbors may be also somehow similar. For instance, if super-concepts are the same, the actual concepts 8

are similar to each other; if sub-concepts are the same, the compared concepts are also similar. In particular, in our work, the taxonomy-distance is calculated through the well known Similarity Flooding (SF) algorithm [? ] where an iterative ﬁx-point computation (see also equation (3)) is utilized to produce an alignment between the elements of two ontologies. δ i+1 = normalize(δ i + f (δ i ))

(3)

where function f increments the similarity value of an element pair based on the similarity of its neighbors, and the previous iterations value (δ i ) changes in each variation. About the details of the SF algorithm, see also [? ]. 3.2.4. Instance-based Measure Instance-based measure exploits the similarity between instances to discover the correspondences between the concepts in ontology. This is motivated by the assumption that the real semantics of a concept is often better deﬁned by the actual instances assigned to the concept. In this paper, we ﬁrst use Token-based measure, i.e. Q-Gram distance [4] to manage the use of diﬀerent conventions for describing data (e.g., “Jack Smith”, “Smith, Jack”) and Character-based measure, i.e. Jaro Distance [5] to compare string values and recognize typographical errors (e.g., “Computre Science”, “Computer Science”). Then, we propose an additional function which applies the concept of upPropagation [27], in which the similarities between instances are propagated to their concepts. To combine all the similarity measures mentioned above, an aggregation strategy is needed. In this work, we utilize weighted average aggregation which is deﬁned in the following: n n wi si (c) with wi = 1 and wi ∈ [0, 1] (3) φ(s(c), w) = i=1

i=1

where: • s(c) is the vector of similarity measure results, • w is the vector of weights, • n is the number of similarity measures. Since the quality of resulting alignment, the correctness and completeness of the correspondences found already, need to be assessed, we will introduce some conformance measures which derive from the information retrieval ﬁeld1[28] in the next section. 9

3.3. Alignment Evaluation The alignment is normally assessed on the basis of two measures commonly known as recall and precision. Recall (or completeness) measures the fraction of correct alignments found in comparison to the total number of correct existing alignments. A recall of 1 means that all of the alignments have actually been found, but it does not provide the information about the number of additionally falsely identiﬁed alignment. Typically, recall is balanced against precision (or correctness), which measures the fraction of found alignments that are actually correct. A precision of 1 means that all found alignments are correct, but it does not imply that all alignments have been found. Therefore, recall and precision are often balanced against each other with the so-called f-measure, which is the uniformly weighted harmonic mean of recall and precision. Given a reference alignment R and an alignment A, recall, precision and f-measure are given by the following formulas: |R ∩ A| |R| |R ∩ A| precision = |A| precision · recall f − measure = 2 · precision + recall recall

=

(4) (5) (6)

4. Error Ratio based Dynamic Alignment Candidates Selection Strategy In this section, we present a Error Ratio based Dynamic Alignment Candidates Selection Strategy to prescreen the less promising aligning alignments. In this way the alignment needed to be combined can be reduced, and the eﬃciency of tuning process by NSGA-II can be improved. In particular, ﬁrst of all, a diﬀerentor based Alignment Candidates Aggregation approach is introduced to self-adaptively obtained an aggregated alignment. Then, on this basis, a Dynamic Alignment Candidates Selection is proposed to discard the alignments having large distances with the aggregated alignment. 4.1. Diﬀerentor based Alignment Candidates Aggregation For each similarity measure, an ontology alignment corresponding to a similarity matrix whose row and column are composed of the entities coming 10

from the source ontology and the target ontology respectively is generated. In order to combine these similarity matrices to form a ﬁnal similarity matrix, it is necessary to determine a combination strategy which can appropriately estimate the quality of each similarity matrix. However, due to the fact that a similarity measure may not work well for all entities in the ontologies, assigning a weight to each similarity matrix is not a good solution. To solve this problem, we introduce a metric termed the diﬀerentor which can assign high weights for the entities a similarity measure handles well and low weights for the entities it handles badly when aggregating similarity matrices. The diﬀerentor can be deﬁned as the normalized number of mappings that suggests an unambiguous one-to-one mapping in a row of a similarity matrix[29]. For a similarity matrix produced by a similarity measure, the cell in the ith row and jth column represents a candidate alignment < ei , ej , s, =>, where ei and ej are the ith and jth entity of the source ontology and the target ontology respectively, and ei refers to the similarity value between them. The diﬀerentor of the ith row can be deﬁned as follows: 1.0, numimax = 1 i (7) dif f erentori = 1 − numnmax , otherwise where numimax is the number of cells with the maximum similarity in the ith row, and n is the number of columns. In our work, a diﬀerentor-based similarity aggregation strategy is applied in the process of dynamic alignment selection. In detail, the steps of the diﬀerentor-based similarity aggregation are presented as follows: (1) calculate the diﬀerentor for each row in all similarity matrices; (2) multiply the similarities in each row by their corresponding diﬀerentor for each similarity matrix; (3) add all similarity matrices to form a ﬁnal similarity matrix; (4) normalize the ﬁnal similarity matrices. 4.2. Dynamic Alignment Candidates Selection Strategy In order to eliminate the poorly performed ontology alignments, we propose a novel approach based on the intuition that the poorly performed ontology alignments are those having large distances with the aggregated alignment. For all entities in two ontologies O1 and O2 , each similarity measure will generate a similarity matrix S, whose rows and columns are formed by the entities in O1 and O2 respectively, the value of an element in S is the similarity value of two corresponding entities. Given a set of similarity matrices {Sj }, we deﬁne the average bias of these multiple similarity matrices 11

as follows:

j( ei1 →ei2 p(M apj (ei1 , ei2 )|Sj , M ap{j} (ei1 , ei2 ))) average bias({Sj }) = total number (8) where: • ei1 → ei2 means one mapping; • p(M apj (ei1 , ei2 )|Sj , M ap{j} (ei1 , ei2 )) is the probability of diﬀerence for (ei1 , ei2 ) between aggregated mapping M ap{j} and Sj ’s mapping M apj , which can be calculated by the following formula: p(M apj (ei1 , ei2 )|Sj , M ap{j} (ei1 , ei2 )) =

|simM apj (ei1 , ei2 ) − simM ap{j} (ei1 , ei2 )| max(simM apj (ei1 , ei2 ), simM ap{j} (ei1 , ei2 ))

where simM apj (ei1 , ei2 ) and simM ap{j} (ei1 , ei2 ) refer to the similarity value of ei1 and ei1 in M apj and M ap{j} respectively; • total number is the number of (ei1 , ei2 )whose similarities in the aggregated matrix and each Sj do not both equal 0. With respect to p(M apj (ei1 , ei2 )|Sj , M ap{j} (ei1 , ei2 )) , we take a speciﬁc example to illustrate how to calculate it. For example, Sj ’s similarity on mapping (ei1 , ei2 ) is 0.5 and aggregated similarity, i.e. the similarity value of the entities ei1 and ei2 in the aggregated matrix, is 0.6, then p(M apj (ei1 , ei2 )|Sj , M ap{j} (ei1 , ei2 )) =

|0.5 − 0.6| 0.1 = = 0.167 max(0.5, 0.6) 0.6

4.3. Eﬃcient Implementation of Dynamic Alignment Candidates Selection To implement the Dynamic Alignment Candidates Selection, we deﬁne the average bias of Sj as average bias(Sj ) = |average bias({Sj }) − average bias({S¯j })| where: average bias({Sj }) is the average bias by all similarity matrices, average bias({S¯j }) is the average bias by all similarity matrices without Sj . The proposed strategy is based on the hypothesis that the higher the average bias of the matrix Sj is, the lower probability of Sj being selected in the ﬁnal aggregation. In this work, the threshold is set as 0.26 in empirical 12

way to achieve the highest average alignment quality on all test cases of exploited dataset, and the similarity matrix with the average bias value above 0.26 will be screened. By the way, if the average biases of all the similarity matrices are larger than the threshold, then merely one similarity matrix with the lowest average bias will be selected as the ﬁnal similarity matrix. 5. NSGA-II for Optimizing Ontology Alignment The process of determining the optimal mapping set in order to yield the alignment with the best quality can be regarded as an optimizing process. In the following, Multi-Objective Optimal Model for optimizing the ontology alignment is ﬁrst presented, and then the details of a problem-speciﬁc NSGAII [6], which is a popular Multi-Objective Evolutionary Algorithm, are given. 5.1. Multi-Objective Optimal Model for Optimizing the Ontology Alignments In this section, the multi-objective optimization model for optimizing the ontology alignment problem is presented as follows: ⎧ ⎨ max f (X) = max(Recall(X), P recision(X)) s.t. X = (x1 , x2 , . . . , xn )T , xi ∈ [0, 1] (9) n−1 ⎩ i=1 xi = 1 where the decision variable X is a n-dimension vector where xi , i ∈ [1, n − 1] represents the i − th alignment’s weight to be aggregated and xn the threshold for ﬁltering the aggregated alignment, and the objective of this model is to maximize both recall and precision of the aggregated alignment. Since modeling the meta-matching problem is a complex (nonlinear problem with many local optimal solutions) and time-consuming task (large scale problem), particularly when the number of similarity measures is signiﬁcantly large, approximate methods are usually used for computing the parameters. From this point of view, evolutionary optimization methods could represent an eﬃcient approach for addressing this problem. Furthermore, in order to overcome the well-known drawback of “a priori” approaches brought by single objective evolutionary optimization approaches, in this paper, a popular multi-objective evolutionary algorithm, i.e. NSGA-II, is utilized to solve the ontology meta-matching problem.

13

Table 1: Outline of NSGA-II

The outline of NSGA-II t = 0; Initialize the Population Pt ; EvaluatePt ; while t < tmax do Gt = generateByGeneticOperators(Pt ); //generate λ variations Evaluate Gt ; Pt+1 = rankAndSelect(Qt ∪ Pt ); //select μ best individuals t = t+1; end while The NSGA-II aims to obtain a well distributed set of points that are close to the pareto front. Both, closeness and diversity, are addressed in the selection operator, where the population is sorted using non-domination ranks as primary sorting criterion, and crowding-distances as secondary sorting criterion. About the details of the non-domination rank and crowding-distance sorting, see also [6]. Basically, the NSGA-II is an elitist EA with (μ + λ) selection, where μ and λ refer to the population size and the number of new solution generated in one generation respectively. The (μ + λ) − EA starts with generating the initial population with μ individuals, and then the EA generates a set of λ new solutions through recombination and mutation. The new candidate solutions are evaluated and ranked in terms of their quality. The outline of NSGA-II is presented in table 1. In the following, ﬁve basic steps of NSGA-II for optimizing the ontology alignment are presented. 5.2. Chromosome Encoding We incorporate in a chromosome both the weights associated with the similarity measures and the threshold to decide whether a pair of entities is an alignment or not. Therefore, one chromosome can be divided into two parts, one stands for several weights and the other for threshold. Concerning the characteristics of the weights which are mentioned in 3.2.3, our encoding mechanism indirectly represents them by deﬁning the cut or separation point in the interval [0, 1] that limits the value of the weights. If p is the number of 14

weights required, the set of cuts can be represented as c = {c1 , c2 , ..., cp−1 } . The chromosome decoding is carried out by queuing the elements of c in ascending order, then we get c = {c1 , c2 , ..., cp−1 }, and calculating the weights as follows: ⎧ k=1 ⎨ c1 , ck − ck−1 , 1 < k < p wk = (10) ⎩ 1 − cp−1 , k = p Therefore, the length of a chromosome is (n−1)cutLength+thresholdLength, where n is the number of weights, cutLength and thresholdLengthare the chromosome lengths of the cut and threshold, respectively. 5.3. Fitness Functions Fitness functions are objective functions that evaluate the quality of the alignment obtained by using the weights and the threshold encoded in the chromosome. In our work, there are two ﬁtness functions calculating the recall and precision value of the aggregating result, respectively. 5.4. Genetic Operators 5.4.1. Selection Like in nature, the most suitable chromosomes must have more opportunities of reproducing themselves. The best chromosomes in a population are the chromosomes that have the best ﬁtness value and the genetic information of these chromosomes can potentially provide the best solutions to the problem. Anyway, reproduction opportunities of the less suitable chromosomes should not be completely removed, because it is important to keep diversity in the population. In this article, in order to ensure the diversity of the population and accelerate the convergence of the algorithm, selection operator ﬁrst queues the chromosomes of population in descending order according to their crowding distances which estimate the density of the solutions. Then we select half of the chromosomes in the front of the population and randomly copy one each time until forming a new population. 5.4.2. Crossover The crossover operator takes two chromosomes called parents and generates two children chromosomes, which are obtained by mixing the genes of the parents. Crossover is applied with a certain probability, a parameter of the genetic algorithm. In this work, we use the common one-cut-point method to carry out the crossover operation on the population. First, a cut 15

position in the chromosomes of two parents is randomly determined and this position is a cut point which cuts each parent into two parts: the left part and the right part. Then, the right parts of them are switched to form two children. 5.4.3. Mutation The mutation operator assures diversity in the population and prevents from premature convergence. In our work, for each bit in the chromosome we check if the mutation could be applied according to the mutation probability and if it is, the value of that bit is then ﬂipped 5.5. Generation of the next generation population First, we put the current population and the new population together and remove the redundancy of the chromosomes. Then, the new population is selected by the non-dominated and crowd-distance sorting technology, see also [6]. 5.6. Elitist Strategy Elitist strategy puts the best chromosome (elite) of the current population unaltered in the next population. This assures the survival of the elite that has been obtained up to the moment. In our work, we regard the individual with the highest f-measure as the elite of current generation and the elite will be updated in each generation during the evolutionary process. When the algorithm terminates, the elite will be recommended to the user as the optimal solution. In order to accelerate the NSGA-II in the presence of time consuming function, Metamodels have been frequently proposed. Next, we introduce the local Metamodel based on GRFM and its application in NSGA-II. 6. Gaussian Random Field Model 6.1. Local Metamodeling based on Gaussian Random Field Model In our work, metamodel is used to model a mathematical relationship, e.g. approximate multivariate function, between the points that have already been evaluated. More precisely, given a set of points x1 , x2 , ..., xn ∈ Rn , and evaluations of the objective functions at these points y 1 = f (x1 ), ..., y n = f (xn ), the metamodel can be used to compute an approximation f(x) ≈ f (x) 16

Figure 1: Outputs of a GRFM with one-dimensional input space. In this example, three points x(1) ,x(2) ,x(3) have been evaluated. The result of each approximate evaluation at a point x is represented by the mean value yˆ and standard deviation sˆ of a one dimensional Gaussian distribution

for any point x ∈ Rn in a time that is considerably faster than the precise evaluations. GRFM is a particular type of interpolation model which can not only predict the objective function value but also provide a measure of conﬁdence for its prediction. Actually, the output of GRFM includes both the mean value and the standard deviation for a one-dimensional Gaussian distribution which represents the likelihood for diﬀerent realizations of outcomes to represent a precise function evaluation. Figure 1 illustrates an example with one-dimensional input space. Here, the Metamodel we used is based on the techniques discussed in Sacks et al[30]. and we will not go into details. However a brief description shall be provided. Basically, the evaluation function is sought to be the realization of a random process with spatial index. The main assumption is that the random variables are correlated by a distance based correlation function. In this work, we utilize the following Gaussian product kernel which is proposed by Sacks et al[30]. as the correlation function. c(θ1 , θ2 , ..., θd ) =

n

exp(−θi · |xi − xi |2 )

(11)

i=1

where θi , i = 1, 2, ..., d denote correlation parameters. First, the free pa17

rameters, i.e. the θ values, are estimated which is done by means of maximum likelihood method using the given evaluations as sample. Then the conditional distribution of the random ﬁeld for the new point x is computed. Since the random ﬁeld is Gaussian, this distribution is a one-dimension conditional Gaussian distribution F(x |X, y), the mean value of which will serve as predictor yˆ, whereas the standard deviation serves as conﬁdence measure sˆ(x ) (see also ﬁgure 1). The quantity of the training points, i.e. the solutions need precise evaluation in our work, is the main factor that determines the time consumption and especially for large training sets this is very time consuming. Therefore, it is recommended to use only the minimum necessary subset of the total number of training points available. A well performing heuristic is to use a small number of training points which are closest, with regard to the Euclidian metric, to each new point x and train a locally new valid Metamodel. This strategy, termed local metamodeling by Giannakoglou et al.[16], is an eﬃcient measure for speeding up computation while still achieving a high quality of the Metamodels. For a systematic study of the number of neighbors we refer to D. Buche et al.[31], the authors suggest a reasonable choice for the neighborhood size. At least in empirical studies it has been shown that this parameter setting leads to suﬃciently well approximation on typical test problems. In our work, the value of the neighborhood size has been set to 6. Any further increase in 6 seems to slightly improve the results, but at the same time, increases the computation time signiﬁcantly. 6.2. Prescreening procedure for NSGA-II In particular, the Metamodels can be integrated into evolutionary optimization procedures in two diﬀerent ways: (1) some generations are evaluated by the evaluation function and some other generations are evaluated solely by the Metamodel; (2) in each generation (apart from the very ﬁrst one), Metamodels and exact evaluation function are used in cooperative manner. The second approach, which is adopted in our work, turns out to be quite robust and proved to be successful on many applications[21, 22, 31, 32]. In order to prescreen the less promising individuals in the new generation, a ranking algorithm applied over the oﬀspring population is needed. Basically, this ranking algorithm is designed based on the value yˆ(x) which is predicted through Metamodel and its corresponding standard deviation sˆ(x). First, according to Torczon et al.[33, 34], we use the following formula to calculate the predicted value fˆ(x) instead of using yˆ directly. The idea is 18

Figure 2: Illustration of the hypervolume measure. The black rectangle represents the improvement brought by x

to increase the number of evaluations in promising but less explored regions of the search space by directing the search towards them. fˆ(x) = yˆ(x) + sˆ(x);

(12)

Then the Lebesgue measure of the dominated hypervolume, which refers to the volume of a n-dimension hypercube, for a restricted solution space is employed to measure the improvement of the new points[7]. Fleischer proved that, for countable spaces, the hypervolume measure H(P ) of a population P takes its maximum if the set found covers the true pareto set[35]. Furthermore, adding a new point x, the hypervolume increases if and only if x is not dominated by any point in already existing non-dominated solutions Et , and thus Et ∪ {x} could be regarded as an improvement of Et . See more details in ﬁgure 2. H(Et ∪ {x}) − H(Et ), if Et nondominates x (13) I(x) = 0, otherwise Finally, a Constant Ratio (CR) selecting strategy[36] is applied to choose the most promising oﬀspring for precise evaluation. The CR strategy makes extensive use of the Metamodel information and thus it has the potential to improve the convergence signiﬁcantly. In our work, we set the selecting ratio of CR as 0.25 and the outline of the Metamodel assisted NSGA-II is presented in table 2. 19

Table 2: Outline of Metamodel-assisted NSGA-II

The outline of Metamodel-assisted NSGA-II t = 0; Initialize the Population Pt ; Evaluate Pt and insert results into Database D; while t < tmax do Gt = generateByGeneticOperators(Pt ); Evaluate Gt with metamodel derived from D; Choose set of promising individuals Qt ⊆ Gt according to CR; P(t+1) = rankAndSelect(Qt ∪ Pt ); t = t+1; end while There are two distinguished features of the Metamodel assisted NSGA-II comparing with standard NSGA-II: (1) all exactly evaluated individuals are recorded in database; (2) before deciding whether new points need to be evaluated, the object function values for them are predicted by the Metamodel during the prescreening phase. 7. Experimental Results and Analysis In the experiments, the well-known benchmarks provided by the OAEI 2011 [7] are used. Each benchmark in the OAEI data set is composed of two ontologies to be aligned and a reference alignment to evaluate the quality of alignment. Moreover, according to OAEI policies, the benchmark reference alignments take into account only the matching between ontology classes and properties. The ontologies in the tests are described in OWL-DL and serialized in the RDF/XML format. Table 3 shows a brief description about the benchmarks of OAEI 2011. 7.1. Experiments Conﬁguration The similarity measures used are as follows: • Levenshtein distance (Syntactic Measure), • Linguistic distance (Linguistic Measure),

20

Table 3: Brief Description of Benchmarks ID 101 103 104 201 203 204 205 206 221 222 223 224 225 228 230 231 301 302 304

Brief description Strictly identical ontologies A regular ontology and other with a language generalization A regular ontology and other with a language restriction Ontologies without entity names Ontologies without entity names and comments Ontologies with diﬀerent naming conventions Ontologies whose labels are synonymous Ontologies whose labels are in diﬀerent languages A regular ontology and other with no specialization A regular ontology and other with a ﬂattened hierarchy A regular ontology and other with a expanded hierarchy Identical ontologies without instances Identical ontologies without restrictions Identical ontologies without properties Identical ontologies with ﬂattening entities Identical ontologies with multiplying entities A real ontology about bibliography made by MIT A real ontology with diﬀerent extensions and naming conventions A regular ontology and other with a real ontology which is not equivalent but quite close

• Taxonomy distance (Taxonomy-based Measure), • Q-Gram distance and Jaro distance (Instance-based Measure). The NSGA-II uses the following parameters: • Search space for each parameter is the continuous interval [0,1], • Numerical accuracy = 0.01, • The ﬁtnesses are recall and precision, • Population size = 20 individuals, • Crossover probability = 0.6, • Mutation probability = 0.01, • Max generation = 5. After ten independent executions, we noticed that the NSGA-II does not improve the results beyond the ﬁfth generation, so we have set a limit of ﬁve generations. The hardware conﬁgurations used to run the algorithms are provided below: 21

• Processor: Intel Core (TM) i7, • CPU speed: 2.93GHz, • RAM capacity: 4GB. The results of the experiments are given in the next section. 7.2. Results and Analysis All the values shown in table 4, table 5 and table 6 are the average ﬁgures in ten independent runs. Speciﬁcally, table 4, where symbol R and P refer to recall and precision respectively, shows the comparison of the qualities of the alignment obtained by the approach using NSGA-II only, the approach using NSGA-II with dynamic alignment candidates selection strategy, the approach using metamodel-assisted NSGA-II and our approach which uses dynamic alignment candidates selection strategy and mtamodel-assisted NSGAII. While table 5 and table 6 present the comparison of the average executing time and main memory consumption per generation by the the approach using NSGA-II only, the approach using NSGA-II with dynamic alignment candidates selection strategy, the approach using metamodel-assisted NSGA-II and our approach, respectively. Table 7 shows the mean values of the results obtained by our approach and the state-of-the-art ontology matching systems [? ] respectively, where 1XX stands for the benchmarks in Table 3 whose number beginning with the preﬁx digit 1 and so are 2XX and 3XX. As it can be seen from table 4, except benchmark 205, all the other benchmarks’ alignment quality obtained by four approaches are identical to each other. With respect to benchmark 205, although the recall and precision of four alignments are slightly diﬀerent, the f-measure values obtained by four approaches are the same. Therefore, we may draw the conclusion that, from the aspect of the quality of alignment, our proposal is eﬀective. We can see from the table 5 that, comparing with the approach using NSGA-II only, our approach dramatically improve the executing time in all benchmarks. In particular, the improvement degree is 75.95% on average. Moreover, two proposals, i.e. dynamic alignment candidates selection strategy and metamodel, are also eﬀective when being applying in NSGA-II respectively to improve the executing time. In table 6, comparing with the approach using NSGA-II only, the main memory consumption per generation of our approach dramatically reduce by 65.91% on average in all benchmarks,

22

Table 4: Comparison among the approach using NSGA-II only, the approach using NSGAII with dynamic alignment candidates selection strategy, the approach using metamodelassisted NSGA-II and our approach in terms of the qualities of the alignments ID F-measure (R, P) F-measure (R, P) F-measure (R, P) F-measure (R, P) (NSGA-II only) (with selection strategy) (with metamodel) (our approach) 101 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 103 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 104 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 201 0.94 (0.90, 0.98) 0.94 (0.90, 0.98) 0.94 (0.90, 0.98) 0.94 (0.90, 0.98) 203 0.99 (0.98, 1.00) 0.98 (0.98, 0.99) 0.99 (0.98, 1.00) 0.99 (0.98, 1.00) 204 0.98 (0.98, 0.99) 0.98 (0.98, 0.99) 0.98 (0.98, 0.99) 0.98 (0.98, 0.99) 205 0.93 (0.89, 0.99) 0.93 (0.89, 0.99) 0.93 (0.90, 0.97) 0.93 (0.90, 0.97) 206 0.70 (0.67, 0.73) 0.70 (0.67, 0.74) 0.70 (0.67, 0.74) 0.70 (0.67, 0.74) 221 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 222 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 223 0.99 (0.98, 1.00) 0.99 (0.98, 1.00) 0.99 (0.98, 1.00) 0.99 (0.98, 1.00) 224 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 225 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 228 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 230 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 231 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 301 0.81 (0.81, 0.82) 0.81 (0.81, 0.82) 0.81 (0.81, 0.82) 0.81 (0.81, 0.82) 302 0.85 (0.80, 0.91) 0.85 (0.80, 0.91) 0.85 (0.80, 0.91) 0.85 (0.80, 0.91) 304 0.93 (0.88, 0.99) 0.93 (0.88, 0.99) 0.93 (0.88, 0.99) 0.93 (0.88, 0.99)

and the average value shows that our proposals are both eﬀective when being applying in NSGA-II respectively to reduce the memory consumption. As can be seen from table 7, for the benchmark 1XX, the quality of all ontology matching systems are identical in terms of F-measure, and our approach outperforms all the others in the benchmarks 2XX. Although our approach ranks second in the benchmarks 3XX, from the average value in table 7, the quality of the alignments obtained by our approach is in general better than those by the state-of-the-art ontology matching systems. Since evolutionary approaches are eﬃcient in ﬁnding an isomorphism between the sub-graphs modeling the two ontologies, particularly when the considered ontologies are characterized by a signiﬁcant number of entities [1]. The quality of the alignments obtained by our approach is in general better than those by the state of the art ontology matching systems without using evolutionary approach. With regard to the runtime, RiMOM, Falcon-AO, SAMBO and SAMBOdtf takes approximately 7 seconds, 6 seconds, 12 seconds and 14 seconds perspectively respectively to determine an alignment. While our 23

Table 5: Comparison among the approach using NSGA-II only, the approach using NSGAII with dynamic alignment candidates selection strategy, the approach using metamodelassisted NSGA-II and our approach in terms of the executing time taken per generation ID Time (ns) Time (ns) Time (ns) Time (ns) (NSGA-II only) (with selection strategy) (with metamodel) (our approach) 101 1,766,065,798 1,341,072,770 809,83,148 491,086,716 103 1,941,128,551 1,298,435,579 495,071,617 513,052,637 104 1,936,678,545 1,568,335,0435 882,905,649 531,648,017 201 26,237,786,925 19,667,79,222 1,145,529,885 6,527,802,835 203 23,129,665,922 17,339,927,051 1,010,275,796 5,760,455,309 204 23,137,994,633 18,344,536,264 1,123,146,872 5,757,619,474 205 22,538,261,706 16,904,326,108 9,861,906,611 5,636,454,913 206 22,593,189,358 16,954,422,419 9,905,963,746 5,676,888,543 221 23,208,903,087 17,839,725,069 9,637,467,637 5,909,427,595 222 22,472,139,336 15,481,838,986 6,743,963,548 1,501,238,286 223 28,851,935,372 21,707,637,323 12,777,241,825 7,419,041,247 224 22,796,107,326 17,109,449,105 11,126,330,025 5,736,132,665 225 23,220,966,073 19,431,607,369 10,194,908,989 5,852,889,962 228 5,622,022,072 4,243,440,157 2,520,212,764 1,486,276,329 230 19,158,248,325 16,435,714,629 8,480,898,759 4,921,782,238 231 22,996,779,479 17,255,009,349 1,010,296,756 5,801,469,182 301 11,337,855,230 8,525,370,306 6,009,761,353 2,900,399,632 302 7,734,034,912 5,530,778,167 3,776,707,575 1,124,264,677 304 17,247,855,524 13,926,727,330 7,525,316,457 4,284,469,953 Average 17,259,348,324 13,016,849,876 5,489,941,527 4,096,442,116

approach takes 4 seconds in average to obtained an alignment, which is lower than other state-of-the-art ontology matching systems. According to the experiment results showed above, comparing with the approach by using NSGA-II solely, the utilization of Dynamic Alignment Candidates Selection Strategy and Metamodel-assisted NSGA-II is able to highly reduce the executing time and main memory consumption of the tuning process while at the same time ensures the correctness and completeness of the alignments. 8. Conclusion and Future Work Ontology alignment is an important step in ontology engineering. Although lots of work have been done to tackle this problem, there are still various challenges left for the researchers to deal with. One of these challenges is the selection of matchers and self-conﬁguration of them. For dynamic applications it is necessary to perform matcher combination and self-tuning at 24

Table 6: Comparison among the approach using NSGA-II only, the approach using NSGAII with dynamic alignment candidates selection strategy, the approach using metamodelassisted NSGA-II and our approach in terms of the main memory consumed per generation by evaluation function ID Memory (byte) Memory (byte) Memory (byte) Memory (byte) (NSGA-II only) (with selection strategy) (with metamodel) (our approach) 101 68,485,120 58,396,482 38,219,206 28,130,568 103 33,020,008 20,058,432 18,356,448 12,135,256 104 35,142,512 26,724,610 22,952,234 18,888,808 201 224,281,424 174,471,418 93,794,662 75,580,408 203 219,355,104 166,168,874 92,440,088 74,940,416 204 135,096,032 108,664,162 73,379,396 52,807,184 205 210,275,960 164,365,808 86,698,486 72,554,504 206 206,870,440 161,680,989 80,568,448 71,956,096 221 185,348,632 145,674,757 96,082,414 66,327,008 222 187,774,568 135,582,834 70,341,431 31,199,368 223 222,302,384 185,502,650 95,441,298 81,903,184 224 203,368,408 160,380,802 73,225,949 71,045,104 225 203,790,664 162,194,813 73,223,782 71,847,624 228 174,386,136 134,386,077 84,379,254 54,376,960 230 176,487,368 135,909,858 861,728,879 54,754,840 231 184,253,248 154,878,082 95,659,126 66,127,752 301 84,511,680 84,484,258 33,187,954 33,429,416 302 229,535,904 226,194,546 32,161,840 32,311,840 304 219,960,864 167,646,365 95,451,853 63,017,368 Average 168,644,550 135,440,306 111,436,460 54,385,984

run time, and thus, eﬃciency of the conﬁguration search strategies becomes critical. To this end, in this paper, we propose to use Dynamic Alignment Candidates Selection Strategy and Metamodel-assisted NSGA-II to tune the parameters of ontology aligning system in order to improve the eﬃciency by prescreening the less promising alignments to be aggregated and individuals to be evaluated in the NSGA-II, respectively. From the aspect of the quality of the alignment, the executing time and main memory consumption, the experiment results show the eﬃciency of our approach by comparing with the approach based on NSGA-II solely. It turns out that our approach is able to highly reduce the executing time and main memory consumption of the tuning process while at the same time ensures the quality of the alignment. As the number of available matchers increases, the problem of their selection will become more critical, e.g., when the task will be to handle more 25

Table 7: Comparison of our approach with the state-of-the-art ontology matching systems ID

F-measure (RiMOM) 1XX 1.00 (1.00, 2XX 0.92 (0.87, 3XX 0.82 (0.82, Average 0.91 (0.89,

(R, P) 1.00) 0.97) 0.83) 0.93)

F-measure (R, P) (Falcon-AO) 1.00 (1.00, 1.00) 0.89 (0.89, 0.90) 0.87 (0.83, 0.93) 0.92 (0.90, 0.94)

F-measure (SAMBO) 1.00 (1.00, 0.70 (0.54, 0.86 (0.80, 0.85 (0.78,

(R, P) 1.00) 0.98) 0.95) 0.97)

F-measure (R, P) (SAMBOdtf) 1.00 (1.00, 1.00) 0.71 (0.56, 0.98) 0.85 (0.81, 0.91) 0.85 (0.79, 0.96)

F-measure (R, P) (our approach) 1.00 (1.00, 1.00) 0.96 (0.95, 0.98) 0.86 (0.83, 0.91) 0.94 (0.92, 0.96)

than 50 matchers within one system. In continuation of our research, study is now being carried out on large scale matchers’ selection. We are also interested in improving the eﬃciency when solving the large-scale ontology aligning problem, which is another challenge problem in ontology matching domain. A feasible method could be that ﬁrstly the ontology is partitioned into various small segments and then the aligning process is carried out between similar segments, so that not all data has to be kept in main memory and the executing time could be also improved. 9. Acknowledgement This work is supported by the National Natural Science Foundation of China (Nos. 61272119 and 61503082), Natural Science Foundation of Fujian Province (No. 2016J05145) and China Scholarship Council. References [1] B. Huang, B. Buckley, T.-M. Kechadi, Multi-objective feature selection by using NSGA-II for customer churn prediction in telecommunications, Expert Systems with Applications 37 (5) (2010) 3638–3646. [2] J. Euzenat, P. Valtchev, et al., Similarity-based ontology alignment in OWL-lite, in: ECAI, vol. 16, 333, 2004. [3] P. Shvaiko, J. Euzenat, Ontology matching: state of the art and future challenges, IEEE Transactions on knowledge and data engineering 25 (1) (2013) 158–176. [4] J. Tang, Y. Liang, Z. Li, Multiple strategies detection in ontology mapping, in: Special interest tracks and posters of the 14th international conference on World Wide Web, ACM, 1040–1041, 2005.

26

[5] J. Euzenat, P. Shvaiko, et al., Ontology matching, vol. 18, Springer, 2007. [6] K. Deb, S. Agrawal, A. Pratap, T. Meyarivan, A fast elitist nondominated sorting genetic algorithm for multi-objective optimization: NSGA-II, in: International Conference on Parallel Problem Solving From Nature, Springer, 849–858, 2000. [7] OAEI, Ontology Alignment Evaluation http://oaei.ontologymatching.org/2015 .

Initiative

(OAEI),

[8] J. Tang, J. Li, B. Liang, X. Huang, Y. Li, K. Wang, Using Bayesian decision for ontology mapping, Web Semantics: Science, Services and Agents on the World Wide Web 4 (4) (2006) 243–262. [9] N. Jian, W. Hu, G. Cheng, Y. Qu, Falcon-ao: Aligning ontologies with falcon, in: Proceedings of K-CAP Workshop on Integrating Ontologies, 85–91, 2005. [10] P. Lambrix, H. Tan, Q. Liu, SAMBO and SAMBOdtf results for the ontology alignment evaluation initiative 2008, in: Proceedings of the 3rd International Conference on Ontology Matching-Volume 431, CEURWS. org, 190–198, 2008. [11] N. F. Noy, M. A. Musen, et al., Algorithm and tool for automated ontology merging and alignment, in: Proceedings of the 17th National Conference on Artiﬁcial Intelligence (AAAI-00). Available as SMI technical report SMI-2000-0831, 2000. [12] wikipedia, Ontology components, http://en.wikipedia.org/wiki/Ontology . [13] S. Melnik, H. Garcia-Molina, E. Rahm, Similarity ﬂooding: A versatile graph matching algorithm and its application to schema matching, in: Data Engineering, 2002. Proceedings. 18th International Conference on, IEEE, 117–128, 2002. [14] S. Guha, R. Rastogi, K. Shim, ROCK: A robust clustering algorithm for categorical attributes, in: Data Engineering, 1999. Proceedings., 15th International Conference on, IEEE, 512–521, 1999. 27

¨ [15] M. Emmerich, A. Giotis, M. Ozdemir, T. B¨ack, K. Giannakoglou, Metamodelassisted evolution strategies, in: International Conference on parallel problem solving from nature, Springer, 361–370, 2002. [16] A. Giotis, K. Giannakoglou, J. P´eriaux, A reduced-cost multi-objective optimization method based on the pareto front technique, neural networks and pvm, in: Proceedings of the ECCOMAS, 2000. [17] Y. Jin, M. Olhofer, B. Sendhoﬀ, Managing approximate models in evolutionary aerodynamic design optimization, in: Evolutionary Computation, 2001. Proceedings of the 2001 Congress on, vol. 1, IEEE, 592–599, 2001. [18] K. Giannakoglou, Design of optimal aerodynamic shapes using stochastic optimization methods and computational intelligence, Progress in Aerospace Sciences 38 (1) (2002) 43–76. [19] K. C. Giannakoglou, A. P. Giotis, M. K. Karakasis, Low-cost genetic optimization based on inexact pre-evaluations and the sensitivity analysis of design parameters, Inverse Problems in Engineering 9 (4) (2001) 389–412. [20] M. A. El-Beltagy, P. B. Nair, A. J. Keane, Metamodeling techniques for evolutionary optimization of computationally expensive problems: Promises and limitations, in: Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation-Volume 1, Morgan Kaufmann Publishers Inc., 196–203, 1999. [21] A. Ratle, Accelerating the convergence of evolutionary algorithms by ﬁtness landscape approximation, in: International Conference on Parallel Problem Solving from Nature, Springer, 87–96, 1998. [22] H. Ulmer, F. Streichert, A. Zell, Evolution strategies assisted by Gaussian processes with improved preselection criterion, in: Evolutionary Computation, 2003. CEC’03. The 2003 Congress on, vol. 1, IEEE, 692– 699, 2003. [23] C. Bock, M. Gruninger, PSL: A semantic domain for ﬂow models, Software & Systems Modeling 4 (2) (2005) 209–231.

28

[24] A. Maedche, S. Staab, Measuring similarity between ontologies, in: International Conference on Knowledge Engineering and Knowledge Management, Springer, 251–263, 2002. [25] V. I. Levenshtein, Binary codes capable of correcting deletions, insertions and reversals, in: Soviet physics doklady, vol. 10, 707, 1966. [26] G. A. Miller, WordNet: a lexical database for English, Communications of the ACM 38 (11) (1995) 39–41. [27] S. Massmann, S. Raunich, D. Aum¨ uller, P. Arnold, E. Rahm, Evolution of the COMA match system, in: Proceedings of the 6th International Conference on Ontology Matching-Volume 814, CEUR-WS. org, 49–60, 2011. [28] C. v. Rijsbergen, Information retrieval, Department of Computing Science, University of Glasgow . [29] P. Xu, Y. Wang, B. Liu, A diﬀerentor-based adaptive ontology-matching approach, Journal of Information Science 38 (5) (2012) 459–475. [30] J. Sacks, W. J. Welch, T. J. Mitchell, H. P. Wynn, Design and analysis of computer experiments, Statistical science (1989) 409–423. [31] D. Buche, N. N. Schraudolph, P. Koumoutsakos, Accelerating evolutionary algorithms with Gaussian process ﬁtness function models, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 35 (2) (2005) 183–194. [32] J. Dennis, V. Torczon, Managing approximation models in optimization, Multidisciplinary design optimization: State-of-the-art (1997) 330–347. [33] M. Fleischer, The measure of Pareto optima applications to multiobjective metaheuristics, in: International Conference on Evolutionary Multi-Criterion Optimization, Springer, 519–533, 2003. [34] M. Emmerich, B. Naujoks, Metamodel assisted multiobjective optimisation strategies and their application in airfoil design, in: Adaptive computing in design and manufacture VI, Springer, 249–260, 2004.

29

[35] K. Tu, M. Xiong, L. Zhang, H. Zhu, J. Zhang, Y. Yu, Towards imaging large-scale ontologies for quick understanding and analysis, in: International Semantic Web Conference, Springer, 702–715, 2005. [36] F. Hamdi, B. Safar, C. Reynaud, H. Zargayouna, Alignment-based partitioning of large-scale ontologies, in: Advances in knowledge discovery and management, Springer, 251–269, 2010.

30

Improving the efficiency of NSGA-II based ontology aligning technology

Improving the efficiency of NSGA-II based ontology aligning technology

Recommend Documents