Author’s Accepted Manuscript Improving the Efficiency of NSGA-II based Ontology Aligning Technology Xingsi Xue, Yuping Wang
www.elsevier.com
PII: DOI: Reference:
S0169-023X(16)30365-2 http://dx.doi.org/10.1016/j.datak.2016.12.002 DATAK1576
To appear in: Data & Knowledge Engineering Received date: 14 January 2014 Revised date: 25 May 2015 Accepted date: 8 December 2016 Cite this article as: Xingsi Xue and Yuping Wang, Improving the Efficiency of NSGA-II based Ontology Aligning Technology, Data & Knowledge Engineering, http://dx.doi.org/10.1016/j.datak.2016.12.002 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Improving the Efficiency of NSGA-II based Ontology Aligning Technology Xingsi Xuea,b,c , Yuping Wang∗,a a
School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi, 710071 China b College of Information Science and Engineering, Fujian University of Technology, Fuzhou, Fujian, 350118 China c Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian University of Technology, Fuzhou, Fujian, 350118 China
Abstract There is evidence from Ontology Alignment Evaluation Initiative (OAEI) that ontology matchers do not necessarily find the same correct correspondences. Therefore, usually several competing matchers are applied to the same pair of entities in order to increase evidence towards a potential match or mismatch. How to select the proper matcher’s alignments and efficiently tune them becomes one of the challenges in ontology matching domain. To this end, in this paper, we propose to use the Dynamic Alignment Candidates Selection Strategy and Metamodel to raise the efficiency of the process of using NSGA-II to optimize the ontology alignment by prescreening the less promising aligning results to be combined and individuals to be evaluated in the NSGA-II, respectively. The experiment results show that, comparing with the approach by using NSGA-II solely, the utilization of Dynamic Alignment Candidates Selection Strategy and Metamodel is able to highly reduce the time and main memory consumption of the tuning process while at the same time ensures the correctness and completeness of the alignments. Moreover, our proposal is also more efficient than the state-of-the-art ontology aligning systems. Key words: ontology alignment, Dynamic Alignment Candidates Selection, ∗
Corresponding Author Email addresses:
[email protected] (Xingsi Xue),
[email protected] (Yuping Wang )
Preprint submitted to Data & Knowledge Engineering
January 20, 2017
Metamodel, NSGA-II 1. Introduction With the development of Semantic Web, there will be an explosion in the number of ontologies. Many of these ontologies may co-exist in a same area for similar application purposes. However, because of human subjectivity, these ontologies may define one entity with different names or in different ways, raising so-called heterogeneity problem which poses as a barrier to semantic interoperability on the ontology level [1]. In order to support semantic interoperability in organizations in many domains through disparate ontologies, we must find the semantic correspondences among ontologies’ elements. This process is commonly known as ontology alignment which can be described as follows: given two ontologies, each describing a set of discrete entities (which can be classes, properties, predicates, etc.), find the relationships (e.g., equivalence or subsumption) that hold between these entities [2]. Unfortunately, manually aligning the ontologies is time-consuming, error-prone and clearly not possible on the Web scale. Thus the development of the alignment systems to assist the ontology alignment is crucial for the success of Semantic Web. Nowadays, numerous alignment systems have arisen and each of them could provide, in a fully automatic or semi-automatic way, a numerical value of similarity between elements from diverse ontologies that can be used to determine whether those elements are semantically similar or not. However, there is no single ontology matcher that clearly dominates others and often one performs well in some cases and not so well in some other cases [3]. Therefore, both for design and run time aligning, it is necessary to be able to take advantage of the best configuration of various matchers in an alignment system. For these reasons, to achieve high accuracy for a large variety of ontologies, most current ontology alignment systems combine a set of different matchers by aggregating their independently obtained results [4]. How to select weights and thresholds in ontology aligning process in order to aggregate the aligning results of various similarity measures to obtain a satisfactory alignment is called meta-matching [5]. This can be viewed as an optimization problem and be addressed by evolutionary approaches like Evolutionary Algorithm (EA). Nevertheless, for dynamic applications, it is necessary to perform the combination of similarity measures and system self-tuning at run time, and 2
thus, beside quality (correctness and completeness) of the aligning results, the efficiency (wrt execution time and main memory) of the aligning process is of prime importance especially when a user cannot wait too long for the system to respond or when memory is limited. Therefore, state-of-the-art ontology meta-matching systems tend to adopt different strategies within the same infrastructure to improve the efficiency of aligning process. Even though, the intelligent aggregation of multiple aligning results is still an open problem. According to the reference [56], all ontology alignment processes based on evolutionary approaches developed so far perform an evaluation of the produced alignments based on multiobjectives “a priori” approaches and the utilize of NSGA-II [6] (fast Non-dominated Sorting Genetic Algorithm), which is a popular multi-objective evolutionary algorithm, to solve the ontology alignment problem is able to overcome the well-known drawbacks of “a priori” methods. Therefore, in this paper we utilize NSGA-II to optimize the process of combining four different basic similarity measures (Syntactic Measure, Linguistic Measure, Taxonomy-based Measure and Instance-based Measure) in a meta-matching system. In order to raise the efficiency of the optimizing process, before the tuning process, Dynamic Alignment Candidates Selection Strategy is utilized to prescreen the less promising aligning results, i.e. discard those poorly performed ontology alignments, to be combined and reduce the search space of NSGA-II. In order to reduce the number of time and memory consuming evaluations during the tuning process through NSGA-II, Metamodel is further presented to improve the efficiency and to the best of our knowledge, this is the first time to utilize both Dynamic Alignment Candidates Selection Strateg and Metamodel assisted evolutionary approach to solve the ontology aligning problem. The rest of the paper is organized as follows: section 2 is devoted to discuss the related work; section 3 introduces the basic definitions; section 4 describes Error Ratio based Dynamic Alignment Candidates Selection Strategy; section 5 presents NSGA-II for Ontology Alignment Optimization Problem; section 6 formulates the metamodel-assisted NSGA-II; section 7 shows the experimental results; finally, section 8 draws conclusions and proposes the future improvement.
3
2. Related Work 2.1. Alignment Candidates Selection There is evidence from OAEI [7] that ontology matchers do not necessarily find the same correct correspondences. Usually several competing matchers are applied to the same pair of entities in order to increase evidence towards a potential match or mismatch[3]. Moreover, the system which aggregates many matchers does not always outperform that aggregating some of them. That is to say, some matchers may generate noise which will affect the quality of the final aggregating result. Therefore, how to select the proper matcher’s executing results is a very important problem. With respect to this problem, several state of the art matching systems have proposed their own strategies. RiMOM[8] utilizes more than eight different matchers. Based on the definition of three ontology feature factors, which are estimated based on two ontologies to be matched, that is, label similarity, structure similarity, and label meaning, a strategy selection method is adopted. The matching strategies which are suited to the highest factors are selected to use. But, the association between factors and strategies and the weights which are used to combine the similarity values are all predefined. Falcon-AO[9] uses four elementary matchers. Similarly, the association between detected similarities and matchers to be combined is predefined. The cardinality parameters are not considered at the mapping selection phase. SAMBO and SAMBOdtf[10] have five basic matchers, which are combined using the weighted average of similarities, where the weights are predefined. A strategy based on double threshold which is adopted at the mapping selection phase is described as follows: pairs above the threshold are retained as suggestions, those in between the lower and the upper threshold are filtered using structural information, and the rest is discarded. PROMPT [11],an algorithm for semi-automatic merging and alignment, is able to guide a user to the next possible point of merging or alignment, to suggest what operations should be performed there, and to perform certain operations automatically. PROMPT also determines possible inconsistencies in the state of the ontology, which result from the users actions, and suggests ways to remedy these inconsistencies. In this paper, we utilize an Error Ratio based Dynamic Alignment Candidates Selection Strategy to automatically discard those poorly performed ontology alignments to improve the efficiency of the optimizing process in the premise of alignment quality assurance. 4
2.2. Multi-objective Evolutionary Algorithm and Metamodel Besides selecting matchers, self-configuring or tuning matchers is another open problem in ontology alignment domain. In this paper, we choose to use Multi-objective Evolutionary Algorithm (MOEA), which is a powerful and robust tool for the optimization in difficult search spaces in order to tune the parameters of meta-matching system. Among various MOEAs, NSGA-II is very popular and has been successfully applied to many real-world problems. For example, NSGA-II is applied to solve the combined economic and emission dispatch problem[12], the multi-objective reactive power planning problem[13], the customer churn prediction in telecommunications[14], the feature selection for facial expression recognition[11], etc. NSGA-II focuses on achieving not only a good diversity of Pareto optimal solutions but also a close approximation of Pareto optimal fronts. However, during the process of optimizing the ontology alignment through NSGA-II, a large number of evaluations are needed in order to achieve a sufficient good approximation of the Pareto front. The function evaluations for the problem of optimizing ontology alignment are time and memory consuming. To be specifically, the evaluation function call takes on average 17 seconds on an Intel Core (TM) i7 of 2.93 GHz and 168 GB memory in one generation. In our work, in order to reduce the number of time and memory consuming evaluations, Metamodel, which could be understood as surrogate evaluation models that are built using existing information [15], is introduced to approximate the objective function value using solutions that have already been evaluated during the tuning process through NSGA-II. As the core technology, Metamodeling approach is helpful for NSGA-II to considerably improve the efficiency of solving process by using large number of precise evaluations. Nowadays, various metamodeling approaches for screening less promising solutions have been proposed and the most frequently employed ones are based on artificial neural networks (ANN) and Gaussian Random Field Model (GRFM). With respect to ANN[16], multilayer perceptrons[17] or exactly interpolating radial basis function (RBF) networks[18] are used, either in their standard forms or by incorporating add-on features such as measures for the relative importance of input variables[19]. Regarding GRFM, it is also used to predict objective function values for new candidate solutions by exploiting information recorded during previous evaluations. Unlike ANN, GRFM dose not only provide estimations of function values but also confidence intervals for the predictions. Recent publications show that Metamodel based on GRFM turns out to be quite robust and proved to be successful in many 5
applications in the past[20, 21, 22]. Therefore, in this paper, we use GRFM to accelerate the searching process of NSGA-II. 3. Preliminaries 3.1. Ontology and Ontology Alignment There are many definitions of ontology over years. But the most frequently referenced one was given by Gruber in 1993 which defined the ontology as an explicit specification of a conceptualization. For convenience of the work in this paper, an ontology can be defined as follows: An ontology is a triple O = (C, P, I, A), where: • C is the set of classes, i.e. the set of concepts that populate the domain of interest, • P is the set of properties, i.e. the set of relations existing between the concepts of domain, • I is the set of individuals, i.e. the set of objects of the real world, representing the instances of a concept. • A is the set of axioms, i.e. the main building blocks for fixing the semantic interpretation of the concepts and the relations [23]. In particular, individuals or instances are the basic, ”ground level” components of an ontology. The individuals in an ontology may include concrete objects such as people, animals, tables, automobiles, molecules, and planets, as well as abstract individuals such as numbers and words [23]. In general, classes, properties and individuals are referred to as entities. Ontologies are seen as the solution to data heterogeneity on the web. However, the existing ontologies could themselves introduce heterogeneity: given two ontologies, the same entity can be given different names or simply be defined in different ways, whereas both ontologies may express the same knowledge but in different languages[24]. To solve this problem, a so-called ontology alignment process is necessary. Formally, an alignment between two ontologies can be defined as follows: An alignment A between two ontologies is a set of mapping elements. A mapping element is a 4-tuple (e, e , n, r), where:
6
• e and e are the entities of the first and the second ontology, respectively, • n is a confidence measure in some mathematical structure (typically in the [0, 1] range) holding for the correspondence between the entities e and e , • r is a relation (typically the equivalence) holding between the entities e and e . The ontology alignment process can be defined as follows: The alignment process can be seen as a function φ which, from a pair of ontologies O and O to be aligned, a set of parameters p and a set of resources r, returns a new alignment AN between these ontologies: AN = φ(O, O , p, r). The ontology alignment process computes a mapping element by using a similarity measure, which determines the closeness value n (related to a given relation R) between the entities e and e in the range [0, 1], where 0 stands for complete inequality and 1 for complete equality. Next, we describe a general classification of the most used similarity measures. 3.2. Similarity Measures Typically, similarity measures between entities of each ontology could be categorized in syntactic, linguistic, taxonomy-based and instance-based measures. In the following, we present some common similarity measures belonging to these four categories. 3.2.1. Syntactic Measure Syntactic Measure computes a string distance or edit distance between the ontology entities. In our work, we utilize Levenshtein distance [25] which calculates the number of operations, such as modification, deletion and insertion of a character, which are necessary to transform one string into another. Formally, the Levenshtein distance between two strings s1 and s2 is defined by the following equation: Levenshtein(s1 , s2 ) = max(0, where: 7
min(|s1 |, |s2 |) − d(s1 , s2 ) ) min(|s1 |, |s2 |)
(1)
• |s1 | and |s2 | is the length of string s1 and s2 , respectively, • d(s1 , s2 ) is the number of operation necessary to transform s1 into s2 . Another measure is the Jaro distance [25], an edit distance that uses the number of common characters in the two strings and the positions in which they appear. Given strings s1 and s2 , the Jaro distance is defined as follows: JaroDistance(s1 , s2 ) 1 com(s1 , s2 ) com(s1 , s2 ) com(s1 , s2 ) − trans(s1 , s2 ) = ( + + ) 3 |s1 | |s2 | com(s1 , s2 )
(2)
where: • |s1 | and |s2 | is the length of string s1 and s2 , respectively, • com(s1 ,s2 ) is the number of common characters of s1 and s2 , • trans(s1 ,s2 ) is the number of pairs consisting of common characters that appear in different positions. 3.2.2. Linguistic Measure Linguistic measure calculates the similarity between ontology entities by considering linguistic relations such as synonymy, hypernym, and so on. In the proposed work, WordNet [26], which is an electronic lexical database where various senses of words are put together into sets of synonyms, is used to calculate a synonymy-based distance by considering the name of entities. Given two words w1 and w2 , LinguisticDistance(w1 , w2 ) equals: • 1, if the word w1 and w2 are synonymous, • 0.5, if the word w1 is the hypernym of w2 or vice versa, • 0, otherwise. 3.2.3. Taxonomy-based Measure Taxonomy-based measures consider only the specialization relation. The intuition behind taxonomic measures is that terms connected with subsumption relation are already similar, therefore, their neighbors may be also somehow similar. For instance, if super-concepts are the same, the actual concepts 8
are similar to each other; if sub-concepts are the same, the compared concepts are also similar. In particular, in our work, the taxonomy-distance is calculated through the well known Similarity Flooding (SF) algorithm [? ] where an iterative fix-point computation (see also equation (3)) is utilized to produce an alignment between the elements of two ontologies. δ i+1 = normalize(δ i + f (δ i ))
(3)
where function f increments the similarity value of an element pair based on the similarity of its neighbors, and the previous iterations value (δ i ) changes in each variation. About the details of the SF algorithm, see also [? ]. 3.2.4. Instance-based Measure Instance-based measure exploits the similarity between instances to discover the correspondences between the concepts in ontology. This is motivated by the assumption that the real semantics of a concept is often better defined by the actual instances assigned to the concept. In this paper, we first use Token-based measure, i.e. Q-Gram distance [4] to manage the use of different conventions for describing data (e.g., “Jack Smith”, “Smith, Jack”) and Character-based measure, i.e. Jaro Distance [5] to compare string values and recognize typographical errors (e.g., “Computre Science”, “Computer Science”). Then, we propose an additional function which applies the concept of upPropagation [27], in which the similarities between instances are propagated to their concepts. To combine all the similarity measures mentioned above, an aggregation strategy is needed. In this work, we utilize weighted average aggregation which is defined in the following: n n wi si (c) with wi = 1 and wi ∈ [0, 1] (3) φ(s(c), w) = i=1
i=1
where: • s(c) is the vector of similarity measure results, • w is the vector of weights, • n is the number of similarity measures. Since the quality of resulting alignment, the correctness and completeness of the correspondences found already, need to be assessed, we will introduce some conformance measures which derive from the information retrieval field1[28] in the next section. 9
3.3. Alignment Evaluation The alignment is normally assessed on the basis of two measures commonly known as recall and precision. Recall (or completeness) measures the fraction of correct alignments found in comparison to the total number of correct existing alignments. A recall of 1 means that all of the alignments have actually been found, but it does not provide the information about the number of additionally falsely identified alignment. Typically, recall is balanced against precision (or correctness), which measures the fraction of found alignments that are actually correct. A precision of 1 means that all found alignments are correct, but it does not imply that all alignments have been found. Therefore, recall and precision are often balanced against each other with the so-called f-measure, which is the uniformly weighted harmonic mean of recall and precision. Given a reference alignment R and an alignment A, recall, precision and f-measure are given by the following formulas: |R ∩ A| |R| |R ∩ A| precision = |A| precision · recall f − measure = 2 · precision + recall recall
=
(4) (5) (6)
4. Error Ratio based Dynamic Alignment Candidates Selection Strategy In this section, we present a Error Ratio based Dynamic Alignment Candidates Selection Strategy to prescreen the less promising aligning alignments. In this way the alignment needed to be combined can be reduced, and the efficiency of tuning process by NSGA-II can be improved. In particular, first of all, a differentor based Alignment Candidates Aggregation approach is introduced to self-adaptively obtained an aggregated alignment. Then, on this basis, a Dynamic Alignment Candidates Selection is proposed to discard the alignments having large distances with the aggregated alignment. 4.1. Differentor based Alignment Candidates Aggregation For each similarity measure, an ontology alignment corresponding to a similarity matrix whose row and column are composed of the entities coming 10
from the source ontology and the target ontology respectively is generated. In order to combine these similarity matrices to form a final similarity matrix, it is necessary to determine a combination strategy which can appropriately estimate the quality of each similarity matrix. However, due to the fact that a similarity measure may not work well for all entities in the ontologies, assigning a weight to each similarity matrix is not a good solution. To solve this problem, we introduce a metric termed the differentor which can assign high weights for the entities a similarity measure handles well and low weights for the entities it handles badly when aggregating similarity matrices. The differentor can be defined as the normalized number of mappings that suggests an unambiguous one-to-one mapping in a row of a similarity matrix[29]. For a similarity matrix produced by a similarity measure, the cell in the ith row and jth column represents a candidate alignment < ei , ej , s, =>, where ei and ej are the ith and jth entity of the source ontology and the target ontology respectively, and ei refers to the similarity value between them. The differentor of the ith row can be defined as follows: 1.0, numimax = 1 i (7) dif f erentori = 1 − numnmax , otherwise where numimax is the number of cells with the maximum similarity in the ith row, and n is the number of columns. In our work, a differentor-based similarity aggregation strategy is applied in the process of dynamic alignment selection. In detail, the steps of the differentor-based similarity aggregation are presented as follows: (1) calculate the differentor for each row in all similarity matrices; (2) multiply the similarities in each row by their corresponding differentor for each similarity matrix; (3) add all similarity matrices to form a final similarity matrix; (4) normalize the final similarity matrices. 4.2. Dynamic Alignment Candidates Selection Strategy In order to eliminate the poorly performed ontology alignments, we propose a novel approach based on the intuition that the poorly performed ontology alignments are those having large distances with the aggregated alignment. For all entities in two ontologies O1 and O2 , each similarity measure will generate a similarity matrix S, whose rows and columns are formed by the entities in O1 and O2 respectively, the value of an element in S is the similarity value of two corresponding entities. Given a set of similarity matrices {Sj }, we define the average bias of these multiple similarity matrices 11
as follows:
j( ei1 →ei2 p(M apj (ei1 , ei2 )|Sj , M ap{j} (ei1 , ei2 ))) average bias({Sj }) = total number (8) where: • ei1 → ei2 means one mapping; • p(M apj (ei1 , ei2 )|Sj , M ap{j} (ei1 , ei2 )) is the probability of difference for (ei1 , ei2 ) between aggregated mapping M ap{j} and Sj ’s mapping M apj , which can be calculated by the following formula: p(M apj (ei1 , ei2 )|Sj , M ap{j} (ei1 , ei2 )) =
|simM apj (ei1 , ei2 ) − simM ap{j} (ei1 , ei2 )| max(simM apj (ei1 , ei2 ), simM ap{j} (ei1 , ei2 ))
where simM apj (ei1 , ei2 ) and simM ap{j} (ei1 , ei2 ) refer to the similarity value of ei1 and ei1 in M apj and M ap{j} respectively; • total number is the number of (ei1 , ei2 )whose similarities in the aggregated matrix and each Sj do not both equal 0. With respect to p(M apj (ei1 , ei2 )|Sj , M ap{j} (ei1 , ei2 )) , we take a specific example to illustrate how to calculate it. For example, Sj ’s similarity on mapping (ei1 , ei2 ) is 0.5 and aggregated similarity, i.e. the similarity value of the entities ei1 and ei2 in the aggregated matrix, is 0.6, then p(M apj (ei1 , ei2 )|Sj , M ap{j} (ei1 , ei2 )) =
|0.5 − 0.6| 0.1 = = 0.167 max(0.5, 0.6) 0.6
4.3. Efficient Implementation of Dynamic Alignment Candidates Selection To implement the Dynamic Alignment Candidates Selection, we define the average bias of Sj as average bias(Sj ) = |average bias({Sj }) − average bias({S¯j })| where: average bias({Sj }) is the average bias by all similarity matrices, average bias({S¯j }) is the average bias by all similarity matrices without Sj . The proposed strategy is based on the hypothesis that the higher the average bias of the matrix Sj is, the lower probability of Sj being selected in the final aggregation. In this work, the threshold is set as 0.26 in empirical 12
way to achieve the highest average alignment quality on all test cases of exploited dataset, and the similarity matrix with the average bias value above 0.26 will be screened. By the way, if the average biases of all the similarity matrices are larger than the threshold, then merely one similarity matrix with the lowest average bias will be selected as the final similarity matrix. 5. NSGA-II for Optimizing Ontology Alignment The process of determining the optimal mapping set in order to yield the alignment with the best quality can be regarded as an optimizing process. In the following, Multi-Objective Optimal Model for optimizing the ontology alignment is first presented, and then the details of a problem-specific NSGAII [6], which is a popular Multi-Objective Evolutionary Algorithm, are given. 5.1. Multi-Objective Optimal Model for Optimizing the Ontology Alignments In this section, the multi-objective optimization model for optimizing the ontology alignment problem is presented as follows: ⎧ ⎨ max f (X) = max(Recall(X), P recision(X)) s.t. X = (x1 , x2 , . . . , xn )T , xi ∈ [0, 1] (9) n−1 ⎩ i=1 xi = 1 where the decision variable X is a n-dimension vector where xi , i ∈ [1, n − 1] represents the i − th alignment’s weight to be aggregated and xn the threshold for filtering the aggregated alignment, and the objective of this model is to maximize both recall and precision of the aggregated alignment. Since modeling the meta-matching problem is a complex (nonlinear problem with many local optimal solutions) and time-consuming task (large scale problem), particularly when the number of similarity measures is significantly large, approximate methods are usually used for computing the parameters. From this point of view, evolutionary optimization methods could represent an efficient approach for addressing this problem. Furthermore, in order to overcome the well-known drawback of “a priori” approaches brought by single objective evolutionary optimization approaches, in this paper, a popular multi-objective evolutionary algorithm, i.e. NSGA-II, is utilized to solve the ontology meta-matching problem.
13
Table 1: Outline of NSGA-II
The outline of NSGA-II t = 0; Initialize the Population Pt ; EvaluatePt ; while t < tmax do Gt = generateByGeneticOperators(Pt ); //generate λ variations Evaluate Gt ; Pt+1 = rankAndSelect(Qt ∪ Pt ); //select μ best individuals t = t+1; end while The NSGA-II aims to obtain a well distributed set of points that are close to the pareto front. Both, closeness and diversity, are addressed in the selection operator, where the population is sorted using non-domination ranks as primary sorting criterion, and crowding-distances as secondary sorting criterion. About the details of the non-domination rank and crowding-distance sorting, see also [6]. Basically, the NSGA-II is an elitist EA with (μ + λ) selection, where μ and λ refer to the population size and the number of new solution generated in one generation respectively. The (μ + λ) − EA starts with generating the initial population with μ individuals, and then the EA generates a set of λ new solutions through recombination and mutation. The new candidate solutions are evaluated and ranked in terms of their quality. The outline of NSGA-II is presented in table 1. In the following, five basic steps of NSGA-II for optimizing the ontology alignment are presented. 5.2. Chromosome Encoding We incorporate in a chromosome both the weights associated with the similarity measures and the threshold to decide whether a pair of entities is an alignment or not. Therefore, one chromosome can be divided into two parts, one stands for several weights and the other for threshold. Concerning the characteristics of the weights which are mentioned in 3.2.3, our encoding mechanism indirectly represents them by defining the cut or separation point in the interval [0, 1] that limits the value of the weights. If p is the number of 14
weights required, the set of cuts can be represented as c = {c1 , c2 , ..., cp−1 } . The chromosome decoding is carried out by queuing the elements of c in ascending order, then we get c = {c1 , c2 , ..., cp−1 }, and calculating the weights as follows: ⎧ k=1 ⎨ c1 , ck − ck−1 , 1 < k < p wk = (10) ⎩ 1 − cp−1 , k = p Therefore, the length of a chromosome is (n−1)cutLength+thresholdLength, where n is the number of weights, cutLength and thresholdLengthare the chromosome lengths of the cut and threshold, respectively. 5.3. Fitness Functions Fitness functions are objective functions that evaluate the quality of the alignment obtained by using the weights and the threshold encoded in the chromosome. In our work, there are two fitness functions calculating the recall and precision value of the aggregating result, respectively. 5.4. Genetic Operators 5.4.1. Selection Like in nature, the most suitable chromosomes must have more opportunities of reproducing themselves. The best chromosomes in a population are the chromosomes that have the best fitness value and the genetic information of these chromosomes can potentially provide the best solutions to the problem. Anyway, reproduction opportunities of the less suitable chromosomes should not be completely removed, because it is important to keep diversity in the population. In this article, in order to ensure the diversity of the population and accelerate the convergence of the algorithm, selection operator first queues the chromosomes of population in descending order according to their crowding distances which estimate the density of the solutions. Then we select half of the chromosomes in the front of the population and randomly copy one each time until forming a new population. 5.4.2. Crossover The crossover operator takes two chromosomes called parents and generates two children chromosomes, which are obtained by mixing the genes of the parents. Crossover is applied with a certain probability, a parameter of the genetic algorithm. In this work, we use the common one-cut-point method to carry out the crossover operation on the population. First, a cut 15
position in the chromosomes of two parents is randomly determined and this position is a cut point which cuts each parent into two parts: the left part and the right part. Then, the right parts of them are switched to form two children. 5.4.3. Mutation The mutation operator assures diversity in the population and prevents from premature convergence. In our work, for each bit in the chromosome we check if the mutation could be applied according to the mutation probability and if it is, the value of that bit is then flipped 5.5. Generation of the next generation population First, we put the current population and the new population together and remove the redundancy of the chromosomes. Then, the new population is selected by the non-dominated and crowd-distance sorting technology, see also [6]. 5.6. Elitist Strategy Elitist strategy puts the best chromosome (elite) of the current population unaltered in the next population. This assures the survival of the elite that has been obtained up to the moment. In our work, we regard the individual with the highest f-measure as the elite of current generation and the elite will be updated in each generation during the evolutionary process. When the algorithm terminates, the elite will be recommended to the user as the optimal solution. In order to accelerate the NSGA-II in the presence of time consuming function, Metamodels have been frequently proposed. Next, we introduce the local Metamodel based on GRFM and its application in NSGA-II. 6. Gaussian Random Field Model 6.1. Local Metamodeling based on Gaussian Random Field Model In our work, metamodel is used to model a mathematical relationship, e.g. approximate multivariate function, between the points that have already been evaluated. More precisely, given a set of points x1 , x2 , ..., xn ∈ Rn , and evaluations of the objective functions at these points y 1 = f (x1 ), ..., y n = f (xn ), the metamodel can be used to compute an approximation f(x) ≈ f (x) 16
Figure 1: Outputs of a GRFM with one-dimensional input space. In this example, three points x(1) ,x(2) ,x(3) have been evaluated. The result of each approximate evaluation at a point x is represented by the mean value yˆ and standard deviation sˆ of a one dimensional Gaussian distribution
for any point x ∈ Rn in a time that is considerably faster than the precise evaluations. GRFM is a particular type of interpolation model which can not only predict the objective function value but also provide a measure of confidence for its prediction. Actually, the output of GRFM includes both the mean value and the standard deviation for a one-dimensional Gaussian distribution which represents the likelihood for different realizations of outcomes to represent a precise function evaluation. Figure 1 illustrates an example with one-dimensional input space. Here, the Metamodel we used is based on the techniques discussed in Sacks et al[30]. and we will not go into details. However a brief description shall be provided. Basically, the evaluation function is sought to be the realization of a random process with spatial index. The main assumption is that the random variables are correlated by a distance based correlation function. In this work, we utilize the following Gaussian product kernel which is proposed by Sacks et al[30]. as the correlation function. c(θ1 , θ2 , ..., θd ) =
n
exp(−θi · |xi − xi |2 )
(11)
i=1
where θi , i = 1, 2, ..., d denote correlation parameters. First, the free pa17
rameters, i.e. the θ values, are estimated which is done by means of maximum likelihood method using the given evaluations as sample. Then the conditional distribution of the random field for the new point x is computed. Since the random field is Gaussian, this distribution is a one-dimension conditional Gaussian distribution F(x |X, y), the mean value of which will serve as predictor yˆ, whereas the standard deviation serves as confidence measure sˆ(x ) (see also figure 1). The quantity of the training points, i.e. the solutions need precise evaluation in our work, is the main factor that determines the time consumption and especially for large training sets this is very time consuming. Therefore, it is recommended to use only the minimum necessary subset of the total number of training points available. A well performing heuristic is to use a small number of training points which are closest, with regard to the Euclidian metric, to each new point x and train a locally new valid Metamodel. This strategy, termed local metamodeling by Giannakoglou et al.[16], is an efficient measure for speeding up computation while still achieving a high quality of the Metamodels. For a systematic study of the number of neighbors we refer to D. Buche et al.[31], the authors suggest a reasonable choice for the neighborhood size. At least in empirical studies it has been shown that this parameter setting leads to sufficiently well approximation on typical test problems. In our work, the value of the neighborhood size has been set to 6. Any further increase in 6 seems to slightly improve the results, but at the same time, increases the computation time significantly. 6.2. Prescreening procedure for NSGA-II In particular, the Metamodels can be integrated into evolutionary optimization procedures in two different ways: (1) some generations are evaluated by the evaluation function and some other generations are evaluated solely by the Metamodel; (2) in each generation (apart from the very first one), Metamodels and exact evaluation function are used in cooperative manner. The second approach, which is adopted in our work, turns out to be quite robust and proved to be successful on many applications[21, 22, 31, 32]. In order to prescreen the less promising individuals in the new generation, a ranking algorithm applied over the offspring population is needed. Basically, this ranking algorithm is designed based on the value yˆ(x) which is predicted through Metamodel and its corresponding standard deviation sˆ(x). First, according to Torczon et al.[33, 34], we use the following formula to calculate the predicted value fˆ(x) instead of using yˆ directly. The idea is 18
Figure 2: Illustration of the hypervolume measure. The black rectangle represents the improvement brought by x
to increase the number of evaluations in promising but less explored regions of the search space by directing the search towards them. fˆ(x) = yˆ(x) + sˆ(x);
(12)
Then the Lebesgue measure of the dominated hypervolume, which refers to the volume of a n-dimension hypercube, for a restricted solution space is employed to measure the improvement of the new points[7]. Fleischer proved that, for countable spaces, the hypervolume measure H(P ) of a population P takes its maximum if the set found covers the true pareto set[35]. Furthermore, adding a new point x, the hypervolume increases if and only if x is not dominated by any point in already existing non-dominated solutions Et , and thus Et ∪ {x} could be regarded as an improvement of Et . See more details in figure 2. H(Et ∪ {x}) − H(Et ), if Et nondominates x (13) I(x) = 0, otherwise Finally, a Constant Ratio (CR) selecting strategy[36] is applied to choose the most promising offspring for precise evaluation. The CR strategy makes extensive use of the Metamodel information and thus it has the potential to improve the convergence significantly. In our work, we set the selecting ratio of CR as 0.25 and the outline of the Metamodel assisted NSGA-II is presented in table 2. 19
Table 2: Outline of Metamodel-assisted NSGA-II
The outline of Metamodel-assisted NSGA-II t = 0; Initialize the Population Pt ; Evaluate Pt and insert results into Database D; while t < tmax do Gt = generateByGeneticOperators(Pt ); Evaluate Gt with metamodel derived from D; Choose set of promising individuals Qt ⊆ Gt according to CR; P(t+1) = rankAndSelect(Qt ∪ Pt ); t = t+1; end while There are two distinguished features of the Metamodel assisted NSGA-II comparing with standard NSGA-II: (1) all exactly evaluated individuals are recorded in database; (2) before deciding whether new points need to be evaluated, the object function values for them are predicted by the Metamodel during the prescreening phase. 7. Experimental Results and Analysis In the experiments, the well-known benchmarks provided by the OAEI 2011 [7] are used. Each benchmark in the OAEI data set is composed of two ontologies to be aligned and a reference alignment to evaluate the quality of alignment. Moreover, according to OAEI policies, the benchmark reference alignments take into account only the matching between ontology classes and properties. The ontologies in the tests are described in OWL-DL and serialized in the RDF/XML format. Table 3 shows a brief description about the benchmarks of OAEI 2011. 7.1. Experiments Configuration The similarity measures used are as follows: • Levenshtein distance (Syntactic Measure), • Linguistic distance (Linguistic Measure),
20
Table 3: Brief Description of Benchmarks ID 101 103 104 201 203 204 205 206 221 222 223 224 225 228 230 231 301 302 304
Brief description Strictly identical ontologies A regular ontology and other with a language generalization A regular ontology and other with a language restriction Ontologies without entity names Ontologies without entity names and comments Ontologies with different naming conventions Ontologies whose labels are synonymous Ontologies whose labels are in different languages A regular ontology and other with no specialization A regular ontology and other with a flattened hierarchy A regular ontology and other with a expanded hierarchy Identical ontologies without instances Identical ontologies without restrictions Identical ontologies without properties Identical ontologies with flattening entities Identical ontologies with multiplying entities A real ontology about bibliography made by MIT A real ontology with different extensions and naming conventions A regular ontology and other with a real ontology which is not equivalent but quite close
• Taxonomy distance (Taxonomy-based Measure), • Q-Gram distance and Jaro distance (Instance-based Measure). The NSGA-II uses the following parameters: • Search space for each parameter is the continuous interval [0,1], • Numerical accuracy = 0.01, • The fitnesses are recall and precision, • Population size = 20 individuals, • Crossover probability = 0.6, • Mutation probability = 0.01, • Max generation = 5. After ten independent executions, we noticed that the NSGA-II does not improve the results beyond the fifth generation, so we have set a limit of five generations. The hardware configurations used to run the algorithms are provided below: 21
• Processor: Intel Core (TM) i7, • CPU speed: 2.93GHz, • RAM capacity: 4GB. The results of the experiments are given in the next section. 7.2. Results and Analysis All the values shown in table 4, table 5 and table 6 are the average figures in ten independent runs. Specifically, table 4, where symbol R and P refer to recall and precision respectively, shows the comparison of the qualities of the alignment obtained by the approach using NSGA-II only, the approach using NSGA-II with dynamic alignment candidates selection strategy, the approach using metamodel-assisted NSGA-II and our approach which uses dynamic alignment candidates selection strategy and mtamodel-assisted NSGAII. While table 5 and table 6 present the comparison of the average executing time and main memory consumption per generation by the the approach using NSGA-II only, the approach using NSGA-II with dynamic alignment candidates selection strategy, the approach using metamodel-assisted NSGA-II and our approach, respectively. Table 7 shows the mean values of the results obtained by our approach and the state-of-the-art ontology matching systems [? ] respectively, where 1XX stands for the benchmarks in Table 3 whose number beginning with the prefix digit 1 and so are 2XX and 3XX. As it can be seen from table 4, except benchmark 205, all the other benchmarks’ alignment quality obtained by four approaches are identical to each other. With respect to benchmark 205, although the recall and precision of four alignments are slightly different, the f-measure values obtained by four approaches are the same. Therefore, we may draw the conclusion that, from the aspect of the quality of alignment, our proposal is effective. We can see from the table 5 that, comparing with the approach using NSGA-II only, our approach dramatically improve the executing time in all benchmarks. In particular, the improvement degree is 75.95% on average. Moreover, two proposals, i.e. dynamic alignment candidates selection strategy and metamodel, are also effective when being applying in NSGA-II respectively to improve the executing time. In table 6, comparing with the approach using NSGA-II only, the main memory consumption per generation of our approach dramatically reduce by 65.91% on average in all benchmarks,
22
Table 4: Comparison among the approach using NSGA-II only, the approach using NSGAII with dynamic alignment candidates selection strategy, the approach using metamodelassisted NSGA-II and our approach in terms of the qualities of the alignments ID F-measure (R, P) F-measure (R, P) F-measure (R, P) F-measure (R, P) (NSGA-II only) (with selection strategy) (with metamodel) (our approach) 101 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 103 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 104 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 201 0.94 (0.90, 0.98) 0.94 (0.90, 0.98) 0.94 (0.90, 0.98) 0.94 (0.90, 0.98) 203 0.99 (0.98, 1.00) 0.98 (0.98, 0.99) 0.99 (0.98, 1.00) 0.99 (0.98, 1.00) 204 0.98 (0.98, 0.99) 0.98 (0.98, 0.99) 0.98 (0.98, 0.99) 0.98 (0.98, 0.99) 205 0.93 (0.89, 0.99) 0.93 (0.89, 0.99) 0.93 (0.90, 0.97) 0.93 (0.90, 0.97) 206 0.70 (0.67, 0.73) 0.70 (0.67, 0.74) 0.70 (0.67, 0.74) 0.70 (0.67, 0.74) 221 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 222 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 223 0.99 (0.98, 1.00) 0.99 (0.98, 1.00) 0.99 (0.98, 1.00) 0.99 (0.98, 1.00) 224 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 225 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 228 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 230 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 231 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 301 0.81 (0.81, 0.82) 0.81 (0.81, 0.82) 0.81 (0.81, 0.82) 0.81 (0.81, 0.82) 302 0.85 (0.80, 0.91) 0.85 (0.80, 0.91) 0.85 (0.80, 0.91) 0.85 (0.80, 0.91) 304 0.93 (0.88, 0.99) 0.93 (0.88, 0.99) 0.93 (0.88, 0.99) 0.93 (0.88, 0.99)
and the average value shows that our proposals are both effective when being applying in NSGA-II respectively to reduce the memory consumption. As can be seen from table 7, for the benchmark 1XX, the quality of all ontology matching systems are identical in terms of F-measure, and our approach outperforms all the others in the benchmarks 2XX. Although our approach ranks second in the benchmarks 3XX, from the average value in table 7, the quality of the alignments obtained by our approach is in general better than those by the state-of-the-art ontology matching systems. Since evolutionary approaches are efficient in finding an isomorphism between the sub-graphs modeling the two ontologies, particularly when the considered ontologies are characterized by a significant number of entities [1]. The quality of the alignments obtained by our approach is in general better than those by the state of the art ontology matching systems without using evolutionary approach. With regard to the runtime, RiMOM, Falcon-AO, SAMBO and SAMBOdtf takes approximately 7 seconds, 6 seconds, 12 seconds and 14 seconds perspectively respectively to determine an alignment. While our 23
Table 5: Comparison among the approach using NSGA-II only, the approach using NSGAII with dynamic alignment candidates selection strategy, the approach using metamodelassisted NSGA-II and our approach in terms of the executing time taken per generation ID Time (ns) Time (ns) Time (ns) Time (ns) (NSGA-II only) (with selection strategy) (with metamodel) (our approach) 101 1,766,065,798 1,341,072,770 809,83,148 491,086,716 103 1,941,128,551 1,298,435,579 495,071,617 513,052,637 104 1,936,678,545 1,568,335,0435 882,905,649 531,648,017 201 26,237,786,925 19,667,79,222 1,145,529,885 6,527,802,835 203 23,129,665,922 17,339,927,051 1,010,275,796 5,760,455,309 204 23,137,994,633 18,344,536,264 1,123,146,872 5,757,619,474 205 22,538,261,706 16,904,326,108 9,861,906,611 5,636,454,913 206 22,593,189,358 16,954,422,419 9,905,963,746 5,676,888,543 221 23,208,903,087 17,839,725,069 9,637,467,637 5,909,427,595 222 22,472,139,336 15,481,838,986 6,743,963,548 1,501,238,286 223 28,851,935,372 21,707,637,323 12,777,241,825 7,419,041,247 224 22,796,107,326 17,109,449,105 11,126,330,025 5,736,132,665 225 23,220,966,073 19,431,607,369 10,194,908,989 5,852,889,962 228 5,622,022,072 4,243,440,157 2,520,212,764 1,486,276,329 230 19,158,248,325 16,435,714,629 8,480,898,759 4,921,782,238 231 22,996,779,479 17,255,009,349 1,010,296,756 5,801,469,182 301 11,337,855,230 8,525,370,306 6,009,761,353 2,900,399,632 302 7,734,034,912 5,530,778,167 3,776,707,575 1,124,264,677 304 17,247,855,524 13,926,727,330 7,525,316,457 4,284,469,953 Average 17,259,348,324 13,016,849,876 5,489,941,527 4,096,442,116
approach takes 4 seconds in average to obtained an alignment, which is lower than other state-of-the-art ontology matching systems. According to the experiment results showed above, comparing with the approach by using NSGA-II solely, the utilization of Dynamic Alignment Candidates Selection Strategy and Metamodel-assisted NSGA-II is able to highly reduce the executing time and main memory consumption of the tuning process while at the same time ensures the correctness and completeness of the alignments. 8. Conclusion and Future Work Ontology alignment is an important step in ontology engineering. Although lots of work have been done to tackle this problem, there are still various challenges left for the researchers to deal with. One of these challenges is the selection of matchers and self-configuration of them. For dynamic applications it is necessary to perform matcher combination and self-tuning at 24
Table 6: Comparison among the approach using NSGA-II only, the approach using NSGAII with dynamic alignment candidates selection strategy, the approach using metamodelassisted NSGA-II and our approach in terms of the main memory consumed per generation by evaluation function ID Memory (byte) Memory (byte) Memory (byte) Memory (byte) (NSGA-II only) (with selection strategy) (with metamodel) (our approach) 101 68,485,120 58,396,482 38,219,206 28,130,568 103 33,020,008 20,058,432 18,356,448 12,135,256 104 35,142,512 26,724,610 22,952,234 18,888,808 201 224,281,424 174,471,418 93,794,662 75,580,408 203 219,355,104 166,168,874 92,440,088 74,940,416 204 135,096,032 108,664,162 73,379,396 52,807,184 205 210,275,960 164,365,808 86,698,486 72,554,504 206 206,870,440 161,680,989 80,568,448 71,956,096 221 185,348,632 145,674,757 96,082,414 66,327,008 222 187,774,568 135,582,834 70,341,431 31,199,368 223 222,302,384 185,502,650 95,441,298 81,903,184 224 203,368,408 160,380,802 73,225,949 71,045,104 225 203,790,664 162,194,813 73,223,782 71,847,624 228 174,386,136 134,386,077 84,379,254 54,376,960 230 176,487,368 135,909,858 861,728,879 54,754,840 231 184,253,248 154,878,082 95,659,126 66,127,752 301 84,511,680 84,484,258 33,187,954 33,429,416 302 229,535,904 226,194,546 32,161,840 32,311,840 304 219,960,864 167,646,365 95,451,853 63,017,368 Average 168,644,550 135,440,306 111,436,460 54,385,984
run time, and thus, efficiency of the configuration search strategies becomes critical. To this end, in this paper, we propose to use Dynamic Alignment Candidates Selection Strategy and Metamodel-assisted NSGA-II to tune the parameters of ontology aligning system in order to improve the efficiency by prescreening the less promising alignments to be aggregated and individuals to be evaluated in the NSGA-II, respectively. From the aspect of the quality of the alignment, the executing time and main memory consumption, the experiment results show the efficiency of our approach by comparing with the approach based on NSGA-II solely. It turns out that our approach is able to highly reduce the executing time and main memory consumption of the tuning process while at the same time ensures the quality of the alignment. As the number of available matchers increases, the problem of their selection will become more critical, e.g., when the task will be to handle more 25
Table 7: Comparison of our approach with the state-of-the-art ontology matching systems ID
F-measure (RiMOM) 1XX 1.00 (1.00, 2XX 0.92 (0.87, 3XX 0.82 (0.82, Average 0.91 (0.89,
(R, P) 1.00) 0.97) 0.83) 0.93)
F-measure (R, P) (Falcon-AO) 1.00 (1.00, 1.00) 0.89 (0.89, 0.90) 0.87 (0.83, 0.93) 0.92 (0.90, 0.94)
F-measure (SAMBO) 1.00 (1.00, 0.70 (0.54, 0.86 (0.80, 0.85 (0.78,
(R, P) 1.00) 0.98) 0.95) 0.97)
F-measure (R, P) (SAMBOdtf) 1.00 (1.00, 1.00) 0.71 (0.56, 0.98) 0.85 (0.81, 0.91) 0.85 (0.79, 0.96)
F-measure (R, P) (our approach) 1.00 (1.00, 1.00) 0.96 (0.95, 0.98) 0.86 (0.83, 0.91) 0.94 (0.92, 0.96)
than 50 matchers within one system. In continuation of our research, study is now being carried out on large scale matchers’ selection. We are also interested in improving the efficiency when solving the large-scale ontology aligning problem, which is another challenge problem in ontology matching domain. A feasible method could be that firstly the ontology is partitioned into various small segments and then the aligning process is carried out between similar segments, so that not all data has to be kept in main memory and the executing time could be also improved. 9. Acknowledgement This work is supported by the National Natural Science Foundation of China (Nos. 61272119 and 61503082), Natural Science Foundation of Fujian Province (No. 2016J05145) and China Scholarship Council. References [1] B. Huang, B. Buckley, T.-M. Kechadi, Multi-objective feature selection by using NSGA-II for customer churn prediction in telecommunications, Expert Systems with Applications 37 (5) (2010) 3638–3646. [2] J. Euzenat, P. Valtchev, et al., Similarity-based ontology alignment in OWL-lite, in: ECAI, vol. 16, 333, 2004. [3] P. Shvaiko, J. Euzenat, Ontology matching: state of the art and future challenges, IEEE Transactions on knowledge and data engineering 25 (1) (2013) 158–176. [4] J. Tang, Y. Liang, Z. Li, Multiple strategies detection in ontology mapping, in: Special interest tracks and posters of the 14th international conference on World Wide Web, ACM, 1040–1041, 2005.
26
[5] J. Euzenat, P. Shvaiko, et al., Ontology matching, vol. 18, Springer, 2007. [6] K. Deb, S. Agrawal, A. Pratap, T. Meyarivan, A fast elitist nondominated sorting genetic algorithm for multi-objective optimization: NSGA-II, in: International Conference on Parallel Problem Solving From Nature, Springer, 849–858, 2000. [7] OAEI, Ontology Alignment Evaluation http://oaei.ontologymatching.org/2015 .
Initiative
(OAEI),
[8] J. Tang, J. Li, B. Liang, X. Huang, Y. Li, K. Wang, Using Bayesian decision for ontology mapping, Web Semantics: Science, Services and Agents on the World Wide Web 4 (4) (2006) 243–262. [9] N. Jian, W. Hu, G. Cheng, Y. Qu, Falcon-ao: Aligning ontologies with falcon, in: Proceedings of K-CAP Workshop on Integrating Ontologies, 85–91, 2005. [10] P. Lambrix, H. Tan, Q. Liu, SAMBO and SAMBOdtf results for the ontology alignment evaluation initiative 2008, in: Proceedings of the 3rd International Conference on Ontology Matching-Volume 431, CEURWS. org, 190–198, 2008. [11] N. F. Noy, M. A. Musen, et al., Algorithm and tool for automated ontology merging and alignment, in: Proceedings of the 17th National Conference on Artificial Intelligence (AAAI-00). Available as SMI technical report SMI-2000-0831, 2000. [12] wikipedia, Ontology components, http://en.wikipedia.org/wiki/Ontology . [13] S. Melnik, H. Garcia-Molina, E. Rahm, Similarity flooding: A versatile graph matching algorithm and its application to schema matching, in: Data Engineering, 2002. Proceedings. 18th International Conference on, IEEE, 117–128, 2002. [14] S. Guha, R. Rastogi, K. Shim, ROCK: A robust clustering algorithm for categorical attributes, in: Data Engineering, 1999. Proceedings., 15th International Conference on, IEEE, 512–521, 1999. 27
¨ [15] M. Emmerich, A. Giotis, M. Ozdemir, T. B¨ack, K. Giannakoglou, Metamodelassisted evolution strategies, in: International Conference on parallel problem solving from nature, Springer, 361–370, 2002. [16] A. Giotis, K. Giannakoglou, J. P´eriaux, A reduced-cost multi-objective optimization method based on the pareto front technique, neural networks and pvm, in: Proceedings of the ECCOMAS, 2000. [17] Y. Jin, M. Olhofer, B. Sendhoff, Managing approximate models in evolutionary aerodynamic design optimization, in: Evolutionary Computation, 2001. Proceedings of the 2001 Congress on, vol. 1, IEEE, 592–599, 2001. [18] K. Giannakoglou, Design of optimal aerodynamic shapes using stochastic optimization methods and computational intelligence, Progress in Aerospace Sciences 38 (1) (2002) 43–76. [19] K. C. Giannakoglou, A. P. Giotis, M. K. Karakasis, Low-cost genetic optimization based on inexact pre-evaluations and the sensitivity analysis of design parameters, Inverse Problems in Engineering 9 (4) (2001) 389–412. [20] M. A. El-Beltagy, P. B. Nair, A. J. Keane, Metamodeling techniques for evolutionary optimization of computationally expensive problems: Promises and limitations, in: Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation-Volume 1, Morgan Kaufmann Publishers Inc., 196–203, 1999. [21] A. Ratle, Accelerating the convergence of evolutionary algorithms by fitness landscape approximation, in: International Conference on Parallel Problem Solving from Nature, Springer, 87–96, 1998. [22] H. Ulmer, F. Streichert, A. Zell, Evolution strategies assisted by Gaussian processes with improved preselection criterion, in: Evolutionary Computation, 2003. CEC’03. The 2003 Congress on, vol. 1, IEEE, 692– 699, 2003. [23] C. Bock, M. Gruninger, PSL: A semantic domain for flow models, Software & Systems Modeling 4 (2) (2005) 209–231.
28
[24] A. Maedche, S. Staab, Measuring similarity between ontologies, in: International Conference on Knowledge Engineering and Knowledge Management, Springer, 251–263, 2002. [25] V. I. Levenshtein, Binary codes capable of correcting deletions, insertions and reversals, in: Soviet physics doklady, vol. 10, 707, 1966. [26] G. A. Miller, WordNet: a lexical database for English, Communications of the ACM 38 (11) (1995) 39–41. [27] S. Massmann, S. Raunich, D. Aum¨ uller, P. Arnold, E. Rahm, Evolution of the COMA match system, in: Proceedings of the 6th International Conference on Ontology Matching-Volume 814, CEUR-WS. org, 49–60, 2011. [28] C. v. Rijsbergen, Information retrieval, Department of Computing Science, University of Glasgow . [29] P. Xu, Y. Wang, B. Liu, A differentor-based adaptive ontology-matching approach, Journal of Information Science 38 (5) (2012) 459–475. [30] J. Sacks, W. J. Welch, T. J. Mitchell, H. P. Wynn, Design and analysis of computer experiments, Statistical science (1989) 409–423. [31] D. Buche, N. N. Schraudolph, P. Koumoutsakos, Accelerating evolutionary algorithms with Gaussian process fitness function models, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 35 (2) (2005) 183–194. [32] J. Dennis, V. Torczon, Managing approximation models in optimization, Multidisciplinary design optimization: State-of-the-art (1997) 330–347. [33] M. Fleischer, The measure of Pareto optima applications to multiobjective metaheuristics, in: International Conference on Evolutionary Multi-Criterion Optimization, Springer, 519–533, 2003. [34] M. Emmerich, B. Naujoks, Metamodel assisted multiobjective optimisation strategies and their application in airfoil design, in: Adaptive computing in design and manufacture VI, Springer, 249–260, 2004.
29
[35] K. Tu, M. Xiong, L. Zhang, H. Zhu, J. Zhang, Y. Yu, Towards imaging large-scale ontologies for quick understanding and analysis, in: International Semantic Web Conference, Springer, 702–715, 2005. [36] F. Hamdi, B. Safar, C. Reynaud, H. Zargayouna, Alignment-based partitioning of large-scale ontologies, in: Advances in knowledge discovery and management, Springer, 251–269, 2010.
30