Engineering Applications of Artificial Intelligence 36 (2014) 238–261
Contents lists available at ScienceDirect
Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai
Ontology-based approach for measuring semantic similarity Mohamed Ali Hadj Taieb n, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou Multimedia Information System and Advanced Computing Laboratory, Sfax University, Sfax 3021, Tunisia
art ic l e i nf o
a b s t r a c t
Article history: Received 3 March 2014 Received in revised form 18 July 2014 Accepted 19 July 2014
The challenge of measuring semantic similarity between words is to find a method that can simulate the thinking process of human. The use of computers to quantify and compare semantic similarities has become an important area of research in various fields, including artificial intelligence, knowledge management, information retrieval and natural language processing. The development of efficient measures for the computation of concept similarity is fundamental for computational semantics. Several computational measures rely on knowledge resources to quantify semantic similarity, such as the WordNet « is a » taxonomy. Several of these measures are based on taxonomical parameters to achieve the best expression possible for the semantics of content. This paper presents a new measure for quantifying the degree of the semantic similarity between concepts and words based on the WordNet hierarchy and using a number of topological parameters related to the “is a” taxonomy. Our proposal combines, in a complementary way, the hyponyms and depth parameters. This measure takes the problem of fine granularity into account. It is argued, however, that WordNet sense distinctions are highly fine-grained even for humans. We, therefore, propose a new method to quantify the hyponyms subgraph of a given concept based on depth distribution. Common nouns datasets (RG65, MC30 and AG203), medical terms dataset (MED38) and verbs dataset (YP130) formed by word pairs are used in the assessment. We start by calculating semantic similarities and then compute the correlation coefficient between human judgement and computational measures. The results demonstrate that, compared to other currently available computational methods, the measure presented in this study yields into better levels of performance. Compared to several measures, it shows good accuracy covering all the pairwises of the verbs dataset YP130. & 2014 Elsevier Ltd. All rights reserved.
Keywords: Semantic similarity WordNet ontology Taxonomic knowledge Taxonomical parameters
1. Introduction The measurement of semantic similarity is exploited in several research fields, including artificial intelligence, knowledge management, information retrieval and mining, and several other biomedical applications. This notion is critical in determining the relatedness between a pair of concepts or words. Several efforts have been made to define Semantic Similarity (SS) measures using a lexical database such as WordNet. The semantic similarity estimation between words/concepts, understood as their degree of taxonomical resemblance (Goldstone, 1991), has many direct and relevant applications. Some basic natural language processing tasks, such as word sense disambiguation (Patwardhan et al., 2003), synonym detection (Lin, 1998) or automatic spelling error detection and correction (Budanitsky and Hirst, 2001), rely on the assessment of words' semantic resemblance. Other direct applications can be found in the knowledge management field, including thesauri generation
n
Corresponding author. Tel.: þ 216 24 68 83 54. E-mail address:
[email protected] (M.A. Hadj Taieb).
http://dx.doi.org/10.1016/j.engappai.2014.07.015 0952-1976/& 2014 Elsevier Ltd. All rights reserved.
(Curran, 2002), information extraction (Atkinson et al., 2009; Stevenson and Greenwood, 2005), semantic annotation (Sánchez et al., 2011), biomedical domain (Sánchez and Batet, 2011) and ontology merging (Formica, 2008; Gaeta et al., 2009) and learning (Sánchez, 2010; Sánchez and Moreno, 2008), in which new concepts related to already existing ones should be discovered or acquired from texts. The most popular way for people to compare two objects and acquire knowledge is the similarity between those two objects. For humans, it is easy to say if one word is more similar to a given word than another. For example, we can easily say that car is more similar to automobile than car is to journey. The semantic quantification is based on semantic resources by exploiting the knowledge existing inside these resources. Some of the most popular semantic similarity measures are implemented and evaluated using WordNet1 as the underlying reference ontology. Some works try to use other resources like the Wikipedia Category Graph in works of Hadj Taieb et al. (2013a, 2013b), (2012) and Zesch (2010). In fact, there exists a great difference between WordNet and WCG.
1
http://wordnet.princeton.edu/.
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
WordNet is designed and realized by experts and includes several types of semantic relations that are expressed explicitly. As for WCG, the categories are proposed by volunteers who often rely on them without specifying the type of semantic relation. It, therefore, needs a pre-treatment step that would be subsequently used in this topic. The proposed measures exploited the textual (gloss) and structural (taxonomic parameters) aspects of WordNet and mainly the taxonomy “is a”. Textual measures are based on weighting the word overlapping within the glosses. However, they are not very well developed in WordNet because the glosses are very short to be used in such measures. Several measures judged as structural approaches exploited the taxonomic parameters extracted from the “is a” taxonomy. Among these approaches we can cite:
Edge-counting measures assess similarity based on the number
of taxonomic links and the minimum path length between two concepts present in a given ontology (Wu and Palmer, 1994; Leacock and Chodorow, 1998; Li et al., 2003; Rada et al., 1989). Information content-based approach quantifies the similarity between concepts as a function of the information content (IC) that both concepts have in common in a given ontology. The basic idea is that general and abstract entities found in a discourse present less IC than more concrete and specialized ones (Hadj Taieb et al., 2013a; 2013b); Zhou et al., 2008b; Meng et al., 2012; Tversky, 1977; Sánchez et al., 2011; Sebti and Barfroush, 2008; Seco et al., 2004). Feature-based measures estimate similarity according to the weighted sum of the number of common and non-common features (Sánchez et al., 2012). By features, authors usually consider taxonomic and non-taxonomic information modeled in ontology, in addition to concept descriptions (e.g., glosses) retrieved from dictionaries (Tversky, 1977; Petrakis et al., 2006; Rodríguez and Egenhofer, 2003). However, they usually rely on non-taxonomic features that are rarely found in ontologies (Ding et al., 2004) and that require the fine-tuning of weighting parameters to integrate heterogeneous semantic evidences (Petrakis et al., 2006). Gloss-based measures exploit the short definitions provided by WordNet in order to quantify the overlaps between the glosses of two concepts with their semantic neighbors (Banerjee and Pedersen, 2003; Lesk, 1986; Patwardhan and Pedersen, 2006). Hybrid measures combine between measures conceived for different methods in order to merge the advantages of measures (Zhou et al, 2008a).
In this paper, we propose a new structural measure based on taxonomical parameters which are extracted from WordNet “is a” taxonomy. Mainly, our proposal used a new method for quantifying the subgraph formed by the hyponyms and the depth ratio between the two concepts concerned by the semantic similarity task and the lowest common subsumer. Furthermore, we treat the problem of fine-grained representation of synsets inside the WordNet that can affect negatively the computed similarity degree. The rest of the paper is organized as follows. Section 2 provides an overview about the set of measures highlighted in structure-based approaches exploiting the WordNet. Section 3 presents the different taxonomical parameters extracted from the WordNet “is a” taxonomy and used in previous works to determine the most significant parameters in the semantic similarity topic. Section 4 provides a detailed description of our proposed structural measure. Section 5 presents the used datasets for semantic similarity assessment and the metrics exploited to express their performance. Section 6 reports on the evaluation and comparison of our measure against currently available ones using known benchmarks. The final section is devoted
239
to presenting our conclusions and recommendations for future research.
2. Ontology-based semantic similarity measures: WordNet as case study WordNet is a lexical database for the English language (Fellbaum, 1998). It was created and is being maintained at the Cognitive Science Laboratory of Princeton University under the direction of psychology professor George A. Miller. It groups words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. WordNet is particularly well suited for similarity measures, since it organizes nouns and verbs into hierarchies of is-a relations. Fig. 1 illustrates a fragment of the WordNet 3.0 “is a” hierarchy. Several measures for determining semantic similarity between words and concepts have been proposed in the literature and most of them have been tested on WordNet. Similarity measures are applied only for nouns and verbs in WordNet (taxonomic properties for adverbs and adjectives do not exist). The measures can be grouped into five classes: path based measures, information content (IC) based measures, gloss based measures, feature based measures, and hybrid measures. 2.1. Path-only based measures The main idea of path-based measures is the fact that the similarity between two concepts is a function of the length of the path linking the concepts and their positions in the taxonomy. Rada (Ra) states that ontologies can be seen as a directed graph in which concepts are interrelated mainly by means of taxonomic (is-a) and, in some cases, non-taxonomic links. A simple measure to calculate their similarity is to compute the minimum path length linking their corresponding ontological nodes via is-a links. The longer the path, the more semantically far the concepts are. Let us define path(c1,c2)¼ l1,..., lk as a set of links connecting the concepts c1 and c2 in a taxonomy. Let |path(c1,c2)| ¼ k be the length of this path. Then, considering all the possible paths from c1 to c2, their semantic distance as defined by (Rada et al., 1989) is: disrad ðc1 ; c2 Þ ¼ minjpathi ðc1 ; c2 Þj 8i
ð1Þ
Hirst and St-Onge (1997) (HSO) define the similarity between concepts as a path distance between two concepts. Hirst and St-Onge categorizes the semantic relations in the WordNet into three types of relations namely extra strong relations, strong relations and medium strong relations. simHS ðc1 ; c2 Þ ¼ C path length k d
ð2Þ
Where d is the number of changes of direction in the path, and C and k are the constant parameters (C ¼8 and k ¼1 are used by the authors); if no such path exists, simHS(c1, c2) is zero and the concepts are unrelated. The following path directions are considered: upward (such as hypernymy and meronymy), downward (such as hyponymy and holonymy) and horizontal (such as antonymy). Due to the non-taxonomic nature of some of the relations considered during the assessment, Hirst and St-Onge's measure captures a more general sense of relatedness than of taxonomical similarity. Despite their simplicity, the edge counting approaches suffer from the problem of relying on the edges of the taxonomy to represent uniform distances. Sussna states that the distance represented by an edge should decrease with increasing depth (Sussna, 1993). The authors consider the depth of nodes as the longest path from the root of the taxonomy to the target concept.
240
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
Fig. 1. WordNet “is a” taxonomy fragment. The number between parentheses is the number of direct hyponyms of each concept including itself. The notation w#i#j refers to the synset number i of the word w among its j synsets.
2.2. Path and depth based measures Wu and Palmer (1994) (WP) proposed a new measure on semantic representation of verbs and analyzed the impact on lexical selection problems in machine translation. Wu and Palmer define semantic similarity measure between concepts c1 and c2 as follows: 2H SimWP ðc1 ; c2 Þ ¼ N 1 þ N 2 þ2 H
distðc1 ; c2 Þ ¼ ð3Þ
where N1 and N2 refer to the number of “is a” links from c1 and c2, respectively, to the lowest common subsumer c, and H to the number of “is a” links from c to the root of the taxonomy. Sussna (1993) (Su) uses the depth-relative scaling and several types of relations to define his semantic relatedness measure. For each relation r and its inverse r0 (such as r: hypernymie and r0 : hyponymie), he defines a weight w(c1-rc2) from a given interval [minr; maxr]. This weight is calculated with the local density which corresponds to the number of relations of the type r which go from c1 ðnr ðc1 ÞÞ: wðc1 -r c2 Þ ¼ maxr
maxr minr nr ðc1 Þ
which is denoted as the type specific fanout factor. The fanout factor denotes the dilution of the strength of the connotation between the source and the target concepts and takes into account the possible asymmetry between the two nodes where the strength of connotation differs from one direction to that in the other direction.
ð4Þ
The point in the range for a relation r linking concept c1–c2 depends on the number nr of edges of the same type, leaving c1,
wðc1 -r c2 Þ þwðc1 -r0 c2 Þ 2 max ½depthðc1 Þ; depthðc2 Þ
ð5Þ
For example, hypernymy, hyponymy, holonymy and meronymy relations have weights between minr ¼1 and maxr ¼ 2. As for synonymy and antonymy, he assigns the values 0 and 2.5, respectively. Finally, the semantic distance between two arbitrary nodes c1 and c2 is the sum of the distances between the pairs of adjacent nodes along the shortest path connecting them. Leacock and Chodoro (1998) (LC) proposed an approach for measuring semantic similarity as the shortest path using is-a hierarchy for concepts in WordNet. The different noun hierarchies are combined into a single hierarchy so that all nodes will have a path. The taxonomic path length between concepts has been considered and it is mapped into a similarity value using a logarithmic function. In this measure, the similarity between two concepts is determined by the length of the shortest path that connects them in the WordNet taxonomy. The length of the path
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
that is found is scaled to a value between 0 and 1 and similarity is then calculated as the negative logarithm of this value. However, they exploit only the “is-a” links and scale the path length by the depth D of the taxonomy. The Leacock and Chodorow measure is expressed as follows:
241
edge-counting techniques. The fundamental idea of this measure is based on the assumption that human judgment process for semantic similarity can be simulated by the ratio of common features to the total features between words. To achieve this consideration, they proposed two following formulae:
ð6Þ
SLiu 1 ðc1 ; c2 Þ ¼
αd α dþβ l
ð11Þ
Li et al. (2003) (Li) have proposed similarity measure that overcomes the weakness of Rada edge counting method. Rada measure quantifies similarity based on the path length alone which yielded good results for much constrained medical semantic networks. When tested for more general semantic nets like WordNet the results were not good. Li et al. consider the shortest length L between two concepts and the depth of their lowest common subsumer H to compute similarity. The similarity between concepts c1 and c2 is defined as non-linear function:
SLiu 2 ðc1 ; c2 Þ ¼
eαd 1 eαd þeβl 2
ð12Þ
simLC ðc1 ; c2 Þ ¼ log
simLi ðc1 ; c2 Þ ¼ e αL
lengthðc1 ; c2 Þ 2D
eβH e βH eβH þ e βH
ð7Þ
where α Z 0 and β Z 0 refer to parameters scaling the contribution of the shortest path length and depth, respectively. Based on empirical study, the optimal parameters are α¼0.2 and β ¼0.6. Al-Mubaid and Nguyen (2006) (AMN) proposed a cluster-based measure that combines the minimum path length and the taxonomical depth for measuring the distance between two given concepts. They define clusters for each of the branches in the hierarchy with respect to the root node. They measure the common specificity of two concepts by subtracting the depth of their lowest common subsumer (LCS) from the depth D, which is the depth of the taxonomy: CSpecðc1 ; c2 Þ ¼ D depthðLCSðc1 ; c2 ÞÞ
8i
2.3. Information content-based measures The IC-based similarity measure was first introduced by Resnik (1998). The concept has then been modified and extended by several authors to include other methods. Although they commonly rely on IC values assigned to the concepts in the ontology. IC-based measures are based on couples (IC computing method, IC measure). Concerning the computing IC methods, they follow two strategies: statistical corpora analysis and exploiting only the topological parameters of “is a” taxonomy known as intrinsic computing method. A complete survey of IC-measures is presented in the next paragraph.
ð8Þ
The common specificity expresses the fact that lower level pairs of concept nodes are more similar than higher level pairs, as in Wu and Palmer approach. So, the proposed distance measure is defined as follows: disAN ðc1 ; c2 Þ ¼ log ððminjpathi ðc1 ; c2 Þj 1Þα ðCSpecÞβ þ kÞ
Where l is the shortest path length between c1 and c2; d is the depth of the subsumer between c1 and c2 in hierarchical semantic nets and, α and β are smoothing factors ð0 o α; β r 1Þ. The experiments showed that the best correlations are reached for SLiu 1 with (α¼0.5 and β¼ 0.55) and SLiu 2 with (α¼ 0.25 and β¼ 0.25). The correlations are, respectively, r¼0.91 and r¼0.92 for the dataset MC30. But, these results hide the fact that their measure provides for 8 couples among 30 the 0-value, i.e., coverage of 73%.
ð9Þ
where α 40 and β40 are the contribution factors of path length and common specify features and k is a constant (k ¼1). Moreover, they performed their experiments by giving the same weight to two components (path length and common specify) by using α¼ β¼1. Hao et al. (2011) (Ha) used the semantic distance between two concepts (the shortest path length: jpathðc1 ; c2 Þj) and the depth of LCS in the lexical hierarchical tree based on WordNet to signify the different points and calculate the similarity of the words. They proposed the following formula for calculating the similarity between two words where α and β are the smoothing factors. jpathðc1 ; c2 Þj SimHao ðc1 ; c2 Þ ¼ 1 jpathðc1 ; c2 Þj þ DepthðLCSðc1 ; c2 ÞÞ þ β DepthðLCSðc1 ; c2 ÞÞ ð10Þ jpathðc1 ; c2 Þj þDepthðLCSðc1 ; c2 ÞÞ=2 þ α In the above formula, when Depth(LCSðc1 ; c2 Þ) ¼0, the two words have the least common attributes and the similarity is 0. The interval of α is [0,1] and increasing step is 0.1. The interval of β is (Hliaoutakis, 2005; Devitt and Vogel, 2004) and the increasing step is 1. The experiment demonstrates that when α¼0, β¼1.0, the correlations reach the maximum value. Liu et al. (2007) (Liu) presented a different measure to estimate semantic similarity between concepts in WordNet, using
2.3.1. Similarity measures exploiting the IC Several proposals have been discussed in research papers, in order to express as better as possible the semantic similarity between two concepts pertaining to an ontological structure such as WordNet “is a” taxonomy. Resnik. The Resnik measure Resnik (1995) (Re) was the first to merge ontology and corpus. Guided by the intuition that the similarity between a pair of concepts may be judged by “the amount of shared information”, Resnik defined the similarity between two concepts as the IC of their lowest common subsumer, LCS(c1, c2): SimRes ðc1 ; c2 Þ ¼ ICðLCSðc1 ; c2 ÞÞ
ð13Þ
Jiang-Conrath. The notion of IC is also used in the approach of Jiang and Conrath (Jiang and Conrath, 1997) (JC). This approach subtracts the IC of the LCS from the sum of the IC of the individual concepts. It is worth noting that this is a dissimilarity measure because the more different the terms are, the higher the difference between their IC and the IC of their LCS will be. DisJC ðc1 ; c2 Þ ¼ ðICðc1 Þ þ ICðc2 ÞÞ 2ICðLCSðc1 ; c2 ÞÞ
ð14Þ
Lin. The similarity measure described by Lin (1998) (Lin) uses the same elements as DisJC, but in a different way: SimLin ðc1 ; c2 Þ ¼
2ICðLCSðc1 ; c2 ÞÞ ICðc1 Þ þ ICðc2 Þ
ð15Þ
Pirro. He proposes a new similarity measure (Pirró, 2009) (Pi) that is conceptually similar to the previous ones but is based on the feature-based theory of similarity posed by Tversky (1977). His
242
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
conceptualization is based on previous studies (Pirró and Seco, 2008; Seco, 2004) where the concept was first worked out. According to Tversky, the similarity of a concept c1 to a concept c2 is a function of the features common to c1 and c2, those in c1 but not in c2, and those in c2 but not in c1. He shows that the semantic similarity between concepts can be computed as follows: Simtvr ðc1 ; c2 Þ ¼ 3 ICðLCSðc1 ; c2 ÞÞ ICðc1 Þ– ICðc2 Þ Finally, he defines his own measure as follows: ( Simtvr ðc1 ; c2 Þ if c1 ac2 SimP&S ðc1 ; c2 Þ ¼ 1 if c1 ¼ c2
ð16Þ
ICðcÞ ¼ 1 ð17Þ
Meng. This measure (Meng et al., 2012) (Me) is based on Lin's measure. From Eq. (15) we can see that it will monotonically increase with SimLin. It is expressed by the following equation: SimMeng ðc1 ; c2 Þ ¼ eSimLin ðc1 ;c2 Þ 1
ð18Þ
In these measures IC is computed in two principal ways, namely Corpora-based using external resources and intrinsic structure computing based on knowledge structures, which are explained below.
2.3.2. Corpora based IC calculus methods The earliest approach to calculate IC requires a corpus of text documents related to the domain of the ontology to determine the IC of a concept. Resnik (1995) was the first to consider the use of this idea, which is inspired from the work of Shannon (1948), for the purpose of semantic similarity judgments. His work estimates the frequencies of concepts in taxonomy using noun frequencies from the Brown Corpus of American English (Francis and Kučera, 1982). The IC value is then calculated by negative log likelihood equation as follows: ICðcÞ ¼ log ðpðcÞÞ
ð19Þ
Where c refers to a concept and p to the probability of encountering c in a given corpus. The basic idea behind the use of the negative likelihood is that the more probable a concept appears, the less information it conveys. More succinctly, specific words are more informative than general ones. Each noun that occurs in the corpus is counted as an occurrence of each taxonomic class that contains it as follows: FreqðcÞ ¼
∑
CountðwÞ
ð20Þ
w A WordðcÞ
where Word(c) refers to the set of words subsumed by the concept c and Count(w) to the frequency of the word w in the corpus. Then, p(c) is computed as follows: pðcÞ ¼
FreqðcÞ N
2.3.3. Intrinsic IC computation methods Several studies have indicated that knowledge resources can also be used with no need for external ones. The following paragraphs present these IC measures following their chronological appearance in the literature. Seco et al. (2004) (Seco) present a comprehensive IC intrinsic measurement which is connected only to the hierarchical structure of an ontology. The IC of a concept c depends on the concept which it subsumes. The computation equation is as follows:
ð21Þ
where N refers to the total number of observed nouns, except those which are not subsumed by any class in the ontology. This approach has several inadequacies, particularly those pertaining to the determination of a suitable corpus, the effort required to calculate probabilities, the need for a correct disambiguation of each noun, and the necessity of updating such probabilities due to changes in the corpus. For these reasons, several works have proposed the use of only the taxonomic structure without making recourse to external corpora. These approaches are called intrinsic information content.
log ðjhypoðcÞj þ 1Þ log ðmax _nodesÞ
ð22Þ
where hypo(c) refers to a function that returns the hyponyms of a given concept and max_nodes to a constant that represents the maximum number of concepts in the knowledge resource (in WordNet 3.0, max_nodes ¼82115 for Noun POS (Part Of Speech)). Sebti and Barfroush (2008) (Sebti) use the hierarchical structure of the resource and implicitly includes the depth of a target concept. This method is based on the number of direct hyponyms of each concept pertaining to the initial path of the root till reaching the target concept. In Fig. 1, the numbers on the left represent the number of direct hyponyms for each concept. For a better understanding of this method, Eq. (23) computes the IC of the concept vehicle#1#4: 1 1 1 1 1 1 1 ¼ 18:2939 ICðvehicle#1#4Þ ¼ Log 4 7 38 8 46 15 15 ð23Þ Zhou et al. (2008b) (Zhou1) present a new approach to overcome the limitations in the Seco method which considers only hyponyms of a given concept. In brief, concepts with the same number of hyponyms but different degrees of generality will be considered equally similar. In fact, Zhou et al. proposed to enhance hyponym-based IC computation with the relative depth of the concept in the taxonomy, which is integrated in a formula with tuning factor: log ðjhypoðcÞj þ 1Þ log ðdepthðcÞÞ ICðcÞ ¼ k 1 þð1 kÞ log ðmax _nodesÞ log ðmax _depthÞ ð24Þ In addition to hypo(c) and max_nodes, which have the same meaning as in Eq. (22), depth(c) refers to the depth of the concept c in the taxonomy and max_depth to the maximum depth of the taxonomy. The parameter k is a tuning factor that adjusts the weight of the two features used in the IC formula. The authors performed experiments using WordNet with k being set at 0.5. Sánchez et al. (2011) (Sanch) followed another strategy and did not include the depth notion. They used the hyponyms through the leaves of the hyponym tree of a concept and integrated a novel parameter, subsumers(c). In fact, they consider that the leaves are enough to describe and differentiate the concept from any other one. Formally, they define the leaves and subsumers of a concept c as: Leaves(c) ¼{l A C/l A hyponyms(c)4 l is a leaf} where C is the set of concepts of the taxonomy. Subsumers(c) ¼{a A C/cr a} [ {c} where c ra means that c is a hierarchical specialization of a. Following a similar principle in related works, they consider that concepts with many leaves in their hyponym tree are general (i.e., they have low IC) because they subsume the meaning of many important terms. The IC formula is as follows: ICðcÞ ¼ log
ðjleavesðcÞj=jsubsumersðcÞjÞ þ 1 max _leaves þ 1
ð25Þ
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
Meng et al. (2012) (Meng) present a formula that merges the principles used by Seco and Zhou. They have also changed the term jhypoðcÞj by another term to better express the hyponyms contribution in the IC of a concept. The novel term integrates the depth notion in the Seco formula. Hence, the method is expressed as follows: log ð∑a A hypoðcÞ ð1=depthðaÞÞ þ 1Þ log ðdepthðcÞÞ ICðcÞ ¼ 1 log ðmax _nodesÞ log ðmax _depthÞ ð26Þ
For a given concept c, depth(c) refers to the depth of concept c in the taxonomy, max_depth to the maximum depth in the taxonomy, and max_nodes to the maximum number of concepts that exists in the ontology. Hadj Taieb et al. (2013a, 2013b) (Hadj) proposed a multistrategic approach for measuring semantic relatedness purpose that exploits the noun and verbal “is a” taxonomies and a weighting mechanism for overlapped words in glosses. A novel IC computing method was used in each strategy. This method quantifies the subgraph formed by the ancestors of a target concept c as described in Fig. 2. They calculate the contribution of each ancestor pertaining to the subgraph modeling the IC. The IC of a given concept Con is computed as follows: ICðConÞ ¼ ∑c A HyperðConÞ ScoreðcÞ AverageDepth ðConÞ
1 ∑c A HyperðConÞ DepthðcÞ jHyperðConÞj
SimTversky ðc1 ; c2 Þ ¼
ψðc1 Þ \ ψðc2 Þ jψðc1 Þ \ ψ ðc2 Þj þ kjψðc1 Þ=ψðc2 Þj þ ðk 1Þjψðc2 Þ=ψðc1 Þj
ð30Þ Where ψðc1 Þ and ψðc2 Þ correspond to description sets of concept c1 and c2 respectively, k is adjustable and kA [0,1]. From formula (30) it is noted that, the values of Simtversky(c1,c2) vary from 0 to 1 and it increases with commonality jψðc1 Þ \ ψðc2 Þj and decreases with the difference between the two concepts jψðc1 Þ=ψðc2 Þ jand jψðc2 Þ=ψðc1 Þj. The definition of the features set is important in this measure. It includes the available information in ontologies, in particular the set of synonyms (called synsets in WordNet), definitions (i.e., glosses, containing textual descriptions of word senses) and different kinds of semantic relationships (“is a”, “part of”, etc.). In the implementation of this measure, we consider features of a given concept c the set of synsets which is formed by its ancestors in the taxonomy “is a”; the meronyms, holonyms and attributes of each ancestor and the hyponyms. The experiments are performed with k¼0.5. Rodríguez and Egenhofer (2003) (RE) computed the similarity as the weighted sum of similarities between synsets, features (e.g., meronyms, attributes, etc.) and neighbor concepts (those linked via semantic pointers) of evaluated concepts: SimRE ðc1 ; c2 Þ ¼ α Ssynsets ðc1 ; c2 Þ þ βSf eatures ðc1 ; c2 Þ þ γSneighbors ðc1 ; c2 Þ
ð28Þ
where c and c0 are the concepts (which are represented by synsets in WordNet). DirectHyper(c) is the set containing the direct parents of the concept c and Hypo(c) is the set of direct and indirect descendants including the concerning concept. As for the term AverageDepth(Con) is used to give an information about the vertical distribution of the subsumers subgraph. The set Hyper(Con) contains the ancestors of the concept Con including itself. AverageDepthðConÞ ¼
second than the inverse sense. It is defined as follows:
ð27Þ
where Score(c) refers the contribution of each ancestor (hypernym) pertaining to the set Hyper(c). This score is computed using taxonomical parameters hyponyms number (direct and indirect descendants) and the depth (the longest path between the concept c and the root of the noun “is a” taxonomy entity#1#1) as follows: Depthðc0 Þ ScoreðcÞ ¼ ∑c0 A DirectHyperðcÞ HypoðcÞj 0 jHypoðc Þj
243
ð29Þ
2.4. Feature-based measures Different from all the above presented measures, feature-based measure attempts to exploit the properties of the ontology to obtain the similarity values. It is based on the assumption that each concept is described by a set of words indicating its properties or features, such as their “glosses” in WordNet. When two concepts have more common characteristics and less noncommon characteristics, they are more similar. Tversky (1977) (Tver) proposes a measure, which argues that similarity is not symmetric. Features between a subclass and its superclass have a larger contribution to the similarity evaluation than those in the inverse direction. For example, for the words Fruit and Apple, the similarity is more from the first concept to the
ð31Þ where α, β and γ weight the contribution of each component, which depend on the characteristics of the ontology and S represents the overlapping between the different features, computed as follows: Sðc1 ; c2 Þ ¼
jA \ Bj jA \ Bj þ δðc1 ; c2 ÞjA n Bj þ ð1 δ ðc1 ; c2 ÞÞjB n Aj
ð32Þ
where A and B are the terms evaluated for concepts corresponding to c1 and c1, A/B is the set of terms in A but not in B and B/A the set of terms in B but not in A. Finally, δ(c1, c2) is computed as a function of the depth of c1 and c2 in the taxonomy as follows:
δ ðc 1 ; c 2 Þ ¼
8 <
depthðc1 Þ ; depthðc1 Þ þ depthðc2 Þ
: 1
depthðc1 Þ ; depthðc1 Þ þ depthðc2 Þ
9 depthðc1 Þ rdepthðc2 Þ = depthðc1 Þ 4 depthðc2 Þ ;
ð33Þ
In the implementation of this measure, we consider Ssynsets ðcÞ the set of words that compose the synset of c. For Sf eatures ðcÞ, we include all synsets which are related to the ancestors of c in the taxonomy “is a” by the relations “part of”, “attribute”, “see also” and “similar to”. As for Sneighbors ðcÞ, we take in consideration the neighborhood of the concept c which is formed by synsets linked to it via “is a” and “part of” relations until the depth 4. Experiments are performed with α ¼β¼γ ¼1. Petrakis et al. (2006) (Pe) proposed a feature-based function called X-similarity based on the overlapping between synsets and the concept's glosses extracted from WordNet (i.e., words extracted by parsing term definitions). They consider that two concepts are similar if their synsets and glosses and those of the concepts in their neighborhood (using Semantic Relations (SR)) are lexically similar. The function is expressed as follows:
SimXSimilarity ðc1 ; c2 Þ ( ¼
1; max fSneighbors ðc1 ; c2 Þ; Sglosses ðc1 ; c2 Þg;
if Ssynsets ðc1 ; c2 Þ 4 0 if Ssynsets ðc1 ; c2 Þ ¼ 0
)
ð34Þ
244
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
Fig. 2. An excerpt of WordNet noun “is a” taxonomy which represents the IC of the concept wheeled vehicle#1#1 modeled by the ancestors subgraph connected by solid lines.
The similarity for the semantic neighbors Sneighborhoods is calculated as follows: jA \ Bi j Sneighbors ðc1 ; c2 Þ ¼ max i i A SR jAi [ Bi j
ð35Þ
where each different semantic relation type (i.e., is-a and part-of in WordNet) is computed separately and the maximum (considering all the synsets of all concepts up to the root of each hierarchy) is taken. Equivalently, the similarity for both glosses Sglosses and synonyms Ssynsets are computed as follows: Sðc1 ; c2 Þ ¼
jA \ Bj jA [ Bj
ð36Þ
Where A and B denote the set of synsets or glosses for the concepts c1 and c2. In our implementation of this measure, Ssynsets ðcÞ is the set of words included into the synset c. As for Sglosses ðcÞ; it is the set of stems that compose the concept c ancestors' glosses. Moreover, Sneighbors ðcÞ includes the synsets which are linked via “is a” and “part of” relation until the deep 5. Feature-based measures exploit more semantic knowledge than edge-counting approaches. They evaluate both commonalities and differences of compared concepts. However, by relying on
features like glosses or synsets (in addition to taxonomic and nontaxonomic relationships), those measures limit their applicability to ontologies in which this information is available. Another problem is their dependency on the weighting parameters that balance the contribution of each feature. In all cases, those parameters should be tuned according to the nature of the ontology and even to the evaluated concepts. This hampers their applicability as a general purpose solution. Only the definition of Petrakis et al. (2006) does not depend on weighting parameters, as the maximum similarity provided by each feature alone is taken. Pirró (2009) proposes a measure based on Tversky approach. The common and different features were defined in terms of information theoretic domain. The redefined Tversky formulation of similarity is given by Eq. (16). 2.5. Gloss-based measures This kind of measures is used to compute semantic relatedness and not only the semantic similarity. It exploits the hypothesis that words are similar if their contexts are similar (Harris, 1954). Lesk (1986) (Lesk) proposes an idea that exploits the already cited principle. In fact, he supposes that related word senses are (often) defined using the same words. Therefore, he uses the
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
WordNet glosses of synsets which are considered as accurate and short definitions. For example the word bank has two synsets which are:
bank#1#2: “a financial institution” bank#2#2: “sloping land beside a body of water” and the word lake has one synset:
lake#1#1: “a body of water surrounded by land” So, the semantic relatedness is quantified as the gloss overlaps which is equal to the content words common to two glosses. Thus, relatedness (bank#2#2, lake#1#1) ¼3 and relatedness (bank#1#2, lake#1#1) ¼0. This measure includes some inconvenient due to the shortness of the most of glosses which reduce the number of overlaps that can be found. So, the solution has been proposed by Banerjee and Pedersen (2003) to extend the Lesk measure. Banerjee and Pedersen (2003) (BP) proposed a measure named extended gloss overlap (EGO) derived from Lesk measure (Lesk, 1986). This measure is based on the number of shared words (overlaps) in their definitions (glosses). This measure extends the glosses of the concerned concepts to include the glosses of other concepts to which they are related according to a given concept hierarchy. In fact, Lesk simply summed up overlapped words, but in Banerjee and Pedersen (2003), the authors assign to a group of n overlap words the score n2. For example, let's suppose that the expression “court of law” is shared by the glosses of the target concepts. So, the score is equal to 32 ¼9. This measure exploits the glosses of neighbors to the target synsets which are related by some relations such as hypernyms (“car”-“vehicle”), hyponyms (“car”-“convertible”), meronyms (“car”-“accelerator”), holonym (“car”-“train”), also-see (“enter”-“move in”), attribute (“measure”-“standard”) and pertainym (“centennial”-“century”). Some others gloss-based measures construct co-occurrence vectors that represent the contextual profile of concepts (context vectors). To build the context vectors, they extract contextual words (using a fixed window of context) from the glosses assigned to concepts. These vectors capture a more general sense of concept likeness, not necessarily reduced to taxonomical similarity but also to inter-concept relatedness. The semantic relatedness of two concepts c1 and c2 is computed as the cosine of the angle between their context vectors. Patwardhan and Pedersen (2006)'s (PP) measure created vectors from term glosses extracted from WordNet, calling them gloss vectors (GV). Glosses are brief notes about the meaning of a particular word sense. As a result, the gloss vector measure was able to obtain good correlation with regards to human judgments in several domain independent benchmarks. The gloss vector (pairwise) measure (GVP) is very similar to the “regular” gloss vector measure, except in the way it augments the glosses of concepts with adjacent glosses. The regular gloss vector measure first combines the adjacent glosses to form one large “super-gloss” and creates a single vector corresponding to each of the two concepts from the two “super-glosses”. The pairwise gloss vector measure, on the other hand, forms separate vectors corresponding to each of the adjacent glosses (does not form a single super gloss). For example separate vectors will be created for the hyponyms, the holonyms, the meronyms, etc. of the two concepts. The measure then takes the sum of the individual cosines of the corresponding gloss vectors, i.e., the cosine of the angle between the hyponym vectors is added to the cosine of the angle between the hlonym vectors, and so on. Pesaranghader et al. (2013) (Ah) attempt to improve the gloss vector semantic relatedness measure for a more accurate
245
estimation of relatedness between two input concepts. Their measure criticizes the low/high frequency cut-off phase applied on the matrix of the first order co-occurrence. In fact, this step of cutting causes loss of significant information that can be useful in semantic relatedness computing. So, they propose a measure based on Pointwise Mutual Information (PMI) to overcome the forgoing problems. Their experiments have been performed on resources of biomedical domain. Moreover, Pesaranghader and Muthaiyah (2013) propose a semantic similarity measure for ontologies aligning purpose. Their measure is based on concepts' definitions such as glosses in WordNet. In fact, these definitions are used to compute the information content vector for each concept based on the probability vector extracted from its definitions. They, then apply the cosine measure to estimate the degree between two given concepts. Experiments are performed in the biomedical domain. 2.6. Hybrid measures Hybrid measures combine the structural characteristics described above (such as path length, depth and local density) and some of the above presented approaches. Zhou et al (2008a) (Zhou2) proposed a measure that takes information content measures and path based measures as parameters. They used a tuning factor to control the contribution of each point (in their experiment k ¼0.5). Their measure is expressed by the following equation: log ðlenðc1 ; c2 Þ þ1Þ Simzhou ðc1 ; c2 Þ ¼ 1 k log ð2ðdeep_ max 1ÞÞ ð1 kÞððICðc1 Þ þ ICðc2 Þ 2ICðLCSðc1 ; c2 ÞÞÞ=2Þ ð37Þ The already cited different methods pertaining to a variety of approaches exploited the topological parameters extracted for the WordNet “is a” taxonomy. Authors in their works suppose that the taxonomical structure is organized in meaningful way. So, they try to express better the semantic content of concepts (synsets in WordNet) to simulate the intellectual human process and to ensure an expressive semantic similarity comparison between concepts or words. Following the same way but with a different vision, we treat the different topological parameters composing the “is a” taxonomy to deduce the more semantic expressive ones. These parameters are combined to conceive a new taxonomicalbased approach that will be detailed in next section. Table 1 summarizes the related works according to the different components of WordNet. We also specify the nature of each measure: semantic similarity (SS) or semantic relatedness (SR). In general, the taxonomy “is a” is exploited for SS computing because it models the characteristics sharing. However, the use of whole semantic network including several relations kinds (“is a”, “part of”, “synonym”, “see also”, etc.) and the glosses is considered as the SR measure. Although the main focus of this paper is on the semantic similarity task, we cite other measures designed for the semantic relatedness purpose because they exploit other WordNet components. They are indicated in this paper to study the contribution of WordNet features other than “is a” taxonomy into semantic computing. Table 2 shows the values of the different parameters needed for computing the semantic similarity between two concepts “car#1#5” and “room#1#4” (see Fig. 1). Those parameters are then exploited to quantify similarity using the various measures cited in the table. This table shows that the estimated values pertain to a variety of scales. Therefore, for specific application based on SS or SR semantic measure, the interpretation of provided similarity estimation for a word couple differs from a method to another.
246
Table 1 Exploitation of WordNet components into semantic similarity measures. Measures References
Topological parameters
Words overlaps Cooccurrence
Path Descendants Depth Ancestors Leafs LCS
Rada et al. (1989) Hirst and St-Onge (1997) Wu and Palmer (1994) Sussna (1993) Leacock and Chodorow (1998) Li et al. (2003) Al-Mubaid and Nguyen (2006) Hao et al. (2011) Liu et al. (2007) Resnik (1995) Jiang and Conrath (1997)) Lin (1998) Pirró (2009) Meng et al. (2012) Seco et al. (2004) Sebti and Barfroush (2008) Zhou et al. (2008b) Sánchez et al. (2011) Meng et al. (2012) Hadj Taieb et al. (2013a, 2013b) Tversky (1977) Rodríguez and Egenhofer (2003) Petrakis et al. (2006) X Lesk (1986) X Banerjee and Pedersen (2003) X Patwardhan and Pedersen (2006) Pesaranghader et al. (2013) Zhou et al. (2008a)
X X X X X X X X X
X
Features IC computing IC measure Corpus SS or SR SS SR
X
X X
X X X X X X X
X X X
X
X X
X X
X
X
X X
X X X X X X X X X
X
X
X
X X X X X X X X X X X X X X X X
X X X X X
X X X X X X X X X X
X
X
X X X X X
X X X X X X X X X X X X X X X X
X X X
X X X X X X X
X X X X
X X X
X
X
X
X
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
Ra HSO WP Su LC Li AMN Ha Liu Re JC Lin Pi Me Seco Sebti Zhou1 Sanch Meng Hadj Tver RE Pe Lesk BP PP Ah Zhou2
“is a” taxonomy Semantic network
Gloss
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
247
Table 2 Example of applying Semantic similarity measures on concepts “car#1#5” and “room#1#4”. Concepts c1: car#1#5 – car, auto, automobile, machine, motorcar – (a motor vehicle with four wheels; usually propelled by an internal combustion engine; “he needs a car to get to work”) c2: room#1#4–room – (an area within a building enclosed by walls and floor and ceiling; “the rooms were very small but they had a nice view”) Parameters of c1 and c2 Depth(c1) Hypo(c1) Leaf(c1) HypoMeng(c1) OurHypo(c1) Hyper(c1) LCS(c1,c2)
11 41 32 3.10 1124.67 13 artifact#1#1
Depth(c2) Hypo(c2) Leaf(c2) HypoMeng(c2) OurHypo(c2) Hyper(c2) path_length(c1,c2)
7 196 149 20.17 1242.36 8 10
IC values IC_Corpus(c1) IC_Sebti(c1) IC_ Hadj (c1) IC_Meng(c1) IC_Nuno(c1) IC_Sanchez(c1) IC_Zhou(c1)
7.0035 31.2592 39.1576 0.7278 0.6718 9.8397 0.8523
IC_Corpus (c2) IC_Sebti (c2) IC_ Hadj (c2) IC Meng (c2) IC_Nuno(c2) IC_Sanchez(c2) IC_Zhou(c2)
5.8073 19.7800 12.1294 0.5068 0.5335 8.1047 0.7121
Parameters of LCS(c1, c2) ¼artifact#1#1 Depth(lsc) Hypo(lsc) OurHypo(lsc) IC_Corpus(lsc) IC_ Hadj (lsc) IC_Nuno(lsc) IC_Zhou(lsc)
4 10699 93273.4415 2.4933 6.1141 0.1801 0.4589
Leaf(lsc) HypoMeng(lsc) Hyper(lsc) IC_Sebti(lsc) IC_Meng(lsc) IC_Sanchez(lsc)
8119 1106.3531 5 8.4472 0.2044 3.6883
SemSim(c1,c2) Ra HSO WP LC Li AMN Tver Pe (X-Similarity)
10 5 0.4444 1.3350 0.1331 5.1119 0.0710 0.2954
Ha Liu_1 Liu_2 BP PP (GV) Zhou2 RE
0.1111 0.2666 0.1331 79 0.4562 0.4541 0.3486
IC
IC-measures
IC computing methods
Res Lin JC Pirro Meng
Corpus
Seco
Sebti
Zhou
Hadj
Sanchez
Meng
2.4934 0.3893 0.1278 5.324 0.4759
0.1801 0.2988 0.8451 1.745 0.3482
8.4471 0.3310 34.1449 25.697 0.3923
0.4588 0.5866 0.6467 0.187 0.7978
6.1140 0.2384 39.0588 32.944 0.2692
3.6883 0.4110 10.5677 6.879 0.5084
0.2044 0.3311 0.8258 0.621 0.3926
3. WordNet “is a” taxonomical parameters studying In WordNet each word is expressed by a set of synsets (considered as concepts) that represents the possible meanings of the concerned word. The taxonomy “is-a” is mainly used to measure the degree of similarity between concepts or words. In order to compute the semantic similarity degree, authors exploited the taxonomical parameters to express the likeness between two concepts. Table 3 contains statistics concerning WordNet 3.0 where each index word is represented by a fixed number of synsets (concepts) referring to its different meanings. The proposed measure in this paper uses the nominal and verbal “is a” taxonomies of WordNet which are more significant in semantics computing. In this section, we study the taxonomical parameters of the WordNet “is a” taxonomy which are exploited to design based hierarchies measures (see Table 1). According to the related works presented in the previous section, the used parameters are mainly: descendants (hyponyms), depth, leafs and ancestors (hypernyms). This study has as purpose the semantic interpretation, detection of dependencies between these parameters and determining their probability distribution.
Table 3 Statistics about WordNet 3.0. Index word Noun index word Verb index word Adjective index word Adverb index word Synsets Noun synsets Verb synsets Adjective synsets Adverb synsets Average synset number per index word Average ancestors Average depth Number leafs synsets
155287 117798 11529 21479 4481 117659 82115 13767 18156 3621 1.2420584 9.05122085 8.54842599 64958-79.1061%
Fig. 3 shows the number of nouns (a) and verbs (b) represented by n synsets in y-axis. The shape of the two parts (a) and (b) are very similar despite the fact that the maximum number of noun synsets is 33 and for verb is 59.
248
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
Fig. 3. The distribution of synsets over the nouns (a) and verbs (b) in logarithmic scale.
3.1. The depth taxonomical significance The depth of a concept is a parameter used in several measures for SS computing (see Table 1). In fact, the concepts existing at the top of the taxonomy refers to a general concept. But another concept, which exists at the bottom, represents a specific concept which is semantically richer. The depth is significant to determine the specificity of a concept because going up on the taxonomy from a level to another generates the data propagation towards the descendants with addition of certain specificities. The transition from concept c1 towards concept c2 using the “is a” relation does not mean the passage from depth i to depth iþ1. Thus, two concepts, directly connected, do not have necessarily successive depths in “is a” taxonomy (for example in Fig. 1, the synsets instrumentality#3#3 and container#1#1 are directly connected but the depth is equal to 5 for the first synset and for the second synset, depth¼ 7). So, while going down from the root towards any target concept, it includes data enrichment. Indeed, a subordinate concept inherits the basic features from the superordinate concept and adds its own specific features. Fig. 4 shows that the depth distribution is Gaussian for the noun “is a” taxonomy (4a), with parameters σ2 ¼ 5.4143 and μ ¼8.5484, and also for the verbal one (4b), with σ2 ¼ 2.5403 and μ¼ 2.5292. Fig. 4 shows that the majority of nominal synsets have a depth between 6 and 11. As for verbal synsets (4b) the depth is mainly between 0 and 4. This fact illustrates the marked difference between the nominal and verbal “is a” taxonomy. 3.2. The hyponyms: significance and quantification methods The set of hyponyms subsumed by a concept c, noted as Hypo (c), contains even the concept c. This parameter is also used, as the depth, to determine the generality/specificity feature. The
hyponyms distribution presented in Fig. 5 shows that the majority of synsets are considered as specific concepts. Indeed, a concept that subsumes a great number of hyponyms (direct and indirect descendents) is considered as a general concept and the inverse leads to a concept more specific. Thus, it coincides with the notion of information content (IC). Indeed, a general concept containing an important set of hyponyms is considered as more probable. So, it is represented by low information content. But, there is not the case for leafs concepts which have an important information content. Therefore, this parameter has been used by Seco et al. (2004) as another alternative for IC computing independent from corpus processing task to compute the probability of each concept pertaining to the “is a” taxonomy. In literature, there are two methods, cited in previous section, for quantifying the hyponyms subgraph (HypoValue) formed by direct and indirect descendants of a given concept into the taxonomy “is a”. The first is proposed by Seco et al. (2004) and the second is proposed by Meng et al. (2012). The first method takes the cardinality of the hyponyms set (Eq. (38)) and the second exploits the depth in order to take in consideration the specificity of each concept pertaining to the set of hyponyms (Eq. (39)). Hypo Value ¼ jhypoðcÞj Hypo Value ¼ c0
1 0Þ Depthðc A HypoðcÞ ∑
ð38Þ ð39Þ
The parameters depth and hyponyms subgraph can be used in complementary way. This is explained by the fact that two leafs have the same information content if we consider only the hyponyms number. So, it is important to invoke the depth parameter that can be exploited to have a more significant semantic quantification (for example: the hyponyms number can be multiplied by the depth).
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
249
Fig. 4. The depth distribution over the nominal (a) and verbal (b) “is a” taxonomy of WordNet 3.0
3.3. Taxonomical ancestors meaning The ontologies modeling multiple inheritances may incorporate several direct subsumers per concept. In WordNet 3.0, 2.28% nodes of the taxonomy are with multiple inheritances (Devitt and Vogel, 2004). However, as they are normally distributed through the depth of the WordNet taxonomy, the effect of multiple inheritances is invoked to specializations at a deeper-level (Devitt and Vogel, 2004). Ontology-based related works (Jiang and Conrath, 1997; Leacock and Chodorow, 1998; Li et al., 2003; Lin, 1998; Rada et al., 1989; Pirró, 2009; Wu and Palmer, 1994) consider, in case of multiple inheritance, only the subsumer that defines the maximum of shared characteristics between the target concepts pair. However, when a concept inherits from several subsumers it becomes more specific than another one inheriting from a unique subsumer. Due to this reason, in order to better differentiate concepts, the definition of taxonomical subsumers recursively considers the whole set of generalizations of a concept by exploring all the taxonomical branches to which it belongs. With this strategy, it represents a broader and more realistic notion of concept's concreteness than other works based solely on the taxonomical depth (Zhou et al., 2008b). Hadj Taieb et al. take in consideration the multiple inheritances in their information
content computing method (Hadj Taieb et al., 2013a, 2013b). They assign to each ancestor a score according to its depth in the taxonomy “is a”. The number of subsumers can be considered as the depth parameter because the multiple inheritance is not frequent within WordNet “is a” taxonomy. In fact, the parameters of the normal distribution in Fig. 6 are very close to the parameters already cited for the depth distribution of the nouns (6a) (σ2 ¼ 8.2442 and μ¼9.0512) and verbs (6b) (σ2 ¼2.5954 and μ¼3.5393)
3.4. Leaves taxonomical interpretation Exploiting leaf concepts is relied to the fact that some inner nodes of the hyponym subgraph of a concept present multiple taxonomical inheritances may cause that several paths exist from that concept to a leaf. Following a similar principle as in related works, we consider that concepts with many leaves in their hyponym subgraph are general as they subsume the meaning of many salient concepts. Leaves, on the other hand, will present equally specialized concepts as they are completely differentiated from any other concept in the taxonomy. The leaves number can replace the hyponyms number because the leaves accounts 79% of the whole nodes in “is a” WordNet taxonomy.
250
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
Fig. 5. Hyponyms distribution over the nominal (a) and verbal (b) WordNet “is a” taxonomy.
3.5. Discussion The previous paragraphs go some way to outlining the topology of WordNet. We have looked at the distributions of depth and hyponyms, and the notion of multiple inheritance and its significance within nouns and verbs “is a” taxonomy. The parameters distributions of these two taxonomies are similar, with a slight difference that characterizes the verbs “is a” taxonomy which is less deep and multi-roots (560 in WordNet 3.0). In fact, concepts at higher levels of a hierarchy tend to represent more general and abstract meanings while those lower down tend to represent more specific meanings. So, the same path length at different levels represents different distances. To overcome these issues, the notion of depth has been incorporated to account for specificity. The motivation is that two nodes are semantically closer if they reside deeper in the hierarchy. Thus the measure of specificity should reflect the monotonic nature. Given a node c in a taxonomy T, the depth of c denoted by depth(c) is the number of nodes along the longest
path between c and root, and the depth of the taxonomy is the depth of the deepest node. In contrast, the hyponyms or leaves nodes number are used to refer to a general concept having a general meaning. Indeed, when the hyponyms number decreases, i.e. the concept is less general. Therefore, in our semantic similarity measure we choose to exploit the hyponyms number and the depth to compute as better as possible the semantic similarity between two concepts or two words.
4. The proposed taxonomical semantic similarity measure The estimation of semantic similarity between two concepts is based on the quantification of the common properties which are well modeled into the “is a” taxonomy representing the inheritance paradigm between concepts. In the “is a” hierarchical structure, a subordinate concept inherits the basic features from the superordinate concept and adds its own specific features to form its meaning. Thus, a concept is an accumulation of the
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
251
Fig. 6. The ancestors number distribution over the nominal (a) and verbal (b) WordNet 3.0 “is a” taxonomy.
propagated information from a distant ancestor to another less distant by adding specificities to each descendant. Therefore, the common semantic amount of two concepts is represented by the lowest common subsumer (LCS). A concept depends strongly on its direct parents and ancestors. For example, in relation with the Fig. 7, the LCS(communicator#1#1, golf club#2#2) is whole#2#2 which reflects a low similarity degree. Also, LCS(actor#1#2, artist#1#2) ¼person#1#3 which reflects a high similarity degree. In our proposal, the specificity degree of a concept is quantified as a product of the more significant parameters deduced from the previous study: the hyponyms subgraph (TermHypo) and the depth (TermDepth). 4.1. The hyponyms subgraph quantification The goal of hyponyms subgraph quantification is to measure the specificity of a concept pertaining to a WordNet “is a” taxonomy. Differently to Seco (Eq. (38)) and Meng (Eq. (39))
processes followed to quantify the hyponyms (descendents) subgraph of a concept c, our proposed method HypoValue(c) is based on the depth probability distribution over the WordNet “is a” taxonomy. In fact, we proceed as follows: HypoValueðcÞ ¼
∑
Pðdepthðc0 ÞÞ
ð40Þ
c0 ϵ HypoðcÞ
where Hypo(c) is the hyponyms set of the concept c and depth(c) represents the length of the longest path between a given concept c and the root. Then, the depth probability P(depth(c)) is computed as follows: PðdepthðcÞÞ ¼
jfc0 A C jdepthðc0 Þ ¼ ¼ depthðcÞgj N
ð41Þ
With C is the set of concepts pertaining to the WordNet “is a” taxonomy and N the cardinality of the set C. The specificity of a general concept is low because it subsumes a great number of concepts. So, it is quantified using HypoValueðcÞ
252
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
Fig. 7. An excerpt of “is a” taxonomy including synsets of nouns “wood” and “forest”.
as follows: SpecHypo ðcÞ ¼ 1
log ðHypoValueðcÞÞ log ðmaxÞ
Table 4 Examples of words having great number of synsets.
ð42Þ
Where max ¼ HypoValueðrootÞ
ð43Þ
Then, the semantic similarity degree based on hyponyms parameter TermHypo(c1,c2) is calculated as Dice coefficient including the LCS that represents the shared information between target concepts: TermHypoðc1 ; c2 Þ ¼
2SpecHypo ðLCSðc1 ; c2 ÞÞ SpecHypo ðc1 Þ þ SpecHypo ððc2 ÞÞ
4.2. The depth parameter As we have already stated, our proposal is based on two parameters the hyponyms and the depth. So, for the second term TermDepth, it applies the Dice coefficient on the depth parameter: 2DepthðLCSðc1 ; c2 ÞÞ Depthðc1 Þ þ Depthððc2 ÞÞ
POS
Nbr of synsets
stock life line take make carry break
Noun Noun Noun Verb Verb Verb Verb
17 14 30 42 49 40 59
is expressed as follows: ð44Þ
Where SpecHypo (c) is based mainly on the quantification method HypoValue(c) (Eq. (40)) of the hyponyms subgraph of the concept c.
TermDepthðc1 ; c2 Þ ¼
word
ð45Þ
Polysemous words in WordNet are represented by a number of synsets that details their different meanings. Some words are referred by a great number of synsets that is known as fine granularity problem (view Table 4). Therefore, high similarity values can be provided for a non-similar words couple due to a meaning rarely exploited. So, the SpecDepth is corrected by a factor that tries to resolve the fine granularity of word senses in WordNet. For instance, Table 4 contains some examples in different parts of speech (POS) of words that are represented by a great number of synsets. The important number of synsets assigned to a word can affect negatively the semantic similarity estimation between two words. In fact, the measure provides an important degree for two words non-semantically related. So, we have proposed an adjustment factor that exploits the synsets number of each word. This factor Λ
Λðw1 ; w2 Þ ¼ maxðjSynðw1 Þj; jSynðw2 ÞjÞ=α
ð46Þ
With α is equal to the maximum number of synsets assigned to a word into “is a” taxonomy. For example α¼33 for the WordNet nominal taxonomy and α ¼59 for the verbal one. Then, the semantic similarity between two words w1 and w2 is the maximum value provided using the different synsets of target words: ( ) max SemSimðc1 ; c2 Þ if w1 a w2 ðc1 ;c2 Þ A Synðw1 ÞSynðw2 Þ SemSimðw1 ; w2 Þ ¼ 1 else ð47Þ Where Syn(w1) and Syn(w2) are the sets of concepts (synsets) pertaining to the ontological hierarchy and represent the words w1 and w2, respectively. The semantic similarity between two concepts c1 and c2 is computed as follows: SemSimðc1 ; c2 Þ ¼ jTermDepthðc1 ; c2 Þ Λðw1 ; w2 Þj TermHypoðc1 ; c2 Þ
ð48Þ
The proposed method is computed using two terms, the first is based on hyponyms and the second exploits the depth parameter. As example, Fig. 8 illustrates the “is a” taxonomy fragment containing whole synsets of the verbs pair “recognize” and “welcome”. If, we consider the equation without adjustment factor (Λ) that takes in consideration the senses number of each word, we found: SemSimð“welcome”; ”recognize”Þ ¼ 0:7203 Now, the use of Λ brings a remarkable improvement. In fact, Λ (“recognize”, “welcome”) ¼max(3,9)/59 ¼0.1525. So, the similarity degree became 0.6098, which is considered closer to human judgment (0.50).
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
253
Fig. 8. An excerpt of “is a” taxonomy including synsets of verbs “recognize” and “welcome”.
Fig. 7 shows the «is a» fragment including all senses of the noun pair “forest” and “wood”. The SemSim(“forest”, “wood”)¼, when we ignore the factor Λ. Then, including the adjustment factor Λðforest; woodÞ ¼ maxð2; 8Þ=33 ¼ 0:24, we found the similarity degree 0.76, which is closer to human judgment 0.773.
5. Semantic similarity measures evaluations The experiments exploit the WordNet 3.0 and the package JWNL (Java WordNet Library) as an interface to query the WordNet base. 5.1. Datasets Table 5 contains the benchmarks formed by human judgments and used to assess different measures for semantic similarity (SS) purposes. In this study, we have experimentally evaluated machine generated measurements of semantic similarity between words and compared them against human ratings performed in the same settings. For the SS task, Rubenstein and Goodenough (1965) (RG65) obtained “synonymy judgments” of 51 human subjects on 65 pairs of words. The pairs ranged from “highly synonymous” to “semantically unrelated”. The participants were asked to rate them on the scale of 0.0–4.0 according to their similarity of meaning. Miller and Charles (1991) (MC30) extracted 30 pairs from the original 65 and then obtained similarity judgments from 38 participants. For the same purpose, Agirre et al. (2009) created a semantic similarity dataset based on Fin353 (AG203). This contains 203 pairs of terms from Fin353, each re-scored for their similarity rather than relatedness. All of the datasets mentioned above were composed of word pairs pertaining to different grammatical categories such as Noun (N), Verb (V), Adjective (A) and Adverb (A). Yang and Powers (2006) created a dataset that contains 130 verb pairs2 (YP130). The evaluation studies the performance of the proposed measure with the verbal taxonomy “is a” (in WordNet 3.0, 11529 verbs grouped in 13767 synsets with 560 roots). Also, we have used a dataset formed by 38 medical term pairs (MED38: Appendix B) extracted from two different biomedical datasets (Pedersen et al., 2007; Hliaoutakis, 2005) and existing within WordNet. In fact, the first one (Pedersen et al., 2007) is created in collaboration with Mayo Clinic experts and it is a set of word pairs referring to general medical disorders. The similarity of each concept pair was assessed by a group of 9 medical coders who were aware about the notion of semantic similarity. After a normalization process, 2 http://code.google.com/p/dkpro-similarity-asl/source/browse/trunk/de. tudarmstadt.ukp.similarity.experiments.wordpairs-asl/src/main/resources/data sets/wordpairs/en/yangPowers130.gold.pos.txt.
Table 5 Datasets used in evaluation of SS task. Dataset
Year
# Pairs
POS
Scores
RG65 MC30 AG203 YP130 MED38
1965 1991 2009 2006 2005/2007
65 30 203 130 38
N N N, V, A V N
[0,4] [0,4] [0,10] [0,4] [0,4]
a final set of 30 word pairs with the averaged similarity measures provided by experts in a scale between 1 and 4 were obtained. The second biomedical benchmark, proposed by (Hliaoutakis, 2005), is composed by a set of 36 word pairs extracted from the MeSH repository. The similarity between word pairs was also assessed by 8 medical experts from 0 (non-similar) to 1 (synonyms). Often, these datasets are tested with biomedical ontologies (SNOMED CT and MeSH) because some medical terms do not exist in WordNet.
5.2. Evaluation metrics Semantic similarity measures can be evaluated using correlation coefficients to correlate the scores computed by a measure with the judgments provided by humans in different datasets. The Pearson product-moment correlation coefficient r can be employed as an evaluation metric. It indicates how well the results of a measure resemble human judgments, where a value of 0 means no correlation and 1 means perfect correlation. Pearson's r is calculated as follows: n ð∑xi yi Þ ð∑xi Þð∑yi Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi r ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n ð∑x2i Þð∑xi Þ2 n ð∑y2i Þð∑yi Þ2
ð49Þ
where xi refers to the ith element in the list of human judgments and yi to the corresponding ith element in the list of SS values computed by a measure and n is the number of word pairs. The standard method that statisticians use to measure the “significance” of their empirical analyses is the p-value (Neter et al., 1996). The p-value is a number between 0 and 1 representing the probability that this data would have arisen if the null hypothesis were true. The “null hypothesis” in this case is the statement ‘judgments and computed values are unrelated’. The results are significant due to the very low value of p-value (i.e., p-value o0.001). The p-value according to the Pearson correlation coefficient is computed using a web interface3. 3
http://www.wessa.net/rwasp_correlation.wasp
254
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
Table 6 Assessment of different methods for hyponyms quantification using the Pearson coefficient (r). Computing HypoValue
RG65
MC30
AG203
MED38
YP130
Seco method Meng method Our method
0.83 0.83 0.84
0.79 0.80 0.81
0.63 0.65 0.66
0.61 0.61 0.60
0.60 0.61 0.61
6. Experiments and interpretations 6.1. Comparing the hyponyms subgraph quantification In this paper, we propose a new method for quantifying the subgraph of hyponyms (direct and indirect descendants of a concept within “is a” taxonomy). In order to compare the different methods: Seco et al. (2004) (Eq. (38)), Meng et al. (2012) (Eq. (39)) and our method (Eq. (40)), we chose to measure the semantic similarity using only the hyponyms parameter (hypoValue) with Eq. (50) and human judgments presented in already cited datasets. Eq. (50) is inspired from the work of Seco et al. that proposed an intrinsic information content computing method based only on the hyponyms parameter (Eq. (22)). This equation exploits the lowest common subsumer of the concepts couple and the hypoValue(root) such as hypoValue (Entity#1#1) in the WordNet “is a” noun taxonomy. Simðc1 ; c2 Þ ¼ 1
log ðhypoValueðLCSðc1 ; c2 ÞÞÞ log ðhypoValueroot Þ
ð50Þ
Table 6 shows that our quantification method performed better than other methods for all datasets except MED38. Moreover, the good correlation values validate the importance of the hyponyms parameter into the estimation process of semantic similarity degree between two concepts or two words.
6.2. Datasets' experiments results In order to make a comparison between all IC-based approaches (IC computing method, IC based measure), we have implemented and performed the experiments that their results are cited in Appendix A. In general, the IC-based approaches with the majority of IC-computing methods give good correlations (rZ0.80) and mainly for the datasets RG65 and MC30. The negative values provided using JC (Jiang and Conrath) measure are related to dissimilarity nature of this measure. The Meng IC computing method outperforms all others methods with the different IC-based measures (for example, r¼ 0.87 with Res measure for the dataset RG65). But, with the largest dataset AG203, the performed experiments show a decreasing performance (r¼0.68 with Meng IC computing method) in relation with the correlations obtained for the datasets RG65 and MC30. Moreover, the method proposed by Seco et al. fails with the Pirro measure. But, the same measure provides good results competitive to other known IC measures (Lin, Res and JC). The excellent correlations found with YP130 using IC-based measures (for example r ¼ 0.80, Hadj Taieb et al. and Lin formula) hides the coverage capacity. In fact, only 82 verb couples among 130 (63%) are treated. This is due to the great number of verbal taxonomies (560 roots) which are not well-developed having minor depth. Concerning the dataset MED38, the results reflect the nature of WordNet that is conceived to be exploited in general domain purpose. Indeed, the best value is the r ¼ 0.65. Table 7 shows the results obtained with other taxonomical approaches in comparison with those obtained with our proposed approach. The length approach provides dissimilarity values which prove their negative correlations. In comparison with the results
presented in Appendix A, we remark that IC-based approaches express better the semantic similarity than those in Table 7. The correlations for IC based-corpora methods (Resnik, Lin, Jiang and Conrath (JC)) are computed using an external corpus. The WordNet::Similarity4 authors have pre-computed information content values from the British National Corpus (World Edition), the Penn Treebank (version 2), the Brown Corpus, the complete works of Shakespeare and SemCor (with and without sense tags)5. The values found for MED38 are very low due to the general nature of the already cited general resources. It is arguable that IC values obtained from very general corpora may be different from those obtained with more specialized ones. However, if we were calculating similarity between concepts of very specialized ontologies, such as MeSH, such corpus would presumably not contain many of the terms included in that ontology, which can affect the IC values and corresponding similarity assessments. For these reasons, the methods based on only the taxonomic structure without making recourse to external corpora, are very important because specific ontologies for the specific concepts are rare. Our measure outperforms all other measures for the largest dataset AG203 with a correlation coefficient r ¼0.76 (Appendix E). As for the datasets RG65 and MC30, we obtain respectively competitive results which are r ¼0.88 (Appendix C) and r ¼0.85 (Appendix D). Also, we found a good correlation r ¼0.70 that exceeds all the other results for the dataset MED38 (Appendix F) despite of the general nature of the WordNet resource. Our taxonomic measure obtained a gain equal to 5% in relation with the higher correlation found with the couple (Seco et al. computing IC method, Meng IC-based measure: see Appendix A). Concerning the second column for the dataset YP130, it indicates the number of verb couples treated among the whole 130 verbs. The coverage percentage is indicated only for the dataset YP130 because the verbs in WordNet are divided on 560 independent components representing the verbal “is a” taxonomy. So, these components are not very deeper like the noun “is a” taxonomy and the LCS does not exist for a great number of verb couples. Therefore, the percentage coverage is considered only for the verbal dataset which is not the same case for the nominal datasets (MC30, RG65, AG203 and MED38). Moreover, our proposal provides the higher correlation r ¼0.66 for YP130 (Appendix G) with a coverage percentage 100% which leads to a gain equal to 37% in relation with the higher correlation r ¼0.80 found with coverage equal to 63% for the couple (Hadj Taieb et al., IC computing method, Lin measure: see Appendix A). In order to show the impact of adjustment factor Λ, we performed the experiments using our measure (Eq. (47)) without Λ. The results support its positive impact due to the improvements which reached 6% for datasets AG203 and YP130. Table 8 shows the p-values of datasets used in the experiments with our measure according to the correlation coefficients in Table 7. As indicated in the table below, all p-value o 0.001, which indicate that the results obtained are significant.
7. Conclusion The results obtained by our taxonomical-based approach outperformed the ones attained by their homologs, which suggest that the initial assumption about the taxonomic structure of WordNet is correct. This approach combines the most significant taxonomical parameters, depth and hyponym subgraph of a concept referring to specificity notion. The study performed through the semantic 4 5
marimba.d.umn.edu/cgi-bin/similarity/similarity.cgi. http://wn-similarity.sourceforge.net/.
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
255
Table 7 Comparison between our approach and other methods with SS datasets using the Pearson coefficient correlation (r). RG65
MC30
AG203
0.75 0.72
0.66 0.68
0.53 0.56
0.31 0.63
0.05 0.71
82 77
(Rada et al., 1989) (Hirst and St-Onge, 1997)
0.79 0.79 0.86 0.84 0.84 0.84 0.84
0.75 0.75 0.81 0.80 0.77 0.80 0.82
0.57 0.53 0.64 0.67 0.64 0.64 0.67
0.52 0.48 0.64 0.64 0.55 0.59 0.69
0.79 0.46 0.77 0.78 0.80 0.79 0.70
57 68 57 90 73 73 73
(Wu and Palmer, 1994) (Leacock and Chodorow, 1998) (Li et al., 2003) (Al-Mubaid and Nguyen, 2006) (Liu et al., 2007) (Liu et al., 2007) (Hao et al., 2011)
IC (Corpus) Resnik Lin JC
0.72 0.72 0.75
0.72 0.70 0.73
0.64 0.57 0.46
0.08 0.32 0.29
0.60 0.65 0.61
95 83 100
(Resnik, 1995) (Lin, 1998) (Jiang and Conrath, 1997)
IC (Intrinsic computing methods) View Appendix A Feature Tversky Rodriguez and Egenhofer Petrakis et al.
0.68 0.78 0.78
0.63 0.71 0.68
0.49 0.59 0.58
0.49 0.51 0.57
0.70 0.72 0.60
96 93 126
(Tversky, 1977) (Rodríguez and Egenhofer, 2003) (Petrakis et al., 2006)
Gloss EGO GV GVP
0.32 0.82 0.67
0.34 0.88 0.74
0.28 0.63 0.54
0.56 0.69 0.58
0.46 0.63 0.61
128 120 120
(Banerjee and Pedersen, 2003) (Patwardhan and Pedersen, 2006) (Patwardhan and Pedersen, 2006)
Hybrid Zhou Our approach (without adjustment factor) Our approach (with adjustment factor)
0.87 0.86 0.88
0.86 0.84 0.85
0.62 0.70 0.76
0.67 0.68 0.70
0.60 0.60 0.66
90 130 130
(Zhou et al, 2008)
Path-only Rada Hirst-St-Onge Path-depth WP LC Li Al-Mubaid and Nguyen Liu_1 Liu_2 Hao
Table 8 P-values computed using our measure over the different datasets. Datasets
RG65
MC30
AG203
MED38
YP130
p-values
6.29E-14
3.84E-07
1.81E-26
2.65E-06
6.29E-14
interpretation of taxonomical parameters showed that hyponyms and the depth are the most semantically expressed parameters within the WordNet “is a” taxonomy. We, therefore, proposed a new method for hyponym quantification based on the depth probability distribution. Moreover, we treated the fine grained problem with a correctness
Appendix A.
MED38
YP130
Ref
factor based on the number of synsets for the concepts involved, c1 and c2. The results demonstrated that our taxonomical measure outperforms path-based, IC-based, hybrid, and features approaches. In fact, we reach a good correlation r¼0.88 for the dataset RG65. As for the largest dataset AG203 treating the semantic similarity task, we reach the best correlation r¼0.76. Also, our ontological measure shows very good correlations in relation with two specific datasets: verbal benchmark YP130 (r¼ 0.66) with coverage 100% and a dataset composed of medical terms MED38 (r¼0.70). Our ontological-based similarity measure can be studied with other ontologies for specific domains such as MeSH or SNOMED for biomedicine or GO (Gene Ontology) for biomedical domain that contains “is a” taxonomies. Moreover, we can extend the measure by including other kind of relations in its computation such as “part of”.
Results of IC approaches combining IC-computing methods and IC-based measures.
IC-intrinsic computing Seco et al. method
Sebti and Barfroush Zhou et al.
Sanchez et al.
Meng et al.
Hadj Taieb et al.
Reference
Seco et al. (2004)
Sebti and Barfroush (2008)
Sánchez et al. (2011)
Meng et al. (2012)
Hadj Taieb et al. (2013a, 2013b)
IC-based measures
Lin
Resnik
Jiang and Conrath
Pirro
Meng
Reference
Lin (1998)
Resnik (1995)
Jiang and Conrath, (1997)
Pirró (2009)
Meng et al. (2012)
Zhou et al. (2008)
256
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
RG65 Lin
Res
JC
Pir
Men
Seco et al. Sebti and Barfroush Zhou et al. Sanchez et al. Meng et al. Hadj Taieb et al.
0.82 0.69 0.81 0.85 0.84 0.80
0.85 0.82 0.83 0.86 0.87 0.58
0.85 0.79 0.83 0.87 0.87 0.56
0.43 0.80 0.83 0.86 0.87 0.64
0.85 0.85 0.86 0.87 0.87 0.82
MC30 Seco et al. Sebti and Barfroush Zhou et al. Sanchez et al. Meng et al. Hadj Taieb et al.
0.80 0.71 0.81 0.82 0.84 0.76
0.81 0.79 0.82 0.84 0.85 0.55
0.84 0.73 0.82 0.85 0.86 0.54
0.35 0.76 0.81 0.86 0.86 0.67
0.83 0.82 0.84 0.85 0.86 0.77
AG203 Seco et al. Sebti and Barfroush Zhou et al. Sanchez et al. Meng et al. Hadj Taieb et al.
0.66 0.62 0.58 0.63 0.66 0.64
0.64 0.55 0.59 0.63 0.65 0.50
0.64 0.54 0.60 0.63 0.63 0.45
0.30 0.63 0.63 0.66 0.68 0.54
0.68 0.65 0.62 0.66 0.68 0.66
MED38 Seco et al. Sebti and Barfroush Zhou et al. Sanchez et al. Meng et al. Hadj Taieb et al.
0.64 0.57 0.52 0.56 0.59 0.52
0.62 0.52 0.52 0.54 0.64 0.50
0.65 0.57 0.53 0.57 0.60 0.43
0.32 0.58 0.52 0.56 0.60 0.52
0.65 0.62 0.62 0.61 0.64 0.52
YP130 Seco et al. Sebti and Barfroush Zhou et al. Sanchez et al. Meng et al. Hadj Taieb et al.
0.18 0.76 0.64 0.42 0.79 0.80
0.48 0.62 0.72 0.52 0.70 0.40
0.21 0.52 0.60 0.44 0.60 0.22
0.11 0.70 0.72 0.59 0.73 0.25
0.15 0.78 0.64 0.40 0.52 0.80
Appendix B. The dataset MED38 composed of 21 first pairs from Pedersen et al. (2007) and 17 pairs from Hliaoutakis (2005)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Word1
Word2
Human ratings
Anemia Dementia Osteoporosis Sinusitis Hypertension Hyperlipidemia Hypothyroidism Sarcoidosis Asthma Lactose Intolerance Urinary Tract Infection Psychology Adenovirus Migraine Hepatitis B
Appendicitis Atopic Dermatitis Patent Ductus Arteriosus Mental Retardation Kidney Failure Hyperkalemia Hyperthyroidism Tuberculosis Pneumonia Irritable Bowel Syndrome Pyelonephritis Cognitive Science Rotavirus Headache Hepatitis C
0.031 0.06 0.156 0.031 0.5 0.156 0.406 0.406 0.375 0.468 0.656 0.593 0.437 0.718 0.562
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
Carcinoma Pulmonary stenosis Breast feeding Pain Measles Down Syndrome Renal failure Abortion Delusion Metastasis Calcification Mitral stenosis Rheumatoid arthritis Carpal tunnel syndrome Diabetes mellitus Acne Antibiotic Multiple sclerosis Appendicitis Depression Hyperlipidemia Heart Stroke
257
Neoplasm Aortic stenosis Lactation Ache Rubeola Trisomy 21 Kidney failure Miscarriage Schizophrenia Adenocarcinoma Stenosis Atrial fibrillation Lupus Osteoarthritis Hypertension Syringe Allergy Psychosis Osteoporosis Cellulitis Metastasis Myocardium Infarct
0.75 0.531 0.843 0.875 0.906 0.875 1 0.825 0.55 0.45 0.5 0.325 0.275 0.275 0.25 0.25 0.3 0.25 0.25 0.25 0.25 0.75 0.7
Appendix C. RG65
Cord Rooster Noon Fruit Autograph Automobile Mound Grin Asylum Asylum Graveyard Glass Boy Cushion Monk Asylum Coast Grin Implement Cock Boy Cemetery
Smile Voyage String Furnace Shore Wizard Stove Implement Fruit Monk Madhouse Magician Rooster Jewel Slave Cemetery Forest Lad Tool Rooster Lad Mound
0.03 0.08 0.03 0.15 0.18 0.03 0.19 0.05 0.15 0.03 0.02 0.10 0.05 0.18 0.20 0.02 0.05 0.13 0.89 0.85 0.80 0.02
Magician Crane Brother Sage Oracle Bird Bird Food Brother Asylum Furnace Magician Hill Cord Glass Grin Serf Journey Cushion Cemetery Automobile Glass
Oracle Implement Lad Wizard Sage Crane Cock Fruit Monk Madhouse Stove Wizard Mound String Tumbler Smile Slave Voyage Pillow Graveyard Car Jewel
0.18 0.31 0.18 0.19 0.46 0.46 0.53 0.09 0.87 0.93 0.17 0.94 0.85 0.80 0.82 0.97 0.78 0.80 0.88 0.97 0.85 0.17
Shore Monk Boy Automobile Mound Lad Forest Food Cemetery Shore Bird Coast Furnace Crane Hill Autograph Coast Forest Midday Gem Car
Woodland Oracle Sage Cushion Shore Wizard Graveyard Rooster Woodland Voyage Woodland Hill Implement Rooster Woodland Signature Shore Woodland Noon Jewel Journey
0.04 0.17 0.20 0.20 0.38 0.18 0.02 0.01 0.02 0.13 0.04 0.41 0.23 0.27 0.03 0.85 0.88 0.94 0.97 0.85 0.10
Appendix D. MC30
Car Gem Food Coast Forest Journey Boy Coast
Automobile Jewel Rooster Hill Graveyard Voyage Lad Shore
0.85 0.85 0.01 0.32 0.02 0.75 0.70 0.76
Cemetery Glass Shore Journey Noon Furnace Rooster Coast
Woodland Magician Woodland Car String Stove Voyage Forest
0.02 0.06 0.03 0.05 0.00 0.15 0.02 0.04
258
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
Asylum Magician Midday Bird Bird Tool Brother
Madhouse Wizard Noon Cock Crane Implement Monk
0.87 0.94 0.97 0.44 0.38 0.78 0.73
Chord Lad Lad Food Crane Monk Monk
Smile Brother Wizard Fruit Implement Oracle Slave
0.09 0.15 0.17 0.07 0.25 0.15 0.18
Appendix E. AG203
Tiger Tiger Plane Train Television Media Bread Cucumber Doctor Professor Student Smart Wood Money King King Bishop Fuck Football Food Money Money Tiger Tiger Tiger Tiger Tiger Tiger Tiger Psychology Psychology Psychology Planet Planet Planet Precedent Precedent Cup Cup Cup Jaguar Mile Japanese Announcement Harvard Life Type Street Street Cell Dividend Calculation
Cat Tiger Car Car Radio Radio Butter Potato Nurse Doctor Professor Stupid Forest Cash Queen Rook Rabbi Sex Soccer Rooster Dollar Currency Jaguar Feline Carnivore Mammal Animal Organism Fauna Psychiatry Science Discipline Star Moon Sun Example Antecedent Tableware Artifact Object Cat Kilometer American News Yale Death Kind Avenue Block Phone Payment Computation
0.65 1.0 0.40 0.33 0.70 0 0.46 0.48 0.58 0.45 0.20 0.11 0.76 0.62 0.70 0.51 0.50 0.58 0.82 0.01 0.51 0.76 0.64 0.69 0.47 0.31 0.19 0.15 0.19 0.42 0.64 0.50 0.53 0.46 0.46 0.66 0.21 0.57 0.24 0.09 0.58 0.36 0.47 0.27 0.67 0.33 0.66 0.72 0.10 0.63 0.59 0.91
Football Football Arafat Physics Vodka Vodka Drink Car Gem Journey Boy Coast Asylum Magician Midday Furnace Food Bird Bird Seafood Seafood Lobster Lobster Championship Man Man Murder Opera Mexico Glass Aluminum Rock Museum Shower Monk Cup Journal Street Car Space Situation Word Peace Consumer Ministry Smart Investigation Image Life Start Computer Board
Basketball Tennis Jackson Chemistry Gin Brandy Eat Automobile Jewel Voyage Lad Shore Madhouse Wizard Noon Stove Fruit Cock Crane Food Lobster Food Wine Tournament Woman Governor Manslaughter Performance Brazil Metal Metal Jazz Theater Thunderstorm Oracle Food Association Children Flight Chemistry Conclusion Similarity Plan Energy Culture Student Effort Surface Term Match News Recommendation
0.60 0.53 0.25 0.54 0.57 0.66 0 0.85 0.85 0.75 0.70 0.76 0.87 0.94 0.97 0.15 0.09 0.44 0.38 0.66 0.66 0.44 0.11 0.52 0.37 0.16 0.73 0.04 0.42 0.11 0.71 0.56 0.14 0.32 0.15 0.22 0.05 0 0.08 0.06 0.16 0.17 0.04 0.02 0.15 0.11 0.54 0.16 0.38 0.11 0.13 0.03
Drink President Prejudice Viewer Peace Mars Media Precedent Announcement Line Crane Drink Opera Volunteer Listing Street Monk Lad Sugar Rooster Noon Chord Professor King Cup Jaguar Skin Century Doctor Hospital Travel Five Announcement Morality Money Delay Governor Practice Century Coast Shore Energy Precedent Production Stock Holy Stock Drink Delay Stock Stock Reason
Car Medal Recognition Serial Insurance Water Gain Cognition Effort Insurance Implement Mother Industry Motto Proximity Place Slave Wizard Approach Voyage String Smile Cucumber Cabbage Entity Car Eye Year Personnel Infrastructure Activity Month Production Importance Operation News Interview Institution Nation Forest Woodland Secretary Group Hike Phone Sex Cd Ear Racism Life Jaguar Hypertension
0.02 0.02 0.27 0.11 0.57 0.03 0 0.28 0.04 0.06 0.25 0.18 0.03 0.13 0.07 0.04 0.18 0.17 0.04 0.08 0.03 0.11 0.04 0.07 0.29 0.03 0.23 0.39 0.03 0.03 0.29 0.17 0.09 0.33 0.05 0.11 0.11 0.60 0.04 0.04 0.03 0.01 0.23 0.13 0.07 0.01 0.08 0.15 0.17 0.03 0.06 0.12
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
Profit Dollar Dollar Phone Liquid Marathon Theater Situation Profit Media Chance Precedent Architecture Population Seven School
Loss Yen Buck Equipment Water Sprint History Isolation Warning Trading Credibility Information Century Development Series Center
0.36 0.39 0.85 0.59 0.72 0.12 0.09 0.23 0.04 0 0.10 0.45 0.03 0.25 0.10 0.21
Lad Observation Coast Deployment Benchmark Attempt Consumer Start Focus Development Precedent Cup Sign Problem Experience Music
Brother Architecture Hill Departure Index Peace Confidence Year Life Issue Collection Article Recess Airport Music Project
0.15 0.19 0.32 0.20 0.40 0.04 0.13 0.12 0.03 0.20 0.43 0.44 0.20 0.13 0.18 0.34
Direction Wednesday Glass Cemetery Possibility Cup Forest Stock Month Stock Peace Morality Minority Atmosphere Report
259
Combination News Magician Woodland Girl Substance Graveyard Egg Hotel Live Atmosphere Marriage Peace Landscape Gain
0.15 0.03 0.10 0.02 0.13 0.21 0.02 0.05 0.15 0 0.19 0.14 0.13 0.11 0.05
Appendix F. MED38
Anemia Dementia Osteoporosis Sinusitis Hypertension Hyperlipidemia Hypothyroidism Sarcoidosis Asthma Lactose Intolerance Urinary Tract Infection Psychology Adenovirus Migraine Hepatitis B Carcinoma Pulmonary Stenosis Breast Feeding Pain
Appendicitis Atopic Dermatitis Patent Ductus Arteriosus Mental Retardation Kidney Failure Hyperkalemia Hyperthyroidism Tuberculosis Pneumonia Irritable Bowel Syndrome Pyelonephritis Cognitive Science Rotavirus Headache Hepatitis C Neoplasm Aortic Stenosis Lactation Ache
0.37 0.15 0.06 0.15 0.39 0.42 0.74 0.26 0.70 0.01 0.89 0.42 0.57 0.82 0.79 0.78 0.72 0.60 0.72
Measles Down Syndrome Renal failure Abortion Delusion Metastasis Calcification Mitral stenosis Rheumatoid arthritis Carpal tunnel syndrome Diabetes mellitus Acne Antibiotic Multiple sclerosis Appendicitis Depression Hyperlipidemia Heart Stroke
Rubeola Trisomy 21 Kidney failure Miscarriage Schizophrenia Adenocarcinoma Stenosis Atrial fibrillation Lupus Osteoarthritis Hypertension Syringe Allergy Psychosis Osteoporosis Cellulitis Metastasis Myocardium Infarct
0.97 0.97 0.97 0.94 0.46 0.06 0.13 0.59 0.35 0.16 0.20 0.06 0.07 0.16 0.02 0.20 0.11 0.20 0.27
Lean Sustain Furnish Study Vow Figure out Swell Tap Rest Intertwine Curl Highlight Situate Devise Impress Concoct Show Situate
0.29 0.49 0.36 0.62 0.73 0.95 0.33 0.31 0.39 0.58 0.67 0.60 0.50 0.41 0.52 0.38 0.39 0.38
Appendix G. YP130
Brag Divide Build End Swear Split Depict Consume Demonstrate Furnish Scorn Bruise Swear Merit Build Hail Spin Swing
Boast Split Construct Terminate Think Crush Recognize Eat Show Supply Yield Split Explain Deserve Organize Judge Twirl Sway
0.97 0.91 0.84 0.95 0.55 0.75 0.33 0.91 0.81 0.94 0.32 0.39 0.29 0.98 0.35 0.44 0.88 0.80
Seize Levy Refer Remember Alter Imitate Correlate Refer Ache Request Situate Discard Hasten Supervise Relieve Divide Want Rotate
Refer Believe Carry Hail Highlight Highlight Levy Lean Spin Concoct Isolate Arrange Permit Concoct Hinder Figure out Deserve Situate
0.56 0.40 0.41 0.42 0.37 0.19 0.25 0.46 0.30 0.44 0.33 0.44 0.49 0.26 0.47 0.51 0.49 0.34
Scrape Refine Advise Arrange Swear Solve Enlarge Drain Lean Weave Twist Accentuate Position Concoct Furnish Clean Postpone Empty
260
Circulate Yell Recognize Resolve Prolong Tap Flush Refer Highlight Block Show Hail Dissipate Impose Swing Stamp Rap Make Approve Acknowledge Hail Welcome Supply Call Sell Explain
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
Distribute Boast Acknowledge Settle Sustain Knock Spin Direct Restore Hinder Publish Acclaim Disperse Levy Break Weave Tap Earn Scorn Distribute Address Recognize Consume Refer Market Boast
0.84 0.35 0.86 0.66 0.89 0.81 0.32 0.71 0.23 0.75 0.42 0.92 0.91 0.95 0.37 0.27 0.81 0.39 0.23 0.42 0.48 0.61 0.57 0.48 0.67 0.29
Expect Need Swing Shake Terminate Forget Hasten Challenge Yield Arrange Evaluate Seize Hinder Finance Research Seize Resolve Swell Build Boast Market Catch Make Swear Clip
References Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Pasca, M., Soroa, A., 2009. A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proceedings of the Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL'09, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 19–27. Al-Mubaid, H., Nguyen, H.A., 2006. A cluster-based approach for semantic similarity in the biomedical domain. In: Proceedings of the 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS 2006. IEEE Computer Society, New York, USA, pp. 2713–2717. Atkinson, J., Ferreira, A., Aravena, E., 2009. Discovering implicit intention-level knowledge from natural-language texts. Knowl.-based Syst. 22 (7), 502–508. Banerjee, S., Pedersen, T., 2003. Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (San Francisco, CA, USA), IJCAI'03, Morgan Kaufmann Publishers Inc., pp. 805–810. Budanitsky, A., Hirst, G., 2001. Semantic distance in WordNet: an experimental, application-oriented evaluation of five measures. Workshop on WordNet and Other Lexical Resources, Second meeting of the North American Chapter of the Association for Computational Linguistics, Pittsburgh, USA. Curran, J.R., 2002. Ensemble methods for automatic thesaurus extraction. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing – Volume 10, EMNLP'02, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 222–229. Devitt, A., Vogel, C., 2004. The topology of WordNet: some metrics. In: Proceedings of GWC-04, 2nd Global WordNet Conference, pp. 106–111. Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Red-divari, P., Doshi, V., Sachs, J., 2004. Swoogle: a search and metadata engine for the semantic web. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, CIKM'04, ACM, New York, NY, USA, pp. 652–659. Formica, A., 2008. Concept similarity in formal concept analysis: An information content approach. Know.-based Syst. 21 (1), 80–87. Francis, W.N., Kučera, H., 1982. Frequency analysis of English usage: lexicon and grammar. Journal of English Linguistics, Houghton Mifflin, Boston 18 (1), 64–70. http://dx.doi.org/10.1177/007542428501800107. Gaeta, M., Orciuoli, F., Ritrovato, P., 2009. Advanced ontology management system for personalised e-learning. Know.-based Syst. 22 (4), 292–301. Goldstone, R.L., 1991. Similarity, interactive activation, and mapping. Journal of Experimental Psychology: Learning, Memory, and Cognition. University of Michigan 20, 3–28 (article-id 1805241). Fellbaum, C. (Ed.), May 1998. WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press. Hadj Taieb, M.A., Ben Aouicha, M., Tmar, M., Ben Hamadou, A., 2012. Wikipedia category graph and new intrinsic information content metric for word semantic
Deserve Deserve Crash Swell Postpone Resolve Accelerate Yield Seize Plan Terminate Request Assist Build Distribute Take Examine Curl Propose Yield Sweeten Consume Trade Describe Twist
0.45 0.49 0.32 0.36 0.38 0.46 0.32 0.30 0.47 0.46 0.87 0.69 0.48 0.30 0.34 0.44 0.53 0.26 0.32 0.35 0.31 0.54 0.53 0.43 0.28
Sweat Lean Hinder Sustain Arrange Request Complain Boil Approve Recognize Move Anger Twist Swing Distribute Dissipate Resolve Refer List Weave Express Dilute Twist Submit Approve
Spin Grate Yield Lower Explain Levy Boast Tap Boast Succeed Swell Approve Fasten Bounce Commercialize Isolate Publicize Explain Figure out Print Figure out Market Intertwine Yield Support
0.22 0.29 0.44 0.50 0.42 0.46 0.49 0.31 0.27 0.79 0.71 0.29 0.41 0.30 0.42 0.31 0.28 0.43 0.31 0.31 0.68 0.29 0.65 0.67 0.74
relatedness measuring, Data and Knowledge Engineering. Springer, Berlin Heidelberg, pp. 128–140. Hadj Taieb, M.A., Ben Aouicha, M., Ben Hamadou, A., 2013a. A new semantic relatedness measurement using wordnet features. Knowl. Inf. Syst. http://dx. doi.org/10.1007/s10115-013-0672-4. (published online 13 Aug 2013). Hadj Taieb, M.A., Ben Aouicha, M., Ben Hamadou, A., 2013b. Computing semantic relatedness using wikipedia features. Knowl.-based Syst. 50, 260–278. Hao, D., Zuo, W., Peng, T., He, F.,2011. An approach for calculating semantic similarity between words using wordnet. In: ICDMA, pp. 177–180. Harris, Z., 1954. Distributional structure. Word 10 (23), 146–162. Hirst, G., St-Onge, D., 1997. Lexical chains as representation of context for the detection and correction malapropisms. Hliaoutakis, A., 2005. Semantic Similarity Measures in the MESH Ontology and their Application to Information Retrieval on Medline (Ph.D. thesis). Technical Univ. of Crete (TUC), Department of Electronic and Computer Engineering, Crete, Greece. Jiang, J.J., Conrath, D.W., 1997. Semantic similarity based on corpus statistics and lexical taxonomy. CoRR cmp-lg/9709008. Leacock, C., Chodorow, M., 1998. Combining Local Context and WordNet Similarity for Word Sense Identification. In: Fellfaum, C. (Ed.), 1998. MIT Press, Cambridge, Massachusetts, pp. 265–283. Lesk, M., 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, SIGDOC'86, ACM, New York, NY, USA, pp. 24–26. Li, Y., Bandar, Z.A., McLean, D., 2003. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. on Knowl. and Data Eng 15 (4), 871–882. Lin, D., 1998. An information-theoretic definition of similarity. In: Proceedings of the Fifteenth International Conference on Machine Learning, ICML'98, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp. 296–304. Liu, X., Zhou, Y., Zheng, R., 2007. Measuring semantic similarity in WordNet. In: Proceedings of the Sixth International Conference on Machine Learning and Cybernetics, Hong Kong, pp. 3431–3435. Meng, L., Gu, J., Zhou, Z., 2012. A new model of information content based on concept's topology for measuring semantic similarity in WordNet. Int. J. Grid Distrib. Comput. 5 (3). Miller, G.A., Charles, W.G., 1991. Contextual correlates of semantic similarity. Lang. Cogn. Processes 6 (1), 1–28. Neter, J., Kutner, M.H., Nachtsheim, C.J., Wasserman, W., 1996. Applied Linear Statistical ModelsIrwin, Chicago. Patwardhan, S., Pedersen, T., 2006. Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of the EACL 2006 Workshop Making Sense of Sense-Bringing Computational Linguistics and Psycholinguistics Together, vol. 1501, pp. 1–8. Patwardhan, S., Banerjee, S., Pedersen, T., 2003. Using measures of semantic relatedness for word sense disambiguation. In: Proceedings of the 4th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing'03, Berlin, Heidelberg, Springer-Verlag, pp. 241–257.
M.A. Hadj Taieb et al. / Engineering Applications of Artificial Intelligence 36 (2014) 238–261
Pedersen, T., Pakhomov, S.V.S., Patwardhan, S., Chute, C.G., 2007. Measures of semantic similarity and relatedness in the biomedical domain. J. Biomed. Inform. 40, 288–299. Pesaranghader, A., Muthaiyah, S., 2013. Definition-based information content vectors for semantic similarity measurement. In: Soft Computing Applications and Intelligent Systems. Communications in Computer and Information Science, Springer Berlin Heidelberg, vol. 378, pp. 268–282. Pesaranghader, A., Muthaiyah, S., Pesaranghader, A., 2013. Improving gloss vector semantic relatedness measure by integrating pointwise mutual information: Optimizing second-order co-occurrence vectors computed from biomedical corpus and UMLS. In: Proceedings of the International Conference on Informatics and Creative Multimedia (ICICM), pp. 196–201. Petrakis, E.G.M., Varelas, G., Hliaoutakis, A., Raftopoulou, P., 2006. X-similarity: computing semantic similarity between concepts from different ontologies. JDIM 4 (4), 233–237. Pirró, G., 2009. A semantic similarity metric combining features and intrinsic information content. Data Knowl. Eng. 68 (11), 1289–1308. Pirró, G., Seco, N., 2008. Design, implementation and evaluation of a new semantic similarity metric combining features and intrinsic information content. In: Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE Part II on On the Move to Meaningful Internet Systems, OTM'08, Springer-Verlag, Berlin, Heidelberg, pp. 1271–1288. Rada, R., Mili, H., Bicknell, E., Blettner, M., 1989. Development and application of a metric on semantic nets. IEEE Trans. Syst. Man Cybern. 19 (1), 17–30. Resnik, P., 1995. Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI'95, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, vol. 1, pp. 448–453. Resnik, P., 1998. Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. 11, 95–130. Rodríguez, M.A., Egenhofer, M.J., 2003. Determining semantic similarity among entity classes from different ontologies. IEEE Trans. Knowl. Data Eng. 15 (2), 442–456. Rubenstein, H., Goodenough, J.B., 1965. Contextual correlates of synonymy. Commun. ACM 8 (10), 627–633. Sánchez, D., 2010. A methodology to learn ontological attributes from the web. Data Knowl. Eng. 69 (6), 573–597. Sánchez, D., Batet, M., 2011. Semantic similarity estimation in the biomedical domain: an ontology-based information-theoretic perspective. J. Biomed. Inform. 44 (5), 749–759.
261
Sánchez, D., Moreno, A., 2008. Learning non-taxonomic relationships from web documents for domain ontology construction. Data Knowl. Eng. 64 (3), 600–623. Sánchez, D., Batet, M., Isern, D., 2011. Ontology-based information content computation. Know.-based Syst. 24 (2), 297–303. Sánchez, D., Isern, D., Millan, M., 2011. Content annotation for the semantic web: an automatic web-based approach. Knowl. Inf. Syst. 27 (3), 393–418. Sánchez, D., Solé-Ribalta, A., Batet, M., Serratosa, F., 2012. Enabling semantic similarity estimation across multiple ontologies: an evaluation in the biomedical domain. J. Biomed. Inform. 45 (1), 141–155. Sebti, A., Barfroush, A.A., 2008. A new word sense similarity measure in WordNet. In: IMCSIT, IEEE, pp. 369–373. Seco, N., 2004. Computational Models of Similarity and Lexical Ontologies (Ph.D. thesis). Department of Computer Science, University College Dublin. Seco, N., Veale, T., Hayes, J., 2004. An intrinsic information content metric for semantic similarity in WordNet. Proc. ECAI 4 (PhD thesis). Shannon, C.E., 1948. A mathematical theory of communication. Bell Syst. Tech. J. 27 (379–423, 623–656). Stevenson, M., Greenwood, M.A., 2005. A semantic approach to IE pattern induction. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL'05, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 379–386. Sussna, M., 1993. Word sense disambiguation for free-text indexing using a massive semantic network. In: Proceedings of the Second International Conference on Information and Knowledge Management, CIKM'93, ACM, New York, NY, USA, pp. 67–74. Tversky, A., 1977. Features of similarity. Psychol. Rev. 84, 327–352. Wu, Z., Palmer, M., 1994. Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, ACL'94, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 133–138. Yang, D., Powers, D.M.W. Verb similarity on the taxonomy of wordnet. In In: Proceedings of the 3rd International WordNet Conference (GWC-06), Jeju Island, Korea, 2006. Zesch, T., 2010. Study of Semantic Relatedness of Words Using Collaboratively Constructed Semantic Resources (Ph.D. thesis). Darmstadt University of Technology. Zhou, Z., Wang, Y., Gu, J., 2008a. New model of semantic similarity measuring in WordNet. In: Proceedings of the 3rd International Conference on Intelligent System and Knowledge Engineering, Xiamen, China. Zhou, Z., Wang, Y., Gu, J., 2008b. A new model of information content for semantic similarity in WordNet. In: Proceedings of the International Conference on the Future Generation Communication and Networking Symposia, vol. 3, pp. 85–89.