Available online at www.sciencedirect.com Available online at www.sciencedirect.com
Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2018) 000–000 Procedia Computer Science (2018) 000–000 Procedia Computer Science 12900 (2018) 110–114
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
2017 International Conference on Identification, Information and Knowledge in the Internet of 2017 International Conference on Identification, Information and Knowledge in the Internet of Things Things
Deep Deep Semantic Semantic Match Match Model Model for for Entity Entity Linking Linking Using Using Knowledge Knowledge Graph and Text Graph and Text Angen Luoa,∗ , Sheng Gaoa , Yajing Xua Angen Luoa,∗, Sheng Gaoa , Yajing Xua
a Beijing a Beijing
University of Posts and Telecommunications, Beijing 100876, China University of Posts and Telecommunications, Beijing 100876, China
Abstract Abstract In this paper, we address the problem of Entity Linking (EL) as aligning a textual mention to the referent entity in a knowledge In this(e.g., paper, we address theprevious problemstudies of Entity (EL)focus as aligning a textual mention to the referent entityfor in the a knowledge base Freebase). Most on Linking EL mainly on designing various feature representations mentions base (e.g., Freebase). studies on EL mainly focus designing various featureorrepresentations the mentions and entities. However,Most theseprevious handcrafted features often ignore the on internal meanings of words entities, requirefortedious feature and entities. and However, thesecomputation handcraftedand features often ignorefor thedifferent internal scenarios. meanings of or entities, require tedious feature engineering expensive lack adaptability In words this paper, we propose a Deep Semantic engineering and expensive computation and lack adaptability for different scenarios. In this paper, we propose a Deep Semantic Match Model (DSMM) for EL by using knowledge graph and descriptive text. Specifically, the DSMM applys bidirectional Long Match ModelMemory (DSMM) for EL by using knowledge graph and descriptive text.mentions Specifically, DSMMentities applys bidirectional Long Short Term Network (BiLSTM) with multi-granularities to match withthe candidate from two aspects: Short (BiLSTM)BiLSTM with multi-granularities match mentions with on candidate entities from two surfaceTerm formMemory match byNetwork a character-level (char-LSTM) andtosemantic match based the “structural” context ofaspects: entities surface match by of a character-level and semanticExperimental match based results on the “structural” context of entities and the form textual context mentions by a BiLSTM word-level(char-LSTM) BiLSTM (word-LSTM). on CoNLL benchmark dataset and the textual context of mentions by a word-level BiLSTM (word-LSTM). Experimental results on CoNLL benchmark dataset show that our proposed DSMM significantly outperforms existing baseline models for EL task. show that our proposed DSMM significantly outperforms existing baseline models for EL task. c 2018 Copyright ⃝ 2018 Elsevier Elsevier Ltd. Ltd. All All rights rights reserved. reserved. Copyright © c 2018 Copyright ⃝ Elsevierunder Ltd. All rights reserved. Selection and peer-review responsibility Selection and peer-review under responsibility of of the the scientific scientific committee committee of of the the 2017 2017 International International Conference Conference on on Identification, Identification, Selection andand peer-review under responsibility of the (IIKI2017). scientific committee of the 2017 International Conference on Identification, Information Knowledge in Internet Information and Knowledge in the the Internet of of Things Things (IIKI2017). Information and Knowledge in the Internet of Things (IIKI2017). Keywords: Deep semantic match; LSTM; Multiple granularities; Entity linking Keywords: Deep semantic match; LSTM; Multiple granularities; Entity linking
1. Introduction 1. Introduction Entity linking (EL) is a fundamental component to numerous tasks in the field of Natural Language Processing Entity linking (EL) ismention a fundamental component to numerous in the the field of Natural Processing (NLP). Given an entity and its context sentence, EL aimstasks to identify referent entityLanguage of the specific men(NLP). Given an entity mention and its context sentence, EL aims to identify the referent entity of the specific mention in a given Knowledge Base (KB). Existing studies on EL can be roughly divided into independent approaches tion in a given Knowledge Base (KB). Existing studies on EL can be roughly divided into independent approaches and collective methods. Independent approaches [4] consider entity mentions in a document independently while coland collective Independent approaches consider simultaneously. entity mentions All in aof document independently while collective methodsmethods. [3, 5] resolve entity mentions in a[4] document these approaches typically utilize lective methods [3, 5] resolve entity mentions in a document simultaneously. All of these approaches typically utilize the context of mentions and the descriptive text associated with the candidate entities to design handcraft feature the context of mentions andEL thebydescriptive associated with the candidate entities to design handcraft feature representations and address ranking thetext candidate entities. representations and address EL by ranking the candidate entities. ∗ ∗
Corresponding author. Tel.: +86-185-1159-9242. Corresponding Tel.: +86-185-1159-9242. E-mail address:author.
[email protected] E-mail address:
[email protected] c 2018 Elsevier Ltd. All rights reserved. 1877-0509 Copyright ⃝ c 2018 1877-0509and Copyright ⃝ Elsevier Ltd. Allof rights scientific reserved. committee of the 2017 International Conference on Identification, Information and Selection peer-review under responsibility 1877-0509 Copyright © 2018 Elsevier Ltd. Allthe rights reserved. Selection and responsibility of the scientific committee of the 2017 International Conference on Identification, Information and Knowledge in peer-review the Internet under of Things (IIKI2017). Selection and peer-review under responsibility of the scientific committee of the 2017 International Conference on Identification, Information Knowledge in theinInternet of Things (IIKI2017). and Knowledge the Internet of Things (IIKI2017). 10.1016/j.procs.2018.03.057
2
Angen Luo et al. / Procedia Computer Science 129 (2018) 110–114 Angen Luo / Procedia Computer Science 00 (2018) 000–000
111
Global Representation
Local Representation
Mention context Mention
score(m , e) President Clindon is expected to propose a tax break on home sales.
President Clinton
word-LSTM
char-LSTM
match
match
TransE
char-LSTM
/m/0157m
Bill Clinton
Entity String
Entity
(a) The whole system.
(b) Structure of the char-LSTM and word-LSTM used in our proposed DSMM.
Fig. 1. The overview of our proposed DSMM. It consists of two submodule: the surface form match and the semantic match. In this example, the entity mention is “President Clinton” occurs in the mention context “President Clinton is expected to proposed a tax break on home sales”. The candidate entity is “Bill Clinton” and the corresponding entity is “/m/0157m” in Freebase.
However, handcraft features used in previous studies, such as bag-of-words or entity popularity, are often overspecified and inefficient. Besides, they can not precisely represent the semantic meanings of entity and mention. Thus, applying word embedding and neural network to capture distributed representations becomes increasingly popular in the field of EL [6, 7]. Almost all previous researches on EL choose Wikipedia as the reference KB for the abundant descriptive text and hyperlinks for entities. However, We use Freebase as the aligning knowledge base in this paper, which has replaced Wikipedia in many downstream tasks of EL such as question answering and knowledge representation for its well structure, rich types info and well defined schemas for entities. With the restriction of limited textual context of Freebase, we propose a novel Deep Semantic Match Model (DSMM) that measures the match scores from two aspects: surface form match and semantic match. The overview of the DSMM is illustrated in Fig. 1(a). In surface form match part, we apply a char-LSTM to capture the local representation. While in semantic match part, we use a similar word-LSTM and the TransE[1] model, a knowledge representation method, to learn the global representation of mention and entity respectively. Afterwards, we calculate the match score using a sum of surface form match and semantic match. We conduct the proposed method on CoNLL 2003 benchmark dataset and yield state-of-art performance. The main contributions of this work are: 1) we propose a novel DSMM to match entity mention with the referent entity by considering a combination of surface form match and semantic match; 2) we firstly utilize the “structure” context of entity and tackle the problem of EL with Freebase which severely lack the textual context needed by traditional methods; 3) we improve representations used in surface form match and semantic match parts by applying a char-LSTM and incorporating position information with context information respectively. 2. Methodology Given an mention m, the sentence x it occurs and KB denoted by a set of triples (h, r, t), where h, t ∈ E stands for entities and r ∈ R stands for relations. E is the set of entities and R is the set of relationships, our model aims to find the most match entity in the candidate entities set Em ∈ E. In this section, we first give an overview of the DSMM, followed by the introduction of the common LSTM framework of char-LSTM and word-LSTM illustrated in Fig. 1(b). We then describe the surface form match and semantic match part followed by the training objective introduction. 2.1. An Overview Our proposed model is depicted in Fig. 1(a). As is shown, it consists of two submodule (detailed in Section 2.2): (i) surface form match based on the local representation by a char-LSTM; (ii) semantic match based on the global representation by a word-LSTM and knowledge representation method. Specifically, the common framework of charLSTM and word-LSTM is showed in Fig. 1(b). The only differences between the char-LSTM and word-LSTM is the position embeddings in the input and the attention layer after the LSTM layer in word-LSTM. We measure the surface
Angen Luo et al. / Procedia Computer Science 129 (2018) 110–114 Angen Luo / Procedia Computer Science 00 (2018) 000–000
112
3
form match and semantic match using the local and global represenatations respectively and apply a sum of them to rank the candidate entities for a given (mention, sentence) pair. 2.2. Our Proposed Model In our work, we employ LSTM as the framework to build our model. LSTM is an RNN variant aimed to overcome the gradient vanishing problem exists in standard RNN by introducing gating mechanism. In this paper, we used a bidirectional LSTM formed by two LSTMs with opposite directions to better capture the context information on both sides. Hence, at time step t, the hidden ht can be referred as the element-wise sum of the forward and backward pass [→ −] − ⊕← as ht = ht ht . Surface Form Match Part We utilize a char-LSTM to embed the surface form of the mention and entity as the local representations. And the surface form match is represented as the similarity between the local representation of mention and entity. Specifically, take the embedding of mention for an example. Given an mention M = {c1 , c2 , . . . , cT } consist of T characters, we use a embedding matrix W char to transform every character into its distributed represenc char {tation ei = W} ic , where ic is a binary vector which is zero at all positions except the i − th index. We then input ec1 , ec2 , . . . , ecT to a BiLSTM and use the final hidden state of the BiLSTM as local representation of the mention Locm . The same way to get Loce of an entity. Afterwards, we compute surface form match as ml = cosine(Locm , Loce ). Semantic Match Part We capture the global representation of mention Glom and entity Gloe by using the textual context of mention and the “structural” context of entity. In particular, we use a word-LSTM, similar to the char-LSTM except the input and an attention layer, to handle the textual context of mention. We transform every word of the context sentence of the mention into a combination of the word embedding and position embedding as the input by looking up word embedding matrix W word and position embedding matrix W p , then the output of BiLSTM layer H = [h1 , h2 , . . . , hT ] is fed into the attention layer and the output Glom is defined as a nonlinear transformation of the weighted sum of H: Glom = tanh(HαT )
(1)
where α is the attention weight matrix calculated just as [8] demontrates. As for the Gloe , since there lacks context for entities in Freebase, we exploit the structure constraints between entities to capture the semantic representation. In particular, we employ TransE [1] which trains the embeddings of entities and relations by enforcing E(s)+E(r) = E(o) for every observed triple (s, r, o) ∈ κ. We use the trained entity embeddings as an initialization of Gloe in semantic match. Similarly, the semantic match is computed as mg = cosine(Glom , Gloe ). Training Objective The surface form and semantic match are both straightforward and crucial clue to EL. So we calculate the overall match score of a pair (m, e) using the sum of ml and mg as score(m, e) = ml + mg . At the training stage, the entity mentions may have many candidate entities in Freebase. To effectively train the model, we utilize the hinge loss with negative samples as the training objective. Specifically, the loss function is defined as this form: ∑ Loss = max(0, γ − score(m, e) + score(m, e′ )) (2) (m,e)
where γ is a predefined margin, e is the gold standard entity, and e′ is a corrupted entity which is randomly selected from the entire entity vocabulary of reference KB. When for inference, we calculate the match score of each pair and select the best one as the final result. 3. Experiments 3.1. Experiment Setting We use a subset of Freebase (FB5M) as our reference KB for EL. The FB5M is released in the SimpleQuestions datasets. It contains 4,904,397 entities, 7,523 relations and 22,441,880 facts. We test the performance of the DSMM on the benchmark dataset CoNLL [3]. The dataset is partitioned into train, test-a and test-b set. Table 1 summarize detailed properties of the dataset. Following [3], only 27,816 mentions with valid entries in the KB are considered,
4
Angen Luo et al. / Procedia Computer Science 129 (2018) 110–114 Angen Luo / Procedia Computer Science 00 (2018) 000–000
113
resulting in the neglection of roughly 20% of the total mentions. We use the train set to train our model, the test-a set for the parameter tuning and the test-b set to evaluate our model.
Table 1. Detailed Properties of CoNLL Dataset
articles mentions unlinkable mentions distinct mentions
train 946 23,396 4,855 4,088
test-a 216 5,962 1,133 1,662
test-b 231 5,571 1,124 1,534
total 1393 34,956 7,112 5,598
Table 2. Candidate entity statistics on CoNLL. The Uniq means the fraction of mentions that map to exactly one entity in KB. The Avg column means the average count of the candidate entity.
method PPRforNED FB
Uniq 17.9% 16.2%
Avg. 12.6 24.6
We compare our model with several state-of-art methods. Hoffart’s model was a graph based approach that finds a dense subgraph of entites in a document to address EL [3]. He’s model used deep neural networks to derive the representations of entities and mention context and applied them to EL [2]. Pershina’s model proposed a Personalized PageRank algorithm to compute the coherence to perform collective EL [5]. Yamada’s model jointly mapped words and entities into the same continuous vector space and apply the embeddings to learn features for EL [7]. To obtain candidate entities, we use this two following methods: 1). a public dataset built by [5] (denoted by PPRforNED); 2) constructing candidate entities using simple string matching rules, “common.topic.aliase” and “type.object.name” property of entities in Freebase (denoted by FB). Table 2 report the statistics of the two methods. We compare our model with other start-of-art models using PPRforNED, and evaluate the DSMM on both PPRforNED and FB. Finally, we report empirical results of our models and these four models above on CoNLL dataset and we use the standard micro-averaged (aggregates over all mentions) and macro-averaged (aggregates over all documents) accuracies as the evaluate metrics. 3.2. Results Table 3 shows the experimental results. It’s reasonable PPRforNED has a higher accuracy than FB for that statistics in Table 2 show the candidate entities of PPRforNED has a lower ambiguity than FB’s. Further, Table 4 shows the comparison between our model with other state-of-art methods. To better analyze the extent to which each module affects the performance of the DSMM, we test four variants of our framework based on what it use to count the match score. SFM is short for surface form match, S F M1 , S F M2 and S F M1+2 use char-LSTM, average of the word vectors and a concatenation of both to capture the local representation. Similarly, SM is short for semantic match and DSMM is the complete model. We can see that the complete model DSMM outperforms the state-of-art method Yamada for micro accuracy and is comparable for macro accuracy. Moreover, S F M∗ and SM have a comparatively low accuracy because they just use unilateral representation to measure the match score, S F M∗ ignore the crucial contextual information while SM don’t consider the basic and straightforward surface form match. Experimental results also show S F M1 is better than S F M2 . The reason is that the char-LSTM is more robust to capture the local representation. S F M1+2 can significantly improve the performance. For example, it may be impossible for S F M1 to link mention “Britain” to the entity “United Kingdom” but S F M1+2 can solve it. Besides, we can observe a huge boost of the performance in DSMM model. It demonstrates the validity of the idea behind the paper that apply neural network to match mention with the referent entity by surface form match and semantic match based on representations at multiple granularities. 4. Related Work Entity linking (EL) can be typically regarded as a task of linking a mention to the referent entity in a knowledge base (KB). In general, there are two main lines of approach to the early studies of the EL problem, namely the independent approach and collective approach. Independent approach considers entity mentions in a document are independent and resolve one mention at each time. These approaches use local context textual features[4] of the mention such as edit distance, Normalized Google Distance, and compare them to the textual features of documents associated with the candidate entities in the KB.
Angen Luo et al. / Procedia Computer Science 129 (2018) 110–114 Angen Luo / Procedia Computer Science 00 (2018) 000–000
114
5
Table 4. Evaluation results on CoNLL dataset (all values in %), some results are from [7].
Table 3. Experimental results of our model on CoNLL datasets using different candidate generation methods.
CoNLL (PPRforNED) CoNLL (FB)
Micro 94.3 92.5
Macro 92.4 90.7
Our models
Baseline models
Method S F M1 S F M2 S F M1+2 SM DSMM Hoffart’s He’s Pershina’s Yamada’s
Micro 85.9 84.6 87.8 89.6 94.3 82.5 85.6 91.8 93.1
Macro 83.7 83.1 86.5 87.3 92.4 81.7 84.0 89.9 92.6
In contrast, collective approach [3, 5] considers entity mentions in one document collectively. These approaches try to model the interdependence between the different candidate entities for different mentions in a document, and reformulate EL as a global optimization problem. Both independent and collective methods rely on handcraft features which are time-consuming to extract and are low level representations of entity and mention. Under this perspective, applying neural networks to automatically learn the representation of entities [2, 6] has been addressed in recent years. For example, [6] proposed a method based on deep neural networks to model representations of mentions, contexts of mentions and entities. Our DSMM diffs from this methods in two aspects. We combine surface form match and semantic match measured by representations at multiple granularities together to rank the candidate entities. On the other hand, we use Freebase as the KB and overcome the lackness of context words for entities by leverage the structure “context” of entities to learn the representation of entity. 5. Conclusion and Future work In this paper, we propose a novel Deep Semantic Match Model (DSMM) for entity linking with Freebase. We combine surface form match and semantic match together by exploiting the textual context and the “structural” context to perform EL. The results of experiments on CoNLL benchmark dataset show that our proposed DSMM outperforms previous state-of-art models. In future work, we plan to improve our model in two aspects: 1) performing a collective linking for that entity assignments for entity mentions in one document are always interdependent with each other. 2) incorporating the type information of entities into the “structural” context to learn better global representation of entities. References [1] Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O., 2013. Translating embeddings for modeling multi-relational data, in: Advances in neural information processing systems, pp. 2787–2795. [2] He, Z., Liu, S., Li, M., Zhou, M., Zhang, L., Wang, H., 2013. Learning entity representation for entity disambiguation., in: ACL (2), pp. 30–34. [3] Hoffart, J., Yosef, M.A., Bordino, I., F¨urstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G., 2011. Robust disambiguation of named entities in text, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics. pp. 782–792. [4] Lin, T., Mausam, Etzioni, O., 2012. Entity linking at web scale, in: Joint Workshop on Automatic Knowledge Base Construction and Web-Scale Knowledge Extraction, pp. 84–88. [5] Pershina, M., He, Y., Grishman, R., 2015. Personalized page rank for named entity disambiguation, in: NAACL. [6] Sun, Y., Lin, L., Tang, D., Yang, N., Ji, Z., Wang, X., 2015. Modeling mention, context and entity with neural networks for entity disambiguation., in: IJCAI, pp. 1333–1339. [7] Yamada, I., Shindo, H., Takeda, H., Takefuji, Y., 2016. Joint learning of the embedding of words and entities for named entity disambiguation. arXiv preprint arXiv:1601.01343 . [8] Zhou, P., Shi, W., Tian, J., Qi, Z., Li, B., Hao, H., Xu, B., 2016. Attention-based bidirectional long short-term memory networks for relation classification, in: The 54th Annual Meeting of the Association for Computational Linguistics, p. 207.