Entity set expansion with semantic features of knowledge graphs
Accepted Manuscript Entity set expansion with semantic features of knowledge graphs Jun Chen, Yueguo Chen, Xiangling Zhang, Xiaoyong Du, Ke Wang, Ji-R...
Accepted Manuscript Entity set expansion with semantic features of knowledge graphs Jun Chen, Yueguo Chen, Xiangling Zhang, Xiaoyong Du, Ke Wang, Ji-Rong Wen
Web Semantics: Science, Services and Agents on the World Wide Web
Received date : 17 December 2017 Revised date : 13 June 2018 Accepted date : 12 September 2018 Please cite this article as: J. Chen, et al., Entity set expansion with semantic features of knowledge graphs, Web Semantics: Science, Services and Agents on the World Wide Web (2018), https://doi.org/10.1016/j.websem.2018.09.001 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Entity Set Expansion with Semantic Features of Knowledge Graphs Jun Chena,b , Yueguo Chena,b,∗, Xiangling Zhanga,b , Xiaoyong Dua,b , Ke Wangc , Ji-Rong Wena b Key
a School of Information, Renmin University of China, China Laboratory of Data Engineering and Knowledge Engineering, MOE, China c School of Computing Science, Simon Fraser University, Canada
Abstract A large-scale knowledge graph contains a huge number of path-based semantic features, which provides a flexible mechanism to assign and expand semantics/attributes to entities. A particular set of these semantic features can be exploited on the fly, to support particular entity-oriented semantic search tasks. In this paper, we use entity set expansion as an example to show how these path-based semantic features can be effectively utilized in a semantic search application. The entity set expansion problem is to expand a small set of seed entities to a more complete set of similar entities. Traditionally, people solve this problem by exploiting the statistical co-occurrence of entities in the web pages, where semantic correlation among the seed entities is not well exploited. We propose to address the entity set expansion problem using the path-based semantic features of knowledge graphs. Our method first discovers relevant semantic features of the seed entities, which can be treated as the common aspects of these seed entities, and then retrieves relevant entities based on the discovered semantic features. Probabilistic models are proposed to rank entities, as well as semantic features, by handling the incompleteness of knowledge graphs. Extensive experiments on a public knowledge graph (i.e., DBpedia V3.9) and three public test collections (i.e., CLEF-QALD 2-4, SemSearch-LS 2011, and INEX-XER 2009) show that our method significantly outperforms the state-of-the-art techniques. Keywords: Knowledge Graph, Semantic Feature, Entity Set Expansion, Semantic Search, Ranking Model 1. Introduction Imagine you are visiting a European oil painting exhibition and highly attracted by Vincent_Van_Gogh and Paul_Gauguin’s work, and you want to learn more about similar painters with the same painting style. Usually, you can submit a query like “Painters similar to Vincent Van Gogh and Paul Gauguin” to a search engine. However, unfortunately, as illustrated in Fig. 1, the web search engine returns relevant documents to Vincent_Van_Gogh and Paul_Gauguin as results (e.g., the relationship between them and their representative paintings), which is not robust enough to meet such information needs (i.e., similar painters to them). To address such a case, the entity set expansion (shorted as ESE) problem was proposed, which aims to expand a small set of seed entities (shorted as seeds, e.g., Vincent_Van_Gogh and Paul_Gauguin) to a more complete set of similar entities, by first discovering the common aspects of the seeds (e.g., both Vincent_Van_Gogh and Paul_Gauguin are post-impressionist painters), and then retrieving similar entities having these common aspects (e.g., Henri_de_Toulouse-Lautrec and Paul_Cezanne). The ESE problem is of practical importance and can be widely used in many applications such as web search ∗ Corresponding
Figure 1: Top-5 relevant results of a search engine for the query “Painters similar to Vincent Van Gogh and Paul Gauguin”.
(search by examples) [1, 2], item recommendation [3], dictionary construction [4], query refinement [5] and expanSeptember 16, 2018
sion [6]. For instance, item recommendation systems could provide suggestions to the users based on the items they have browsed. To better illustrate the ESE problem, please consider the following example:
producer
producer
starring Forrest_Gump
Contact director
starring
director
Example 1. Given a query composed of the seeds as {Forrest_Gump, Apollo_13_(film), Philadelphia_(film)}, return a ranked list of relevant entities with respect to the query, whose implicit query intent is to find “Tom Hanks’ movies where he plays a leading role”.
Brian_Grazer
Gary_Sinise
Steve_Starkey
Ron_Howard
director
Philadelphia_(film)
Tom_Hanks director subject
producer
type starring subject
subject
subject
starring
Robert_Zemeckis
Jonathan_Demme
director
Apollo_13_(film)
subject
Films_directed_by _Robert_Zemeckis
producer
starring
starring
American_films
Film type
subject Cast_Away
Edward_Saxon
type type
producer
Jack_Rapke
Figure 2: A subgraph of DBpedia V3.9. The dashed line indicates a missing predicate, i.e., starring.